How each captured construct behaves under different Informix source DB_LOCALE settings. The CDC capture pipe is byte-oriented — column data arrives as raw bytes from the logical log, transcoding never happens on the wire — so character data is locale-agnostic by construction. Numeric and temporal literal formatting goes through the Informix CSDK formatters and is forced to a locale-neutral encoding at the ripper's process boundary so captured SQL is portable to any target dialect regardless of source locale.
The matrix below shows the verified behaviour for each (source-locale × captured construct) cell. Every cell was exercised live on this build — INSERT / UPDATE (with WHERE-image) / DELETE driven against an Informix source running the named locale, the captured SQL inspected against the source bytes hex-for-hex. See the SQL Mapping page for the per-target dialect rewriter and the Schema Translation page for end-to-end target-side replay examples.
| Captured construct | en_US.819 (Latin-1, baseline) | en_US.utf8 | de_de.819 | ja_jp.utf8 |
|---|---|---|---|---|
| NCHAR / NVARCHAR multi-byte content | single-byte high-bit (e.g. 0xE9=é) | byte-perfect (Cyrillic, Greek, Japanese pass through unchanged) | single-byte high-bit (German umlauts äöüß) | byte-perfect (Hiragana / Katakana / Kanji pass through unchanged) |
| VARCHAR with high-bit / multi-byte chars | byte-preserved | byte-preserved (multi-byte UTF-8 sequences emit verbatim in INSERT / UPDATE / DELETE) | byte-preserved | byte-preserved |
| DECIMAL value emission (decimal separator) | ASCII dot (1234.56) | ASCII dot | ASCII dot (forced via CLIENT_LOCALE=en_us.819 at process startup — the ripper overrides the operator's env so a comma-decimal locale on the source never leaks into the captured SQL) | ASCII dot |
| DECIMAL emission, large precision (32+ digits) | full digits preserved | full digits preserved | full digits preserved with dot separator | full digits preserved |
| DATETIME literal | 'YYYY-MM-DD HH:MM:SS[.fffff]' (ISO) | ISO | ISO (no comma drift in the date string under DE locale) | ISO (no Japanese-era drift; '2026-04-01 09:15:30' not '令和8') |
| UPDATE / DELETE WHERE-image (full row reconstruction) | preserved | preserved (multi-byte WHERE values match source bytes verbatim) | preserved (umlauts and dot-decimal both clean) | preserved (multi-byte WHERE values match source bytes verbatim) |
| NCHAR length semantics | byte-padded | byte-padded (server-side octet_length matches captured byte count) | byte-padded | byte-padded |
The capture path is locale-agnostic for column data because the CDC log records carry raw byte sequences, not character strings — no client-side decoding or re-encoding happens between the source log and the captured SQL stream. Numeric and temporal formatting goes through Informix CSDK functions (dectoasc, dttoasc) which honour the GLS locale system; the ripper forces CLIENT_LOCALE=en_us.819 at process startup so those formatters always emit ASCII-dot decimal separator and ISO DATETIME literals regardless of the operator's shell environment or the source database's locale. Other locale categories (collation, character classification) stay at the operator's setting; only the formatter output is normalised.
The same byte-oriented capture path that makes same-locale matches trivial puts the cross-locale match on the operator: the captured INSERT carries whatever bytes the source's column held, and the target accepts or rejects them per its own column-charset declaration. The ripper does not transcode.
| Source DB_LOCALE encoding | Target column charset | Result |
|---|---|---|
| Latin-1 (en_US.819, de_de.819, …) | latin1 (MySQL/MariaDB), SQL_Latin1_General_CP1_CI_AS (MSSQL), WE8ISO8859P1 (Oracle), 819 (Db2) | byte-perfect — high-bit chars (0xE9=é) land verbatim |
| UTF-8 (en_US.utf8, ja_jp.utf8, …) | utf8mb4 (MySQL/MariaDB), UTF-8 (PG), AL32UTF8 (Oracle), 1208 (Db2) | byte-perfect — multi-byte sequences pass through unchanged |
| Latin-1 | utf8mb4 / UTF-8 | requires per-target charset workaround — the Latin-1 high-bit byte (e.g. 0xE9) is not a valid UTF-8 sequence and the target rejects with Incorrect string value. The ripper's connector init issues SET NAMES binary on MySQL/MariaDB and SET client_encoding TO LATIN1 on PG so the target accepts the bytes as opaque; the column should be declared with a CHARACTER SET latin1 attribute (or the equivalent per dialect) for the bytes to render correctly to applications reading the target. |
| UTF-8 | latin1 / single-byte | multi-byte source content does not fit a single-byte target column — the target either rejects or silently truncates. Operator must widen the target column to a multi-byte charset (utf8mb4 / UTF-8) before pointing the ripper at it. |
The ripper itself does not narrow or widen captured bytes; the target column's declared charset and the connection's session charset (SET NAMES on MySQL family, client_encoding on PG) determine whether the bytes land cleanly. For mixed environments, declare the target column with a charset matching the source's storage encoding and let the connection-level SET NAMES binary path (already wired into the MySQL/MariaDB connectors) carry the bytes opaquely.
At startup, after each direct-DB connector has issued its connection-level charset directive, the ripper additionally probes the actual declared charset of every captured-table column on the target and compares against the source's DB_LOCALE encoding family. The probe is informational — it does not block startup — but surfaces cross-family mismatches the operator may not have noticed when the target schema was built.
| Target | Probe query | Granularity |
|---|---|---|
| postgres | SHOW server_encoding | per database (PG has one encoding per database, not per column — same answer for every captured column) |
| mysql, mariadb | SELECT CHARACTER_SET_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = DATABASE() AND ... | per column — MySQL/MariaDB carry a charset attribute on every text column, so the probe surfaces per-column mismatch |
For each captured column the probe classifies both the source DB_LOCALE family and the target column's declared family (Latin-1 / UTF-8 / Shift-JIS / multi-byte-other / unknown). On a cross-family split (e.g. Latin-1 source + UTF-8 target column) the ripper emits a one-time WARN per column naming the table.column, the source family, the target family, and a brief remediation hint (set the target column's charset to match, or rely on the connection-level binary-transit directive plus a per-application re-decode). Same-family pairs produce no log line.
The probe is opt-out via skip_charset_check: true at the top-level YAML (the same knob disables the connection-level verify-back covered on the Schema Translation page). Db2 / Oracle / MSSQL targets do not yet carry a per-column probe — their per-column charset model is either single-database-level (Db2) or a fixed CSDK-controlled configuration (Oracle NLS_CHARACTERSET) better verified by the connection-level directive check.
The SQL Mapping page covers the per-target dialect rewrite once the captured SQL is in hand; the Schema Translation page covers per-dialect target column types and the connection-level charset directive verify-back; the Configuration page covers the YAML knobs the operator sets per target.
To discuss how Oninit ® can assist please call on +1-913-732-8892 or alternatively just send an email specifying your requirements.
You get all this for free.. think about what you get if you pay us