Transforms run inside the capture path, between the column-value decoder and the captured-SQL emitter. Every transform is declared in YAML on the source table and applies to a single column. The operator chooses an op, supplies the op's parameters, and optionally renames the column on emit. The transformed value lands in INSERT, UPDATE…SET, and the WHERE-image of UPDATE and DELETE uniformly — the target sees no plaintext that the operator marked for transformation.
Transforms are upstream of the per-target dialect rewriter: the same redacted / hashed / masked / cast value lands at every configured replay target without per-dialect duplication. The rewriter is also transform-aware — a hashed SHA-256 hex string won't trip the BOOLEAN-context rewrite by accident.
This page documents every op the ripper ships today. Every example is operator-runnable as written; the captured SQL is shown below each example so the round-trip is verifiable end-to-end.
The transform: sequence sits on the extended per-table entry alongside columns: and skip_unsupported_columns:. Each entry targets one column and chooses one op (plus optional rename and optional when: predicate). Operators can stack multiple transforms on different columns of the same table.
source:
tables:
- name: t_customer
transform:
- column: email
op: redact
value: "'***REDACTED***'"
- column: phone
op: hash
- column: ssn
op: mask
match: "^([0-9]{3})-([0-9]{2})-([0-9]{4})$"
pattern: "'XXX-XX-${3}'"
- column: cust_id
op: rename
rename: customer_id
- column: notes
op: cast
to: "VARCHAR(40)"
when: "status = 'archived'"
Replace a column's emitted value with an operator-supplied literal. The operator owns SQL quoting — string literals wrap in '…'; numeric literals stay bare; NULL emits unquoted.
transform:
- column: email
op: redact
value: "'***'"
- column: salary
op: redact
value: "0"
- column: phone
op: redact
# value: omitted → emits NULL
The replacement applies uniformly to INSERT, UPDATE SET, and the WHERE-image of UPDATE / DELETE. Combinable with rename so the target column also gets a different name.
Emit the column under a different name on the target. Useful for schema-evolution scenarios where the source still carries a legacy name but the target has migrated. The transform is pure-emit — the source schema is untouched.
transform:
- column: cust_id
op: rename
rename: customer_id
Combines with any other op — redact + rename, hash + rename, mask + rename, cast + rename all valid.
Replace the column's value with a SHA-256 hash of its wire bytes, emitted as a single-quoted 64-character lowercase hex literal. The hash is deterministic — identical source values produce identical hashes — so foreign-key joins downstream still hold. No libcrypto dependency on the ripper; the SHA-256 implementation is inline.
transform:
- column: email
op: hash
Captured SQL for a customer table:
INSERT INTO t_customer (id, email, ...) VALUES (
127,
'3d193523d3dbfb2dbe24d119ac9de893beac279385b1ca60ece693590d8b1bd9',
...
);
Caveat for operators: the hash covers the captured column's wire bytes, not the ASCII string the column logically holds. For VARCHAR / LVARCHAR the wire image is the value bytes verbatim, so the hash is reproducible with a plain sha256sum. For DECIMAL, DATETIME, NCHAR, and other type-padded forms the wire image includes framing (length byte, trailing NUL pad, locale conversions) so external reproduction needs to mirror that framing. The simpler operator workflow is to capture once and document the digests as the canonical reference.
Template-substitution rewrite. The pattern: string is emitted verbatim with substitution rules applied. Two reference forms supported by every pattern:
Forward references (the named column hasn't been processed yet in column order) emit the literal placeholder string ?{name}? — replay-safe SQL, operator notices the placeholder if they intended a back-reference but specified a forward one.
With match:, a POSIX Extended Regular Expression matches against the column's value and capture groups become available as ${1} … ${9} in the pattern. Closes the canonical "mask CC keeping last 4 digits" / "mask phone keeping area code" idioms.
transform:
- column: email
op: mask
pattern: "'masked-${fname}-at-***'"
- column: card_number
op: mask
match: "^([0-9]{12})([0-9]{4})$"
pattern: "'************${2}'"
- column: phone
op: mask
match: "^\(([0-9]{3})\)"
pattern: "'(${1}) XXX-XXXX'"
No-match fallback emits the literal NULL rather than the raw value — rows stay replayable, the operator notices the miss via captured-target audit.
Reformat the column's value to fit a different target type. The to: string specifies the target spec; the cast handler picks the right reformat based on the source column type and the target spec.
| Source type | Target spec (to:) | Behaviour |
|---|---|---|
| VARCHAR / CHAR / NVARCHAR / NCHAR | VARCHAR(N), CHAR(N), NVARCHAR(N), NCHAR(N) | byte-truncate the value to N bytes; leading and trailing single-quotes preserved |
| DECIMAL / MONEY | DECIMAL(p,s), NUMERIC(p,s), NUMBER(p,s) | re-format via the Informix decimal formatter at the new scale; when total digits exceed the target precision, emit NULL with a runtime WARN naming the table.column + the value that overflowed |
| DATETIME | TIMESTAMP(N), TIMESTAMPTZ, DATETIME YEAR TO FRACTION(N), DATETIME YEAR TO SECOND | truncate fractional seconds to the target precision (N for FRACTION / TIMESTAMP, 0 for SECOND, 6 for TIMESTAMPTZ); no-op if the source is already at or below target precision |
| BLOB / CLOB (SQLUDTFIXED) | VARCHAR(N) HEX, VARCHAR(N) BASE64 | re-encode the captured X'NN…' hex literal as a plain VARCHAR; HEX strips the X prefix and truncates the inner hex; BASE64 decodes the hex to bytes and base64-encodes (padded to 4-char multiples), then truncates the result to N characters |
transform:
- column: email
op: cast
to: "VARCHAR(8)" # byte-truncate
- column: amount
op: cast
to: "DECIMAL(12,4)" # re-precision via Informix dectoasc
- column: audit_ts
op: cast
to: "TIMESTAMP(3)" # truncate fractional seconds
- column: payload
op: cast
to: "VARCHAR(48) BASE64" # encode BLOB as base64 text
When the source type does not match any handler for the target spec, the row's column passes through verbatim and the ripper emits a one-time WARN naming the table, column, source type, and target spec. Subsequent rows on the same column suppress the line so a misconfiguration produces one diagnostic rather than per-row log spam.
A when: predicate gates any transform to rows matching a column = literal equality check. The predicate is byte-for-byte case-sensitive; the operator owns the literal format the same way they own redact's value: field — quoted strings stay quoted, numerics stay bare.
transform:
- column: email
op: redact
value: "'***'"
when: "country = 'EU'" # GDPR redaction only
- column: salary
op: hash
when: "is_executive = 1"
The predicate column's value comes from the same per-record cache that ${col} mask references use, so forward references (the predicate column comes later in column order than the transform's target column) miss the cache and the predicate never matches. The cache populates in column order — place the predicate column to the left of the transformed column in the source schema.
A malformed when: (no =) emits a stderr warning at config load and leaves the predicate empty (= unconditional, matching v1 default). No silent behaviour change.
Multiple transforms on the same table are independent — each targets one column. A column can carry at most one op, plus rename, plus an optional when:. Rename always combines with any op:
transform:
- column: email
op: hash
rename: email_hash # target column name on replay
- column: cust_id
op: rename # name-change-only
rename: customer_id
- column: card_number
op: mask
match: "^([0-9]{12})([0-9]{4})$"
pattern: "'************${2}'"
rename: card_last_four
The per-target dialect rewriter knows when a column carries a transform and suppresses any source-type-driven rewrite that would re-interpret the transformed value. The canonical case is a SHA-256 hash literal whose first byte happens to be 't' or 'f' — without transform-awareness the BOOLEAN- context rewriter would translate the digest's first character into TRUE or FALSE. With transforms wired through, the rewriter leaves transformed columns alone.
CDC wire bytes → column-value decoder → transform
→ dialect rewriter
→ captured SQL → target
The transform runs once per captured record per configured column, inside the worker thread. No round-trip to the target, no separate process. The redacted / hashed / masked / cast value is the only value the target ever sees — there's no plaintext leak through any intermediate stage.
To discuss how Oninit ® can assist please call on +1-913-732-8892 or alternatively just send an email specifying your requirements.
You get all this for free.. think about what you get if you pay us