Oninit Logo
The Down System Specialists
+1-913-732-8892
+44-2081-337529
Partnerships Contact

Oninit® Log Ripper — Per-Column Transforms

Transforms run inside the capture path, between the column-value decoder and the captured-SQL emitter. Every transform is declared in YAML on the source table and applies to a single column. The operator chooses an op, supplies the op's parameters, and optionally renames the column on emit. The transformed value lands in INSERT, UPDATE…SET, and the WHERE-image of UPDATE and DELETE uniformly — the target sees no plaintext that the operator marked for transformation.

Transforms are upstream of the per-target dialect rewriter: the same redacted / hashed / masked / cast value lands at every configured replay target without per-dialect duplication. The rewriter is also transform-aware — a hashed SHA-256 hex string won't trip the BOOLEAN-context rewrite by accident.

This page documents every op the ripper ships today. Every example is operator-runnable as written; the captured SQL is shown below each example so the round-trip is verifiable end-to-end.

YAML shape

The transform: sequence sits on the extended per-table entry alongside columns: and skip_unsupported_columns:. Each entry targets one column and chooses one op (plus optional rename and optional when: predicate). Operators can stack multiple transforms on different columns of the same table.

source:
  tables:
    - name: t_customer
      transform:
        - column: email
          op:     redact
          value:  "'***REDACTED***'"
        - column: phone
          op:     hash
        - column: ssn
          op:     mask
          match:  "^([0-9]{3})-([0-9]{2})-([0-9]{4})$"
          pattern: "'XXX-XX-${3}'"
        - column: cust_id
          op:     rename
          rename: customer_id
        - column: notes
          op:     cast
          to:     "VARCHAR(40)"
          when:   "status = 'archived'"

Op: redact

Replace a column's emitted value with an operator-supplied literal. The operator owns SQL quoting — string literals wrap in '…'; numeric literals stay bare; NULL emits unquoted.

transform:
  - column: email
    op:     redact
    value:  "'***'"
  - column: salary
    op:     redact
    value:  "0"
  - column: phone
    op:     redact
    # value: omitted → emits NULL

The replacement applies uniformly to INSERT, UPDATE SET, and the WHERE-image of UPDATE / DELETE. Combinable with rename so the target column also gets a different name.

Op: rename

Emit the column under a different name on the target. Useful for schema-evolution scenarios where the source still carries a legacy name but the target has migrated. The transform is pure-emit — the source schema is untouched.

transform:
  - column: cust_id
    op:     rename
    rename: customer_id

Combines with any other op — redact + rename, hash + rename, mask + rename, cast + rename all valid.

Op: hash

Replace the column's value with a SHA-256 hash of its wire bytes, emitted as a single-quoted 64-character lowercase hex literal. The hash is deterministic — identical source values produce identical hashes — so foreign-key joins downstream still hold. No libcrypto dependency on the ripper; the SHA-256 implementation is inline.

transform:
  - column: email
    op:     hash

Captured SQL for a customer table:

INSERT INTO t_customer (id, email, ...) VALUES (
    127,
    '3d193523d3dbfb2dbe24d119ac9de893beac279385b1ca60ece693590d8b1bd9',
    ...
);

Caveat for operators: the hash covers the captured column's wire bytes, not the ASCII string the column logically holds. For VARCHAR / LVARCHAR the wire image is the value bytes verbatim, so the hash is reproducible with a plain sha256sum. For DECIMAL, DATETIME, NCHAR, and other type-padded forms the wire image includes framing (length byte, trailing NUL pad, locale conversions) so external reproduction needs to mirror that framing. The simpler operator workflow is to capture once and document the digests as the canonical reference.

Op: mask

Template-substitution rewrite. The pattern: string is emitted verbatim with substitution rules applied. Two reference forms supported by every pattern:

  • ${self} — the current column's pre-mask formatted value (with surrounding SQL quotes stripped)
  • ${col_name} — another column's post-transform value (also unquoted)

Forward references (the named column hasn't been processed yet in column order) emit the literal placeholder string ?{name}? — replay-safe SQL, operator notices the placeholder if they intended a back-reference but specified a forward one.

With match:, a POSIX Extended Regular Expression matches against the column's value and capture groups become available as ${1}${9} in the pattern. Closes the canonical "mask CC keeping last 4 digits" / "mask phone keeping area code" idioms.

transform:
  - column: email
    op:     mask
    pattern: "'masked-${fname}-at-***'"

  - column: card_number
    op:     mask
    match:  "^([0-9]{12})([0-9]{4})$"
    pattern: "'************${2}'"

  - column: phone
    op:     mask
    match:  "^\(([0-9]{3})\)"
    pattern: "'(${1}) XXX-XXXX'"

No-match fallback emits the literal NULL rather than the raw value — rows stay replayable, the operator notices the miss via captured-target audit.

Op: cast

Reformat the column's value to fit a different target type. The to: string specifies the target spec; the cast handler picks the right reformat based on the source column type and the target spec.

Source type Target spec (to:) Behaviour
VARCHAR / CHAR / NVARCHAR / NCHAR VARCHAR(N), CHAR(N), NVARCHAR(N), NCHAR(N) byte-truncate the value to N bytes; leading and trailing single-quotes preserved
DECIMAL / MONEY DECIMAL(p,s), NUMERIC(p,s), NUMBER(p,s) re-format via the Informix decimal formatter at the new scale; when total digits exceed the target precision, emit NULL with a runtime WARN naming the table.column + the value that overflowed
DATETIME TIMESTAMP(N), TIMESTAMPTZ, DATETIME YEAR TO FRACTION(N), DATETIME YEAR TO SECOND truncate fractional seconds to the target precision (N for FRACTION / TIMESTAMP, 0 for SECOND, 6 for TIMESTAMPTZ); no-op if the source is already at or below target precision
BLOB / CLOB (SQLUDTFIXED) VARCHAR(N) HEX, VARCHAR(N) BASE64 re-encode the captured X'NN…' hex literal as a plain VARCHAR; HEX strips the X prefix and truncates the inner hex; BASE64 decodes the hex to bytes and base64-encodes (padded to 4-char multiples), then truncates the result to N characters
transform:
  - column: email
    op:     cast
    to:     "VARCHAR(8)"        # byte-truncate

  - column: amount
    op:     cast
    to:     "DECIMAL(12,4)"     # re-precision via Informix dectoasc

  - column: audit_ts
    op:     cast
    to:     "TIMESTAMP(3)"      # truncate fractional seconds

  - column: payload
    op:     cast
    to:     "VARCHAR(48) BASE64"  # encode BLOB as base64 text

When the source type does not match any handler for the target spec, the row's column passes through verbatim and the ripper emits a one-time WARN naming the table, column, source type, and target spec. Subsequent rows on the same column suppress the line so a misconfiguration produces one diagnostic rather than per-row log spam.

Op: when (per-row conditional)

A when: predicate gates any transform to rows matching a column = literal equality check. The predicate is byte-for-byte case-sensitive; the operator owns the literal format the same way they own redact's value: field — quoted strings stay quoted, numerics stay bare.

transform:
  - column: email
    op:     redact
    value:  "'***'"
    when:   "country = 'EU'"      # GDPR redaction only

  - column: salary
    op:     hash
    when:   "is_executive = 1"

The predicate column's value comes from the same per-record cache that ${col} mask references use, so forward references (the predicate column comes later in column order than the transform's target column) miss the cache and the predicate never matches. The cache populates in column order — place the predicate column to the left of the transformed column in the source schema.

A malformed when: (no =) emits a stderr warning at config load and leaves the predicate empty (= unconditional, matching v1 default). No silent behaviour change.

Combining transforms

Multiple transforms on the same table are independent — each targets one column. A column can carry at most one op, plus rename, plus an optional when:. Rename always combines with any op:

transform:
  - column: email
    op:     hash
    rename: email_hash            # target column name on replay

  - column: cust_id
    op:     rename                # name-change-only
    rename: customer_id

  - column: card_number
    op:     mask
    match:  "^([0-9]{12})([0-9]{4})$"
    pattern: "'************${2}'"
    rename: card_last_four

Transforms and the dialect rewriter

The per-target dialect rewriter knows when a column carries a transform and suppresses any source-type-driven rewrite that would re-interpret the transformed value. The canonical case is a SHA-256 hash literal whose first byte happens to be 't' or 'f' — without transform-awareness the BOOLEAN- context rewriter would translate the digest's first character into TRUE or FALSE. With transforms wired through, the rewriter leaves transformed columns alone.

Where transforms run in the capture pipeline

CDC wire bytes  →  column-value decoder  →  transform
                                          →  dialect rewriter
                                          →  captured SQL  →  target

The transform runs once per captured record per configured column, inside the worker thread. No round-trip to the target, no separate process. The redacted / hashed / masked / cast value is the only value the target ever sees — there's no plaintext leak through any intermediate stage.

To discuss how Oninit ® can assist please call on +1-913-732-8892 or alternatively just send an email specifying your requirements.


You get all this for free.. think about what you get if you pay us