Oninit Logo
The Down System Specialists
+1-913-732-8892
+44-2081-337529
Partnerships Contact

Oninit® Log Ripper — Troubleshooting

What to look at when the Ripper doesn't start, doesn't capture, falls behind, or refuses a table. Cross-reference the CDC Error Codes page for the canonical numeric ↔ symbolic mapping; this page is about how those codes show up in practice and what to do.

Pre-flight checklist

CheckHowIf missing
Source DB has logical logging onstat -d on the source server; the database flag column shows L for buffered or U for unbuffered. CDC has nothing to read — cdc_set_fullrowlogging returns CDC_E_DBNOTLOGGED. Use ondblog buf <db> or ondblog unbuf <db> on the source server.
syscdcv1 database exists echo "SELECT count(*) FROM syscdcv1:syscdcerrcodes" | dbaccess. Run the bundled syscdcv1.sql from $INFORMIXDIR/etc/ as user informix. The Ripper raises CDC_E_NOCDCDB on every cdc_opensess until this exists.
INFORMIXSERVER set, sqlhosts resolvable The wrapper at /usr/bin/oni_ripper sources /etc/oni_ripper/environment first; verify with echo $INFORMIXSERVER after sourcing. ESQL/C connection fails before any CDC call. Edit /etc/oni_ripper/environment or the YAML source.server key.
Connecting user has DBA GRANT DBA TO <user> in the source DB and on syscdcv1. cdc_opensess returns CDC_E_NOSESS. The CDC API requires DBA on both catalogs.
For ODBC: DSN works under isql isql -v <dsn> <user> <pass> on the Ripper host. If isql can't connect, the Ripper won't either. SQLSTATE IM002 = DSN missing, IM003 = driver missing.

Validate the YAML before going live

The fastest way to catch config issues is to run the connectivity test that every ripper build ships:

oni_ripper -c your_config.yml -t auto

The -t flag runs every startup phase except worker spawn — source database connect, table resolution, LSN monitor open, log-fallback pre-flight, output backend init — then prints a summary and exits cleanly. Implies -v so each step is visible. No server-side state is modified: orphan-session cleanup is skipped, no CDC sessions are opened, no DML is captured. Run it as often as you like.

On success you get:

=========================================================
  Connectivity test PASSED
=========================================================
  Source DB:           stores_demo @ ol_informix1410
  Tables resolved:     15
  Output mode:         file

  This was a dry run. No workers were spawned, no DML
  was captured, and no server-side state was modified.
  Re-run without -t to start the live capture.

On failure the test stops at the offending phase and prints which phase failed plus how to retry — e.g. for a missing target.mysql.host:

[T095] mysql: target.mysql.host is required
Error: failed to initialize output

=== Connectivity test FAILED ===
  Source DB:    OK (resolved 15 table(s))
  Output mode:  FAILED to initialize
  Edit your YAML and re-run: oni_ripper -c <config.yml> -t mysql

The <mode> argument is either auto (use the target.mode from the YAML, the common case) or any specific mode name to override target.mode for the test run only:

oni_ripper -c your_config.yml -t auto         # test as-configured
oni_ripper -c your_config.yml -t odbc         # try the ODBC target
oni_ripper -c your_config.yml -t informix     # try the direct-Informix target

Run -t first whenever a YAML is new, has just been edited, or has just been moved to a new host. Most of the entries in the two failure tables below are exactly the conditions -t catches in seconds without ever starting capture.

Startup failures

Symptom in the logCauseFix
cdc_opensess returned sid=0 (or sid < 0) Multi-worker race against syscdcsess or stale orphan sessions from a previous crash. The Ripper auto-cleans orphan sessions at startup when threading.cleanup_orphans: true (default). If it still trips, bump the verbose flag and watch the [ORPHAN] line; the fallback is a server restart of the source.
CDC error CDC_E_NOTAB (-83705) for a configured table Table dropped between config edit and startup, or typo in tables: entry. If the DDL is staged in schema_archive_dir, the Ripper logs [ARCHIVE-FALLBACK] and starts with skip_capture=1 on that table so the rest of the set still captures. Otherwise restore the table or remove the entry.
CDC error CDC_E_TABPROPERTIES (-83719) on cdc_startcapture Table carries a column type CDC rejects: TEXT, BYTE, BLOB, CLOB, SET, MULTISET, LIST, ROW, or a UDT. Default policy is "abort the whole table" ([CRITICAL] log line, skip_capture=1). Set skip_unsupported_columns: true on that table's extended entry to drop the offending column from the captured set, or use "!col" to filter explicitly.
schema drift detected at first TABSCHEMA record Live DESCRIBE doesn't match what CDC reports for the table — the YAML config is stale relative to the running schema, or the Ripper restarted after an ALTER it didn't see. Place the matching DDL in schema_archive_dir as <db>_<tab>.sql. The Ripper re-tries the parse against that layout. Without an archive the worker aborts rather than emit garbage SQL.
error reading from CDC session on ifx_lo_read Source server bounced, network blip, smart-large-object timeout. The Ripper closes the failed session, drains active tx as _NO_COMMIT incomplete files, sleeps with linear backoff, and re-opens via cdc_activatesess at last_committed_lsn. Up to 5 consecutive failures before the worker gives up. Recovery count surfaces in the per-worker [STATUS] line.

Runtime failures

SymptomCauseFix
CDC_REC_DISCARD record / [CRITICAL] discard line The Ripper fell behind log recycling on the source. Records committed during the lag window are unrecoverable. Operationally: increase logical log size on the source so there's more headroom, OR raise throughput (workers / max_recs_per_read) so the lag closes. The configured alerting.discard_command fires on every discard, rate-limited by discard_alert_min_sec.
Lag growing across [LSN] reports Output side is slower than the source's commit rate, or the Ripper is running with too few workers for the table count. See the Performance Tuning page. Quick lever: bump threading.max_recs_per_read, then threading.max_threads.
Mid-capture [CRITICAL] schema drift on a table An ALTER ran on the source and CDC emitted a fresh TABSCHEMA that no longer matches the live DESCRIBE. Drop the new DDL into schema_archive_dir or use the live release / re-acquire path: drop a release_<tab> sentinel under control_dir before the ALTER, then acquire_<tab> after.
Output file directory keeps growing File-mode output writes one file per committed tx; no auto-pruning by design. Operator's job — logrotate / tmpwatch / a periodic shipper. The Ripper will not delete output files because they may be in-flight to a downstream consumer.
ODBC target init fails on startup DSN not in /etc/odbc.ini (IM002), driver not registered (IM003), or auth wrong (28000). The ODBC connector reports the SQLSTATE plus a one-line fix hint at startup. Verify the DSN with isql -v first; see Install for the package names per distro.
[BUFFER] target offline followed by growing mem= / files= in the periodic [BUFFER] line The configured target returned a connection-class failure and the dispatcher routed the in-flight transactions into buffering.memory_max_bytes / spilled them under buffering.disk_overflow_path. Capture is still running; no data loss. Bring the target back. The probe thread retries every buffering.target_check_seconds; on success the backlog drains chronologically, you’ll see [DRAIN] complete (N tx), and the buffer returns to online.
[CRITICAL] memory + disk both full — CDC capture suspended Both buffering.memory_max_bytes and buffering.disk_max_bytes are exhausted. The state machine moved to SUSPENDED: capture pauses so the source LSN holds and no records are dropped. Either bring the target back so the buffer drains, or free space under disk_overflow_path (move spill files elsewhere; on next start they’re picked up by the recovery scan before CDC opens). Then SIGHUP / restart to lift suspension. Raise disk_max_bytes for more headroom.

Diagnostic flags

When something is misbehaving, layer the debug flags from cheap to expensive:

FlagYAML keyWhat it adds
-vverbose High-level info messages. Always the first flag to add — cheap, gives you [ARCHIVE-FALLBACK], [ACQUIRE], [RELEASE], recovery progress.
-Tdebug_trace [TRACE] function-entry lines. Useful when you suspect a hang or want to see which function gave up.
-Sdebug_sql [SQL] shows every generated statement before it's written. The fast way to spot quoting / type / WHERE-clause bugs.
-Cdebug_cdc [CDC] traces every CDC API call with its return code, plus record-type histogram.
-Ddebug_detail [DET] per-column dump — column name, type, byte offset, formatted value.
-Xdebug_hex [HEX] raw byte dump of the wire record. Last resort — verbose, only useful when cross-checking the wire format directly.

All six are SIGHUP-reloadable, so flip them on a running daemon by editing the YAML and kill -HUP $(cat /tmp/oni_ripper.pid) — no restart, no LSN gap, the [RELOAD] line confirms what changed.

To discuss how Oninit ® can assist please call on +1-913-732-8892 or alternatively just send an email specifying your requirements.


You get all this for free.. think about what you get if you pay us