Oninit Logo
The Down System Specialists
+1-913-732-8892
+44-2081-337529
Partnerships Contact

Oninit® Snooper — Troubleshooting

The snoop accepts but the client times out

The forward path requires the snoop host to reach the upstream IDS on the configured port. Test from the snoop host:

nc -vz 198.51.100.42 19089

If that fails, the snoop will accept the client connection, attempt the upstream connect, log # upstream <host>:<port> unreachable: ... on its own stderr, and close the client side. The client typically sees an immediate connection close (not a hang). Fix the upstream reachability and retry.

A token logs as ONI_UNKNOWN_xxxx

The walker now sizes most server response tokens directly (ONI_DONE, ONI_COST, ONI_NFETCH, IUS-mode ONI_TUPLE, ONI_ERR, ONI_PUTERR, ONI_DESCRIBE, ONI_BBIND) so they appear with their real name and length. The remaining hold-outs are the login response token (id 0x011c) and a small handful of negotiated-state encodings whose layout the walker can't yet derive from the bytes alone. When the walker can't size a token, the snoop logs the token name — or ONI_UNKNOWN_<hex> if even the id is unrecognised — and resyncs at the next ONI_EOT.

This is benign. The forward path is unaffected; downstream PFPDUs continue to be parsed correctly. Future library releases widen sqli_pkt_token_len as wire captures ground each remaining token's layout.

# conn N dropped=K (logger fell behind) at connection close

The per-connection ring buffer overflowed during the session and K log lines were dropped. The forwarder didn't pause — it dropped the events and kept moving bytes — but the captured log is missing those entries. Common causes:

  • The output destination is on a slow filesystem (NFS, sshfs, USB stick) and can't keep up with the throughput of a busy SELECT.
  • A tail -f consumer attached to the file is paused (terminal scrolling stopped, console multiplexer suspended) and the kernel backed up the writer.
  • Disk is full or the filesystem is throttled.

Move the log to faster storage, unblock the consumer, or pair --out with --only=A,B,... to drop the noise floor so the ring keeps up. Forward bytes are never lost; only some log records were not preserved.

All round_us columns are -

round_us is populated only on response rows (dir=<). On request rows it is always -. If response rows also show -, the snoop hasn't seen a request on that connection yet — the server has spoken first (server-initiated push, which is rare in SQLI but possible).

All stmt_us columns are -

stmt_us resets on every ONI_PREPARE / ONI_COMMAND. If no PREPARE / COMMAND has been seen on the connection yet (e.g. login has just completed and no SQL has been issued), the column is -. As soon as the first PREPARE or COMMAND lands, every subsequent row carries a populated stmt_us.

The log file grows fast on a busy server

Every PFPDU produces one line. A server fielding thousands of prepared-statement executions per second writes thousands of lines per second. Three mitigations:

  • Capture only the window around the slow event; kill the snoop afterwards. The default capture-all behaviour is intentional — filtering on the wire would mean the operator misses something they didn't know to ask for.
  • Use --only=A,B,... to drop the bulk of the chatter at the snoop itself. --only=ONI_PREPARE,ONI_COMMAND,ONI_DONE,ONI_ERR,ONI_PUTERR keeps statement boundaries and errors and discards TUPLE / FETCH / EOT noise. Per-connection timing rollups still measure the full stream.
  • Or pipe the snoop log through grep --line-buffered with a tighter filter to a file; the unfiltered firehose stays on stderr but only the matching subset hits the disk.
# Inline filter — leanest:
oni_snoop --listen ... --upstream ... \
    --out /var/log/oni_snoop.log \
    --only=ONI_PREPARE,ONI_COMMAND,ONI_DONE,ONI_ERR,ONI_PUTERR

# Post-process filter — keeps the firehose for re-analysis:
oni_snoop --listen ... --upstream ... 2>&1 \
    | grep --line-buffered -E $'\tONI_(EOT|ERR|PREPARE|COMMAND|DONE)\t' \
    > snoop.log

The snoop adds noticeable latency to every query

The forward path is a plain read() / write() loop with TCP_NODELAY on the upstream socket. Per-PFPDU overhead is < 50 μs on commodity hardware. If the operator measures more than that, the most common causes are:

  • Logging to a slow disk — the snoop's stderr writes are line-buffered and serialised under one mutex. Redirect 2> to a file on a fast filesystem (or to /dev/null for a baseline measurement).
  • The upstream IDS itself is on a different network segment — the snoop adds one TCP hop, which is one round-trip-time on top of the original.
  • An overloaded snoop host — the forwarder threads run at the priority of the snoop process. Pin the snoop to a dedicated CPU set or run it on a less-loaded host.

bind: Address already in use

Either another process is already on the requested listen port, or a previous snoop run is still in TIME_WAIT on it. Wait a few seconds for the kernel to release the port, or pick a different listen port. The snoop sets SO_REUSEADDR on its listen socket, so a clean restart of the same instance binds immediately.

No connections show up after starting the snoop

Check the listen address and port:

ss -ltnp | grep 9089

The snoop should be on the line. If it's not, check the --listen argument. If it is on a specific IP and the client is reaching a different IP, the snoop never sees the connection. Bind to 0.0.0.0:<port> for any-interface accept.

For the special case of a local-loopback IDS where the IDS lives on 198.51.100.42:9089 but routes via lo, the snoop still works: bind oni_snoop --listen 198.51.100.42:<new port> and either move IDS off 9089 or have the snoop listen on a different port and use an iptables PREROUTING DNAT to redirect 9089 connections to it.

Static binary fails on an old Linux host

The snoop is built against a glibc that ELF-tags itself for GNU/Linux 3.2.0. Hosts running a 2.6 kernel or older will refuse to exec the binary. For those environments, build from source on a host with the same vintage glibc, or use a musl-libc rebuild. The same applies to extremely old Linux distributions whose kernel ABI predates the snoop's build target.

To discuss how Oninit ® can assist please call on +1-913-732-8892 or alternatively just send an email specifying your requirements.


You get all this for free.. think about what you get if you pay us