Oninit® Log Ripper — Grafana & Prometheus

The ripper exposes capture health as Prometheus metrics. Drop-in provisioning files + curated dashboards ship in the package so an operator can wire the ripper into Grafana with no JSON authoring. This page covers the metric set, the install path for two datasource flavors (Prometheus or the Oninit Grafana plugin), the panels in each dashboard, and the operator playbook for the SLO signals.

The /metrics endpoint

An embedded HTTP server inside the ripper serves /metrics in Prometheus text-exposition format v0.0.4 and /healthz for orchestrator liveness probes. The endpoint is default-disabled: the ripper opens no listening sockets unless monitoring.prometheus.port is non-zero. Enable in the YAML:

monitoring:
  prometheus:
    port: 9091
    bind: "0.0.0.0"   # 127.0.0.1 to keep loopback-only

Default bind is 127.0.0.1 — a fresh enable carries no external attack surface. Set 0.0.0.0 to allow a Prometheus server on a different host to scrape; wrap with a reverse proxy (nginx / haproxy / Caddy) if the link crosses an untrusted network. The embedded server is plain HTTP, no TLS, no authentication — tunnel it.

Metric families

Five families ship in v1, all named with the oni_logripper_ prefix per CNCF Prometheus naming convention:

Sample scrape:

$ curl -s http://ripper-host:9091/metrics
# HELP oni_logripper_build_info Constant 1 with build metadata labels.
# TYPE oni_logripper_build_info gauge
oni_logripper_build_info{version="1.0.0"} 1
# HELP oni_logripper_records_total Records emitted per worker per op.
# TYPE oni_logripper_records_total counter
oni_logripper_records_total{worker="0",op="insert"} 1234
oni_logripper_records_total{worker="0",op="update"} 567
oni_logripper_records_total{worker="0",op="delete"} 12
oni_logripper_records_total{worker="0",op="truncate"} 0
oni_logripper_records_total{worker="0",op="discard"} 0
oni_logripper_records_total{worker="1",op="insert"} 845
oni_logripper_records_total{worker="1",op="update"} 230
# HELP oni_logripper_lag_seconds Real-time capture lag in seconds.
# TYPE oni_logripper_lag_seconds gauge
oni_logripper_lag_seconds{worker="0"} 3
oni_logripper_lag_seconds{worker="1"} 2
# HELP oni_logripper_recovery_count Worker recovery attempts.
# TYPE oni_logripper_recovery_count counter
oni_logripper_recovery_count{worker="0"} 0
oni_logripper_recovery_count{worker="1"} 0
# HELP oni_logripper_worker_running 1 if worker is running, 0 otherwise.
# TYPE oni_logripper_worker_running gauge
oni_logripper_worker_running{worker="0"} 1
oni_logripper_worker_running{worker="1"} 1

/healthz liveness probe

/healthz returns HTTP 200 with body OK when every worker reports error == 0, otherwise HTTP 503 with body UNHEALTHY. Standard shape for orchestrator liveness probes.

# Kubernetes liveness probe
livenessProbe:
  httpGet:
    path: /healthz
    port: 9091
  initialDelaySeconds: 10
  periodSeconds: 30

# systemd readiness check (via curl)
ExecStartPost=/usr/bin/curl --fail --silent http://127.0.0.1:9091/healthz

# Datadog Agent http_check
init_config:
instances:
  - name: oni_ripper
    url: http://ripper-host:9091/healthz
    timeout: 5

What ships in share/grafana/

The package places provisioning + dashboards under share/grafana/ (typically /usr/share/oni_ripper/grafana/ when installed via the RPM / DEB):

share/grafana/
├── provisioning/
│   ├── datasources/
│   │   ├── oni_logripper_prometheus.yaml   # Prometheus DS
│   │   └── oni_logripper_oninit.yaml       # Oninit Grafana plugin DS
│   └── dashboards/
│       └── oni_ripper.yaml                  # dashboard provider
└── dashboards/
    ├── oni_logripper_capture_health.json       # main board
    └── oni_logripper_capture_drilldown.json    # per-worker drilldown

Install — Prometheus path

The canonical CNCF setup. A Prometheus server scrapes the ripper’s /metrics on a 15-30s cadence and rolls up across all instances. Grafana queries Prometheus.

  1. Enable the ripper’s metrics endpoint (monitoring.prometheus.port non-zero, see above).
  2. Point your existing Prometheus server at the ripper:
    scrape_configs:
      - job_name: oni_ripper
        static_configs:
          - targets: ["ripper-host:9091"]
    
  3. Drop the datasource provisioning file into Grafana:
    cp share/grafana/provisioning/datasources/oni_logripper_prometheus.yaml \
       /etc/grafana/provisioning/datasources/
    
  4. Drop the dashboard provisioning manifest:
    cp share/grafana/provisioning/dashboards/oni_ripper.yaml \
       /etc/grafana/provisioning/dashboards/
    
  5. Copy the dashboard JSON to the location the manifest points at:
    mkdir -p /var/lib/grafana/dashboards/oni_ripper
    cp share/grafana/dashboards/*.json \
       /var/lib/grafana/dashboards/oni_ripper/
    
  6. Set PROMETHEUS_URL in Grafana’s environment (or edit the YAML’s url: directly), then restart Grafana:
    PROMETHEUS_URL=http://prom.local:9090
    systemctl restart grafana-server
    
  7. Open Grafana → Dashboards → Oninit Log Ripper folder. Both dashboards are now provisioned.

Install — Oninit Datasource path

For shops already running the Oninit Grafana plugin across the Oninit product family (InformixAnalyser, snooper, etc.). The plugin queries the source Informix directly via its own protocol — no separate Prometheus deployment required, and lag / DML rates surface alongside your existing Informix dashboards.

  1. Install the Oninit Grafana plugin on the Grafana host (separate deliverable from the Oninit product team; not shipped in this package).
  2. Drop the Oninit datasource provisioning file:
    cp share/grafana/provisioning/datasources/oni_logripper_oninit.yaml \
       /etc/grafana/provisioning/datasources/
    
  3. Set the plugin connection environment in Grafana:
    ONINIT_DS_PLUGIN_TYPE=oninit-datasource    # whatever the local plugin id is
    ONINIT_DS_URL=https://informix.local:9088
    ONINIT_DS_USER=monitor
    ONINIT_DS_PASSWORD=<...>
    INFORMIXSERVER=ol_informix1410
    
  4. Steps 4–7 above are identical — same dashboard manifest, same dashboard JSON, same restart.

Both can coexist. Drop both provisioning files and the dashboard’s ${DS} variable lets users pick at view time which datasource backs the panel queries.

Dashboard: Capture Health

Top-of-funnel board. Eight panels arranged for an at-a-glance read on the entire capture pipeline.

Two template variables: ${DS} picks Prometheus vs. Oninit DS; ${worker} filters every panel to a worker subset (default All). Refresh defaults to 30s, time range to now-1h.

Dashboard: Worker Drilldown

Linked from the health board’s “Drilldown” link. Same metrics, narrowed to one $worker at a time. Five panels:

Operator playbook

What each panel signal means and the first thing to check.

Ad-hoc PromQL

For one-off questions outside the curated dashboards. All queries below assume the Prometheus datasource.

Total inserts/sec across all workers:

sum(rate(oni_logripper_records_total{op="insert"}[1m]))

Per-worker DML mix:

sum by (worker, op) (rate(oni_logripper_records_total[5m]))

Worst-case lag across the fleet:

max(oni_logripper_lag_seconds)

Workers currently behind > 30s:

count(oni_logripper_lag_seconds > 30)

Recovery rate over the last hour:

increase(oni_logripper_recovery_count[1h])

Discards in the last 24 hours (any non-zero is a problem):

sum(increase(oni_logripper_records_total{op="discard"}[24h]))

Out of scope (today)

See config.html for the YAML knobs and reference.html for the per-key reference.