The ripper exposes capture health as Prometheus metrics. Drop-in provisioning files + curated dashboards ship in the package so an operator can wire the ripper into Grafana with no JSON authoring. This page covers the metric set, the install path for two datasource flavors (Prometheus or the Oninit Grafana plugin), the panels in each dashboard, and the operator playbook for the SLO signals.
An embedded HTTP server inside the ripper serves /metrics in Prometheus text-exposition format v0.0.4 and /healthz for orchestrator liveness probes. The endpoint is default-disabled: the ripper opens no listening sockets unless monitoring.prometheus.port is non-zero. Enable in the YAML:
monitoring:
prometheus:
port: 9091
bind: "0.0.0.0" # 127.0.0.1 to keep loopback-only
Default bind is 127.0.0.1 — a fresh enable carries no external attack surface. Set 0.0.0.0 to allow a Prometheus server on a different host to scrape; wrap with a reverse proxy (nginx / haproxy / Caddy) if the link crosses an untrusted network. The embedded server is plain HTTP, no TLS, no authentication — tunnel it.
Five families ship in v1, all named with the oni_logripper_ prefix per CNCF Prometheus naming convention:
Sample scrape:
$ curl -s http://ripper-host:9091/metrics
# HELP oni_logripper_build_info Constant 1 with build metadata labels.
# TYPE oni_logripper_build_info gauge
oni_logripper_build_info{version="1.0.0"} 1
# HELP oni_logripper_records_total Records emitted per worker per op.
# TYPE oni_logripper_records_total counter
oni_logripper_records_total{worker="0",op="insert"} 1234
oni_logripper_records_total{worker="0",op="update"} 567
oni_logripper_records_total{worker="0",op="delete"} 12
oni_logripper_records_total{worker="0",op="truncate"} 0
oni_logripper_records_total{worker="0",op="discard"} 0
oni_logripper_records_total{worker="1",op="insert"} 845
oni_logripper_records_total{worker="1",op="update"} 230
# HELP oni_logripper_lag_seconds Real-time capture lag in seconds.
# TYPE oni_logripper_lag_seconds gauge
oni_logripper_lag_seconds{worker="0"} 3
oni_logripper_lag_seconds{worker="1"} 2
# HELP oni_logripper_recovery_count Worker recovery attempts.
# TYPE oni_logripper_recovery_count counter
oni_logripper_recovery_count{worker="0"} 0
oni_logripper_recovery_count{worker="1"} 0
# HELP oni_logripper_worker_running 1 if worker is running, 0 otherwise.
# TYPE oni_logripper_worker_running gauge
oni_logripper_worker_running{worker="0"} 1
oni_logripper_worker_running{worker="1"} 1
/healthz returns HTTP 200 with body OK when every worker reports error == 0, otherwise HTTP 503 with body UNHEALTHY. Standard shape for orchestrator liveness probes.
# Kubernetes liveness probe
livenessProbe:
httpGet:
path: /healthz
port: 9091
initialDelaySeconds: 10
periodSeconds: 30
# systemd readiness check (via curl)
ExecStartPost=/usr/bin/curl --fail --silent http://127.0.0.1:9091/healthz
# Datadog Agent http_check
init_config:
instances:
- name: oni_ripper
url: http://ripper-host:9091/healthz
timeout: 5
The package places provisioning + dashboards under share/grafana/ (typically /usr/share/oni_ripper/grafana/ when installed via the RPM / DEB):
share/grafana/
├── provisioning/
│ ├── datasources/
│ │ ├── oni_logripper_prometheus.yaml # Prometheus DS
│ │ └── oni_logripper_oninit.yaml # Oninit Grafana plugin DS
│ └── dashboards/
│ └── oni_ripper.yaml # dashboard provider
└── dashboards/
├── oni_logripper_capture_health.json # main board
└── oni_logripper_capture_drilldown.json # per-worker drilldown
The canonical CNCF setup. A Prometheus server scrapes the ripper’s /metrics on a 15-30s cadence and rolls up across all instances. Grafana queries Prometheus.
scrape_configs:
- job_name: oni_ripper
static_configs:
- targets: ["ripper-host:9091"]
cp share/grafana/provisioning/datasources/oni_logripper_prometheus.yaml \ /etc/grafana/provisioning/datasources/
cp share/grafana/provisioning/dashboards/oni_ripper.yaml \ /etc/grafana/provisioning/dashboards/
mkdir -p /var/lib/grafana/dashboards/oni_ripper cp share/grafana/dashboards/*.json \ /var/lib/grafana/dashboards/oni_ripper/
PROMETHEUS_URL=http://prom.local:9090 systemctl restart grafana-server
For shops already running the Oninit Grafana plugin across the Oninit product family (InformixAnalyser, snooper, etc.). The plugin queries the source Informix directly via its own protocol — no separate Prometheus deployment required, and lag / DML rates surface alongside your existing Informix dashboards.
cp share/grafana/provisioning/datasources/oni_logripper_oninit.yaml \ /etc/grafana/provisioning/datasources/
ONINIT_DS_PLUGIN_TYPE=oninit-datasource # whatever the local plugin id is ONINIT_DS_URL=https://informix.local:9088 ONINIT_DS_USER=monitor ONINIT_DS_PASSWORD=<...> INFORMIXSERVER=ol_informix1410
Both can coexist. Drop both provisioning files and the dashboard’s ${DS} variable lets users pick at view time which datasource backs the panel queries.
Top-of-funnel board. Eight panels arranged for an at-a-glance read on the entire capture pipeline.
Two template variables: ${DS} picks Prometheus vs. Oninit DS; ${worker} filters every panel to a worker subset (default All). Refresh defaults to 30s, time range to now-1h.
Linked from the health board’s “Drilldown” link. Same metrics, narrowed to one $worker at a time. Five panels:
What each panel signal means and the first thing to check.
For one-off questions outside the curated dashboards. All queries below assume the Prometheus datasource.
Total inserts/sec across all workers:
sum(rate(oni_logripper_records_total{op="insert"}[1m]))
Per-worker DML mix:
sum by (worker, op) (rate(oni_logripper_records_total[5m]))
Worst-case lag across the fleet:
max(oni_logripper_lag_seconds)
Workers currently behind > 30s:
count(oni_logripper_lag_seconds > 30)
Recovery rate over the last hour:
increase(oni_logripper_recovery_count[1h])
Discards in the last 24 hours (any non-zero is a problem):
sum(increase(oni_logripper_records_total{op="discard"}[24h]))
See config.html for the YAML knobs and reference.html for the per-key reference.
To discuss how Oninit ® can assist please call on +1-913-732-8892 or alternatively just send an email specifying your requirements.
You get all this for free.. think about what you get if you pay us