1 Commits

Author SHA1 Message Date
Nelson Yen
18082f70b4 dynstats: add opt-in state persistence; fix worker lifecycle
Operators want dynstats to survive restarts for consistent metrics and
smoother observability in containers and rolling deploys.

Before: dynstats buckets were ephemeral; restarts reset counters.
After: optional on-disk persistence restores counters; worker thread is
started on demand and torn down with the owning rsconf.

Impact: New state files under WorkDirectory (or statefile.directory)
when enabled; slight I/O overhead on configured thresholds. Defaults
preserve previous behavior (persistence off).

This adds two thresholds to trigger persistence:
- persistStateInterval (count-based) and persistStateTimeInterval
  (time-based), both default 0 (disabled). A new statefile.directory
  can override WorkDirectory for dynstats files.
On bucket creation, existing JSON state ("dynstats-state:<bucket>")
is loaded to rehydrate counters. Updates may enqueue async writes to a
lazily-started file-write worker; teardown performs a final sync flush
without holding the bucket lock to avoid I/O-induced deadlocks.
Worker lifecycle is tied to rsconf: init in dynstats_initCnf(),
start on first persistent bucket, stop in dynstats_destroyAllBuckets().
The latter now takes rsconf_t* and is invoked from rsconf destruct,
avoiding prior hangs when loadConf/runConf differed. Per-bucket stats
track flushed bytes/counts/errors; a "file-write-worker" group reports
queue size/enqueues. Docs updated; tests add dynstats-persist(+vg) to
verify restore-after-restart and clean shutdown.

With the help of AI Agents: GitHub Copilot, cubic-dev-ai, ChatGPT codex

Co-authored-by: Rainer Gerhards <rgerhards@adiscon.com>
2026-01-20 10:56:28 +01:00