Why:
omrelp keepalive settings were accepted but could be compiled out
because configure never defined HAVE_RELPCLTSETKEEPALIVE.
Impact: keepalive settings are now applied when librelp exports
the keepalive API.
Before/After: before keepalive options could be ignored silently;
after they are compiled in when supported by librelp.
Technical Overview:
- Add an AC_CHECK_FUNC probe for relpCltSetKeepAlive in the RELP
configure block.
- Define HAVE_RELPCLTSETKEEPALIVE when the symbol is available.
- This aligns configure-time feature detection with the existing
omrelp compile-time guard around relpCltSetKeepAlive().
With the help of AI-Agents: GPT-5.3-Codex
Why:
The de-flake campaign exposed a real imptcp race in the
processOnPoller="off" path under Ubuntu 26 TSAN. Multiple helper
workers could process one session concurrently and race on parser state.
Impact:
Fixes imptcp helper-worker session handling without reducing test scope.
Before/After:
Before, helper workers could race on one session; after, one worker owns
session processing, close, and rearm at a time.
Technical Overview:
Add a per-session queued-work flag protected by rsyslog's atomic helper.
Claim session epoll work before queueing it to helper workers.
Serialize receive parsing, zlib finish, session close, and epoll rearm.
Drop duplicate same-session events while already queued or processing.
Release the work claim before rearming the EPOLLONESHOT descriptor so a
fresh event cannot be lost behind the processing guard.
Avoid holding a pthread mutex across recv(), which would both hurt the
hot path and trip the clang static analyzer's blocking-in-critical-section
check.
Keep listener work concurrent and preserve helper parallelism across
independent sessions.
Document the non-processing-poller test intent and oracle.
With the help of AI-Agents: Codex
Why:
Regular PR CI should avoid waking long-running service-backed tests when a
change only touches unrelated helper code. Kafka, imfile, and Elasticsearch
are frequent long-tail costs, so they need focused relevance gates without
weakening full CI and flake-testing workflows.
Impact:
PR CI omits Kafka, imfile, and Elasticsearch tests for unrelated helper-only
changes, while direct module/test changes and plausible shared runtime paths
still run those families. Local CI-container runs can apply the same
relevance policy before devtools/run-ci.sh.
Before/After:
Before, broad runtime patterns made these expensive families run too often;
after, they use explicit focused dependency rules with full-run overrides.
Technical Overview:
Move the remaining root-level runtime C/H files under runtime/ so path-based
rules can reason about core code consistently. Keep conservative broad
relevance for service families that do not yet have focused dependency
rules. Add focused relevance for Kafka, imfile, and Elasticsearch covering
module paths, tests, build/testbench plumbing, config/message/action/queue,
worker, template, ruleset, parser, stats, and selected family-specific
runtime helpers. Keep isolated helpers such as lookup tables, dynstats, DNS
cache, crypto/KSI, GSSAPI, and unrelated protocol helpers from waking those
families. Add devtools/apply-service-relevance.sh so GitHub Actions and local
container testing share the same relevance-to-configure suppression logic.
Centralize Elasticsearch and Kafka job decisions on the top-level
change-scope outputs so scheduled jobs always run their test body. Preserve
RSYSLOG_TESTBENCH_FORCE_SERVICE_TESTS,
RSYSLOG_TESTBENCH_FORCE_<MODULE>_TESTS, and
RSYSLOG_TESTBENCH_SKIP_SERVICE_RELEVANCE so daily, weekly, and flake runs
can still force all tests even when there are no relevant changes. Document
that AI agents must validate both the relevance decision layer and the
resulting configured test list when changing these gates.
Validation:
bash -n tests/diag.sh devtools/apply-service-relevance.sh
git diff --check
actionlint .github/workflows/run_checks.yml
shellcheck -S warning devtools/apply-service-relevance.sh
module-needs-testing rule matrix for kafka, imfile, elasticsearch, mysql
Temporary git-diff probes for runtime/lookup.c and runtime/action.c
Source helper checks for runtime/lookup.c and runtime/action.c
Ubuntu 26.04 container make distclean plus MOCK-OK run-ci for runtime/lookup.c
With the help of AI-Agents: Codex
Why:
The de-flake campaign exposed a get_free_port race in the OTEL
collector test helper. A parallel test could claim the selected port
before otelcol bound it, while readiness checks still connected to the
wrong service.
Impact:
Makes OTEL-backed tests publish only collector-owned listener ports.
Before/After:
Before, OTEL tests preselected a racy port; after, otelcol binds
localhost port 0 and the testbench discovers the owned OTLP listener.
Technical Overview:
Configure the OTEL collector receiver and metrics endpoint with
localhost dynamic ports by default.
Start otelcol with exec so the stored PID owns the listener sockets.
Discover the actual OTLP HTTP port from /proc socket ownership and a
/v1/logs probe.
Write the test port file only after discovery and readiness succeed.
Keep explicit nonzero OTEL_COLLECTOR_ENDPOINT overrides working.
Move the discovery logic into an in-tree Python helper so normal Python
linting can inspect it.
Register the helper in EXTRA_DIST.
With the help of AI-Agents: Codex
closes https://github.com/rsyslog/rsyslog/issues/6017
Why:
The internal offAfterPRI field tracks the offset in raw messages
immediately after the PRI. This was inconsistently calculated across
modules (e.g. imuxsock omitted the closing '>') and was prone to
parsing invalid strings (e.g. '<>') as valid PRI offsets. This
caused misalignments and potential out-of-bounds risks in downstream
parser modules.
Impact:
Stabilizes syslog parsing; downstream modules consistently receive
accurate raw message text.
Before/After:
offAfterPRI was inconsistently calculated or misaligned on
malformed/special inputs; now it is centrally validated and correct.
Technical Overview:
Extracted the PRI offset logic into a strict static helper
compute_off_after_pri in runtime/parser.c to parse 1..3 digits
between '<' and '>'. Refactored ParsePRI to use this helper. Enhanced
MsgSetAfterPRIOffs in runtime/msg.c with defensive assertions to
validate offsets and enclosing brackets. Updated the legacy imuxsock
parser to set the correct offs + 1 offset when the closing '>' is
present. Created a pure C unit test checking 10 distinct
RFC3164/RFC5424 corner cases.
With the help of AI-Agents: Antigravity
* imfifo: implement named pipe input module
Why:
Allows rsyslog to read logs line-by-line from local POSIX named pipes
(FIFOs) without blocking the startup sequence or spinning on EOF
disconnect loops.
Impact:
Adds the 'imfifo' input module and registers its test suite.
Before/After:
Rsyslog had no native named pipe input capability; now imfifo
provides dynamic, non-blocking FIFO input instances.
Technical Overview:
- Integrated imfifo into the autotools build system with
--enable-imfifo.
- Implemented plugins/imfifo/imfifo.c using the modern v6 config
syntax.
- Used open(path, O_RDWR) to keep a dummy writer, avoiding
startup hangs and EOF reopen loops.
- Implemented select-polling loop with 100ms timeout for
clean, quick shut down responses.
- Splitted incoming chunks by newline, submitting complete
messages using submitMsg2.
- Created tests/imfifo.sh and tests/imfifo-vg.sh to verify
correct function and Valgrind compatibility.
closes https://github.com/rsyslog/rsyslog/issues/440
With the help of AI-Agents: Antigravity
* tests: widen service relevance defaults
Why: Service-backed tests were skipped for broad, non-module edits that\ncan still affect service integrations.\n\nImpact: Elasticsearch, MySQL/libdbi, and Kafka setup paths run for\nshared core, build, workflow, and testbench changes.\n\nBefore/After: Before, only runtime and a narrow allow-list triggered\nservice tests; after, common cross-cutting edits also trigger them.\n\nTechnical Overview: Extend the generic module_needs_testing()\nchanged-file gate in tests/diag.sh.\nTreat top-level C/H changes as globally relevant because they include\nshared engine files such as action.c/template.c.\nTreat build and CI metadata updates (.mk, m4, workflows) as relevant\nso service jobs selected by CI do not self-skip prematurely.\nTreat testbench shell/testsuites edits as relevant because service\norchestration and service-specific assertions live under tests/.\nKeep module-specific path matching unchanged for targeted triggering.\n\nWith the help of AI-Agents: GPT-5.3-Codex
Why:
The mmjsonparse find-json ownership fix is already present via PR #7016,
but the conflict-container path still needs explicit regression coverage.
Impact:
Adds focused normal and Valgrind testbench coverage for msgAddJSON failure
after mmjsonparse hands off a parsed JSON object.
Before/After:
Before, the negative path relied on manual reasoning and broad coverage.
After, the testbench asserts rsyslog continues processing the trigger
message, and the Valgrind wrapper checks that the parsed object is not
released twice.
Technical Overview:
1. Add mmjsonparse-find-json-conflict.sh for the conflicting-container path.
2. Add a Valgrind wrapper for the same scenario.
3. Register both tests in tests/Makefile.am.
With the help of AI-Agents: Codex
Why:
da-mainmsg-q is meant to exercise disk-assisted main queue draining,
but its diagnostic injector could overrun the deliberately tiny queue
under CI stress. That made the test report message loss before it had
actually isolated the DA queue behavior it intends to verify.
Impact:
Reduces da-mainmsg-q flakes without weakening the tested DA queue oracle.
Before/After:
Before, imdiag injected a 2000-message burst as non-delayable traffic;
after, the burst participates in queue flow control and the final output
count is observed before shutdown.
Technical Overview:
Set RSTB_IMDIAG_INJECT_DELAY_MODE=full before generate_conf so imdiag
marks generated messages as fully delayable. This keeps the test's small
queue configuration intact while avoiding diagnostic-input loss as a side
effect of the stress setup.
The test still verifies the complete sequence 0..2099 after forcing DA
mode. It now also waits for the final 2100 output lines after the post-DA
recovery burst, so shutdown is not used as a substitute for the omfile
output oracle.
The header comment was updated to document the setup, stimulus, oracle,
and why the injection mode is part of the test plumbing rather than the
behavior under test.
With the help of AI-Agents: OpenAI Codex
Why: percentile stats buckets should become visible only after they are fully initialized.\n\nImpact: avoids publishing partially initialized buckets and removes redundant teardown unlinking.\n\nBefore/After: before, bucket construction linked into the list before stats registration; after, list insertion is the final locked step.\n\nTechnical Overview:\nMove perctile bucket insertion after stats object and metric setup complete.\nProtect the list mutation with the existing perctile bucket-list rwlock.\nRemove the helper that unlinked buckets during destruction, as failed setup never publishes buckets and normal teardown already pops buckets before destruction.\n\nWith the help of AI-Agents: GPT-5.5
Co-authored-by: Andre Lorbach <alorbach@adiscon.com>
Why: keep the percentile stats teardown cleanup path consistent and easier to audit.\n\nImpact: no behavior change; cleanup code now uses the existing helper.\n\nBefore/After: before, one error path manually destroyed counters; after, it uses the shared helper.\n\nTechnical Overview:\nUse perctileDestroyCounter() in the perctileAddBucketMetrics() finalize path.\nThis keeps reference clearing in one helper and mirrors dynstats cleanup style.\nThe helper still handles NULL references and clears each counter handle after destruction.\n\nWith the help of AI-Agents: GPT-5.5
Co-authored-by: Andre Lorbach <alorbach@adiscon.com>
Why: stats-enabled reloads and shutdowns must not hang while tearing down dynamic statistics buckets.\n\nImpact: percentile and dynstats teardown no longer inverts locks with impstats scrapes.\n\nBefore/After: before, teardown could hold bucket locks while waiting for the global stats list mutex; after, stats objects are detached before bucket storage is freed.\n\nTechnical Overview:\nDetach per-bucket stats objects from the global stats registry before taking bucket teardown locks.\nUnlink counter lists before stats object destruction so backing counter handles can be released after registry removal.\nPop buckets from the configured bucket list one at a time, releasing the list lock before object teardown.\nTrack percentile bucket parent state so global counter handles can be released correctly after the global stats object is detached.\nReuse helper cleanup paths to clear counter references and avoid double destruction after partial setup failures.\n\nWith the help of AI-Agents: GPT-5.5
Co-authored-by: Andre Lorbach <alorbach@adiscon.com>
Wrap tcpsrv/tcps_sess objUse/objRelease of lmregexp in FEATURE_REGEXP guards.\n\nThis prevents unconditional lmregexp dependency in regexp-disabled builds while preserving existing regex framing behavior when FEATURE_REGEXP is enabled.
Why: Fix reachable unload cleanup corruption in regex cache teardown.
Impact: Prevents module-unload crashes when per-thread cache remains.
Before/After: Before, class-exit could free perthread entries twice; after, each entry is freed once.
Technical Overview:
- perthread_regexs stores the same pointer as key and value.
- hashtable_destroy(..., 1) frees both key and value.
- That pattern double-frees perthread_regex_t entries on class exit.
- Destroy perthread_regexs with free_values=0 so only keys are freed.
- Keep regex_to_uncomp destruction unchanged because key/value differ.
With the help of AI-Agents: Codex