20848 Commits

Author SHA1 Message Date
Rainer Gerhards
2d44870958
Merge pull request #7080 from rgerhards/ai-fix-lookup-table-flakes
lookup: report active table reloads as pending
2026-05-28 08:27:21 +02:00
Rainer Gerhards
0a1cf045d5
Merge pull request #7093 from rsyslog/codex/propose-fix-for-fromhost-port-vulnerability
runtime: fix fromhost-port extraction before rcvFrom union swap
2026-05-28 08:05:18 +02:00
Rainer Gerhards
12a5ccfc21
Merge pull request #7074 from rsyslog/codex/propose-fix-for-retry-wrapper-issue
tests: clear stale retry fail marker
2026-05-28 07:58:38 +02:00
Rainer Gerhards
6e23c9655c
Merge pull request #7077 from rsyslog/codex/fix-omusrmsg-rate-limiter-thread-safety
omusrmsg: enable thread-safe action ratelimiter
2026-05-28 07:57:06 +02:00
Rainer Gerhards
37b83722eb runtime: clean up DNS props on port allocation failure 2026-05-27 20:44:59 +02:00
Rainer Gerhards
3c5bc56efe runtime: fix fromhost-port extraction before rcvFrom union swap 2026-05-27 19:06:10 +02:00
Rainer Gerhards
6157fb6d69
Merge pull request #7075 from rsyslog/ai-explore-mysql-flakes
tests: avoid mysql action queue timeout flakes
2026-05-27 17:02:39 +02:00
Rainer Gerhards
5b7197b2dd tests: suppress retry marker for intermediate attempts 2026-05-27 17:00:17 +02:00
Rainer Gerhards
974e345de4 runtime: initialize ratelimit mutex once 2026-05-27 17:00:17 +02:00
Rainer Gerhards
da0b5b8baa
lookup: report active table reloads as pending
Why:
The de-flake campaign exposed lookup-table reload tests that could resume
after HUP while the reload worker was still installing the replacement
table. The wait helper saw no pending reload and injected messages against
stale or stub lookup state.

Impact:
Lookup reload waiters now observe the full reload lifecycle.

Before/After:
Before, only queued reload requests were pending; after, an active reload
also remains pending until the table swap completes.

Technical Overview:
Track the interval after the reloader consumes do_reload but before
lookupDoReload() returns. lookupPendingReloadCount() now treats that
interval as pending, so imdiag's AwaitLookupTableReload command cannot
return while a reload is still applying. Initialize and clear the new state
alongside the existing reloader flags to keep startup, activation, and
shutdown state consistent.

Validation:
- ./autogen.sh --enable-debug --enable-testbench --enable-imdiag --enable-omstdout
- make -j$(nproc) check TESTS=""
- ./tests/array_lookup_table.sh
- ./tests/lookup_table_bad_configs.sh
- git diff --check
- Ubuntu 26.04 dev container focused lookup reload run, 9/9 pass:
  array_lookup_table.sh array_lookup_table-vg.sh
  array_lookup_table_misuse-vg.sh lookup_table_bad_configs.sh
  lookup_table_bad_configs-vg.sh lookup_table_rscript_reload.sh
  lookup_table_rscript_reload-vg.sh
  lookup_table_rscript_reload_without_stub.sh
  lookup_table_rscript_reload_without_stub-vg.sh

With the help of AI-Agents: OpenAI Codex
2026-05-27 16:16:54 +02:00
Rainer Gerhards
9378fffaec omusrmsg: enable thread-safe action ratelimiter 2026-05-27 15:55:02 +02:00
Rainer Gerhards
1a33aa202c
tests: relax mysql action queue enqueue timeout
Why:
The MySQL action queue tests validate lossless ommysql delivery under
bounded queue pressure. Recent flake evidence showed the fixed 30000ms
enqueue timeout becoming the effective oracle when CI load or the local
MySQL server delayed draining.

Impact:
The tests allow a slower stressed MySQL service to drain queued bursts,
but still fail if the action queue cannot make progress within 80s.

Before/After:
Before, the tests could drop one message after a 30000ms enqueue wait.
After, they use an 80000ms CI tolerance budget and still pass only when
the final MySQL sequence is complete.

Technical Overview:
Keep the queue size and worker-thread settings that exercise bounded
multi-worker action queue behavior. Raise queue.timeoutEnqueue to 80000ms
rather than removing the timeout entirely, so persistent database stalls
or test plumbing problems remain visible. Add head comments documenting
each test's invariant, stimulus, oracle, and why the timeout is a bounded
CI tolerance rather than the behavior under test.

Validation:
- bash -n tests/mysql-actq-mt.sh tests/mysql-actq-mt-withpause.sh
- git diff --check

With the help of AI-Agents: Codex
2026-05-27 15:48:42 +02:00
Rainer Gerhards
1b3fe4cdc8 tests: clear stale retry fail marker
Why:
The retry wrapper can inherit a failure marker left by the first
attempt. That marker can block the second attempt when abort-all is
enabled and can leave stale failure state after a successful retry.

Impact:
Retry behavior is now deterministic and does not leak stale testbench
failure state across attempts.

Before/After:
Before: retries could short-circuit or leave false failure artifacts.
After: retries clear marker state before rerun and on success.

Technical Overview:
The wrapper now removes testbench_test_failed_rsyslog when a retry
attempt succeeds.
It also removes the same marker after a failed attempt when another
attempt remains.
This preserves the intended retry flow while keeping final-failure
reporting unchanged on the last attempt.
The change is scoped to the test wrapper and does not modify runtime
code paths.

With the help of AI-Agents: GPT-5.3-Codex
2026-05-27 15:45:33 +02:00
Rainer Gerhards
ea8589dfdf
Merge pull request #7073 from rsyslog/codex/fix-documentation-for-sd_id-parameter
doc: correct mmrfc5424addhmac SD-ID parameter spelling
2026-05-27 15:43:19 +02:00
Rainer Gerhards
212133bb1d
Merge pull request #7058 from rsyslog/codex/propose-fix-for-regexp-unload-double-free
runtime: avoid regexp unload double free
2026-05-27 15:40:09 +02:00
Rainer Gerhards
cc9f436e9f
Merge pull request #7071 from rsyslog/codex/fix-omrelp-keepalive-compilation-issue
configure: detect relpCltSetKeepAlive support
2026-05-27 15:38:45 +02:00
Rainer Gerhards
6a4feaca75
Merge pull request #7072 from rsyslog/codex/fix-omclickhouse-enable-flag-in-dockerfile
docker: fix ClickHouse configure flag in Debian 13 dev image
2026-05-27 15:32:17 +02:00
Rainer Gerhards
265f0cb2c2
Merge pull request #7064 from rsyslog/ai-fix-imptcp-nonprocessing-poller
imptcp: serialize helper work per session
2026-05-27 15:31:12 +02:00
Rainer Gerhards
3fddf8db8f doc: fix mmrfc5424addhmac sd_id parameter name 2026-05-27 15:30:04 +02:00
Rainer Gerhards
bb9dc79120
Merge pull request #7068 from rgerhards/ai-ci-relevance-gates
ci: gate expensive PR test families
2026-05-27 14:49:14 +02:00
Rainer Gerhards
fea61cc350 docker: fix clickhouse configure flag in Debian 13 dev image 2026-05-27 14:02:19 +02:00
Rainer Gerhards
5164242c94
Merge pull request #7069 from rgerhards/codex/i4199-ommysql-disconnect
ommysql: guard transaction commit after disconnect
2026-05-27 13:58:12 +02:00
Rainer Gerhards
e02fd0853d runtime: guard per-thread regexp iterator allocation 2026-05-27 13:35:13 +02:00
Rainer Gerhards
f1f34f0460
Merge pull request #7065 from rsyslog/ai-otel-port0-probe
testbench: bind OTEL collector on dynamic port
2026-05-27 13:27:45 +02:00
Rainer Gerhards
ffafda3b55 configure: detect relpCltSetKeepAlive support
Why:

omrelp keepalive settings were accepted but could be compiled out

because configure never defined HAVE_RELPCLTSETKEEPALIVE.

Impact: keepalive settings are now applied when librelp exports

the keepalive API.

Before/After: before keepalive options could be ignored silently;

after they are compiled in when supported by librelp.

Technical Overview:

- Add an AC_CHECK_FUNC probe for relpCltSetKeepAlive in the RELP

  configure block.

- Define HAVE_RELPCLTSETKEEPALIVE when the symbol is available.

- This aligns configure-time feature detection with the existing

  omrelp compile-time guard around relpCltSetKeepAlive().

With the help of AI-Agents: GPT-5.3-Codex
2026-05-27 13:24:10 +02:00
Rainer Gerhards
acea40e0c4 imptcp: serialize helper work per session
Why:
The de-flake campaign exposed a real imptcp race in the
processOnPoller="off" path under Ubuntu 26 TSAN. Multiple helper
workers could process one session concurrently and race on parser state.

Impact:
Fixes imptcp helper-worker session handling without reducing test scope.

Before/After:
Before, helper workers could race on one session; after, one worker owns
session processing, close, and rearm at a time.

Technical Overview:
Add a per-session queued-work flag protected by rsyslog's atomic helper.
Claim session epoll work before queueing it to helper workers.
Serialize receive parsing, zlib finish, session close, and epoll rearm.
Drop duplicate same-session events while already queued or processing.
Release the work claim before rearming the EPOLLONESHOT descriptor so a
fresh event cannot be lost behind the processing guard.
Avoid holding a pthread mutex across recv(), which would both hurt the
hot path and trip the clang static analyzer's blocking-in-critical-section
check.
Keep listener work concurrent and preserve helper parallelism across
independent sessions.
Document the non-processing-poller test intent and oracle.

With the help of AI-Agents: Codex
2026-05-27 13:20:25 +02:00
Rainer Gerhards
83807078ce
Merge pull request #7037 from rgerhards/antigravity-i-6017
parser: verify and fix offAfterPRI calculation
2026-05-27 13:18:11 +02:00
Rainer Gerhards
9c83614ce4
Merge pull request #7057 from rsyslog/codex/propose-fix-for-addlf-signature-vulnerability
runtime: bound KSI debug record logging by length
2026-05-27 13:17:19 +02:00
Rainer Gerhards
1e80f70356 ci: gate expensive PR test families
Why:
Regular PR CI should avoid waking long-running service-backed tests when a
change only touches unrelated helper code. Kafka, imfile, and Elasticsearch
are frequent long-tail costs, so they need focused relevance gates without
weakening full CI and flake-testing workflows.

Impact:
PR CI omits Kafka, imfile, and Elasticsearch tests for unrelated helper-only
changes, while direct module/test changes and plausible shared runtime paths
still run those families. Local CI-container runs can apply the same
relevance policy before devtools/run-ci.sh.

Before/After:
Before, broad runtime patterns made these expensive families run too often;
after, they use explicit focused dependency rules with full-run overrides.

Technical Overview:
Move the remaining root-level runtime C/H files under runtime/ so path-based
rules can reason about core code consistently. Keep conservative broad
relevance for service families that do not yet have focused dependency
rules. Add focused relevance for Kafka, imfile, and Elasticsearch covering
module paths, tests, build/testbench plumbing, config/message/action/queue,
worker, template, ruleset, parser, stats, and selected family-specific
runtime helpers. Keep isolated helpers such as lookup tables, dynstats, DNS
cache, crypto/KSI, GSSAPI, and unrelated protocol helpers from waking those
families. Add devtools/apply-service-relevance.sh so GitHub Actions and local
container testing share the same relevance-to-configure suppression logic.
Centralize Elasticsearch and Kafka job decisions on the top-level
change-scope outputs so scheduled jobs always run their test body. Preserve
RSYSLOG_TESTBENCH_FORCE_SERVICE_TESTS,
RSYSLOG_TESTBENCH_FORCE_<MODULE>_TESTS, and
RSYSLOG_TESTBENCH_SKIP_SERVICE_RELEVANCE so daily, weekly, and flake runs
can still force all tests even when there are no relevant changes. Document
that AI agents must validate both the relevance decision layer and the
resulting configured test list when changing these gates.

Validation:
bash -n tests/diag.sh devtools/apply-service-relevance.sh
git diff --check
actionlint .github/workflows/run_checks.yml
shellcheck -S warning devtools/apply-service-relevance.sh
module-needs-testing rule matrix for kafka, imfile, elasticsearch, mysql
Temporary git-diff probes for runtime/lookup.c and runtime/action.c
Source helper checks for runtime/lookup.c and runtime/action.c
Ubuntu 26.04 container make distclean plus MOCK-OK run-ci for runtime/lookup.c

With the help of AI-Agents: Codex
2026-05-27 12:46:58 +02:00
Rainer Gerhards
9ccb1dad14 testbench: bind OTEL collector on dynamic port
Why:
The de-flake campaign exposed a get_free_port race in the OTEL
collector test helper. A parallel test could claim the selected port
before otelcol bound it, while readiness checks still connected to the
wrong service.

Impact:
Makes OTEL-backed tests publish only collector-owned listener ports.

Before/After:
Before, OTEL tests preselected a racy port; after, otelcol binds
localhost port 0 and the testbench discovers the owned OTLP listener.

Technical Overview:
Configure the OTEL collector receiver and metrics endpoint with
localhost dynamic ports by default.
Start otelcol with exec so the stored PID owns the listener sockets.
Discover the actual OTLP HTTP port from /proc socket ownership and a
/v1/logs probe.
Write the test port file only after discovery and readiness succeed.
Keep explicit nonzero OTEL_COLLECTOR_ENDPOINT overrides working.
Move the discovery logic into an in-tree Python helper so normal Python
linting can inspect it.
Register the helper in EXTRA_DIST.

With the help of AI-Agents: Codex
2026-05-27 12:46:58 +02:00
Rainer Gerhards
107446947a ommysql: handle closed connection before commit
closes https://github.com/rsyslog/rsyslog/issues/4199
2026-05-27 12:43:45 +02:00
Rainer Gerhards
10fe8ba585 parser: verify and fix offAfterPRI calculation
closes https://github.com/rsyslog/rsyslog/issues/6017

Why:
The internal offAfterPRI field tracks the offset in raw messages
immediately after the PRI. This was inconsistently calculated across
modules (e.g. imuxsock omitted the closing '>') and was prone to
parsing invalid strings (e.g. '<>') as valid PRI offsets. This
caused misalignments and potential out-of-bounds risks in downstream
parser modules.

Impact:
Stabilizes syslog parsing; downstream modules consistently receive
accurate raw message text.

Before/After:
offAfterPRI was inconsistently calculated or misaligned on
malformed/special inputs; now it is centrally validated and correct.

Technical Overview:
Extracted the PRI offset logic into a strict static helper
compute_off_after_pri in runtime/parser.c to parse 1..3 digits
between '<' and '>'. Refactored ParsePRI to use this helper. Enhanced
MsgSetAfterPRIOffs in runtime/msg.c with defensive assertions to
validate offsets and enclosing brackets. Updated the legacy imuxsock
parser to set the correct offs + 1 offset when the closing '>' is
present. Created a pure C unit test checking 10 distinct
RFC3164/RFC5424 corner cases.

With the help of AI-Agents: Antigravity
2026-05-27 12:16:25 +02:00
Rainer Gerhards
26627f5a35
imfifo: implement named pipe input module (#7029)
* imfifo: implement named pipe input module

Why:
Allows rsyslog to read logs line-by-line from local POSIX named pipes
(FIFOs) without blocking the startup sequence or spinning on EOF
disconnect loops.

Impact:
Adds the 'imfifo' input module and registers its test suite.

Before/After:
Rsyslog had no native named pipe input capability; now imfifo
provides dynamic, non-blocking FIFO input instances.

Technical Overview:
- Integrated imfifo into the autotools build system with
  --enable-imfifo.
- Implemented plugins/imfifo/imfifo.c using the modern v6 config
  syntax.
- Used open(path, O_RDWR) to keep a dummy writer, avoiding
  startup hangs and EOF reopen loops.
- Implemented select-polling loop with 100ms timeout for
  clean, quick shut down responses.
- Splitted incoming chunks by newline, submitting complete
  messages using submitMsg2.
- Created tests/imfifo.sh and tests/imfifo-vg.sh to verify
  correct function and Valgrind compatibility.

closes https://github.com/rsyslog/rsyslog/issues/440

With the help of AI-Agents: Antigravity
2026-05-27 11:47:21 +02:00
Rainer Gerhards
c568975546
tests: widen service relevance defaults (#7055)
* tests: widen service relevance defaults

Why: Service-backed tests were skipped for broad, non-module edits that\ncan still affect service integrations.\n\nImpact: Elasticsearch, MySQL/libdbi, and Kafka setup paths run for\nshared core, build, workflow, and testbench changes.\n\nBefore/After: Before, only runtime and a narrow allow-list triggered\nservice tests; after, common cross-cutting edits also trigger them.\n\nTechnical Overview: Extend the generic module_needs_testing()\nchanged-file gate in tests/diag.sh.\nTreat top-level C/H changes as globally relevant because they include\nshared engine files such as action.c/template.c.\nTreat build and CI metadata updates (.mk, m4, workflows) as relevant\nso service jobs selected by CI do not self-skip prematurely.\nTreat testbench shell/testsuites edits as relevant because service\norchestration and service-specific assertions live under tests/.\nKeep module-specific path matching unchanged for targeted triggering.\n\nWith the help of AI-Agents: GPT-5.3-Codex
2026-05-27 11:45:05 +02:00
cursor[bot]
a1611a71ab
tests: cover mmjsonparse find-json conflict path
Why:
The mmjsonparse find-json ownership fix is already present via PR #7016,
but the conflict-container path still needs explicit regression coverage.

Impact:
Adds focused normal and Valgrind testbench coverage for msgAddJSON failure
after mmjsonparse hands off a parsed JSON object.

Before/After:
Before, the negative path relied on manual reasoning and broad coverage.
After, the testbench asserts rsyslog continues processing the trigger
message, and the Valgrind wrapper checks that the parsed object is not
released twice.

Technical Overview:
1. Add mmjsonparse-find-json-conflict.sh for the conflicting-container path.
2. Add a Valgrind wrapper for the same scenario.
3. Register both tests in tests/Makefile.am.

With the help of AI-Agents: Codex
2026-05-27 11:24:33 +02:00
Rainer Gerhards
b4e306ca1b
Merge pull request #7062 from rsyslog/codex/propose-fix-for-dynstats-persistence-vulnerability
dynstats: coalesce pending persistence writes per bucket
2026-05-27 10:56:34 +02:00
Rainer Gerhards
3bb845b37f
Merge pull request #7063 from rsyslog/codex/fix-nul-byte-certificate-impersonation-vulnerability
runtime: reject embedded NULs in mbedtls certificate names
2026-05-27 10:55:15 +02:00
Rainer Gerhards
e594341051
Merge pull request #7067 from rgerhards/ai-fix-da-mainmsg-q-flowctl
testbench: fix da-mainmsg-q flow control
2026-05-27 10:50:38 +02:00
Rainer Gerhards
de69e1859b
Merge pull request #7066 from rsyslog/codex/fix-man-page-sigint-shutdown-option
doc: fix invalid SIGINT config parameter spelling in rsyslogd.8
2026-05-27 10:47:51 +02:00
Rainer Gerhards
7969e3ec95
testbench: fix da-mainmsg-q flow control
Why:
da-mainmsg-q is meant to exercise disk-assisted main queue draining,
but its diagnostic injector could overrun the deliberately tiny queue
under CI stress. That made the test report message loss before it had
actually isolated the DA queue behavior it intends to verify.

Impact:
Reduces da-mainmsg-q flakes without weakening the tested DA queue oracle.

Before/After:
Before, imdiag injected a 2000-message burst as non-delayable traffic;
after, the burst participates in queue flow control and the final output
count is observed before shutdown.

Technical Overview:
Set RSTB_IMDIAG_INJECT_DELAY_MODE=full before generate_conf so imdiag
marks generated messages as fully delayable. This keeps the test's small
queue configuration intact while avoiding diagnostic-input loss as a side
effect of the stress setup.

The test still verifies the complete sequence 0..2099 after forcing DA
mode. It now also waits for the final 2100 output lines after the post-DA
recovery burst, so shutdown is not used as a substitute for the omfile
output oracle.

The header comment was updated to document the setup, stimulus, oracle,
and why the injection mode is part of the test plumbing rather than the
behavior under test.

With the help of AI-Agents: OpenAI Codex
2026-05-27 10:31:02 +02:00
Rainer Gerhards
723d50dd48 doc: fix shutdown.enable.ctlc spelling in rsyslogd man page 2026-05-27 09:51:32 +02:00
Rainer Gerhards
1662279a00
Merge pull request #7048 from rsyslog/codex/fix-ocsp-cache-stale-response-vulnerability
ossl: avoid caching stale OCSP responses when nextUpdate is not future
2026-05-27 09:39:14 +02:00
Rainer Gerhards
fe0834698d
Merge pull request #7046 from rsyslog/codex/fix-double-free-in-impstats-batching
impstats: fix double-free in remote write batching cleanup
2026-05-27 09:37:46 +02:00
Rainer Gerhards
eea4307d4e
Merge pull request #7047 from rsyslog/codex/fix-imuxsock-ratelimit.name-vulnerability
imuxsock: prevent ratelimit.name bypass in credentialed sender path
2026-05-27 09:33:47 +02:00
Rainer Gerhards
a6490d80cf
Merge pull request #7044 from rsyslog/codex/fix-unbounded-http-response-buffering
omazuredce: bound HTTP response buffering
2026-05-27 09:29:15 +02:00
Rainer Gerhards
b18f0feb93
Merge pull request #7042 from rsyslog/codex/fix-gzip-buffer-underallocation-issue
omazuredce: avoid gzip buffer underallocation
2026-05-27 09:25:08 +02:00
Rainer Gerhards
649f62da74
Merge pull request #7059 from rsyslog/codex/fix-dependency-on-disabled-regexp-module
runtime: guard optional regexp object lifecycle in tcp server
2026-05-27 09:22:11 +02:00
Rainer Gerhards
8edc4ba40a
Merge pull request #7049 from rsyslog/codex/fix-unbounded-json-batch-fan-out-in-imkafka
imkafka: cap split.json.records fan-out
2026-05-27 09:16:54 +02:00
Rainer Gerhards
185d878a7f
Merge pull request #7056 from rsyslog/codex/fix-libgcrypt-configure-check-bug
configure: avoid leaking -lgcrypt into global LIBS
2026-05-27 09:15:49 +02:00
Rainer Gerhards
afa1f15419 mbedtls-refine-nul-name-checks 2026-05-27 08:34:29 +02:00