20970 Commits

Author SHA1 Message Date
Rainer Gerhards
64fa239c71 tls: preserve send retry without reconnect 2026-05-29 13:24:17 +02:00
Rainer Gerhards
0b2d239a03 tls: keep send-side read retries local 2026-05-29 12:16:02 +02:00
Rainer Gerhards
b495454435 imudp: validate listen port formatting 2026-05-29 11:20:05 +02:00
Rainer Gerhards
495c9b48b4 tests: tighten mmsnareparse regex anchor check 2026-05-29 11:20:05 +02:00
Rainer Gerhards
6c6b3ad92e
Merge pull request #7109 from rgerhards/codex/i6663-omfwd-rebind-leak
omfwd: avoid tcpclt leak on rebind
2026-05-29 10:45:08 +02:00
Rainer Gerhards
7d06d8900b
Merge pull request #7105 from rgerhards/codex/i6308-omfwd-lb-flake
tests: retry omfwd one-target flake wrappers
2026-05-29 10:43:15 +02:00
Rainer Gerhards
bd71ea0083 mmsnareparse: preserve regex end-anchor semantics
Why: Regex trailing extra-data detection must not let a bounded search
window change the meaning of end-anchored patterns.

Impact: Prevents false trailing-token removal for regex patterns that
match only a truncated prefix.

Before/After: Before, '$' could match an artificial NUL at the search
limit; after, it only matches a real token end.

Technical Overview: When the regex input is temporarily NUL-terminated
at the configured search limit, pass REG_NOTEOL to regexec(). This keeps
the bounded input optimization and the start-anchored behavior, but
prevents '$' from matching the artificial boundary. Add a regression test
with an end-anchored numeric pattern and a longer alphanumeric final
token to prove the token remains parsed data instead of extradata.

With the help of AI-Agents: GPT-5.3-Codex
2026-05-29 10:19:15 +02:00
Rainer Gerhards
b3e47281b2 imudp: harden listen port file writes
Why:

Operators may configure imudp to publish its bound UDP port before

privilege drop. The handoff file must not let a local user abuse

symlinks or special files in attacker-writable directories.

Impact:

Symlink, FIFO, and other special-file handoff paths now fail startup.

Before/After:

imudp used fopen("w"); now it opens and validates regular files.

Technical Overview:

Replace stdio truncation with open(2) using O_NOFOLLOW, O_CLOEXEC,

O_NONBLOCK, and owner-only creation mode.

Inspect the configured path before opening so existing non-regular

files are rejected without blocking on FIFOs.

Validate the opened descriptor with fstat before writing so races and

device files do not become handoff targets.

Write the port through a retrying write loop and preserve close-error

propagation.

Add a regression test that configures listenPortFileName as a symlink

and verifies the target content remains unchanged.

Update the documentation example to use /run/rsyslog and warn against

untrusted writable directories.

With the help of AI-Agents: Codex
2026-05-29 09:54:03 +02:00
Rainer Gerhards
d525958533 tls: propagate send-side receive retry
Why: avoid worker CPU spin when TLS send progress needs peer input.

Impact: nonblocking TLS send now re-arms I/O readiness instead of looping.

Before/After: send-side WANT_READ retried immediately; now it returns retry.

Technical Overview:

OpenSSL and GnuTLS send paths may need a receive operation to process TLS control traffic before a write can continue. The receive helpers already return RS_RET_RETRY for nonblocking not-ready states and set internal retry state. After clearing that temporary receive retry state, Send() must still return RS_RET_RETRY to its caller so the event loop can wait for socket readiness. Restore that propagation in both TLS backends while keeping buffered application-data handling unchanged.

With the help of AI-Agents: GPT-5.3-Codex
2026-05-29 08:20:57 +02:00
Rainer Gerhards
2817c88815
Merge pull request #7110 from rgerhards/codex/i4945-imjournal-future-warning
imjournal: warn when newest entry is in the future
2026-05-29 08:01:27 +02:00
Rainer Gerhards
75423a0b3b imjournal: handle clock jumps in future probe 2026-05-29 02:25:23 +02:00
Rainer Gerhards
676f485a6c imjournal: warn when newest entry is in the future
When sd_journal_next() reports no entries, probe the journal tail with a separate handle and emit a rate-limited warning if the newest journal entry is ahead of current wall clock time. This gives operators a concrete diagnostic for post-crash/time-jump stalls without disturbing the main journal cursor.

closes https://github.com/rsyslog/rsyslog/issues/4945
2026-05-29 02:02:16 +02:00
Rainer Gerhards
1d2c6a49b5 omfwd: avoid tcpclt leak on rebind
Rebind teardown already drops the target transport state and lets poolTryResume() establish the next connection. Re-running initTCP() during each rebind overwrote the worker tcpclt pointers, leaking the old objects on every interval.

Keep the per-worker tcpclt objects for the worker lifetime and add a Valgrind regression that drives repeated TCP rebinds.

closes https://github.com/rsyslog/rsyslog/issues/6663
2026-05-29 00:52:15 +02:00
Rainer Gerhards
82e991219c tests: cover imrelp TLS random disconnects 2026-05-28 23:32:11 +02:00
Rainer Gerhards
0bf116858f template: guard NULL property rendering
Harden list-template rendering against unexpected NULL property values before copying into the action parameter buffer.

Add a regression test for issue #3311's queued list template with missing JSON fields so action workers render empty values instead of crashing.
2026-05-28 23:17:54 +02:00
Rainer Gerhards
d2fcc3dd9f imuxsock: handle embedded NUL datagrams safely
When imuxsock sanitized a raw message containing an embedded NUL, tag parsing continued to walk the original receive buffer while using the sanitized length. That could read beyond initialized datagram bytes and was reported by Valgrind in issue #4941.

Rebase parsing onto the sanitized raw message buffer, initialize the reserved listener slot used during cleanup, and add focused normal plus Valgrind regression coverage for embedded-NUL Unix datagrams.
2026-05-28 20:42:50 +02:00
Rainer Gerhards
e6d3a35736 tests: make omfwd retry wrappers vpath-safe
Why: The retry wrappers execute shared skeletons as child scripts, so VPATH builds need the source directory available in that child environment.

Impact: The wrappers no longer depend on skeleton executable bits and preserve the Bash testbench environment.

Before/After: Before, child skeletons could lose srcdir; after, srcdir is exported and skeletons run explicitly with bash.

Technical Overview: Export the resolved srcdir value before computing the skeleton path.

Technical Overview: Invoke the Bash skeletons with bash instead of direct execution.

Technical Overview: Keep the retry loop and fail-marker suppression behavior unchanged.

Refs: https://github.com/rsyslog/rsyslog/issues/6308

With the help of AI-Agents: Codex
2026-05-28 19:20:00 +02:00
Rainer Gerhards
926a59498a tests: retry omfwd one-target flake wrappers
Why: CI occasionally trips over a known TCP timing race in the omfwd one-target retry variants, especially on constrained workers.

Impact: The 1-byte-buffer wrappers now get the same bounded retry tolerance as the full-buffer wrapper.

Before/After: Before, two wrappers failed the suite on one unlucky reconnect window; after, they retry once and still fail if the scenario remains broken.

Technical Overview: Keep the existing skeletons as the single source of test behavior.

Technical Overview: Execute each 1-byte-buffer wrapper through a two-attempt loop.

Technical Overview: Suppress the fail marker only on the first attempt so final failure reporting remains intact.

Technical Overview: Preserve the forced target-failure and normal one-target scenarios unchanged.

Technical Overview: Document the retry as a flake mitigation, not a semantic oracle change.

Refs: https://github.com/rsyslog/rsyslog/issues/6308

With the help of AI-Agents: Codex
2026-05-28 19:07:00 +02:00
Rainer Gerhards
993068f961
Merge pull request #7100 from rsyslog/codex/propose-fix-for-double-free-vulnerability
mmjsontransform: fix dotted conflict ownership
2026-05-28 18:38:02 +02:00
Rainer Gerhards
d406ffab40 mmdblookup: clean up failed worker creation
When opening the MaxMind database fails during worker-instance creation, the module-template helper still writes the partially allocated worker pointer back to the action slot. The action code then skips pAction assignment because creation failed, leaving shutdown cleanup with orphan worker data and no owning action.

Initialize and track the worker mutex before opening the database, destroy any initialized resources on failure, and return a NULL worker pointer so the runtime has no stale slot to remove.

Add normal and Valgrind regression coverage for a missing mmdbfile path.

Refs: https://github.com/rsyslog/rsyslog/issues/4024
2026-05-28 18:24:15 +02:00
Rainer Gerhards
ad91c5fa44
Merge pull request #7101 from rgerhards/ai-avoid-container-uid-permission-trap
agents: avoid local container UID trap
2026-05-28 18:22:30 +02:00
Rainer Gerhards
c7ae67aded
agents: avoid local container UID trap
Why:
Local validation should prevent known workflow traps instead of making each
agent rediscover and repair them after a container run.

Impact:
Local container validation now defaults to host UID ownership, reducing
permission cleanup churn in worktrees.

Before/After:
Before, the helper forced the dev image default user; after, it lets the
container wrapper map to the host uid/gid by default.

Technical Overview:
The local validation helper no longer exports an empty
RSYSLOG_CONTAINER_UID for Ubuntu 26.04 check and focused-test lanes.
Leaving the variable unset uses the existing devcontainer.sh behavior that
passes the host uid/gid to docker and injects passwd/group entries when
needed.
The container-testing skill now documents this as the normal local mode and
keeps RSYSLOG_CONTAINER_UID='' reserved for intentional GitHub Actions user
reproduction.
The fallback cleanup guidance remains for already-polluted or intentionally
CI-user worktrees.

With the help of AI-Agents: Codex
2026-05-28 18:11:11 +02:00
Rainer Gerhards
7c7c94fd5f mmjsontransform: fix dotted conflict ownership
Why: malformed dotted keys in policy preprocessing could reuse a JSON value after jsontransformInsertDotted() had already consumed it.

Impact: prevents policy-mode conflict paths from double-freeing or leaking transformed JSON values.

Before/After: conflict callers kept stale ownership; now they clear ownership immediately after the insert handoff.

Technical Overview:

jsontransformInsertDotted() owns the value argument after it is called, including conflict and error exits. The policy preprocessing caller now clears rewrittenValue as soon as that transfer occurs, so loop cleanup cannot put the same json-c object again. The unflatten rewrite caller follows the same immediate handoff pattern before checking the return code. The policy regression test now sends a malformed trailing-empty dotted key and asserts that the daemon logs the hierarchy conflict, drops that message, and shuts down cleanly.

With the help of AI-Agents: GPT-5.3-Codex
2026-05-28 18:07:56 +02:00
Rainer Gerhards
d8a747b69e
Merge pull request #7099 from rsyslog/ai-fix-omfwd-lb-flakes
testbench: synchronize omfwd retry test
2026-05-28 17:27:00 +02:00
Rainer Gerhards
f9a6c2697c
Merge pull request #7097 from rgerhards/mmexternal-security-review-fixes
mmexternal: fix single-instance reply handling
2026-05-28 17:25:16 +02:00
Rainer Gerhards
b69ed7991a
mmexternal: fix EINTR reply handling
Guard the reply loop against EINTR retries before any bytes were read
so the newline check cannot underflow the response buffer. Drop the
unreachable debug block and apply the repository formatter to satisfy
style-check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-28 15:35:37 +02:00
Rainer Gerhards
3ef0ddd699
testbench: synchronize omfwd retry test
Why:
The omfwd load-balancing retry test could still depend on timing luck under
heavy parallel test load. It waited for target 2 to listen, then slept a fixed
fractional interval before assuming omfwd had retried the suspended pool
member.

Impact:
Fixes a test synchronization bug exposed by the deflake campaign.

Before/After:
Before, the test could inject the second batch before the retry connection was
actually accepted; after, it waits for the retry accept event before checking
line-count oracles.

Technical Overview:
Extend minitcpsrvr with an optional accept-ready marker file written after a
successful accept(). Thread that option through start_minitcpsrvr in diag.sh.
Update omfwd-lb-2target-retry.sh to wait for the whole-second retry window used
by omfwd, inject the second batch, and then wait until target 2 has actually
accepted the retry connection.

This preserves the existing test semantics: the first batch must go only to
target 1 while target 2 is unavailable, and the second batch must split evenly
after target 2 becomes usable again.

With the help of AI-Agents: OpenAI Codex
2026-05-28 15:28:55 +02:00
Rainer Gerhards
6928ed313b
Merge pull request #7098 from rsyslog/ai-update-local-validation-artifacts
agents: add interim testing skill helper
2026-05-28 15:23:37 +02:00
Rainer Gerhards
3cf354dabb
agents: add local validation helper
Why:
Local validation instructions had become detailed enough that agents and
external contributors could easily run the wrong subset, miss uncommitted
files, or treat missing local tooling as a hard blocker.

Impact:
Adds a first in-repository PoC for deterministic local validation planning.

Before/After:
Before, local validation was primarily checklist text; after, a helper can
classify the delta and run the available applicable checks.

Technical Overview:
Add devtools/local-validation-plan.sh as a POSIX sh helper that classifies
committed, staged, unstaged, and untracked local changes. In plan mode it
prints the recommended validation path. In --run mode it executes the selected
available checks, keeps shell and Python checks diff-scoped, and warns rather
than failing when local tools such as Docker, Cubic, shellcheck, or
checkbashisms are unavailable.

Document the helper in AGENTS.md and the local container testing skill. Clarify
that agent and skill documentation changes do not require runtime container CI,
rendered user docs should use a docs build, and limited local environments must
run whatever applicable checks are available while reporting skipped coverage.

Add checkbashisms guidance for changed scripts that claim POSIX sh portability,
including the Debian/Ubuntu devscripts package hint, without turning the whole
Bash-owned testbench into a noisy portability gate.

With the help of AI-Agents: OpenAI Codex
2026-05-28 15:15:20 +02:00
Rainer Gerhards
a423a4d13d
Merge pull request #7078 from rsyslog/codex/propose-fix-for-regex-cpu-dos-vulnerability
mmsnareparse: cap regex input in trailing token
2026-05-28 14:41:51 +02:00
Rainer Gerhards
90c13e1f56
Merge pull request #7081 from rsyslog/codex/propose-fix-for-omhttp-vulnerability
omhttp: refresh health-check headers after gzip rebuild
2026-05-28 14:40:05 +02:00
Rainer Gerhards
708d6de2f4
Merge pull request #7084 from jjourdin/imptcp-autocompression
imptcp: add stream:auto compression mode
2026-05-28 14:37:12 +02:00
Rainer Gerhards
5e18aa601c
mmexternal: fix single-instance reply handling
Make forceSingleInstance share one helper process across workers, clean up shared and per-worker child state on teardown, and bound helper replies by the existing maxMessageSize setting while removing the realloc NULL-deref path.

Add focused mmexternal regressions for shared single-instance behavior and oversized helper replies, and record the module locking model in the AI module map.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-28 14:27:41 +02:00
Rainer Gerhards
3e20f6ff88
Merge pull request #7079 from rsyslog/codex/fix-assertion-failure-in-mmanon
mmanon: avoid assert on malformed embedded IPv4-in-IPv6
2026-05-28 13:18:52 +02:00
Rainer Gerhards
a643a73948
Merge pull request #7083 from rsyslog/codex/propose-fix-for-mmanon-availability-issue
mmanon: make random suffix generation fully reachable
2026-05-28 13:14:56 +02:00
Rainer Gerhards
202378969d
Merge pull request #7085 from rsyslog/codex/fix-omotlp-invalid-pointer-dereference
omotel: use getProgramName() to avoid unsafe PROGNAME union access
2026-05-28 13:07:21 +02:00
Rainer Gerhards
68f6a891d5
Merge pull request #7070 from rgerhards/codex/i4452-omkafka-unreachable
omkafka: harden failed delivery replay paths
2026-05-28 13:00:50 +02:00
Rainer Gerhards
05a0a62f5a
Merge pull request #7088 from rsyslog/codex/propose-fix-for-mbedtls-documentation-issue
doc: clarify mbedtls requires StreamDriverMode=1 to enable TLS
2026-05-28 12:57:59 +02:00
Rainer Gerhards
a4f5e678f0
Merge pull request #7089 from rsyslog/codex/fix-double-free-in-mmjsontransform
mmjsontransform: fix child ownership on dotted-key merge conflicts
2026-05-28 12:53:52 +02:00
Rainer Gerhards
2f048b0da0
Merge pull request #7090 from rsyslog/codex/fix-cpu-denial-of-service-in-mmsnarewinsec
mmsnareparse: avoid quadratic provider scans in Snare payload detection
2026-05-28 12:52:44 +02:00
Rainer Gerhards
61b4d3337d
Merge pull request #7091 from rsyslog/codex/propose-fix-for-memory-exhaustion-vulnerability
omelasticsearch: bound startup version probe response size
2026-05-28 12:47:50 +02:00
Rainer Gerhards
758387b5fd
Merge pull request #7092 from rsyslog/codex/propose-fix-for-tls-keyupdate-vulnerability
tls: avoid recv-helper misuse in send WANT_READ
2026-05-28 12:45:48 +02:00
Rainer Gerhards
11ad47b9c6
Merge pull request #7096 from rsyslog/ai-daily-distro-ci-lanes
ci: move duplicate distro lanes to daily checks
2026-05-28 12:36:40 +02:00
Rainer Gerhards
87a0e984a9
ci: move duplicate distro lanes to daily checks
Why:
Fedora 42 and CentOS 8 still provide useful portability and
image-drift signal, but they mostly duplicate adjacent PR runtime
lanes. Keeping them in every PR makes the regular matrix slower
without adding enough per-change confidence to justify the cost.
The Debian sid PR lane no longer provides reliable rolling-Debian
signal because its devcontainer image is not rebuilt frequently.

Impact:
Regular PR CI runs fewer duplicate or misleading distro lanes; daily CI
keeps full configured coverage for the moved distro lanes and opens or
updates tracking issues when scheduled lanes fail.

Before/After:
Before, centos_8, fedora_42, and a stale debian_sid image ran on every
PR. After, centos_8 and fedora_42 run as full-suite daily distro lanes,
and the stale debian_sid lane/container is removed.

Technical Overview:
Remove centos_8, fedora_42, and debian_sid from the run_checks.yml PR
matrix.
Add run_distro_daily.yml for full configured distro test runs using the
same devcontainer images and configure options as the removed centos_8
and fedora_42 PR lanes.
Delete the Debian sid devcontainer definition because an unreliably
rebuilt sid image is a stale snapshot rather than a trustworthy
upcoming-Debian canary.
Do not apply PR relevance pruning to daily distro runs; scheduled runs
must test the full configured lane because any code may have changed
since the previous run.
Use the same ci-failure artifact naming and log globs as regular PR CI
so the flake collector can process scheduled failures through the same
path.
Add or align tracking issue reporting for the touched daily and weekly
scheduled workflows so failures provide a persistent triage handle.
Restrict tracking issue search to open issues so failures cannot update
a closed tracker and become hidden.
Clarify in issue summaries that failures must be classified as one-off
flakes or regressions and that the long-term expectation is fewer
flakes as recurring causes are fixed.
Keep issue-write permission scoped to reporting jobs only.

With the help of AI-Agents: Codex
2026-05-28 10:49:09 +02:00
Rainer Gerhards
10c3f84e8b
Merge pull request #7095 from rgerhards/ai-test-antipattern-guidance
testbench: document flake-prone test antipatterns
2026-05-28 10:43:50 +02:00
Rainer Gerhards
0ca511efc8
testbench: improve antipattern scanner portability
Why:
The advisory flake-pattern scanner should be useful on developer
machines without producing avoidable false positives or requiring
GNU-only command options.

Impact:
The scanner remains advisory but is more portable and less noisy.

Before/After:
Before, the scanner used GNU xargs behavior and could flag commented
background-helper examples. After, it runs without xargs -r and ignores
comment-only lines for background-helper findings.

Technical Overview:
Remove the GNU-specific xargs -r usage from both rg and grep paths.
Keep the existing early empty-input check, so xargs still receives at
least one file when the scanner reaches the search path.
Fix the fallback branch indentation while touching the scanner.
Tighten the background-helper regex so the first non-blank character
must not be a comment marker.
This keeps actual backgrounded helper commands visible while avoiding
comment-only matches.

With the help of AI-Agents: Codex
2026-05-28 10:37:01 +02:00
Rainer Gerhards
d37513835a
Merge pull request #7094 from rgerhards/codex/continuous-issue-session-skill
agents: add continuous issue session skill
2026-05-28 09:31:02 +02:00
Rainer Gerhards
9e68654db6 agents: add continuous issue session skill
Why: Long-running issue sessions need a clear state-machine workflow so agents keep active work slots filled instead of stopping after one PR is green.

Impact: Documents the rolling active-set workflow, validation expectations, PR babysitting, cleanup, and refill behavior for future agents.

Before/After: Previously this workflow was spread across chat history; now it is captured as a dedicated skill with a ledger template.

Technical Overview: Adds rsyslog_continuous_issue_session as an orchestration skill. The skill composes existing triage, test, container, commit, and PR babysitting skills instead of replacing their stricter rules. It makes the any-PR-merged refill rule explicit, preserves shared-cadence babysitting, records validation and AI review requirements, and makes merged-PR local cleanup mandatory before refilling a slot. AGENTS.md now lists the skill for discovery.

With the help of AI-Agents: Codex
2026-05-28 09:21:19 +02:00
Rainer Gerhards
f57b231bb6
testbench: document flake-prone test antipatterns
Why:
The deflake campaign found repeated test design patterns that create
races only under loaded CI or high-concurrency local runs. Capturing
those patterns gives agents and reviewers reusable guidance instead of
rediscovering the same failure modes test by test.

Impact:
Adds advisory testbench guidance and a non-blocking scanner for shell
tests.

Before/After:
Before, flake-prone patterns were implicit campaign knowledge; after,
they are documented and can be scanned locally.

Technical Overview:
Add devtools/check-test-antipatterns.sh as an advisory scanner for shell
test patterns seen in prior flakes. The script prefers rg when present
and falls back to find plus grep for minimal environments.

Document the antipattern list in tests/AGENTS.md so test authors know
how to avoid or justify risky patterns. Mirror the same guidance in the
rsyslog_test skill so generated or AI-assisted tests inherit the same
review checklist.

The scanner exits successfully by design. Findings are review prompts,
not CI blockers, because some existing tests intentionally use timing,
background helpers, or thresholds with a documented oracle.

With the help of AI-Agents: Codex
2026-05-28 09:13:39 +02:00
Rainer Gerhards
2d44870958
Merge pull request #7080 from rgerhards/ai-fix-lookup-table-flakes
lookup: report active table reloads as pending
2026-05-28 08:27:21 +02:00