Why: The deflake campaign found repeated test design patterns that create races only under loaded CI or high-concurrency local runs. Capturing those patterns gives agents and reviewers reusable guidance instead of rediscovering the same failure modes test by test. Impact: Adds advisory testbench guidance and a non-blocking scanner for shell tests. Before/After: Before, flake-prone patterns were implicit campaign knowledge; after, they are documented and can be scanned locally. Technical Overview: Add devtools/check-test-antipatterns.sh as an advisory scanner for shell test patterns seen in prior flakes. The script prefers rg when present and falls back to find plus grep for minimal environments. Document the antipattern list in tests/AGENTS.md so test authors know how to avoid or justify risky patterns. Mirror the same guidance in the rsyslog_test skill so generated or AI-assisted tests inherit the same review checklist. The scanner exits successfully by design. Findings are review prompts, not CI blockers, because some existing tests intentionally use timing, background helpers, or thresholds with a documented oracle. With the help of AI-Agents: Codex
14 KiB
AGENTS.md – Testbench subtree
This guide covers everything under tests/, including shell test cases,
helpers, suppressions, and supporting binaries. Pair it with
tests/README for human- and CI-facing run instructions; this
document focuses on authoring guidance and the knobs that matter most to AI
agents.
Purpose & scope
- The directory implements the Automake testbench that exercises rsyslog.
- Each
.shscript is a standalone scenario that can be executed directly or throughmake check. - Treat
tests/as the single recursive test-owning subtree. Keep new unit test sources undertests/unit/or similar subfolders, but register and run them fromtests/Makefile.aminstead of creating another recursivetests/.../Makefile.amharness. Recursive Automake propagatesmake check TESTS=...into every subdirectory, so splitting test ownership across multiple subdirs creates fragile name collisions and selection failures. - Use this guide together with the top-level
AGENTS.mdand the component guide that matches the module you are testing.
Flake-prone test antipatterns
These patterns come from actual deflake fixes. They are not banned in every case, but new or changed tests must either avoid them or explain why they are safe in that specific test. Before adding or heavily changing tests, run the advisory scan:
devtools/check-test-antipatterns.sh tests/<changed-test>.sh
The tool uses rg when available and falls back to grep, so it can run in
minimal containers. Findings are review prompts, not automatic blockers.
- Port preselection with
get_free_port: selecting a free port before the listener binds is racy. Another process can take the port in the gap. Prefer listenerport="0"plus a port file, or helper-owned readiness where the file is written only afterlisten(2)succeeds. - Fixed sleeps as synchronization:
sleep 1,msleep 3000, and similar waits usually encode host-speed assumptions. Prefer explicit readiness or completion oracles such as port files,wait_file_lines,wait_queueempty, stats counters, or imdiag waits. If a sleep intentionally creates retry pressure, document that timing role and the success/failure oracle. - Readiness files written too early: a port or ready file must mean the consumer can proceed now, not merely that setup has started or a port number was chosen.
- Negative-path tests without a deterministic oracle: auth failures, retries, disconnects, timeouts, and shutdown-abort paths must wait for a specific state, diagnostic, queue condition, or process result instead of assuming it happened after a delay.
- CPU tick, runtime, or timeout thresholds without rationale: thresholds are acceptable only when the test header explains what the number proves and why it is high enough under loaded CI runners.
- Background helpers without lifecycle control: backgrounded servers, clients, or probes need deterministic readiness and cleanup. Prefer existing testbench helpers; if custom plumbing is required, make ownership and cleanup explicit.
- Queue tests assuming immediate drain or shutdown ordering: use queue-specific synchronization where possible. Do not assume that input completion, shutdown start, or a fixed delay means all queued messages reached the tested stage.
- Shared external state: fixed filenames, spool directories, service names,
ports, topics, databases, and state files can collide under parallel
make check. Use dynamic names derived from the testbench instance where possible. - Scope-reducing deflake fixes: do not make a test pass by removing the behavior, race window, or invariant it was meant to exercise. If the oracle or setup changes, refresh the head comment and verify the original behavior is still covered.
Writing & updating tests
- Base new shell tests on existing ones; include
. $srcdir/diag.shto gain the helper functions (timeouts, Valgrind integration, rsyslogd launch helpers). - Every new or changed test must document the exact behavior, regression, or invariant it covers. Add or refresh a short comment near the top of the test when the current intent is missing, stale, or vague. For timing, retry, sampling, concurrency, or negative-path tests, also explain the oracle: what proves success or failure, and why any wait or threshold exists. When changing a test, verify that the head comment still matches the actual setup, stimulus, oracle, and pass/fail conditions after the edit; update it in the same commit if it does not.
- For diagnostics emitted by rsyslog itself, prefer asserting the configured
rsyslog output destination, usually testbench omfile output such as
RSYSLOG_OUT_LOG, after synchronized shutdown. Do not use rsyslogd stdout/stderr as the oracle unless the behavior being tested is specifically process-level stdout/stderr emission, startup before configuration is usable, or another documented case where the message cannot pass through rsyslog's normal output path. Explain such exceptions in the test header. - Prefer harness helpers such as
content_check,content_count_check,custom_content_check,check_not_present,cmp_exact,command_deny, andrequire_pluginover ad-hoc shell to keep diagnostics uniform. In particular, usecontent_check "needle" "$file"instead of hand-writtengrep ... || { cat "$file"; error_exit 1; }blocks when asserting log or output content. - Fix practical matches from
devtools/check-test-antipatterns.shwhen touching a test. If a match is intentional, explain the deterministic oracle in the test header instead of leaving the pattern as unexplained timing or concurrency folklore. - Config format coverage: When a module parameter or config object is tested
via RainerScript, add a companion YAML test (or extend an existing
yaml-<area>-*.sh) that exercises the same parameter. Both frontends share the same backend, but bugs can exist in the YAML parser alone. Name YAML testsyaml-<area>-<what>.shand use a.yamlextension for the config fixture intests/testsuites/so the YAML loader is triggered automatically. Register the test intests/Makefile.amunder the same module conditionals as its RainerScript counterpart. See thersyslog_configskill for full conventions. - Name Valgrind-enabled wrappers with the
-vg.shsuffix and toggle Valgrind by exportingUSE_VALGRINDbefore including the non-vgscript using the.command. Usetests/timereported-utc-vg.shas the reference layout: it includes the base scenario using the.command instead of copying it and demonstrates paring back emitted messages when the underlying test is slow—especially important under Valgrind. Older wrappers still duplicate logic; prefer the modern pattern when touching them. - Put auxiliary binaries next to their scripts (e.g.
*.cprograms compiled via the Automake harness) and list them intests/Makefile.am. Unit-test sources may live intests/unit/, but the owning harness remainstests/Makefile.am. - Keep long-lived configuration snippets in
tests/testsuites/and reuse them instead of copying large config blocks into multiple scripts. - Document new environment flags or helper functions inside
diag.shso other tests can discover them. Mention the addition intests/READMEif operators should know about the new knob.
Running tests locally
- Build rsyslog first using the efficient incremental command:
This ensures the testbench can load freshly built binaries and modules. If themake -j$(nproc) check TESTS=""Makefileis missing, see "Step 2" in the top-levelAGENTS.md. - Execute individual scenarios directly for quick feedback
(
./tests/imfile-basic.sh). - Use
make check TESTS='script.sh'when you need Automake logging, parallelisation control, or to exercise the Valgrind wrappers. - For unit binaries registered in
tests/Makefile.am, usemake check TESTS='binary_name'from the repository root so Automake builds the required runtime libraries before enteringtests/.
Multi-Module Test Guards
When adding a test that requires multiple modules (e.g., imtcp AND imptcp), you MUST wrap the test definition in tests/Makefile.am with significantly separate conditionals for ALL required modules. Do not assume one implies the others.
Example:
if ENABLE_IMTCP
if ENABLE_IMPTCP
TESTS += multi_module_test.sh
endif
endif
Failing to do this breaks the build on systems where only one of the modules is enabled.
- Remove stale
.log/.trsfiles before re-running a flaky test to avoid Automake caching previous outcomes. - For configuration validation changes, run
./tests/validation-run.shto confirm both failure and success paths.
Debugging & environment control
tests/diag.shdocuments environment variables such asSUDO,USE_VALGRIND,RSYSLOG_DEBUG, and timeout overrides; prefer these knobs over ad-hocsleeploops.tests/READMEmirrors the operator-facing knobs so tooling and human docs stay aligned.- Use
USE_GDB=1 make <test>.logto pause execution and attach a debugger as described intests/README. - Keep suppression files (e.g.
*.supp) current when adding new Valgrind noise; failing to do so will cause CI false positives.
Enabling Debug Output
To enable rsyslog debug logging for a test, temporarily uncomment these lines in tests/diag.sh (around lines 88-89):
export RSYSLOG_DEBUG="debug nologfuncflow noprintmutexaction nostdout"
export RSYSLOG_DEBUGLOG="log"
This creates a log file in the tests directory with detailed execution traces.
Important: Remember to re-comment these lines after debugging to avoid cluttering test output.
Preventing Test Cleanup for Inspection
To examine test output files after a test runs, temporarily comment out exit_test at the end of the test script:
# exit_test # Temporarily disabled to inspect logs
This preserves:
rstb_*.out.log- The actual test outputrstb_*.conf- The generated rsyslog configurationlog- Debug log (if enabled)rstb_*.input*- Test input files
Example Debugging Workflow
-
Enable debug output in
diag.sh:# Uncomment lines 88-89 export RSYSLOG_DEBUG="debug nologfuncflow noprintmutexaction nostdout" export RSYSLOG_DEBUGLOG="log" -
Disable cleanup in your test script:
# Comment the exit_test line #exit_test -
Run the test:
cd tests ./mmsnareparse-trailing-extradata.sh -
Examine output:
# Check actual output vs expected cat rstb_*.out.log # Search debug log for specific patterns grep "extradata_section" log grep "Truncated trailing" log -
Restore test environment:
- Re-comment debug exports in
diag.sh - Uncomment
exit_testin your test script - Clean up test artifacts:
rm -f rstb_* log
- Re-comment debug exports in
Understanding Test Output
When a test fails with content_check, the error shows:
FAIL: content_check failed to find "expected content"
FILE "rstb_*.out.log" content:
1 actual line 1
2 actual line 2
This helps identify:
- What the test expected vs what was produced
- Whether the module parsed the message correctly
- If fields are populated as expected
YAML-only test mode
generate_conf --yaml-only creates a pure-YAML preamble (no RainerScript preamble)
that is used as the rsyslogd startup configuration directly. Use it when a test
must validate YAML-loader behaviour or when no RainerScript is desired.
How it works
generate_conf --yaml-only [instance]writes${TESTCONF_NM}[instance].yamlcontainingversion: 2,global:, andtestbench_modules:(imdiag setup).testbench_modules:is a YAML key understood by rsyslogd as an alias formodules:and is reserved for testbench infrastructure — it avoids any conflict with the test's ownmodules:section.- Tests add their own
modules:section (andinputs:,rulesets:, etc.) viaadd_yaml_conf. add_yaml_conf 'fragment' [instance]appends arbitrary YAML to the same file.add_yaml_imdiag_input [instance]is a historical compatibility helper. imdiag startup detection is configured bygenerate_conf --yaml-onlyvia module-scoped testbench parameters, so new tests should not call it.startup_commondetectsRSYSLOG_YAML_ONLY=1and passes the.yamlfile to rsyslogd instead of the usual.conffile.
Limitations
The following testbench features are not available in yaml-only mode:
| Feature | Reason | Workaround |
|---|---|---|
Legacy $ directives |
Legacy syntax is not parsed by the YAML loader | Use v2 RainerScript (module(), input()) or YAML keys instead |
Note
: Startup detection uses the imdiag port file in both RainerScript and yaml-only modes. The
.startedmarker file mechanism has been removed; the imdiag port file is the sole startup signal in all modes.
Example test structure
. ${srcdir:=.}/diag.sh init
require_plugin imtcp
export NUMMESSAGES=100
export QUEUE_EMPTY_CHECK_FUNC=wait_file_lines
generate_conf --yaml-only
# Test-specific modules in their own modules: section (testbench_modules: is in preamble)
add_yaml_conf 'modules:'
add_yaml_conf ' - load: "../plugins/imtcp/.libs/imtcp"'
add_yaml_conf ''
add_yaml_conf 'inputs:'
add_yaml_conf " - type: imtcp"
add_yaml_conf " port: \"0\""
add_yaml_conf " listenPortFileName: \"${RSYSLOG_DYNNAME}.tcpflood_port\""
add_yaml_conf " ruleset: main"
add_yaml_conf 'rulesets:'
add_yaml_conf ' - name: main'
add_yaml_conf ' script: |'
add_yaml_conf " action(type=\"omfile\" file=\"${RSYSLOG_OUT_LOG}\")"
startup
# ... test body ...
exit_test
Naming and registration
- Name yaml-only tests
yaml-<area>-yamlonly.sh(or append-yamlonlyto an existing test name) so they are easy to find. - Register them in
tests/Makefile.amunderTESTS_LIBYAML.
Coordination
- When adding tests for a plugin or runtime subsystem, mention them in the
component’s
AGENTS.mdso future authors know smoke coverage exists. - Update
KNOWN_ISSUESor module metadata if a test encodes a known bug or a skipped scenario. - If a change requires additional docker services or fixtures, document setup
steps in
tests/CI/README(or create it) and link from the relevant module guide.