rsyslog/AGENTS.md
Rainer Gerhards 1e80f70356 ci: gate expensive PR test families
Why:
Regular PR CI should avoid waking long-running service-backed tests when a
change only touches unrelated helper code. Kafka, imfile, and Elasticsearch
are frequent long-tail costs, so they need focused relevance gates without
weakening full CI and flake-testing workflows.

Impact:
PR CI omits Kafka, imfile, and Elasticsearch tests for unrelated helper-only
changes, while direct module/test changes and plausible shared runtime paths
still run those families. Local CI-container runs can apply the same
relevance policy before devtools/run-ci.sh.

Before/After:
Before, broad runtime patterns made these expensive families run too often;
after, they use explicit focused dependency rules with full-run overrides.

Technical Overview:
Move the remaining root-level runtime C/H files under runtime/ so path-based
rules can reason about core code consistently. Keep conservative broad
relevance for service families that do not yet have focused dependency
rules. Add focused relevance for Kafka, imfile, and Elasticsearch covering
module paths, tests, build/testbench plumbing, config/message/action/queue,
worker, template, ruleset, parser, stats, and selected family-specific
runtime helpers. Keep isolated helpers such as lookup tables, dynstats, DNS
cache, crypto/KSI, GSSAPI, and unrelated protocol helpers from waking those
families. Add devtools/apply-service-relevance.sh so GitHub Actions and local
container testing share the same relevance-to-configure suppression logic.
Centralize Elasticsearch and Kafka job decisions on the top-level
change-scope outputs so scheduled jobs always run their test body. Preserve
RSYSLOG_TESTBENCH_FORCE_SERVICE_TESTS,
RSYSLOG_TESTBENCH_FORCE_<MODULE>_TESTS, and
RSYSLOG_TESTBENCH_SKIP_SERVICE_RELEVANCE so daily, weekly, and flake runs
can still force all tests even when there are no relevant changes. Document
that AI agents must validate both the relevance decision layer and the
resulting configured test list when changing these gates.

Validation:
bash -n tests/diag.sh devtools/apply-service-relevance.sh
git diff --check
actionlint .github/workflows/run_checks.yml
shellcheck -S warning devtools/apply-service-relevance.sh
module-needs-testing rule matrix for kafka, imfile, elasticsearch, mysql
Temporary git-diff probes for runtime/lookup.c and runtime/action.c
Source helper checks for runtime/lookup.c and runtime/action.c
Ubuntu 26.04 container make distclean plus MOCK-OK run-ci for runtime/lookup.c

With the help of AI-Agents: Codex
2026-05-27 12:46:58 +02:00

242 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# AGENTS.md rsyslog Repository Agent Guide
This file defines the high-level roadmap for AI assistants to understand and contribute to the rsyslog codebase. Technical workflows are now modularized into **Skills**.
## Local Overlay
Before starting work in this repository, read `AGENTS.local.md` if it exists.
That file contains machine- and workflow-specific instructions that are not
duplicated here.
## AI Agent Skills
To ensure consistency and high-quality contributions, AI agents SHOULD use the following standardized skills located in `.agent/skills/`:
| Skill | Purpose |
|-------|---------|
| [`rsyslog_build`](.agent/skills/rsyslog_build/SKILL.md) | Environment setup and incremental parallel builds. |
| [`rsyslog_test`](.agent/skills/rsyslog_test/SKILL.md) | Standardized validation and debugging via `diag.sh`. |
| [`rsyslog_local_container_testing`](.agent/skills/rsyslog_local_container_testing/SKILL.md) | CI-style local dev-container validation, analyzer-first flow, service-skip checks, and clean-tree rules. |
| [`rsyslog_pr_babysitting`](.agent/skills/rsyslog_pr_babysitting/SKILL.md) | Post-push PR monitoring, including CI failures, reruns, and unresolved review-thread checks. |
| [`rsyslog_changelog`](.agent/skills/rsyslog_changelog/SKILL.md) | Selective ChangeLog maintenance that follows release-note style and avoids low-signal churn. |
| [`rsyslog_doc`](.agent/skills/rsyslog_doc/SKILL.md) | Structured, RAG-optimized documentation and metadata. |
| [`rsyslog_doc_dist`](.agent/skills/rsyslog_doc_dist/SKILL.md) | Syncing documentation files in `doc/Makefile.am`. |
| [`rsyslog_module`](.agent/skills/rsyslog_module/SKILL.md) | Technical patterns for concurrency and module authoring. |
| [`rsyslog_config`](.agent/skills/rsyslog_config/SKILL.md) | Dual-frontend config architecture (RainerScript + YAML) and parity rules. |
| [`rsyslog_issue_triage`](.agent/skills/rsyslog_issue_triage/SKILL.md) | GitHub issue backlog triage, clustering, closure comments, and local evidence boards. |
| [`rsyslog_commit`](.agent/skills/rsyslog_commit/SKILL.md) | Compliant commit messages and branching policies. |
## Agent Quick Start: The "Happy Path"
Follow these steps for a typical development task:
1. **Build**: Use the `rsyslog_build` skill to set up and compile.
2. **Validate**: Use the `rsyslog_test` skill to run relevant shell tests.
3. **Container Validation**: Use the `rsyslog_local_container_testing` skill
when Docker or Podman container tooling is available.
4. **Local AI Review**: Run local Cubic review when `cubic` is available.
5. **Commit**: Use the `rsyslog_commit` skill to format code and draft your message.
Tip: You do NOT need to re-run your build, test, or container validation cycle
after formatting if you already validated the code immediately before.
## Repository Overview
- **Primary Language**: C (v8 worker model)
- **Architecture**: Microkernel core (`runtime/`) + Loadable Plugins (`plugins/`)
- **Metadata**: Every module directory contains `MODULE_METADATA.yaml`.
- **Knowledge Base**: `doc/ai/` contains canonical patterns for RAG ingestion.
- **Security Triage**: [`doc/ai/security_triage_rubric.md`](./doc/ai/security_triage_rubric.md)
defines how AI agents must distinguish confirmed issues from potential
issues, hardening, and invalid findings before using security severity or CWE
language.
## Container Images
- Runtime container definitions live in `packaging/docker/rsyslog`.
- Local GitHub Actions-style validation commands for the Ubuntu 26.04 dev
container, `-j80` check runs, clang static analyzer, disabled external
services, and Docker storage cleanup are documented in the
[`rsyslog_local_container_testing`](.agent/skills/rsyslog_local_container_testing/SKILL.md)
skill. AI agents should use that skill when running or planning this
validation.
- The container Makefile default version must stay clearly non-release.
Use explicit `VERSION=...` values for release-like local rehearsals and for
any publish automation.
- Release-tagged container images are downstream of package publishing.
AI agents must not add or use release container flows that bypass the
Adiscon PPA readiness check.
- Manual release flows use two fixed channels:
`stable` maps `8.yymm.0` to `20yy-mm` via `ppa:adiscon/v8-stable`,
and `daily-stable` uses `ppa:adiscon/daily-stable` with the fixed tag
`daily-stable`.
- AI agents must not introduce release-looking fallback tags such as
`2026-03` as the default local container build version.
## Required Final Validation Gate
For implementation tasks, AI agents MUST treat full local container validation
as the final validation gate when container tooling is available.
- If Docker or Podman is available and usable, run the
[`rsyslog_local_container_testing`](.agent/skills/rsyslog_local_container_testing/SKILL.md)
skill's full local validation before reporting the task complete.
- Full local container validation means the skill's ordered full sequence,
including the static analyzer and Ubuntu 26.04 `run-ci.sh` check run. Focused
container tests are useful targeted evidence, but they are not the full gate
unless the skill explicitly allows the reduced lane for the touched area.
- Use the skill's configured CI-equivalent dev image, including Docker Hub dev
images when appropriate. Use a locally built image only when validating that
local image or the runtime container produced by the task.
- Run local Cubic validation when `cubic` is installed and reachable. Hosted
Cubic or Gemini PR comments are additional review feedback, not substitutes
for local Cubic or local container validation.
- Relax expensive or service-backed lanes only for the narrow touched-area
cases documented in the container-testing skill, and record the rationale.
- If Docker or Podman is not installed, not running, lacks required
permissions, or the required image cannot be obtained, state that exact
blocker in the final response.
- If full local container validation is skipped or blocked, list the targeted
validation that was run instead and explicitly mark the work as **not fully
container-validated**.
- Do not describe implementation work as fully validated or complete unless
full local container validation passed, or the user explicitly accepted the
reduced validation scope after the blocker was reported.
- Session ledgers and final summaries for PR work must distinguish fully
container-validated work from targeted container-tested-only work. Include the
local Cubic status, hosted AI review status, image tag and ID, exact commands,
lane relaxations, and pass/fail results.
## Context Discovery (Subtree Guides)
Each major subtree contains a specialized `AGENTS.md` that points to area-specific context and requirements:
- **Documentation**: [`doc/AGENTS.md`](./doc/AGENTS.md)
- **Core Plugins**: [`plugins/AGENTS.md`](./plugins/AGENTS.md)
- **Contrib Modules**: [`contrib/AGENTS.md`](./contrib/AGENTS.md)
- **Runtime Core**: [`runtime/AGENTS.md`](./runtime/AGENTS.md)
- **Testbench**: [`tests/AGENTS.md`](./tests/AGENTS.md)
- **Built-in Tools**: [`tools/AGENTS.md`](./tools/AGENTS.md)
## Test Structure Rule
- For this recursive Automake tree, keep `tests/` as the single recursive
test-owning subtree.
- New and changed tests must include inline intent documentation that says what
behavior, regression, or invariant they test. If an existing test lacks that
context, add it while touching the test.
For timing, retry, sampling, concurrency, or negative-path tests, also explain
the oracle: what proves success or failure, and why any wait or threshold
exists.
When changing a test, verify that the head comment still matches the actual
setup, stimulus, oracle, and pass/fail conditions after the edit; update it in
the same commit if it does not.
- It is fine to organize sources under `tests/unit/`, `tests/helpers/`, or
similar folders, but register and run those tests from `tests/Makefile.am`.
- Do not introduce additional recursive `tests/.../Makefile.am` test harnesses.
Top-level `make check TESTS=...` propagates into every subdirectory, and
multiple test-owning subdirs make targeted selection fragile.
## Python Style Validation
- Python style is governed by `setup.cfg` with `pycodestyle` line length set
to 120 columns.
- For Python edits, run `devtools/format-python.sh <changed-python-files>`
when `pycodestyle` is installed. Use `devtools/format-python.sh --fix
<changed-python-files>` to run `autopep8` first.
- If `pycodestyle` or `autopep8` is not installed in a local agent environment,
suggest installing it (`sudo apt-get install -y pycodestyle
python3-autopep8` on Debian/Ubuntu) but do not block unrelated build or
test validation. Agents may use `devtools/format-python.sh --check-if-available ...` for
optional local checks.
- The GitHub Actions `python_style.yml` workflow installs `pycodestyle` and
checks only changed Python files in pull requests. It does not run `autopep8`.
Do not introduce full-tree Python style gates unless the baseline is
intentionally refreshed in the same change.
- Be cautious with legacy Python-2-style helper scripts: review any `autopep8`
changes that touch print statements, exception syntax, imports, or line
continuations.
## Optional Local Linter Passes
CodeFactor and CI provide centralized lint feedback, but agents SHOULD run
useful local linters on the PR diff when the tools are already installed. These
checks are advisory local validation: if a tool is missing, suggest installing
it and continue with the normal build/test flow.
Use a freshly fetched upstream base when computing changed files:
```bash
git fetch upstream main --prune
```
- For changed shell scripts, run `shellcheck` when installed:
`command -v shellcheck >/dev/null && git diff -z --name-only
--diff-filter=ACMR upstream/main...HEAD -- '*.sh' | xargs -0 -r
shellcheck -S warning`
- For changed Dockerfiles, run `hadolint` when installed:
`command -v hadolint >/dev/null && git diff -z --name-only
--diff-filter=ACMR upstream/main...HEAD -- '*Dockerfile*' 'Dockerfile' |
xargs -0 -r hadolint`
- For changed infrastructure/config files, run `trivy config` when installed.
Prefer changed paths or the smallest relevant directory over a full-repo scan.
- For larger PRs, run `jscpd` on changed source/test files when installed to
catch accidental copy/paste duplication. Treat findings as review prompts,
not automatic blockers.
Do not add `cppcheck` to the routine local PR checklist for this repository
unless a maintainer explicitly asks for it; it has historically produced too
much low-value noise on the rsyslog code base.
## GitHub Actions Validation
- When editing files under `.github/workflows/`, validate locally with
`actionlint .github/workflows/<file>.yml` and the pinned zizmor version:
`python3 -m venv .zizmor-venv && .zizmor-venv/bin/python -m pip install -r .github/requirements-zizmor.txt && .zizmor-venv/bin/zizmor --strict-collection .github/workflows`.
- Avoid direct `${{ ... }}` template expansion inside shell `run:` scripts.
Pass expression values through `env:` variables and expand those variables in
the shell script instead.
## PR Test Relevance Policy
- Regular pull-request CI may use approximate relevance gates to avoid
scheduling expensive service-backed test families that cannot reasonably be
affected by the change. The goal is to omit irrelevant tests from the
configured Automake `TESTS` set, not merely to start those tests and skip
them late after service setup.
- Relevance gates must be conservative. Direct module changes, related tests,
testbench/build plumbing, workflow files, configure inputs, and shared runtime
paths that can plausibly affect the service family must keep that family
enabled.
- Isolated helper areas may be excluded from a heavy family only when there is a
clear rationale that the family cannot use that code path. Current examples
include keeping Kafka, imfile, and Elasticsearch tests disabled for unrelated
helper-only changes such as lookup tables or dynstats.
- Agents changing relevance rules must validate both levels:
`tests/diag.sh module-needs-testing <family>` with representative changed-file
sets, and a container/mock CI run that confirms the generated test list omits
irrelevant heavy-family tests before execution.
- Full coverage must remain forceable. Daily, weekly, release, flake-campaign,
and maintainer-requested runs must be able to bypass relevance gates via
`RSYSLOG_TESTBENCH_FORCE_SERVICE_TESTS=1`,
`RSYSLOG_TESTBENCH_FORCE_<FAMILY>_TESTS=1`, or
`RSYSLOG_TESTBENCH_SKIP_SERVICE_RELEVANCE=1`.
- Do not present a relevance-filtered PR run as equivalent to an unconditional
full-suite run. Report which families were disabled and why when that matters
for the validation claim.
## Agent Chat Keywords
- `SETUP`: Triggers the `rsyslog_build` setup workflow.
- `BUILD`: Triggers the `rsyslog_build` incremental build workflow.
- `TEST`: Triggers the `rsyslog_test` validation workflow.
- `CHANGELOG`: Triggers the `rsyslog_changelog` release-note maintenance workflow.
- `SUMMARIZE`: Generates PR and commit summaries using `rsyslog_commit` templates.
- `FINISH`: Final review of code and style before conclusion.
---
*For human-facing guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md) and [DEVELOPING.md](DEVELOPING.md).*