Rainer Gerhards 2b2fcd5ef5
tests: document diagnostic oracle expectations
Why: Test assertions should prove the observable behavior operators rely
on, not just that rsyslogd wrote a message to its process streams.

Impact: Clarifies test-authoring and babysitting guidance for rsyslog
diagnostics.

Before/After: Tests could use stdout/stderr oracles without a clear
exception; guidance now prefers configured rsyslog output destinations
after synchronized shutdown.

Technical Overview: Add the authoritative rule to tests/AGENTS.md.
Point rsyslog_test at that rule when explaining test oracle
documentation. Add a review-time reminder to rsyslog_pr_babysitting so
PR feedback catches weak stdout/stderr diagnostic oracles. Exceptions
remain allowed when the tested behavior is process output or
configuration is not usable yet, but they must be documented in the test
header.

With the help of AI-Agents: OpenAI Codex
2026-05-21 16:58:31 +02:00

7.5 KiB

name, description
name description
rsyslog_pr_babysitting Monitor rsyslog pull requests after push or rerun, including GitHub Actions checks, unresolved review threads, bot comments, reruns for known flakes, and concise status reporting.

rsyslog_pr_babysitting

Use this skill when babysitting an rsyslog PR after pushing, rerunning CI, or waiting for review. Babysitting is incomplete unless both CI status and unresolved review threads have been checked. Read review comments as part of babysitting, including bot or CI comments that explain failed or skipped jobs.

When a user says "babysit PR", the default goal is to watch and act until the PR is fully babysat (see stop conditions below), then stop. Do not keep polling until merge unless the user explicitly asks for merge-time monitoring.

Poll Cadence

  • Poll every 10 to 20 minutes while checks are running; 15 minutes is the default.
  • If checks are complete but actionable review threads or CI/bot comments remain unresolved, continue on the same cadence until they are handled or the user says to stop.
  • If babysitting a non-empty set of PRs, check them together on the same cadence instead of starting separate tight loops.
  • Stop babysitting each PR as soon as that PR is green: no failed checks, no running or queued required checks, and no unresolved actionable review threads. Merge is a maintainer decision and may happen days later, so a green unmerged PR is not a reason to keep polling unless the user explicitly asked for merge-time monitoring.
  • Do not leave background sleep or polling processes alive when stopping work.
  • Report the commit SHA, failed checks, running checks, and unresolved review findings at each meaningful update.
  • Do not keep polling when progress is blocked by a maintainer decision, missing credentials, unavailable external service, or any other condition the agent cannot resolve safely. Notify once with the blocker and recommended next step, then stop or pause the automation.

Poll Decision Table

Run this decision table on every poll before deciding whether to report, fix, or keep waiting:

  • Merged: stop tracking the PR. Worktree and branch cleanup is a separate maintenance task; do it only when explicitly requested.
  • Checks running or queued: keep polling on cadence, unless review threads already contain an actionable simple fix that can be handled while checks run.
  • Checks failed: inspect failing logs before guessing. Classify the failure as likely flake, PR-caused, external/unresolvable, or decision-needed.
  • Likely flake: record the failing test and reason, rerun failed jobs only, and keep polling.
  • PR-caused failure: fix only after tying the failing path to the PR's changes, validate locally, amend or commit as appropriate, push, and keep polling.
  • Active review threads: classify each unresolved, non-outdated thread as actionable/simple, needs maintainer decision, external/unresolvable, or response-only.
  • Actionable/simple review thread: if the PR branch is in the agent's write scope, fix it immediately, validate the affected area, amend or commit as appropriate, push, and continue babysitting. Do not report it as a blocker before attempting the fix.
  • Needs maintainer decision or external/unresolvable: notify the user once with the exact blocker and recommended next step, then stop or pause babysitting for that PR.
  • Response-only: draft the needed response for the user and stop or pause unless the user explicitly asked the agent to post replies.
  • Fully babysat: stop tracking this PR when there are no failed checks, no running or queued required checks, and no unresolved actionable review threads. Do not keep polling a green PR while it waits for maintainer merge unless the user explicitly asked for merge-time monitoring.

CI Poll

Use gh pr view for check rollups:

gh pr view PR_NUMBER --repo rsyslog/rsyslog \
  --json statusCheckRollup,reviewDecision,mergeStateStatus,mergeable,state,headRefOid,url

Treat mergeable: CONFLICTING or mergeStateStatus: DIRTY as an actionable blocker, not just a status to report. Before resolving it, fetch official upstream main freshly, for example:

git fetch upstream main --prune

Then rebase or merge the PR branch onto that fetched upstream/main, resolve conflicts locally, validate the affected files, and push the updated branch. Do not rely on the fork's origin/main for conflict resolution.

For failures, inspect logs before guessing:

gh run view RUN_ID --repo rsyslog/rsyslog --job JOB_ID --log-failed
gh api -H 'Accept: application/vnd.github+json' \
  /repos/rsyslog/rsyslog/actions/jobs/JOB_ID/logs

If a failed test is a known or likely flake, record the failing test name and reason, then rerun failed jobs only:

gh run rerun RUN_ID --repo rsyslog/rsyslog --failed

Do not change code for CI failures until the failing path has been tied to the PR's changes.

Review Thread Poll

Fetch review threads with GraphQL. Flat PR comments are not enough because they omit thread state. If a GitHub review-comment skill is available, use its thread-aware fetch helper; otherwise query reviewThreads directly:

gh api graphql \
  -F owner=rsyslog \
  -F repo=rsyslog \
  -F number=PR_NUMBER \
  -f query='
query($owner:String!,$repo:String!,$number:Int!) {
  repository(owner:$owner,name:$repo) {
    pullRequest(number:$number) {
      reviewThreads(first:100) {
        nodes {
          isResolved
          isOutdated
          path
          line
          originalLine
          comments(first:20) {
            nodes { author { login } createdAt body }
          }
        }
      }
    }
  }
}'

For each unresolved, non-outdated thread, track:

  • author, path, and line
  • priority words such as high, critical, regression, skip, CI, or workflow
  • whether the requested change is actionable or only needs a response

When the PR branch belongs to the current agent task or the user has asked the agent to babysit its own PR, simple review comments are part of the babysitting work. Handle small, localized fixes directly, validate them, and push the updated branch. Examples include typo fixes, documentation wording, metadata formatting, simple example corrections, and narrow test expectation updates.

When reviewing or adjusting test assertions, apply the oracle rule from tests/AGENTS.md: diagnostics emitted by rsyslog itself should normally be asserted through the configured rsyslog output destination after synchronized shutdown, not through rsyslogd stdout/stderr. Treat stdout/stderr diagnostic oracles as actionable unless the test documents a specific exception.

AI review comments need an explicit GitHub reply so later readers know the comment was considered. If the requested change was implemented, a short Done. reply is sufficient. If the comment is invalid, not applicable, or not implemented, reply with the specific reason.

Do not push fixes for PRs outside the agent's write scope, broad design changes, compatibility changes, ambiguous review requests, or anything requiring maintainer policy judgment. Report those once and stop or pause polling. Do not reply to non-AI reviewers or resolve threads unless explicitly asked.

Status Summary

When reporting, separate:

  • CI failures that need code changes
  • likely flakes already rerun
  • unresolved review comments needing decisions
  • checks still running

If all checks pass but unresolved actionable review threads remain, say the PR is not fully babysat yet.