Merge pull request #5920 from rgerhards/action-flow-step1

doc: reference AI module map in AGENTS.md and DEVELOPING.md
This commit is contained in:
Rainer Gerhards 2025-08-12 18:21:01 +02:00 committed by GitHub
commit d436497c1e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 164 additions and 18 deletions

View File

@ -2,14 +2,15 @@
This file defines guidelines and instructions for AI assistants (e.g., Codex, GitHub Copilot Workspace, ChatGPT agents) to understand and contribute effectively to the rsyslog codebase.
## Repository Overview
- **Primary Language**: C
- **Build System**: autotools (`autogen.sh`, `configure`, `make`)
- **Modules**: Dynamically loaded from `modules/`
- **Modules**: Dynamically loaded from `plugins/`
- **Contrib Modules**: Community-contributed under `contrib/`
- **Contributions**: Additional modules and features are placed in `contrib/`, which contains community-contributed plugins not actively maintained by the core rsyslog team. These are retained in `contrib/` even if adopted later, to avoid disruptions in dependent software.
- **Documentation**: Maintained in the doc/ subdirectory
- **AI module map**: `doc/ai/module_map.yaml` (per-module paths & locking hints)
- **docker definitions**: Maintained in the packaging/docker/ subdirectory
- **Side Libraries** (each in its own repo within the rsyslog GitHub org):
- [`liblognorm`](https://github.com/rsyslog/liblognorm)
@ -40,13 +41,14 @@ AI agents should follow this process:
Formatting-only commits listed in .git-blame-ignore-revs.
AI Agent Note: run devtools/format-code.sh as the final formatting step before commit.
-----
## Development Workflow
### Base Repository
- URL: [https://github.com/rsyslog/rsyslog](https://github.com/rsyslog/rsyslog)
- URL: https://github.com/rsyslog/rsyslog
- **Default base branch: `main`**
> The `main` branch is now the canonical base for all development.
> Some older references to `master` may still exist in documentation
@ -57,7 +59,7 @@ AI Agent Note: run devtools/format-code.sh as the final formatting step before c
1. Fork the repository (for personal development)
2. Create a feature/fix branch
3. Push changes to your fork
4. Open a **pull request directly into `rsyslog/rsyslog:master`**
4. Open a **pull request directly into `rsyslog/rsyslog:main`**
> **Important**: AI-generated PRs must target the `rsyslog/rsyslog` repository directly.
@ -77,7 +79,7 @@ There are no strict naming rules, but these conventions are used frequently:
## Coding Standards
- Commit messages **must include all relevant information**, not just in the PR
- Commit message titles **must not exceed 70 characters**
- Commit message titles **must not exceed 65 characters** (aim for 62)
- commit message text must be plain US ASCII, line length must not exceed 86 characters
- When referencing GitHub issues, use the **full GitHub URL** to assist in `git log`-based reviews
- Favor **self-documenting code** over excessive inline comments
@ -90,7 +92,7 @@ When fixing compiler warnings like `stringop-overread`, explain in the commit me
- Why the warning occurred
- What part of the code was changed
- How the fix prevents undefined behavior or aligns with compiler expectations
- Optionally link: [https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-overread](https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-overread)
- Optionally link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wstringop-overread
-----
@ -255,7 +257,7 @@ This ensures Codex can build core components even in constrained environments. S
### `imdocker`
- Depends on: `libcurl` (\>= 7.40.0)
- Depends on: `libcurl` (>= 7.40.0)
### `impcap`
@ -263,15 +265,15 @@ This ensures Codex can build core components even in constrained environments. S
### `imczmq` and `omczmq`
- Depends on: `libczmq` (\>= 4.0.0)
- Depends on: `libczmq` (>= 4.0.0)
### `omrabbitmq`
- Depends on: `librabbitmq` (\>= 0.2.0)
- Depends on: `librabbitmq` (>= 0.2.0)
### `omdtls` and `imdtls`
- Depends on: `openssl` (\>= 1.0.2 for output, \>= 1.1.0 for input)
- Depends on: `openssl` (>= 1.0.2 for output, >= 1.1.0 for input)
### `omhttp`
@ -283,11 +285,11 @@ This ensures Codex can build core components even in constrained environments. S
### `mmnormalize`
- Depends on: `liblognorm` (\>= 2.0.3)
- Depends on: `liblognorm` (>= 2.0.3)
### `mmkubernetes`
- Depends on: `libcurl` and `liblognorm` (\>= 2.0.3)
- Depends on: `libcurl` and `liblognorm` (>= 2.0.3)
### `mmgrok`
@ -321,16 +323,11 @@ This ensures Codex can build core components even in constrained environments. S
## AI-Specific Hints
- The `modules/` directory contains dynamically loaded input/output plugins
- The `plugins/` directory contains dynamically loaded input/output plugins
- `contrib/` contains external contributions (e.g., plugins) that are not core-maintained
- `statsobj.c` implements the statistics interface
- Documentation resides in the monorepos doc/ directory
- You may reference `rsyslog-docker` for dev/test environment setup
- Side libraries are external GitHub repos, not subdirectories
- **Shell Script Documentation**
@ -364,3 +361,28 @@ If you are an AI agent contributing code or documentation:
- Do not install third-party dependencies unless explicitly approved
- PRs must pass standard CI and review checks
- All code **must be reviewed manually**; AI output is subject to full review
-----
## Quickstart for AI coding agents (v8 concurrency & state)
**Read these first:**
* [`DEVELOPING.md`](./DEVELOPING.md) — v8 worker model & locking rules
* [`MODULE_AUTHOR_CHECKLIST.md`](./MODULE_AUTHOR_CHECKLIST.md) — one-screen checklist
* [doc/ai/module_map.yaml](./doc/ai/module_map.yaml) — seed list of modules, paths, and known locking needs
**Rules you must not break**
1. The framework may run **multiple workers per action**.
2. `wrkrInstanceData_t` (WID) is **per-worker**; never share it.
3. Shared mutable state lives in **pData** (per-action) and **must be protected**
by the module (mutex/rwlock). Do **not** rely on `mutAction` for this.
4. **Inherently serial resources** (e.g., a shared stream) must be serialized
inside the module via a mutex in **pData**.
5. **Direct queues** do not remove the need to serialize serial resources.
**Common agent tasks**
* Consult `doc/ai/module_map.yaml` to understand module paths and known locking.
* Add a “Concurrency & Locking” block at the top of output modules.
* Ensure serial modules guard stream/flush with a **pData** mutex.
* For modules with a library I/O thread (e.g., Proton), verify read/write locks
are taken on **all** callback paths.

65
DEVELOPING.md Normal file
View File

@ -0,0 +1,65 @@
# Developing rsyslog (v8 engine essentials)
This short guide is the **canonical entry point** for humans and AI coding agents.
It captures the v8 worker model and locking rules that module code must follow.
## v8 worker model (normative)
* The framework may run **multiple workers per action**.
* `wrkrInstanceData_t` (WID) is **per-worker, thread-local****no locks needed** inside WID.
* `pData` (per-action/shared) may be touched by multiple workers and/or extra threads → **guard mutable/shared members**.
* **Inherently serial resources** (e.g., a FILE/strm shared in pData) **must be serialized** inside the module using a mutex in pData.
* **Direct queues** do **not** remove the need to serialize serial resources.
### Where state goes
| Kind of state | Put it in | Locking |
|--------------------------------------|---------------------|----------------------------------------------------|
| Config, shared counters, streams | `pData` (per-action)| Guard if mutated or not thread-safe |
| Live handles, sockets, buffers | `wrkrInstanceData_t`| No locks (WID is owned by exactly one worker) |
| Library thread + callbacks (I/O) | `pData` + rwlock | Define who reads/writes; document in comments |
### Locking rules (quick)
1. Never share a WID across workers.
2. Any shared mutable `pData` member → protect with `pthread_mutex_t`
(or `pthread_rwlock_t` if a background thread is involved).
3. If a module uses an external/library thread, document **who takes read/write**.
4. Transactions follow the same rules as `doAction`.
5. Test with workerThreads > 1.
## Canonical examples (start here)
* `plugins/omfile/*` → inherently serial: uses a pData mutex to guard write/flush.
* `plugins/omfwd/*` → per-worker sockets in WID, typically no additional locks.
* `plugins/omelasticsearch/*` → per-worker curl & batch in WID.
* `plugins/omazureeventhubs/*` → two-thread model (rsyslog worker + Proton): needs an rwlock across threads (fix in progress).
* `contrib/omhttp/*` → per-worker curl & buffers in WID.
* `contrib/mmkubernetes/*` → WID is per-thread; shared caches live in pData and
must be guarded (see code).
## Entry points (what modules implement)
* `createWrkrInstance` / `freeWrkrInstance` — allocate/free WID (thread-local).
* `doAction` — may run concurrently across workers of the same action.
* `beginTransaction` / `commitTransaction` / `rollbackTransaction` — same concurrency rules as `doAction`.
> **Rule of thumb:** if multiple workers could touch it, it lives in `pData` and must be protected.
## Notes on current code
* `ommysql`: a global RW-lock was introduced historically; it will be refactored soon.
Consider it a historical note, not a pattern to copy.
* `omazureeventhubs`: ensure all Proton callbacks take the read lock and worker paths
that mutate handles take the write lock. A follow-up patch will address this.
## Coding style
* US-ASCII, subject ≤ 65 chars (aim 62), see `COMMENTING_STYLE.md` and `CONTRIBUTING.md`.
## Safe starter tasks for agents
* Add a **“Concurrency & Locking”** comment block at the top of output modules.
* Ensure inherently serial modules guard pData streams with a mutex.
* Verify modules with external threads hold the documented read/write locks on **all** callback paths (TSAN already runs in CI).
## Pointers
* Developer docs: `doc/` (Sphinx).
* Tests: `tests/`.
* Module sources: `plugins/` and `contrib/`.
* AI module map: `doc/ai/module_map.yaml` (per-module paths & locking hints).
---
_This file is intentionally brief and normative. Longer background belongs in Sphinx._

View File

@ -0,0 +1,25 @@
# Module author checklist (v8)
**State placement**
- [ ] Config/static/shared data in **pData** (per-action).
- [ ] Live handles/buffers in **wrkrInstanceData_t** (per-worker).
- [ ] No sharing of WID across workers.
**Serialization**
- [ ] Inherently serial resources (e.g., a shared stream) guarded by a **mutex in pData**.
- [ ] If any library thread or callback touches shared state, define a **pthread_rwlock_t**
and document who reads/writes.
**Entry points**
- [ ] `createWrkrInstance`/`freeWrkrInstance` allocate/free WID only.
- [ ] `doAction`/tx callbacks may run concurrently across workers; they **never** share WID.
**Docs in code**
- [ ] Top-of-file “Concurrency & Locking” block explains the above for this module.
- [ ] Doxygen comments on pData/WID typedefs describe lifetime & locking rules.
**Testing**
- [ ] Run with `queue.workerThreads > 1`. CI already runs TSAN on the full suite.
**Style**
- [ ] Commit subject ≤ 65 chars; aim for 62. ASCII only. See `CONTRIBUTING.md`.

34
doc/ai/module_map.yaml Normal file
View File

@ -0,0 +1,34 @@
# Seed map for AI agents; keep minimal and truthful. Extend over time.
omfile:
paths: ["plugins/omfile/"]
requires_serialization: true
locks:
- type: mutex
field: pData.mutWrite
omfwd:
paths: ["plugins/omfwd/"]
requires_serialization: false
omelasticsearch:
paths: ["plugins/omelasticsearch/"]
requires_serialization: false
omazureeventhubs:
paths: ["plugins/omazureeventhubs/"]
requires_serialization: true
locks:
- type: rwlock
field: pData.pnLock
writer: rsyslog_worker
reader: proton_thread
contrib_omhttp:
paths: ["contrib/omhttp/"]
requires_serialization: false
contrib_mmkubernetes:
paths: ["contrib/mmkubernetes/"]
requires_serialization: false
notes:
- WID is per-thread; shared caches in pData must be guarded.