doc(yamlconf): add architecture doc, expand RainerScript relation

Why:
YAML configuration architecture was undocumented at the developer
and AI-agent level. The nvlst/cnfobj translation pipeline existed
only in C file headers and a one-liner in runtime/AGENTS.md, making
it invisible to contributors reading doc/ and to AI ingestion tools.
The first-move rationale and the deliberate confinement strategy had
no canonical home.

Technical Overview:
- Add doc/source/development/yaml_config_architecture.rst: new
  developer concept page explaining the full pipeline from libyaml
  event stream through nvlst chains, cnfobj construction, cnfDoObj()
  dispatch, and cnfAddConfigBuffer() ruleset injection. Includes a
  Mermaid flowchart with colour-coded layers (input/parse/IR/core).
  States that the translation approach keeps the change surface
  minimal and may be revisited only with concrete justification.
- Expand yaml_config.rst 'Relationship to RainerScript' section:
  names nvlst/cnfobj/cnfDoObj() explicitly, explains there is no
  independent YAML runtime, and cross-links to the new arch doc.
- Update doc/ai/terminology.md: add YAML Configuration section with
  canonical definitions for nvlst, cnfobj, cnfDoObj(), cnfAddConfig
  Buffer(), and the YAML front-end concept for AI RAG ingestion.

All pages pass authoring policy checks (summary <=200 chars,
description <=160 chars, page <=800 words, Mermaid rules).

With the help of AI-Agents: GitHub Copilot (claude-sonnet-4.6)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Rainer Gerhards 2026-03-28 10:23:45 +01:00
parent e224ca3a19
commit f8321aadbb
No known key found for this signature in database
GPG Key ID: 0CB6B2A8BE80B499
3 changed files with 168 additions and 2 deletions

View File

@ -7,3 +7,24 @@
- **Queue** — buffering between stages.
Avoid "data pipeline" unless context demands.
## YAML Configuration
- **YAML front-end** — the translation layer in `runtime/yamlconf.c` that converts
`.yaml` config files into rsyslog's internal structures. It has no independent
runtime path; all execution goes through the shared RainerScript back-end.
- **nvlst** — ordered linked list of `(name, value)` pairs; rsyslog's canonical
intermediate representation for configuration parameters. Both the RainerScript
grammar and the YAML parser produce `nvlst` chains; `nvlstGetParams()` consumes
them in a format-agnostic way.
- **cnfobj** — a typed wrapper around an `nvlst` chain. The type tag
(`CNFOBJ_GLOBAL`, `CNFOBJ_ACTION`, `CNFOBJ_INPUT`, …) tells `cnfDoObj()` which
module handler to invoke.
- **cnfDoObj()** — the central dispatcher in `rsconf.c` that receives a `cnfobj`
and calls the appropriate subsystem initialisation function. Both RainerScript
and the YAML front-end target this function.
- **cnfAddConfigBuffer()** — pushes a RainerScript text snippet onto the lex buffer
stack so the running `yyparse()` processes it. The YAML front-end uses this to
handle ruleset `script:` / `statements:` bodies without a separate interpreter.
For the full pipeline diagram see `doc/source/development/yaml_config_architecture.rst`.

View File

@ -623,8 +623,20 @@ a typical ``/etc/rsyslog.conf``:
Relationship to RainerScript
-----------------------------
YAML configuration is a thin front-end over the same internal machinery
that RainerScript uses. In particular:
YAML configuration is a thin translation front-end over the same internal
machinery that RainerScript uses. The YAML parser (``runtime/yamlconf.c``)
converts each YAML block into ``cnfobj`` + ``nvlst`` structures — the identical
intermediate representation that the RainerScript lex/bison grammar produces —
and hands them to the shared ``cnfDoObj()`` dispatcher. There is no independent
YAML runtime; the shared back-end handles all validation, module initialisation,
and execution.
This approach deliberately minimises the change surface: the YAML-specific code
is confined to one file with no runtime presence after configuration loading.
It may be refactored in the future, but only when concrete requirements justify
the additional maintenance surface.
In practical terms:
- Parameter names are identical; all per-module parameter documentation
applies without change.
@ -635,6 +647,9 @@ that RainerScript uses. In particular:
- Template, ruleset, and lookup-table names are shared; a YAML-defined
ruleset can be referenced from a RainerScript ``action()``.
For a detailed description of the pipeline see
:doc:`../development/yaml_config_architecture`.
Limitations (current implementation)
--------------------------------------

View File

@ -0,0 +1,130 @@
.. _dev_yaml_config_architecture:
YAML Configuration Architecture
================================
.. meta::
:description: Developer guide to rsyslog YAML config architecture: translation layer, nvlst intermediate representation, and RainerScript back-end sharing.
:keywords: yaml, configuration, architecture, nvlst, rainerscript, translation layer, cnfobj, cnfDoObj, yamlconf
.. summary-start
YAML config is a thin translation front-end: yamlconf.c converts .yaml files into the same nvlst/cnfobj structures RainerScript produces, feeding them to the shared cnfDoObj() back-end.
.. summary-end
Overview
--------
rsyslog's YAML support is a first step toward a structured, operator-friendly
configuration format. The guiding principle was to **minimise the change surface**:
rather than building a parallel engine, ``runtime/yamlconf.c`` translates each YAML
block into the same intermediate representation (``cnfobj`` + ``nvlst``) that the
lex/bison RainerScript grammar already produces.
This confinement strategy means:
- Parameter validation, type coercion, and error reporting reuse the existing
``nvlstGetParams()`` layer — no duplication.
- Every module that works in RainerScript works identically in YAML.
- Defects in the translation layer cannot silently corrupt the shared back-end.
- The YAML-specific code is a single, self-contained file with no runtime presence
after configuration loading completes.
This approach may be revisited if future requirements (richer error messages,
schema validation, live reload) justify a deeper integration. Any such refactoring
requires clear evidence of user benefit that outweighs the additional maintenance
surface.
Processing Pipeline
-------------------
.. mermaid::
flowchart TD
YF["YAML file<br>(.yaml / .yml)"]:::src
LY["libyaml parser<br>(event stream)"]:::parse
NV["nvlst chains<br>(name-value lists)"]:::ir
CO["cnfobj<br>(typed config objects)"]:::ir
CD["cnfDoObj()<br>(rsconf.c dispatcher)"]:::core
MH["Module handlers<br>(global / input / action …)"]:::core
SC["script: / statements:<br>content"]:::src
CB["cnfAddConfigBuffer()<br>(RainerScript text)"]:::parse
BP["lex / bison parser<br>(existing grammar)"]:::parse
RE["Ruleset engine<br>(execution tree)"]:::core
YF --> LY
LY --> NV
NV --> CO
CO --> CD
CD --> MH
SC -->|"verbatim or<br>synthesised"| CB
CB --> BP
BP --> RE
classDef src fill:#fff2cc,stroke:#d6b656;
classDef parse fill:#dae8fc,stroke:#6c8ebf;
classDef ir fill:#f8cecc,stroke:#b85450;
classDef core fill:#d5e8d4,stroke:#82b366;
**Legend:** Yellow — input files/text. Blue — parsing stages.
Red — intermediate representation. Green — shared rsyslog core back-end.
Key Structures
--------------
**``nvlst``** (``runtime/conf.h``) is an ordered linked list of ``(name, value)``
pairs — rsyslog's canonical intermediate representation for configuration
parameters. Both the RainerScript grammar and ``yamlconf.c`` produce ``nvlst``
chains; ``nvlstGetParams()`` is format-agnostic.
**``cnfobj``** wraps an ``nvlst`` with a type tag (``CNFOBJ_GLOBAL``,
``CNFOBJ_ACTION``, ``CNFOBJ_INPUT``, etc.). One ``cnfobj`` is constructed per
top-level YAML block.
**``cnfDoObj()``** (``rsconf.c``) receives a ``cnfobj``, reads the type tag, and
calls the same module initialisation function that RainerScript would have called.
Ruleset Script Handling
-----------------------
YAML offers three ways to express ruleset logic; all three end up in the same
RainerScript execution tree via ``cnfAddConfigBuffer()``:
- **``script:``** — raw RainerScript block, passed through verbatim.
- **``statements:``** — YAML-native ``if / set / unset / call / foreach`` maps;
``yamlconf.c`` synthesises them into a RainerScript string.
- **``filter:`` + ``actions:``** — one-level shortcut; synthesised into an
``if … then { … }`` fragment.
No separate interpreter exists. The lex/bison parser handles all ruleset logic.
Parity and Maintenance
----------------------
Because both formats share the back-end, parity is largely automatic:
- **New global parameters** — no YAML change needed; ``nvlstGetParams()`` picks
them up automatically.
- **New top-level statement types** — requires a ``parse_*`` function in
``yamlconf.c`` and an entry in the dispatch table.
- **Renamed/removed parameters** — update ``yamlconf.c`` if it special-cases the
name, and update the user docs for both formats.
See ``runtime/AGENTS.md``: *"Any change to config objects, statement types,
template modifiers, or global parameters must be reflected here as well as in
``grammar/``."*
See Also
--------
- :doc:`architecture` — rsyslog microkernel architecture overview
- :doc:`design_decisions` — libyaml library choice rationale
- :doc:`config_data_model` — rsyslog configuration object model
- :ref:`yaml_config` — user-facing YAML reference