This improves code readability and maintainability by correcting
typos in comments. While non-functional, clear comments reduce
confusion for contributors and support long-term project quality.
Before: comments contained minor typos and inconsistencies.
After: comments use corrected spelling and clearer wording.
Impact: none (no functional or behavioral changes).
The patch updates comment text only, without modifying logic,
interfaces, or runtime behavior. No changes to queue semantics,
transactions, or module interactions are introduced.
This aligns with ongoing maintenance efforts to keep the codebase
clean and easier to understand for contributors and reviewers.
Fixes: https://github.com/rsyslog/rsyslog/issues/6023
AI-Agent: Copilot 2026-03
Why:
This branch combines two related hardening steps for disk queue reliability:
- robust corruption detection/recovery handling in disk queue state/file validation
- worker startup cancellation-race closure that could lead to shutdown wait loops
Impact:
- disk queue scan now rejects out-of-range segment sequence numbers early and
reports corruption deterministically.
- worker startup no longer exposes a cancellation window before cleanup
registration.
- test/CI diagnostics preserve timeout backtraces (gdb) in ARM jobs and print
them to stdout for post-mortem debugging.
- test script cleanup removes redundant operations and uses a macOS-friendlier
segment enumeration path.
Technical Overview:
- runtime/queue.c:
- add out-of-range sequence-number rejection during spool scan
- keep orphan-loop range check as defensive fallback
- runtime/wtp.c:
- disable cancellation and register cleanup before publishing RUNNING
- document startup/cancellation invariant inline
- runtime/wti.c:
- add concise cancellation-contract comment
- devtools/ci/Dockerfile.arm:
- install gdb for CI timeout diagnostics
- tests/diskqueue-oncorruption-missing-segment.sh:
- emit timeout gdb backtraces to stdout
- drop redundant STARTED_LOG truncate
- avoid GNU find -printf/mapfile dependency in segment listing
Revert the problematic condition added in commit 4748c5746 that
activated the DA worker pool when disk queue (pqDA) has data.
Root Cause:
The DA worker pool (pWtpDA, ConsumerDA function) moves data FROM
the in-memory parent queue TO the disk queue. When activated with
an empty parent queue, it immediately terminates (parent below low
watermark), but the condition remains true, causing an infinite
start/stop loop.
Why the original logic was incorrect:
The commit misunderstood the queue architecture. It tried to solve
slow disk queue draining by activating the DA worker pool, but:
- DA worker pool: Moves memory → disk (for spillover)
- Disk queue workers: Process disk → actions (automatic on load)
When rsyslog restarts with persisted disk queue data:
1. pqDA (disk queue) is loaded from files
2. pqDA's own regular workers start automatically via qqueueStart()
3. Those workers process messages from disk
4. No DA worker pool activation needed!
Test Results:
- With buggy code: 372 DA worker starts, test unstable
- With revert: 2 DA worker starts (normal), 19/20 test passes
- The 1/20 failure is pre-existing test flakiness
The original issue #2646 likely had a different root cause that
needs separate investigation. This revert prevents the regression
while restoring system stability.
Fixes regression in test: daqueue-drain-without-traffic.sh
Relates to: issue #2646, commit 4748c5746
Why:
Disk-assisted queues were taking days to drain after recovery
because the DA worker only activated when the in-memory queue
reached the high watermark, creating a catch-22 when starting
with an empty memory queue but full disk queue.
Impact:
This fix enables proper recovery from backlogs and prevents data
loss from queues that cannot drain. Existing behavior for normal
operations is preserved.
Before:
DA worker only started when: memQueueSize >= highWatermark
After:
DA worker starts when: memQueueSize >= highWatermark OR
diskQueueSize > 0
Technical Overview:
Modified qqueueAdviseMaxWorkers() in runtime/queue.c to check
both the memory queue size against the high watermark (original
condition) and whether the disk queue (pqDA) has pending messages.
This ensures the DA worker activates whenever there is data on
disk to process, not just when new incoming traffic fills the
memory queue. The NULL check for pqDA prevents dereferencing
before the DA queue is initialized. This change maintains the
original high-watermark behavior while adding the recovery path.
closes https://github.com/rsyslog/rsyslog/issues/2646
With the help of AI-Agents: GitHub Copilot
Hardens disk-queue recovery after an invalid .qi so read/write pointers
realign and on-disk size is corrected. This prevents stuck queues and
stabilizes the daqueue dirty shutdown test.
Bug Fixes
- On anomaly (rd==wr and offsets equal), seek the read-delete cursor to
the writer, subtract deleted bytes from sizeOnDisk, and align the
read-dequeue cursor; keep draining if seek fails.
- Log errors when pointer resets or seeks fail.
- Add strm.Sync() to keep stream state consistent after pointer updates.
- Refactor invalid .qi recovery and startup seek errors into helpers.
- When spool read files are missing on startup, align read to write and
continue recovery.
With the help of AI-Agents: gpt-5.2-codex
Replace opaque/variadic callback usage with explicit, type-safe function
signatures to reduce undefined behavior and clarify intent.
Adapter helpers bridge the existing APIs without raw variadic casts, enabling
the transition incrementally. Callback setup sites are standardized for
consistent readability. This tightens the contract on callbacks, eases future
refactoring, and makes their roles more self-documenting.
Inspired by https://github.com/rsyslog/rsyslog/pull/5882
With AI support: Codex, Gemini
This commit applies the new canonical formatting style using `clang-format` with custom settings (notably 4-space indentation), as part of our shift toward automated formatting normalization.
⚠️ No functional changes are included — only whitespace and layout modifications as produced by `clang-format`.
This change is part of the formatting modernization strategy discussed in:
https://github.com/rsyslog/rsyslog/issues/5747
Key context:
- Formatting is now treated as a disposable view, normalized via tooling.
- The `.clang-format` file defines the canonical style.
- A fixup script (`devtools/format-code.sh`) handles remaining edge cases.
- Formatting commits are added to `.git-blame-ignore-revs` to reduce noise.
- Developers remain free to format code however they prefer locally.
* Fix issue with queue.maxDiskSpace validation
When queue.maxDiskSpace is set smaller than queue.maxfilesize, rsyslog
could enter an infinite loop during shutdown. This fix adds validation
to ensure maxDiskSpace is at least as large as maxFileSize.
If an invalid configuration is detected, the system will log a warning
and automatically adjust maxDiskSpace to match maxFileSize to prevent
the shutdown loop.
closes: https://github.com/rsyslog/rsyslog/issues/2693
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
This commit performs a broad modernization of widely used rsyslog
macros to align with modern C practices and support automated
formatting tools like clang-format. The changes focus on improving
syntactic regularity, readability, and tooling compatibility — without
altering behavior.
Macros refactored in this commit now follow a consistent,
statement-like form with explicit trailing semicolons. Where
applicable, macro blocks that define module interfaces (`queryEtryPt`)
have been updated to use simple `if` statements instead of `else if`
chains. While this slightly increases evaluation time, the affected
functions are only called once per module during load time to register
supported interfaces — making the performance cost irrelevant in
practice.
These improvements serve multiple purposes:
- Enable reliable clang-format usage without mangling macro logic
- Simplify reasoning about macro-expanded code for human readers
- Reduce style drift and merge conflicts
- Facilitate development for contributors using assistive tools
- Support future formatting pipelines using:
1. `clang-format`
2. a post-fixup normalization script
Refactored macros:
- MODULE_TYPE_NOKEEP
- MODULE_TYPE_KEEP
- MODULE_TYPE_INPUT
- MODULE_TYPE_OUTPUT
- MODULE_TYPE_FUNCTION
- MODULE_TYPE_PARSER
- MODULE_TYPE_LIB
- DEF_IMOD_STATIC_DATA
- DEF_OMOD_STATIC_DATA
- DEF_PMOD_STATIC_DATA
- DEF_FMOD_STATIC_DATA
- DEFobjStaticHelpers
- SIMP_PROP(...)
And all `queryEtryPt()` dispatch macros:
- CODEqueryEtryPt_STD_MOD_QUERIES
- CODEqueryEtryPt_STD_OMOD_QUERIES
- CODEqueryEtryPt_STD_OMODTX_QUERIES
- CODEqueryEtryPt_STD_OMOD8_QUERIES
- CODEqueryEtryPt_TXIF_OMOD_QUERIES
- CODEqueryEtryPt_IsCompatibleWithFeature_IF_OMOD_QUERIES
- CODEqueryEtryPt_STD_IMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_QUERIES
- CODEqueryEtryPt_STD_CONF2_setModCnf_QUERIES
- CODEqueryEtryPt_STD_CONF2_OMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_IMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_PREPRIVDROP_QUERIES
- CODEqueryEtryPt_STD_CONF2_CNFNAME_QUERIES
- CODEqueryEtryPt_STD_PMOD_QUERIES
- CODEqueryEtryPt_STD_PMOD2_QUERIES
- CODEqueryEtryPt_STD_FMOD_QUERIES
- CODEqueryEtryPt_STD_SMOD_QUERIES
- CODEqueryEtryPt_doHUPWrkr
- CODEqueryEtryPt_doHUP
This general modernization reduces macro misuse, improves DX, and
lays the foundation for a robust, automated style normalization
system.
See also: https://github.com/rsyslog/rsyslog/issues/5747
among others, remove some warning suppressions by "fixing" the
respective constructs with work-arounds (root cause is compilers
do not handle enums in switch well).
Note: The upcoming gnu23 C standard is overdoing it with type-safety. Inside
rsyslog, we historically have method tables for generic calls, which
keeps the code small and easy to understand. This would not decently be
possible with the new type-safety requirements.
So this commit works around these warning in a way that pretends to
provide more type safety. We have done this in the least intrusive
way to reduce the risk for regressions in code that works well in
since decades. Also note that the code already does parameter
validation.
There would have been more elaborate ways to make gnu23 compile happy,
e.g. by using a union of structs to provide the data element. Some folks
consider this type safe. In reality, it is not a bit better than
traditional C without types at all. Because the caller still needs to
ensure it picks the right struct from the union. As this approach
would also have larger regeression potential, we have not used it.
Right now, we have suppressed some of the new warnings, as working
around them would have required an even larger time budget and
potentially larger regression potential. In the long term we may
want to look into enabling them, as they would potentially be
beneficial for new code not involving method tables.
Some nits, however, were detected and have been fixed.
This patch also "fixes" some false positive test failures, mostly
be disabling some test functionality after confirmation these are
flakes.
see also https://github.com/rsyslog/rsyslog/issues/5507
When switching to Disk queue emergency mode, we destructed the in-memory
queue object. Practice has shown that this MAY cause races during
destruction which themselfs can lead to segfault. For that reason, we
now keep the disk queueu object. This will keep some ressources,
including disk space, allocated. But we prefer that over a segfault.
After all, it only happens after a serious queue error when we are
already at the edge of hard problems.
see also: https://github.com/rsyslog/rsyslog/issues/4963
Add NULL value handling for pDeqRoot. This caused seqfaults if
messages were discarded during dequeue.
Also fix iOverallQueueSize calculation (discarded items) in imdiag.
While building a testcase for issue #4437 , I discovered an issue with the
iOverallQueueSize counter not being substracting discarded messages. This caused
the testcase to fail with testcase timeout at the count of "discardMark" queue
setting.
closes: https://github.com/rsyslog/rsyslog/issues/4437
Direct queues do not apply queue parameters because they are actually
no physical queue. As such, any parameter set is ignored. This can
lead to unintentional results.
The new code detects this case and warns the user.
closes https://github.com/rsyslog/rsyslog/issues/77
This is a fine-tuning option which permits to control whether or not
rsyslog shall alays take the flow control setting from the message. If
so, non-primary queues may also block when reaching high water mark.
This permits to add some synchronous processing to rsyslog core engine.
However, it is dangerous, as improper use may make the core engine
stall. As such, enabling this option requires very careful planning
of the rsyslog configuration and deep understanding of the consequences.
Note that the option is applied to individual queues, so a configuration
with a large number of queues can (and must if use) be fine-tuned to
the exact use case.
The rsyslog team strongly recommends to let the option turned off,
which is the default setting.
see also https://github.com/rsyslog/rsyslog/issues/3941
This was added for a specific debug effort and obviously forgotten
to cleanup. This issue is not included in any scheduled release. It
was added a few days ago (we did not try to hunt down the exact
commit that caused it).
Thanks to github user eshadesu for alerting us.
closes https://github.com/rsyslog/rsyslog/issues/3955
This was a long-standing bug where the DA queue always had a fixed small batch
size because the setting was not propagated from the memory queue. This also
removes a needless and counter-productive "debug aid" which seemed to be in
the code for quite some while. It did not cause harm because of the batch
size issue.
The action name is stored in modified form for the debug header and
some messages. If it is extremely long, a buffer can be overrun,
resulting in misadressing and potential segfault for rsyslog. This
can also happen if the action is NOT named, but a custom path to
the output module is given and that path is very long. This triggers
the same issue because by default the module load path is included
in the action name.
This patch corrects the problem and trunctates overly long names
when being used for name generation.
The problem was detected during testbench work. We did never receive
a bug report from practice.
The queue subsystem now provides additional information messages which
may help a regular user to maintain system healt. Most importantly,
DA queues now output when they persist queue data at end of run and
when they restart the queue based on persisted data.
If the same name is specified for multiple queues, the queue files
will become corrupted. This commit adds a check during config parsing.
If duplicate names are detected the config parser errors out and the
related object is not created.
Note: this may look to a change-of-behaviour to some users. However,
this never worked and it was pure luck that these users did not run
into big problems (e.g. DA queues were never going to disk at the
same time). So it is acceptable to error out in this hard error case.
closes https://github.com/rsyslog/rsyslog/issues/1385
New semantic: if lightDelayMark is 0, it is set to the max queue
size, effectively disabling the "light delay" functionality.
Thanks to Yury Bushmelev to mentioning issues related to light
delay mark and propsing the solution (which actually is what
this commit does).
closes https://github.com/rsyslog/rsyslog/issues/1778
We see error reports from users who have configured excessively large queues
and receive an OOM condition or other problems.
With that patcjh we generate a warnomg message if a queue is configured very
large. "Very large" is defined to be in excess of 500000 messages.
see also https://github.com/rsyslog/rsyslog/issues/3314
closes https://github.com/rsyslog/rsyslog/issues/3334
while this is useful for users as well, we have done it so
that we can handle slow CI systems during CI runs. It is also
required for massively parallel testing, which makes each
individual test rather slow.
With the new settings, the testbench framework can now set
longer timeouts by defaults. Also updated framework accordingly.
gtls and ossl driver used a default buffersize of 8 x 1024 bytes to store
received TLS packets. When tls read returned more than buffersize, the additional
buffer was not processed until new data arrived on the socket again.
TLS RFCs require up to 16KB buffer for a single TLS record.
closes https://github.com/rsyslog/rsyslog/issues/3325
due to some old regression (commit not exactly identified, but for
sure a regression, 9 years ago it was correct) an error message
is emitted when no .qi file exists on startup of the queue, which
is a normal condition.
Actually, the code should not have tried to open the .qi file in
the first place because it detected that it did not exist. That
(necessary) shortcut had been removed a while ago.
closes https://github.com/rsyslog/rsyslog/issues/3117
This prevents new workers from being spawned when the system
is in shutdown state, except when needed to persist queue
data to disk. In all other cases, re-spawning workers brings
the system to an instable state and is not desired. We think
this happens during very late messages being newly generated
(e.g. status message) or received.