376 Commits

Author SHA1 Message Date
Aditi Prakash
075bf3006a core: Fix typos in comments
This improves code readability and maintainability by correcting
typos in comments. While non-functional, clear comments reduce
confusion for contributors and support long-term project quality.

Before: comments contained minor typos and inconsistencies.
After: comments use corrected spelling and clearer wording.

Impact: none (no functional or behavioral changes).

The patch updates comment text only, without modifying logic,
interfaces, or runtime behavior. No changes to queue semantics,
transactions, or module interactions are introduced.

This aligns with ongoing maintenance efforts to keep the codebase
clean and easier to understand for contributors and reviewers.

Fixes: https://github.com/rsyslog/rsyslog/issues/6023

AI-Agent: Copilot 2026-03
2026-03-16 23:40:43 -07:00
Rainer Gerhards
b1ea93aed9
queue/runtime/tests: harden diskqueue recovery and shutdown race handling
Why:
This branch combines two related hardening steps for disk queue reliability:
- robust corruption detection/recovery handling in disk queue state/file validation
- worker startup cancellation-race closure that could lead to shutdown wait loops

Impact:
- disk queue scan now rejects out-of-range segment sequence numbers early and
  reports corruption deterministically.
- worker startup no longer exposes a cancellation window before cleanup
  registration.
- test/CI diagnostics preserve timeout backtraces (gdb) in ARM jobs and print
  them to stdout for post-mortem debugging.
- test script cleanup removes redundant operations and uses a macOS-friendlier
  segment enumeration path.

Technical Overview:
- runtime/queue.c:
  - add out-of-range sequence-number rejection during spool scan
  - keep orphan-loop range check as defensive fallback
- runtime/wtp.c:
  - disable cancellation and register cleanup before publishing RUNNING
  - document startup/cancellation invariant inline
- runtime/wti.c:
  - add concise cancellation-contract comment
- devtools/ci/Dockerfile.arm:
  - install gdb for CI timeout diagnostics
- tests/diskqueue-oncorruption-missing-segment.sh:
  - emit timeout gdb backtraces to stdout
  - drop redundant STARTED_LOG truncate
  - avoid GNU find -printf/mapfile dependency in segment listing
2026-02-23 10:17:04 +01:00
Rainer Gerhards
6a636f2527
queue: fix DA worker infinite loop regression from commit 4748c5746
Revert the problematic condition added in commit 4748c5746 that
activated the DA worker pool when disk queue (pqDA) has data.

Root Cause:
The DA worker pool (pWtpDA, ConsumerDA function) moves data FROM
the in-memory parent queue TO the disk queue. When activated with
an empty parent queue, it immediately terminates (parent below low
watermark), but the condition remains true, causing an infinite
start/stop loop.

Why the original logic was incorrect:
The commit misunderstood the queue architecture. It tried to solve
slow disk queue draining by activating the DA worker pool, but:
- DA worker pool: Moves memory → disk (for spillover)
- Disk queue workers: Process disk → actions (automatic on load)

When rsyslog restarts with persisted disk queue data:
1. pqDA (disk queue) is loaded from files
2. pqDA's own regular workers start automatically via qqueueStart()
3. Those workers process messages from disk
4. No DA worker pool activation needed!

Test Results:
- With buggy code: 372 DA worker starts, test unstable
- With revert: 2 DA worker starts (normal), 19/20 test passes
- The 1/20 failure is pre-existing test flakiness

The original issue #2646 likely had a different root cause that
needs separate investigation. This revert prevents the regression
while restoring system stability.

Fixes regression in test: daqueue-drain-without-traffic.sh
Relates to: issue #2646, commit 4748c5746
2026-01-29 17:30:03 +01:00
Rainer Gerhards
4748c57462 queue: fix slow drain of disk-assisted queues
Why:
Disk-assisted queues were taking days to drain after recovery
because the DA worker only activated when the in-memory queue
reached the high watermark, creating a catch-22 when starting
with an empty memory queue but full disk queue.

Impact:
This fix enables proper recovery from backlogs and prevents data
loss from queues that cannot drain. Existing behavior for normal
operations is preserved.

Before:
DA worker only started when: memQueueSize >= highWatermark

After:
DA worker starts when: memQueueSize >= highWatermark OR
diskQueueSize > 0

Technical Overview:
Modified qqueueAdviseMaxWorkers() in runtime/queue.c to check
both the memory queue size against the high watermark (original
condition) and whether the disk queue (pqDA) has pending messages.
This ensures the DA worker activates whenever there is data on
disk to process, not just when new incoming traffic fills the
memory queue. The NULL check for pqDA prevents dereferencing
before the DA queue is initialized. This change maintains the
original high-watermark behavior while adding the recovery path.

closes https://github.com/rsyslog/rsyslog/issues/2646

With the help of AI-Agents: GitHub Copilot
2026-01-29 10:07:08 +01:00
08027ecff5 queue: harden disk recovery after invalid .qi
Hardens disk-queue recovery after an invalid .qi so read/write pointers
realign and on-disk size is corrected. This prevents stuck queues and
stabilizes the daqueue dirty shutdown test.

Bug Fixes
- On anomaly (rd==wr and offsets equal), seek the read-delete cursor to
  the writer, subtract deleted bytes from sizeOnDisk, and align the
  read-dequeue cursor; keep draining if seek fails.
- Log errors when pointer resets or seeks fail.
- Add strm.Sync() to keep stream state consistent after pointer updates.
- Refactor invalid .qi recovery and startup seek errors into helpers.
- When spool read files are missing on startup, align read to write and
  continue recovery.

With the help of AI-Agents: gpt-5.2-codex
2026-01-16 11:37:14 +01:00
Rainer Gerhards
b715e92f3d
queue: re-implement queue size warnings
This restores the warnings for invalid queue.highWaterMark and
queue.lowWaterMark configurations which were reverted in a previous
commit due to regressions.

The logic has been improved to only emit warnings if the user
explicitly configured an invalid value (i.e., not the default -1).
Previously, default values (which are -1) were triggering the warnings
incorrectly.

With the help of AI Agents: Google Jules

See also https://github.com/rsyslog/rsyslog/issues/5615
See also https://github.com/rsyslog/rsyslog/issues/5586
See also https://github.com/rsyslog/rsyslog/issues/5676

closes https://github.com/rsyslog/rsyslog/issues/5677
2026-01-05 14:08:17 +01:00
Rainer Gerhards
3830484fdb
queue: refactor batch deletion with explicit phases (#6013)
Introduce explicit phases for DeleteBatchFromQStore to
streamline logic and enforce deterministic dequeue IDs.

With the help of AI-Agent: ChatGPT
2025-08-28 18:23:31 +02:00
Rainer Gerhards
c8ab665951
core: migrate callback invocations to type-safe signatures
Replace opaque/variadic callback usage with explicit, type-safe function
signatures to reduce undefined behavior and clarify intent.
Adapter helpers bridge the existing APIs without raw variadic casts, enabling
the transition incrementally. Callback setup sites are standardized for
consistent readability. This tightens the contract on callbacks, eases future
refactoring, and makes their roles more self-documenting.

Inspired by https://github.com/rsyslog/rsyslog/pull/5882

With AI support: Codex, Gemini
2025-08-01 13:02:10 +02:00
Rainer Gerhards
b326c76f45 style: normalize C source formatting via clang-format (PoC)
This commit applies the new canonical formatting style using `clang-format` with custom settings (notably 4-space indentation), as part of our shift toward automated formatting normalization.

⚠️ No functional changes are included — only whitespace and layout modifications as produced by `clang-format`.

This change is part of the formatting modernization strategy discussed in:
https://github.com/rsyslog/rsyslog/issues/5747

Key context:
- Formatting is now treated as a disposable view, normalized via tooling.
- The `.clang-format` file defines the canonical style.
- A fixup script (`devtools/format-code.sh`) handles remaining edge cases.
- Formatting commits are added to `.git-blame-ignore-revs` to reduce noise.
- Developers remain free to format code however they prefer locally.
2025-07-16 13:56:21 +02:00
4a42252af8 Investigate and resolve rsyslog issue 2693 (#5793)
* Fix issue with queue.maxDiskSpace validation

When queue.maxDiskSpace is set smaller than queue.maxfilesize, rsyslog
could enter an infinite loop during shutdown. This fix adds validation
to ensure maxDiskSpace is at least as large as maxFileSize.

If an invalid configuration is detected, the system will log a warning
and automatically adjust maxDiskSpace to match maxFileSize to prevent
the shutdown loop.

closes: https://github.com/rsyslog/rsyslog/issues/2693

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-07-15 17:42:23 +02:00
Rainer Gerhards
7225999b77 refactor: modernize macro definitions to support formatting and clarity
This commit performs a broad modernization of widely used rsyslog
macros to align with modern C practices and support automated
formatting tools like clang-format. The changes focus on improving
syntactic regularity, readability, and tooling compatibility — without
altering behavior.

Macros refactored in this commit now follow a consistent,
statement-like form with explicit trailing semicolons. Where
applicable, macro blocks that define module interfaces (`queryEtryPt`)
have been updated to use simple `if` statements instead of `else if`
chains. While this slightly increases evaluation time, the affected
functions are only called once per module during load time to register
supported interfaces — making the performance cost irrelevant in
practice.

These improvements serve multiple purposes:
- Enable reliable clang-format usage without mangling macro logic
- Simplify reasoning about macro-expanded code for human readers
- Reduce style drift and merge conflicts
- Facilitate development for contributors using assistive tools
- Support future formatting pipelines using:
  1. `clang-format`
  2. a post-fixup normalization script

Refactored macros:
- MODULE_TYPE_NOKEEP
- MODULE_TYPE_KEEP
- MODULE_TYPE_INPUT
- MODULE_TYPE_OUTPUT
- MODULE_TYPE_FUNCTION
- MODULE_TYPE_PARSER
- MODULE_TYPE_LIB
- DEF_IMOD_STATIC_DATA
- DEF_OMOD_STATIC_DATA
- DEF_PMOD_STATIC_DATA
- DEF_FMOD_STATIC_DATA
- DEFobjStaticHelpers
- SIMP_PROP(...)

And all `queryEtryPt()` dispatch macros:
- CODEqueryEtryPt_STD_MOD_QUERIES
- CODEqueryEtryPt_STD_OMOD_QUERIES
- CODEqueryEtryPt_STD_OMODTX_QUERIES
- CODEqueryEtryPt_STD_OMOD8_QUERIES
- CODEqueryEtryPt_TXIF_OMOD_QUERIES
- CODEqueryEtryPt_IsCompatibleWithFeature_IF_OMOD_QUERIES
- CODEqueryEtryPt_STD_IMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_QUERIES
- CODEqueryEtryPt_STD_CONF2_setModCnf_QUERIES
- CODEqueryEtryPt_STD_CONF2_OMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_IMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_PREPRIVDROP_QUERIES
- CODEqueryEtryPt_STD_CONF2_CNFNAME_QUERIES
- CODEqueryEtryPt_STD_PMOD_QUERIES
- CODEqueryEtryPt_STD_PMOD2_QUERIES
- CODEqueryEtryPt_STD_FMOD_QUERIES
- CODEqueryEtryPt_STD_SMOD_QUERIES
- CODEqueryEtryPt_doHUPWrkr
- CODEqueryEtryPt_doHUP

This general modernization reduces macro misuse, improves DX, and
lays the foundation for a robust, automated style normalization
system.

See also: https://github.com/rsyslog/rsyslog/issues/5747
2025-07-15 08:25:58 +02:00
Rainer Gerhards
6bba692bba
dev doc: add design notes to queue subsystem files 2025-07-12 12:23:46 +02:00
Rainer Gerhards
e7dfcc9a74
cleanup and doc improvement
among others, remove some warning suppressions by "fixing" the
respective constructs with work-arounds (root cause is compilers
do not handle enums in switch well).
2025-06-14 14:38:51 +02:00
Rainer Gerhards
d72f5a364f
Revert "queue: emit better warning messages on queue param mismatch" 2025-06-09 18:28:21 +02:00
Rainer Gerhards
525a6f1bbf
queue: emit better warning messages on queue param mismatch
We silently adjust some queue parameters if they are set invalidly. At
least for some important settings are now warning messages emitted.
2025-02-27 09:32:44 +01:00
Rainer Gerhards
08501b930f
CI: add check for compile with -std=gnu23 gcc option
Note: The upcoming gnu23 C standard is overdoing it with type-safety. Inside
rsyslog, we historically have method tables for generic calls, which
keeps the code small and easy to understand. This would not decently be
possible with the new type-safety requirements.

So this commit works around these warning in a way that pretends to
provide more type safety. We have done this in the least intrusive
way to reduce the risk for regressions in code that works well in
since decades. Also note that the code already does parameter
validation.

There would have been more elaborate ways to make gnu23 compile happy,
e.g. by using a union of structs to provide the data element. Some folks
consider this type safe. In reality, it is not a bit better than
traditional C without types at all. Because the caller still needs to
ensure it picks the right struct from the union. As this approach
would also have larger regeression potential, we have not used it.

Right now, we have suppressed some of the new warnings, as working
around them would have required an even larger time budget and
potentially larger regression potential. In the long term we may
want to look into enabling them, as they would potentially be
beneficial for new code not involving method tables.

Some nits, however, were detected and have been fixed.

This patch also "fixes" some false positive test failures, mostly
be disabling some test functionality after confirmation these are
flakes.

see also https://github.com/rsyslog/rsyslog/issues/5507
2024-12-31 10:29:00 +01:00
Rainer Gerhards
eaac48d0d2
bugfix: prevent pot. segfault when switchung to queue emergency mode
When switching to Disk queue emergency mode, we destructed the in-memory
queue object.  Practice has shown that this MAY cause races during
destruction which themselfs can lead to segfault. For that reason, we
now keep the disk queueu object. This will keep some ressources,
including disk space, allocated. But  we prefer that over a segfault.
After all, it only happens after a serious queue error when we are
already at the edge of hard problems.

see also: https://github.com/rsyslog/rsyslog/issues/4963
2022-11-10 12:59:26 +01:00
Rainer Gerhards
b0435d5e89
Merge pull request #4791 from Cropi/dynamic-config-queue
Make the main message queue part of the config
2022-03-16 12:48:22 +01:00
alakatos
452b62b4a4 Make the main message queue part of the config 2022-03-01 09:56:39 +01:00
Michael Biebl
6569133c75
Typo fixes (#4801)
* typo fix: ambigious -> ambiguous

* typo fix: aquire -> acquire

* typo fix: assgined -> assigned

* typo fix: cancelation -> cancellation

* typo fix: childs -> children

* typo fix: configuraton -> configuration

* typo fix: delemiter -> delimiter

* typo fix: forwardig -> forwarding

* typo fix: initializiation -> initialization

* typo fix: intializing -> initializing

* typo fix: lengh -> length

* typo fix: mesage -> message

* typo fix: occured -> occurred

* typo fix: occurence -> occurrence

* typo fix: paramter -> parameter

* typo fix: remaing -> remaining

* typo fix: resetted -> reset

* typo fix: suppored -> supported

* typo fix: Sytem -> System

* typo fix: uncommited -> uncommitted

* typo fix: depricated -> deprecated

* typo fix: stoping -> stopping

* type fix: allow to -> allow one to
2022-02-17 10:54:12 +01:00
alakatos
321fc76f0f Move rsyslog global parameters to rsconf_t struct 2022-01-13 12:43:21 +01:00
74a49d3b63 queue: Add NULL check in qDeqLinkedList
Add NULL value handling for pDeqRoot. This caused seqfaults if
messages were discarded during dequeue.

Also fix iOverallQueueSize calculation (discarded items) in imdiag.

While building a testcase for issue #4437 , I discovered an issue with the
iOverallQueueSize counter not being substracting discarded messages. This caused
the testcase to fail with testcase timeout at the count of "discardMark" queue
setting.

closes: https://github.com/rsyslog/rsyslog/issues/4437
2021-09-06 16:02:40 +02:00
Rainer Gerhards
879a645bfb
Merge pull request #4069 from rgerhards/i4020
queue: permit ability to double size at shutdown
2020-06-22 12:46:37 +02:00
Rainer Gerhards
2e7207e3a5
queue subsystem: cap max queue size to 2^31-1
closes https://github.com/rsyslog/rsyslog/issues/4192
2020-03-04 10:33:55 +01:00
Rainer Gerhards
a9dd12b967
queue: permit ability to double size at shutdown
This prevents message loss due to "queue full" when re-enqueueing data
under quite exotic settings.

see also https://github.com/rsyslog/rsyslog/issues/3941#issuecomment-549765813
closes https://github.com/rsyslog/rsyslog/issues/4020
2020-03-03 16:14:31 +01:00
Rainer Gerhards
deb98fecc1
bugfixes: small issues detected by clang static analyzer 10 2019-12-18 11:37:00 +01:00
Rainer Gerhards
3b571a9201
core queue: emit warning if parameters are set for direct queue
Direct queues do not apply queue parameters because they are actually
no physical queue. As such, any parameter set is ignored. This can
lead to unintentional results.

The new code detects this case and warns the user.

closes https://github.com/rsyslog/rsyslog/issues/77
2019-11-14 16:57:12 +01:00
Rainer Gerhards
76d582a59b core queue: add config param "queue.takeFlowCtlFromMsg"
This is a fine-tuning option which permits to control whether or not
rsyslog shall alays take the flow control setting from the message. If
so, non-primary queues may also block when reaching high water mark.
This permits to add some synchronous processing to rsyslog core engine.
However, it is dangerous, as improper use may make the core engine
stall. As such, enabling this option requires very careful planning
of the rsyslog configuration and deep understanding of the consequences.

Note that the option is applied to individual queues, so a configuration
with a large number of queues can (and must if use) be fine-tuned to
the exact use case.

The rsyslog team strongly recommends to let the option turned off,
which is the default setting.

see also https://github.com/rsyslog/rsyslog/issues/3941
2019-11-11 13:06:32 +01:00
Rainer Gerhards
f585c715ca
queue cleanup: remove no longer needed debug output
This was added for a specific debug effort and obviously forgotten
to cleanup. This issue is not included in any scheduled release. It
was added a few days ago (we did not try to hunt down the exact
commit that caused it).

Thanks to github user eshadesu for alerting us.

closes https://github.com/rsyslog/rsyslog/issues/3955
2019-11-11 11:08:40 +01:00
Rainer Gerhards
e88b956e9d
core/queue: provide ability to run diskqueue on multiple threads
see also https://github.com/rsyslog/rsyslog/issues/3543
closes https://github.com/rsyslog/rsyslog/issues/3833
2019-10-30 10:04:09 +01:00
Rainer Gerhards
db7d639a83
core queue bugfix: propagate batch size to DA queue
This was a long-standing bug where the DA queue always had a fixed small batch
size because the setting was not propagated from the memory queue. This also
removes a needless and counter-productive "debug aid" which seemed to be in
the code for quite some while. It did not cause harm because of the batch
size issue.
2019-10-14 08:56:19 +02:00
Rainer Gerhards
f275487aab
core bugfix: potential abort on very long action name
The action name is stored in modified form for the debug header and
some messages. If it is extremely long, a buffer can be overrun,
resulting in misadressing and potential segfault for rsyslog. This
can also happen if the action is NOT named, but a custom path to
the output module is given and that path is very long. This triggers
the same issue because by default the module load path is included
in the action name.

This patch corrects the problem and trunctates overly long names
when being used for name generation.

The problem was detected during testbench work. We did never receive
a bug report from practice.
2019-10-08 12:55:20 +02:00
Rainer Gerhards
166acbfeca
queue subsystem bugfix: oversize queue warning message shown as error
The warning message was emitted as an error message, which is misleading
and may also break some automatted procedures.
2019-06-06 14:07:06 +02:00
Rainer Gerhards
359288d788
global config: new parameters for ruleset queue defaults
specifically:

* default.ruleset.queue.timeoutshutdown
* default.ruleset.queue.timeoutactioncompletion
* default.ruleset.queue.timeoutenqueue
* default.ruleset.queue.timeoutworkerthreadshutdown

closes https://github.com/rsyslog/rsyslog/issues/3656
2019-05-09 15:18:55 +02:00
Rainer Gerhards
ca7fb2408e
queue subsystem: provide better user status messages
The queue subsystem now provides additional information messages which
may help a regular user to maintain system healt. Most importantly,
DA queues now output when they persist queue data at end of run and
when they restart the queue based on persisted data.
2019-05-08 12:44:18 +02:00
Rainer Gerhards
e22fb205a3
config processing: check disk queue file is unique
If the same name is specified for multiple queues, the queue files
will become corrupted. This commit adds a check during config parsing.
If duplicate names are detected the config parser errors out and the
related object is not created.

Note: this may look to a change-of-behaviour to some users. However,
this never worked and it was pure luck that these users did not run
into big problems (e.g. DA queues were never going to disk at the
same time). So it is acceptable to error out in this hard error case.

closes https://github.com/rsyslog/rsyslog/issues/1385
2019-05-02 11:36:49 +02:00
Rainer Gerhards
86564757bf
queue subsystem: permit to disable "light delay mark"
New semantic: if lightDelayMark is 0, it is set to the max queue
size, effectively disabling the "light delay" functionality.

Thanks to Yury Bushmelev to mentioning issues related to light
delay mark and propsing the solution (which actually is what
this commit does).

closes https://github.com/rsyslog/rsyslog/issues/1778
2019-04-30 18:08:17 +02:00
Rainer Gerhards
f826c2afad
core: emit a warning message for ultra-large queue size definitions
We see error reports from users who have configured excessively large queues
and receive an OOM condition or other problems.

With that patcjh we generate a warnomg message if a queue is configured very
large. "Very large" is defined to be in excess of 500000 messages.

see also https://github.com/rsyslog/rsyslog/issues/3314
closes https://github.com/rsyslog/rsyslog/issues/3334
2019-04-18 11:50:18 +02:00
Philippe Duveau
9ad7324dfa AIX_port: second phase 2019-02-14 14:36:05 +01:00
Rainer Gerhards
8c8e1e92d0
queue: add support for minimum batch sizes
closes https://github.com/rsyslog/rsyslog/issues/495
2019-01-18 17:34:55 +01:00
Rainer Gerhards
e2b87769f1
global object: add ability to set action queue timeout defaults
while this is useful for users as well, we have done it so
that we can handle slow CI systems during CI runs. It is also
required for massively parallel testing, which makes each
individual test rather slow.

With the new settings, the testbench framework can now set
longer timeouts by defaults. Also updated framework accordingly.
2018-12-20 13:56:30 +01:00
937dbcb801 bugfix tls subsystem: Receiver hang due to insufficient TLS buffersize.
gtls and ossl driver used a default buffersize of 8 x 1024 bytes to store
received TLS packets. When tls read returned more than buffersize, the additional
buffer was not processed until new data arrived on the socket again.

TLS RFCs require up to 16KB buffer for a single TLS record.

closes https://github.com/rsyslog/rsyslog/issues/3325
2018-12-18 14:44:58 +01:00
Josh Soref
bfd9248670 spelling: https 2018-11-14 11:56:57 -05:00
Josh Soref
d642d984d3 canonical url www.rsyslog.com/doc/ 2018-11-14 12:03:20 -05:00
Rainer Gerhards
9894bea047
simplify code 2018-10-31 08:23:17 +01:00
Rainer Gerhards
9bece39dc6
SQUASH
debug cleanup: remove some old, no longer used macros
2018-10-30 12:46:04 +01:00
Rainer Gerhards
f48b10383d
queue bugfix: invalid error message on queue startup
due to some old regression (commit not exactly identified, but for
sure a regression, 9 years ago it was correct) an error message
is emitted when no .qi file exists on startup of the queue, which
is a normal condition.

Actually, the code should not have tried to open the .qi file in
the first place because it detected that it did not exist. That
(necessary) shortcut had been removed a while ago.

closes https://github.com/rsyslog/rsyslog/issues/3117
2018-10-22 12:21:58 +02:00
Rainer Gerhards
d33aa84eb4
shutdown: fix race in thread termination
This prevents new workers from being spawned when the system
is in shutdown state, except when needed to persist queue
data to disk. In all other cases, re-spawning workers brings
the system to an instable state and is not desired. We think
this happens during very late messages being newly generated
(e.g. status message) or received.
2018-10-19 12:15:55 +02:00
Jan Gerhards
32b1033e7b queue: fix invalid assert location
Also improve error checking. Detected by Codacy/cppcheck.
2018-10-16 13:02:37 +02:00
Rainer Gerhards
e229fee062
Merge pull request #3111 from rgerhards/grcy-memleak
gcry crypto driver: small memleak
2018-10-11 12:28:33 +02:00