Why
The queue security audit found several reachable validation and
recovery bugs in queue configuration, worker lifecycle handling,
and disk queue housekeeping. These failures were spread across the
v6 object parser, legacy sysline parser, and runtime recovery paths.
Impact
Queue configs now reject empty spool directories and non-positive
worker thread counts, and worker/disk queue recovery paths fail
closed instead of proceeding with inconsistent state.
Before/After
Before, invalid queue settings and worker startup/join failures could
slip through or corrupt recovery bookkeeping; after, those paths are
validated and handled deterministically with regression coverage.
Technical Overview
Switch queue worker thread settings to positive-integer handling in
both object config and legacy sysline plumbing.
Reject empty queue.spooldirectory values before trailing-slash
normalization and propagate cryprov initialization failures.
Keep the to-delete list sorted correctly and clear the duplicate-file
registry root after config teardown.
Avoid counting uninitialized stat data during multi-file queue seek and
reject invalid stream iMaxFiles values.
Handle pthread_create and pthread_join failures without leaving worker
state inconsistent, and add overflow guards to transactional worker
parameter growth.
Register focused regression tests for empty spool directories and zero
worker-thread configs and keep them in the distributed test set.
With the help of AI-Agents: Codex
This resolves inconsistent data type usage in srUtilItoA() calls
throughout the runtime. The function signature expects number_t
(typedef'd as int64/long long), but call sites were passing mixed
types (int, long) with inconsistent casting.
Why:
Type inconsistencies can lead to subtle bugs on platforms where
long and long long differ in size, particularly in queue handling
and serialization code.
Impact:
Internal only. No behavior change, improved type safety.
Before:
Mixed casts: (long) for short/int, bare long, bare int64.
After:
All call sites explicitly cast to number_t for consistency.
Technical Overview:
Standardized all 6 srUtilItoA() call sites:
- runtime/obj.c: 4 cases (SHORT, INT, LONG, INT64) now cast to
number_t explicitly
- runtime/stream.c: strmWriteLong() casts parameter to number_t
- runtime/stringbuf.c: rsCStrAppendInt() casts parameter to
number_t
The PROPTYPE_INT64 case already passed int64 (equivalent to
number_t), so it remains functionally unchanged but now has the
explicit cast for consistency.
closes https://github.com/rsyslog/rsyslog/issues/5733
With the help of AI-Agents: GitHub Copilot CLI
Why: Avoid a rare async close/open race and prevent
lock-order inversion when internal logging runs during
open failures.
Impact: Error-path logging timing changes; tests:
./tests/imtcp-tls-ossl-x509fingerprint.sh
Before/After: Before, async open released its lock before
init and could log while locked; after, open/init stays
locked and LogError is deferred until unlock.
Technical Overview:
- Hold the async writer mutex for open(2), isatty, and
crypto init.
- Defer LogError emission until after mutex release.
- Keep errno and error code for accurate post-unlock
logging.
- Add a clarifying doc comment for doPhysOpen.
Closes https://github.com/rsyslog/rsyslog/issues/3404
With the help of AI-Agents: Codex
Why: rotation scripts need the target file name to act without
external logrotate and to retain existing args.
Impact: new action param rotation.sizeLimitCommandPassFileName
(default on) appends the file name; legacy outchannel stays unchanged.
Before: sizeLimitCommand could pass at most one arg and never the
current file name.
After: optional file name is appended as the last argument when enabled.
Technical Overview:
- Add rotation.sizeLimitCommandPassFileName and plumb to streams.
- Default to on for action config; disable for legacy outchannel.
- Simplify execProg to accept up to two args and update callers.
- Update docs/tutorial and register new doc in doc/Makefile.am.
- Extend size-limit test to validate arg order and file name.
Issue: https://github.com/rsyslog/rsyslog/issues/6435
Tests: make -j$(nproc) check TESTS=""; ./tests/omfile-sizelimitcmd-many.sh
With the help of AI-Agents: Codex
Hardens disk-queue recovery after an invalid .qi so read/write pointers
realign and on-disk size is corrected. This prevents stuck queues and
stabilizes the daqueue dirty shutdown test.
Bug Fixes
- On anomaly (rd==wr and offsets equal), seek the read-delete cursor to
the writer, subtract deleted bytes from sizeOnDisk, and align the
read-dequeue cursor; keep draining if seek fails.
- Log errors when pointer resets or seeks fail.
- Add strm.Sync() to keep stream state consistent after pointer updates.
- Refactor invalid .qi recovery and startup seek errors into helpers.
- When spool read files are missing on startup, align read to write and
continue recovery.
With the help of AI-Agents: gpt-5.2-codex
This commit applies the new canonical formatting style using `clang-format` with custom settings (notably 4-space indentation), as part of our shift toward automated formatting normalization.
⚠️ No functional changes are included — only whitespace and layout modifications as produced by `clang-format`.
This change is part of the formatting modernization strategy discussed in:
https://github.com/rsyslog/rsyslog/issues/5747
Key context:
- Formatting is now treated as a disposable view, normalized via tooling.
- The `.clang-format` file defines the canonical style.
- A fixup script (`devtools/format-code.sh`) handles remaining edge cases.
- Formatting commits are added to `.git-blame-ignore-revs` to reduce noise.
- Developers remain free to format code however they prefer locally.
This commit performs a broad modernization of widely used rsyslog
macros to align with modern C practices and support automated
formatting tools like clang-format. The changes focus on improving
syntactic regularity, readability, and tooling compatibility — without
altering behavior.
Macros refactored in this commit now follow a consistent,
statement-like form with explicit trailing semicolons. Where
applicable, macro blocks that define module interfaces (`queryEtryPt`)
have been updated to use simple `if` statements instead of `else if`
chains. While this slightly increases evaluation time, the affected
functions are only called once per module during load time to register
supported interfaces — making the performance cost irrelevant in
practice.
These improvements serve multiple purposes:
- Enable reliable clang-format usage without mangling macro logic
- Simplify reasoning about macro-expanded code for human readers
- Reduce style drift and merge conflicts
- Facilitate development for contributors using assistive tools
- Support future formatting pipelines using:
1. `clang-format`
2. a post-fixup normalization script
Refactored macros:
- MODULE_TYPE_NOKEEP
- MODULE_TYPE_KEEP
- MODULE_TYPE_INPUT
- MODULE_TYPE_OUTPUT
- MODULE_TYPE_FUNCTION
- MODULE_TYPE_PARSER
- MODULE_TYPE_LIB
- DEF_IMOD_STATIC_DATA
- DEF_OMOD_STATIC_DATA
- DEF_PMOD_STATIC_DATA
- DEF_FMOD_STATIC_DATA
- DEFobjStaticHelpers
- SIMP_PROP(...)
And all `queryEtryPt()` dispatch macros:
- CODEqueryEtryPt_STD_MOD_QUERIES
- CODEqueryEtryPt_STD_OMOD_QUERIES
- CODEqueryEtryPt_STD_OMODTX_QUERIES
- CODEqueryEtryPt_STD_OMOD8_QUERIES
- CODEqueryEtryPt_TXIF_OMOD_QUERIES
- CODEqueryEtryPt_IsCompatibleWithFeature_IF_OMOD_QUERIES
- CODEqueryEtryPt_STD_IMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_QUERIES
- CODEqueryEtryPt_STD_CONF2_setModCnf_QUERIES
- CODEqueryEtryPt_STD_CONF2_OMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_IMOD_QUERIES
- CODEqueryEtryPt_STD_CONF2_PREPRIVDROP_QUERIES
- CODEqueryEtryPt_STD_CONF2_CNFNAME_QUERIES
- CODEqueryEtryPt_STD_PMOD_QUERIES
- CODEqueryEtryPt_STD_PMOD2_QUERIES
- CODEqueryEtryPt_STD_FMOD_QUERIES
- CODEqueryEtryPt_STD_SMOD_QUERIES
- CODEqueryEtryPt_doHUPWrkr
- CODEqueryEtryPt_doHUP
This general modernization reduces macro misuse, improves DX, and
lays the foundation for a robust, automated style normalization
system.
See also: https://github.com/rsyslog/rsyslog/issues/5747
among others, remove some warning suppressions by "fixing" the
respective constructs with work-arounds (root cause is compilers
do not handle enums in switch well).
When a to-be-monitored file is being rotated, some messages may be lost or
duplicated. In case of duplication, many file lines may be duplicated
depending on actual timing. The whole bug was primarily timing depenedent
in general. It most often was visible in practice when the monitored
file was very frequently rotated (we had some report with every few
seconds).
Note that while we try hard to not lose any messages, input file
rotation always has some loss potential. This is inevitable if
the monitored file is being truncated.
Also note that this bugfix affects imfile, only. It has nothing to do
and no relation to rsyslog output files being rotated on HUP.
closes: https://github.com/rsyslog/rsyslog/issues/4797
The zstd library provides better and faster compression than zlib.
This patch integrates zstd as a dynamically-loadable functionality.
As such, no further dependencies need to be added to the rsyslog
base package.
Due to the increased performance, usage of zstd is highly recommended
for high-volume use cases.
This patch also refactor zlib compression in order to unify handling
in both compression cases.
This error message is most probably rooted in a kernel problem. At
least knowbody knows how it can happen. It's definitely not a
rsyslog issue. We also can recover from it for a long time now
so there is no reason to irritate users by emitteing this
"error" message.
If imfile is ingesting log files with readMode set to 2 or 1, the resulting
messages all have a '#' character at the end. This patch corrects the behaviour.
Note: if some external script "supported" the bug of extra hash character at
the end of line, it may be necessary to update them.
closes https://github.com/rsyslog/rsyslog/issues/4491
When using Disk Queue and a queue.filename that can not be created
by rsyslog, the service does not switch to another queue type as
supposed to and crashes at a later step.
closes: https://github.com/rsyslog/rsyslog/issues/4282
The check was done in strmPhysWrite before which caused syslog
messages to split in the middle if the syslog message batch exceeded
the default IO Buffer size.
closes: https://github.com/rsyslog/rsyslog/issues/4233
- if cstrLen(pThis->prevMsgSegment) > maxMsgSize then len calculation
become negative if cstrLen(thisLine) < cstrLen(pThis->prevMsgSegment)
This causes illegal access to memory location and thus causing segfault.
- assigning len = 0 if cstrLen(pThis->prevMsgSegment) > maxMsgSize so that
it access the correct memory location.
Signed-off-by: Ankit Jain <ankitja@vmware.com>
The new parameter permits to specify a replacement to be configured
when "escapeLF" is set to "on". Previously, a fixed replacement string
was used ("#012"/"\n") depending on circumstances. If the parameter is
set to an empty string, the LF is simply discarded.
closes https://github.com/rsyslog/rsyslog/issues/3889
The stream class does not close re-opened file descriptors.
This lead to leaking file handles and ultimately to the inability
to open any files/sockets/etc as rsyslog ran out of handles.
The bug was depending on timing. This involed different OS
thread scheduler timing as well as workload. The bug was more
common under the following conditions:
- async writing of files
- dynafiles
- not commiting file data at end of transaction
However it could be triggerred under other conditions as well.
The refactoring done in 8.1908 increased the likelyhood of
experienceing this bug. But it was not a real regression, the new
code was valid, but changed the timing so that the race was more
likely.
Thanks to Michael Biebl for reporting this bug and helping to
analyze it.
closes https://github.com/rsyslog/rsyslog/issues/3885
Rsyslog may leave some dangling disk queue files under the following
conditions:
- batch sizes and/or messages are large
- queue files are comparatively small
- a batch spans more than two queue files (from n to n+m with m>1)
In this case, queue files n+1 to (n+m-1) are not deleted. This can
lead to problems when the queue is re-opened again. In extreme cases
this can also lead to stalled processing when the max disk space is
used up by such left-over queue files.
Using defaults this scenario is very unlikely, but it can happen,
especially when large messages are being processed.
This seems to be a long-standing bug, introduced around 7 years ago.
It became more visible by properly closing files during HUP, which
was done in 8.1905.0 (and was another bugfix).
closes https://github.com/rsyslog/rsyslog/issues/3772
This was originally added as aid to solve potential regressions.
But now it looks good for a while and we remove some of it as
it really is overdone.
Note: some other debug messages had already be removed, so this
closes https://github.com/rsyslog/rsyslog/issues/3046
The flush was only done to the last dynafile in use at end of
transactions. Dynafiles that were also modified during the
transaction were not flushed.
Special thanks to Duy Nguyen for pointing us to the bug and
suggesting a solution.
This commit also contains a bit of cosmetic cleanup inside
the file stream class.
closes https://github.com/rsyslog/rsyslog/issues/2502
This works-around an issue we can reproduce e.g. via the
imtcp-tls-ossl-x509fingerprint.sh test. Here, omfile gets a write
error with reason EBADF. So far, I was not able to see an actual
coding error. However I traced this down to a multithreaded race
on open and close calls. I am very surprised to see this type
of issue, as I think the kernel guarantees that it does not happen.
Here is what I see in strace -f:
openssl accepts a socket:
[pid 66386] accept(4, {sa_family=AF_INET, sin_port=htons(59054), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 10
then, it works a bit with that socket, detects a failure and shuts it down. Sometimes, at the very same instant omfile on another thread tries to open on output file. Then the following happens:
[pid 66386] close(10) = 0
[pid 66389] openat(AT_FDCWD, "./rstb_356100_31fa9d20.out.log", O_WRONLY|O_CREAT|O_NOCTTY|O_APPEND|O_CLOEXEC, 0644 <unfinished ...>
[pid 66386] close(10 <unfinished ...>
[pid 66389] <... openat resumed> ) = 10
[pid 66386] <... close resumed> ) = 0
[pid 66386] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, -1 <unfinished ...>
[pid 66389] write(2, "file './rstb_356100_31fa9d20.out"..., 66file './rstb_356100_31fa9d20.out.log' opened as #10 with mode 420
) = 66
[pid 66389] ioctl(10, TCGETS, 0x7f59aeb89540) = -1 EBADF (Bad file descriptor)
This is **literally** from the log, without deleting or reordering
lines. I read it so that there is a race between `open` and `close`
where fd 10 is reused, but seemingly closed - resulting in the `EBADF`
While it smells like a kernel issue, it may be a well-hidden program
bug - if so, one I currently do not find. HOWEVER, this commit
works around the issue by reopening the file when we receive EBADF.
That's the best thing to do in that case, especially if it really is
a kernel bug. Data loss should not occur, as the previous writes
succeeded in that case.
The backdraw of this work-around is that it only "fixes" omfile. In
theory every part of rsyslog can be affected by this issues (queue
files, for example). So this is not to be considered a final solution
of the root issues (but a big step forward for known problem cases).
see also https://github.com/rsyslog/rsyslog/issues/3404
while in theory, the fd should immediately be rewritten, in practice
we sometimes see some errors "bad file descriptor" that we cannot
explain. So we clean this up to remove a potential trouble cause.
* truncation check did not necessarily detect if re-read of last
block was too short (only hard errors were detected)
* consistently use correct lseek64() return type off64_t
* improve performance of rotation detection a bit
after we fixed this code, we can go back to real backwards
seeking what spares us one system call (which was in 8.39
enabled for debugging purposes, so this is an overall win!).
This occurs always if and only if
- reopenOnTruncate="on" is set
- file grows over 2GiB in size
Then, the data is continously re-sent until the file becomes smaller
2GiB (due to truncation) or is deleted.
It is a regression introduced by 2d15cbc8221e385c5aa821e4a851d7498ed81850
closes https://github.com/rsyslog/rsyslog/issues/3249
Rotation detection seeks backwards, what caused issues as least in one isolated
case. We try to work around this by only doing positive seeks. We also have
added diagnostic information to the warning messages rsyslog emits on
rotation detection.
see also https://github.com/rsyslog/rsyslog/issues/3249
A change in the inode was not detected under all circumstances,
most importantly not in some logrotate cases.
Includes new tests made by Andre Lorbach. They now use the
logrotate tool natively to reproduce the issue.
previously, truncation was only detected at end of file. Especially with
busy files that could cause loss of data and possibly also stall imfile
reading. The new code now also checks during each read. Obviously, there
is some additional overhead associated with that, but this is unavoidable.
It still is highly recommended NOT to turn on "reopenOnTruncate" in imfile.
Note that there are also inherant reliability issues. There is no way to
"fix" these, as they are caused by races between the process(es) who truncate
and rsyslog reading the file. But with the new code, the "problem window"
should be much smaller and, more importantly, imfile should not stall.
see also https://github.com/rsyslog/rsyslog/issues/2659
see also https://github.com/rsyslog/rsyslog/issues/1605
This adds support for endmsg.regex. It is similar to
startmsg.regex except that it matches the line that denotes
the end of the message, rather than the start of the next message.
This is primarily for container log file use cases such as this:
date stdout P start of message
date stdout P middle of message
date stdout F end of message
The `F` means this is the line which contains the final part of
the message. The fully assembled message should be
`start of message middle of message end of message`.
`startmsg.regex="^[^ ]+ stdout F "` will match.
this can happen if imfile reads a state file. On each open, memory for the
file name can be lost.
We detected this while working on imfile refactoring, there is no related
bug report. No specific test has been crafted, as the refactored imfile
tests catch it (as soon as they are merged).
Bug is actually in stream object, but currently exposed only via imfile.
Happens when in readMode 0 a partial line is read and no more data is
present in the file during that iteration. One partial message is lost
in this case.
closes https://github.com/rsyslog/rsyslog/issues/2421
The currently done buffer modification (add of '\0') is bad, especially when
multiple threads access the same string. It is not really an issue that needs
to be urgently fixed, as always the same data is written. However, among others,
it will pollute the thread debugger and as such prevent more elaborate automatted
tests.
closes https://github.com/rsyslog/rsyslog/issues/1993