When data is read from the journal using sd_journal_get_data it may be
truncated to a certain threshold (64K by default).
If the rsyslog MaxMessageSize is larger than the threshold, there's a
chance rsyslog will receive incomplete messages from the journal.
Empirically, this appears to happen reliably when XZ compression is
used by journald. Systems where journald uses LZ4 compression do not
appear to suffer this issue reliably--if at all.
This change sets the threshold to the MaxMessageSize when the
journal is opened.
The stream class does not close re-opened file descriptors.
This lead to leaking file handles and ultimately to the inability
to open any files/sockets/etc as rsyslog ran out of handles.
The bug was depending on timing. This involed different OS
thread scheduler timing as well as workload. The bug was more
common under the following conditions:
- async writing of files
- dynafiles
- not commiting file data at end of transaction
However it could be triggerred under other conditions as well.
The refactoring done in 8.1908 increased the likelyhood of
experienceing this bug. But it was not a real regression, the new
code was valid, but changed the timing so that the race was more
likely.
Thanks to Michael Biebl for reporting this bug and helping to
analyze it.
closes https://github.com/rsyslog/rsyslog/issues/3885
commit 78976a9bc059 introduced a regression that caused writing
the journal state file to fail. This happens when the state file
is given as relative file name and the working directory is also
a relative path. This situation is very uncommon. So most deployments
will never experience it. We discovered the issue during CI runs
where the trigger condition is given. Note that it also takes
multiple times of loading the journal to actually see the bug.
see also https://github.com/rsyslog/rsyslog/pull/3878
Rsyslog may leave some dangling disk queue files under the following
conditions:
- batch sizes and/or messages are large
- queue files are comparatively small
- a batch spans more than two queue files (from n to n+m with m>1)
In this case, queue files n+1 to (n+m-1) are not deleted. This can
lead to problems when the queue is re-opened again. In extreme cases
this can also lead to stalled processing when the max disk space is
used up by such left-over queue files.
Using defaults this scenario is very unlikely, but it can happen,
especially when large messages are being processed.
The test was timing-sensitive as we did not properly check all data
was output to the output file - we just relied on sleep periods.
This has been changed. Also, we made some changes to the testing
framework to fully support sequence checking of multiple ZIP files.
Decomposed ReadJournal() a bit, also now coupling journald
variables in one struct, added few warning messages and debug
prints to help with bug hunts in future, also got rid of two
needless journald calls. WorkAroundJournalBug now deprecated.
Added option to pull journald records from outside local machine.
This was a long-standing bug where the DA queue always had a fixed small batch
size because the setting was not propagated from the memory queue. This also
removes a needless and counter-productive "debug aid" which seemed to be in
the code for quite some while. It did not cause harm because of the batch
size issue.