If the queue parameters were incorrectly given, a NULL pointer derefernce
could happen during config parsing. Once post that stage, no problem could
occur.
Detected by Coverity scan, CID 185339
This is not a real problem because imdiag intentionally does multiple
tries to validate predicate validity. However, this is reported in
clang thread sanitizer and so we need to fix it.
when rulesets are nested a segfault can occur when shutting down
rsyslog. the reason is that rule sets are destructed in load order,
which means a "later" ruleset may still be active when an "earlier"
one was already destructed. In these cases, a "call" can invalidly
call into the earlier ruleset, which is destructed and so leads to
invalid memory access. If a segfault actually happens depends on the
OS, but it is highly probable.
The cure is to split the queue shutdown sequence. In a first step,
all worker threads are terminated and the queue set to enqOnly.
While some are terminated, it is still possible that the others
enqueue messages into the queue (which are then just placed into the
queue, not processed). After this happens, a call can no longer
be issued (as there are no more workers). So then we can destruct
the rulesets in any order.
closes https://github.com/rsyslog/rsyslog/issues/1122
with the recent change that permits us to emit error messages at
(alomost) any time, we can now provide better end-user diagnostics.
This patch generates error message which could previously only be
logged to the debug log.
This is useful for tracking queue corruption. Was added as part
of tracking down github issue #1404.
As an unrelated small change, we slightly improve the debug
information imuxsock emits.
see also https://github.com/rsyslog/rsyslog/issues/1404
first of all, this creates race issues and as such is not clean. We
also do no longer need it, as the changes we made to file rollover
cover this use case as well. So instead if working on proper sync,
which is complex, we remove this operation again. Note that it
was NOT present in any released version, just the has been inside
the code the past couple of days while we work on queue robustness.
previously, a kill -9 during the .qi write could keep the file
in inconsistent state. Now, we first write a temp file, and
(automically) rename it to "the real thing". So if something happens
during writing the .qi, at least the old state still is consistent.
This is inspired by the way the pid file is handled sind v8.19.
This now also guards the first write to the new file after
queue rollover. That permits rsyslog to clean out the file
after restart.
Also, a robustness write is now done when a queue file is
fully processed and deleted. Otherwise, the .qi file contains
a non-longer-existing read file number.
The most probable error case are a) power off b) forceful termination during
regular system shutdown. This patch writes the .qi file out whenever a new
queue file is begun. This prevents the loss of more than a single queue data
file in abort conditions.
Remove the use sizeof(char) or sizeof(uchar) in calculations for memory
allocation or string length. There are no known platforms for which
sizeof(char) or sizeof(uchar) is not 1, and c99 has defined sizeof(char)
to be 1 (section 6.5.3.4 of c99).
First issue: Error 14 was generated on the .qi file directory handle.
As the .qi filestream does not have a directory set, fsync
was called on an empty directory causing a error 14 in debug log.
closes https://github.com/rsyslog/rsyslog/issues/402
Second issue: When queue files existed on startup, the bSyncQueueFiles
strm property was not set to 1. This is now done in the
qqueueLoadPersStrmInfoFixup function.
closes https://github.com/rsyslog/rsyslog/issues/403
This "fixes" a false positive from clang static analyzer, which
cannot detect with the previous code that the queue type cannot
change during function calls. Well, actually a memory corruption
*could* trigger using an inconsistent queue type (but so can
corruption of the new variable, albeit even far less likely).
Error: NULL_RETURNS (CWE-476):
rsyslog-7.4.10/runtime/queue.c:2126: returned_null: Function "malloc(size_t)" returns null (checked 140 out of 168 times).
rsyslog-7.4.10/action.c:1197: example_checked: Example 1: "malloc(batchNumMsgs(pBatch) * 1UL)" has its value checked in "(active = malloc(batchNumMsgs(pBatch) * 1UL)) == NULL".
rsyslog-7.4.10/grammar/lexer.l:302: example_checked: Example 2: "malloc(40UL)" has its value checked in "(bs = malloc(40UL)) == NULL".
rsyslog-7.4.10/grammar/rainerscript.c:2483: example_checked: Example 3: "malloc(8UL)" has its value checked in "(ar->arr = malloc(8UL)) == NULL".
rsyslog-7.4.10/plugins/imklog/bsd.c:221: example_checked: Example 4: "malloc(1UL * (iMaxLine + 1))" has its value checked in "(pRcv = (uchar *)malloc(1UL * (iMaxLine + 1))) == NULL".
rsyslog-7.4.10/plugins/imuxsock/imuxsock.c:968: example_checked: Example 5: "malloc(1UL * (iMaxLine + 1))" has its value checked in "(pRcv = (uchar *)malloc(1UL * (iMaxLine + 1))) == NULL".
rsyslog-7.4.10/runtime/queue.c:2126: var_assigned: Assigning: "pThis->mut" = null return value from "malloc(size_t)".
rsyslog-7.4.10/runtime/queue.c:2127: dereference: Dereferencing a pointer that might be null "pThis->mut" when calling "pthread_mutex_init(pthread_mutex_t *, pthread_mutexattr_t const *)".
so far, we checked only the main queue size to become zero, ignoring
the sizes of action queues. For some tests, this caused racieness and
unreliability. Now, we check all queues. This should make matters
much more stable.