The check was done in strmPhysWrite before which caused syslog
messages to split in the middle if the syslog message batch exceeded
the default IO Buffer size.
closes: https://github.com/rsyslog/rsyslog/issues/4233
- if cstrLen(pThis->prevMsgSegment) > maxMsgSize then len calculation
become negative if cstrLen(thisLine) < cstrLen(pThis->prevMsgSegment)
This causes illegal access to memory location and thus causing segfault.
- assigning len = 0 if cstrLen(pThis->prevMsgSegment) > maxMsgSize so that
it access the correct memory location.
Signed-off-by: Ankit Jain <ankitja@vmware.com>
The new parameter permits to specify a replacement to be configured
when "escapeLF" is set to "on". Previously, a fixed replacement string
was used ("#012"/"\n") depending on circumstances. If the parameter is
set to an empty string, the LF is simply discarded.
closes https://github.com/rsyslog/rsyslog/issues/3889
The stream class does not close re-opened file descriptors.
This lead to leaking file handles and ultimately to the inability
to open any files/sockets/etc as rsyslog ran out of handles.
The bug was depending on timing. This involed different OS
thread scheduler timing as well as workload. The bug was more
common under the following conditions:
- async writing of files
- dynafiles
- not commiting file data at end of transaction
However it could be triggerred under other conditions as well.
The refactoring done in 8.1908 increased the likelyhood of
experienceing this bug. But it was not a real regression, the new
code was valid, but changed the timing so that the race was more
likely.
Thanks to Michael Biebl for reporting this bug and helping to
analyze it.
closes https://github.com/rsyslog/rsyslog/issues/3885
Rsyslog may leave some dangling disk queue files under the following
conditions:
- batch sizes and/or messages are large
- queue files are comparatively small
- a batch spans more than two queue files (from n to n+m with m>1)
In this case, queue files n+1 to (n+m-1) are not deleted. This can
lead to problems when the queue is re-opened again. In extreme cases
this can also lead to stalled processing when the max disk space is
used up by such left-over queue files.
Using defaults this scenario is very unlikely, but it can happen,
especially when large messages are being processed.
This seems to be a long-standing bug, introduced around 7 years ago.
It became more visible by properly closing files during HUP, which
was done in 8.1905.0 (and was another bugfix).
closes https://github.com/rsyslog/rsyslog/issues/3772
This was originally added as aid to solve potential regressions.
But now it looks good for a while and we remove some of it as
it really is overdone.
Note: some other debug messages had already be removed, so this
closes https://github.com/rsyslog/rsyslog/issues/3046
The flush was only done to the last dynafile in use at end of
transactions. Dynafiles that were also modified during the
transaction were not flushed.
Special thanks to Duy Nguyen for pointing us to the bug and
suggesting a solution.
This commit also contains a bit of cosmetic cleanup inside
the file stream class.
closes https://github.com/rsyslog/rsyslog/issues/2502
This works-around an issue we can reproduce e.g. via the
imtcp-tls-ossl-x509fingerprint.sh test. Here, omfile gets a write
error with reason EBADF. So far, I was not able to see an actual
coding error. However I traced this down to a multithreaded race
on open and close calls. I am very surprised to see this type
of issue, as I think the kernel guarantees that it does not happen.
Here is what I see in strace -f:
openssl accepts a socket:
[pid 66386] accept(4, {sa_family=AF_INET, sin_port=htons(59054), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 10
then, it works a bit with that socket, detects a failure and shuts it down. Sometimes, at the very same instant omfile on another thread tries to open on output file. Then the following happens:
[pid 66386] close(10) = 0
[pid 66389] openat(AT_FDCWD, "./rstb_356100_31fa9d20.out.log", O_WRONLY|O_CREAT|O_NOCTTY|O_APPEND|O_CLOEXEC, 0644 <unfinished ...>
[pid 66386] close(10 <unfinished ...>
[pid 66389] <... openat resumed> ) = 10
[pid 66386] <... close resumed> ) = 0
[pid 66386] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, -1 <unfinished ...>
[pid 66389] write(2, "file './rstb_356100_31fa9d20.out"..., 66file './rstb_356100_31fa9d20.out.log' opened as #10 with mode 420
) = 66
[pid 66389] ioctl(10, TCGETS, 0x7f59aeb89540) = -1 EBADF (Bad file descriptor)
This is **literally** from the log, without deleting or reordering
lines. I read it so that there is a race between `open` and `close`
where fd 10 is reused, but seemingly closed - resulting in the `EBADF`
While it smells like a kernel issue, it may be a well-hidden program
bug - if so, one I currently do not find. HOWEVER, this commit
works around the issue by reopening the file when we receive EBADF.
That's the best thing to do in that case, especially if it really is
a kernel bug. Data loss should not occur, as the previous writes
succeeded in that case.
The backdraw of this work-around is that it only "fixes" omfile. In
theory every part of rsyslog can be affected by this issues (queue
files, for example). So this is not to be considered a final solution
of the root issues (but a big step forward for known problem cases).
see also https://github.com/rsyslog/rsyslog/issues/3404
while in theory, the fd should immediately be rewritten, in practice
we sometimes see some errors "bad file descriptor" that we cannot
explain. So we clean this up to remove a potential trouble cause.
* truncation check did not necessarily detect if re-read of last
block was too short (only hard errors were detected)
* consistently use correct lseek64() return type off64_t
* improve performance of rotation detection a bit
after we fixed this code, we can go back to real backwards
seeking what spares us one system call (which was in 8.39
enabled for debugging purposes, so this is an overall win!).
This occurs always if and only if
- reopenOnTruncate="on" is set
- file grows over 2GiB in size
Then, the data is continously re-sent until the file becomes smaller
2GiB (due to truncation) or is deleted.
It is a regression introduced by 2d15cbc8221e385c5aa821e4a851d7498ed81850
closes https://github.com/rsyslog/rsyslog/issues/3249
Rotation detection seeks backwards, what caused issues as least in one isolated
case. We try to work around this by only doing positive seeks. We also have
added diagnostic information to the warning messages rsyslog emits on
rotation detection.
see also https://github.com/rsyslog/rsyslog/issues/3249
A change in the inode was not detected under all circumstances,
most importantly not in some logrotate cases.
Includes new tests made by Andre Lorbach. They now use the
logrotate tool natively to reproduce the issue.
previously, truncation was only detected at end of file. Especially with
busy files that could cause loss of data and possibly also stall imfile
reading. The new code now also checks during each read. Obviously, there
is some additional overhead associated with that, but this is unavoidable.
It still is highly recommended NOT to turn on "reopenOnTruncate" in imfile.
Note that there are also inherant reliability issues. There is no way to
"fix" these, as they are caused by races between the process(es) who truncate
and rsyslog reading the file. But with the new code, the "problem window"
should be much smaller and, more importantly, imfile should not stall.
see also https://github.com/rsyslog/rsyslog/issues/2659
see also https://github.com/rsyslog/rsyslog/issues/1605
This adds support for endmsg.regex. It is similar to
startmsg.regex except that it matches the line that denotes
the end of the message, rather than the start of the next message.
This is primarily for container log file use cases such as this:
date stdout P start of message
date stdout P middle of message
date stdout F end of message
The `F` means this is the line which contains the final part of
the message. The fully assembled message should be
`start of message middle of message end of message`.
`startmsg.regex="^[^ ]+ stdout F "` will match.
this can happen if imfile reads a state file. On each open, memory for the
file name can be lost.
We detected this while working on imfile refactoring, there is no related
bug report. No specific test has been crafted, as the refactored imfile
tests catch it (as soon as they are merged).
Bug is actually in stream object, but currently exposed only via imfile.
Happens when in readMode 0 a partial line is read and no more data is
present in the file during that iteration. One partial message is lost
in this case.
closes https://github.com/rsyslog/rsyslog/issues/2421
The currently done buffer modification (add of '\0') is bad, especially when
multiple threads access the same string. It is not really an issue that needs
to be urgently fixed, as always the same data is written. However, among others,
it will pollute the thread debugger and as such prevent more elaborate automatted
tests.
closes https://github.com/rsyslog/rsyslog/issues/1993
if a file cannot be opened but would need to be for the crypto provider
to work correctly, an error message is now emitted.
Root issue detected by Coverity scan, CID 185338
We ensure that the previous line segment is always valid... actually this
was already done with existing code, but Coverity scan did not detect this.
Maybe we now get a control flow issue because we do what already happened
in this case...
CID 185423
when a queue was restarted from disk file, it almost always
emitted a message claiming
"file opened for non-append write, but already contains xxx bytes"
This message was wrong and did not indicate a real error condition.
The predicate check was incorrect.
The timeout feature for multiline reads does not correctly work
for files for which a state file existed. This is usually the
case for files that had been processed by a previous run and
that still exist on the new start. For all other files,
especially those monitored by a wildcard and newly created after
the rsyslog start, timeout worked as expected.
closes https://github.com/rsyslog/rsyslog/issues/1445