- session.opened
- session.openfailed
- session.closed
These are useful for monitoring, capacity planning and troubleshooting. They
are reported on a per-listener basis.
imptcp first tries to remove a to-be-shut-down socket from the
epoll set, and errors out if that does not work. In that case, the
underlying socket will be leaked.
This patch refactors the code; most importantly, it is not necessary
to remove the socket from the epoll set, as this happens automatically
on close. As such, we simply remove that part of the code, which
also removes the root cause of the socket leak.
... and failures of getaddrinfo() when obtaining hostname.
This requires a number of testing LD_PRELOD libraries, which just
simulate system error. The actual test is rather small.
see also https://github.com/rsyslog/rsyslog/issues/1573
rsyslog will segfault on startup if
a) the local machine's hostname is set to a non-FQDN name
b) the getaddrinfo() system call fails
This scenario is higly unlikely, but may exist especially with
provisioned VMs which may not properly be able to do name queries
on startup (seen for example on AWS).
This patch fixes the situation and also provides more robustness
for very early startup error messages when some of the error-reporting
subsystem is not yet properly initialized. Note that under these
circumstances, errors may only show up on stderr.
closes https://github.com/rsyslog/rsyslog/issues/1573
permitnonkernelfacility doesn't work when the new configuration syntax
is used, e.g. 'module(load="imklog" permitnonkernelfacility="on")'.
It does work with the old syntax, e.g. '$KLogPermitNonKernelFacility
on'
This is because the old style config is stored in a static global
struct "cs", while the new style config is passed in as a pointer.
Code in imklog will put old style config entries into the new config
struct, and almost all the code in imklog uses the new config struct
like it should. Except for a check for bPermitNonKernel in Syslog()
that continued to use the static global that only has old style
configs.
Fix this by passing pModConf down into Syslog() and using that in
place of the static global.
closes https://github.com/rsyslog/rsyslog/issues/477
Generally improved udp-related error messages (e.g. they now contain the
socket number, which makes it easier to related them to errors reported by
net.c subsystem).
We also depricated (removed) the "maxerrormessages" configuration parameters.
It provided some very rough rate-limiting capabilities and was introduced
before we had native rate-limiters. The default was that only the first 5
error messages were actually reported. For long-running instances, that
meant that in many cases no errors were ever reported. We now use the default
internal message rate limter, which works far better and ensures that also
long-running instances will be able to emit error messages after prolonged
runtime. In contrast, this also means that users will see more error
messages from rsyslog, but that should actually improve the end user
experience.
On very busy systems, we see "udp send error 11" inside the logs, and the requesting
action is being suspended (and later resumed). During the suspension period (in
default configuration), messages are lost. Error 11 translates to EAGAIN and the
cause of this problem is that the system is running out of UDP buffer space. This
can happen on very busy systems (with busy networks).
It is not an error per se. Doing a short wait will resolve the issue. The real root
cause of the issue is that omfwd uses a nonblocking socket for sending. If it were
blocking, the OS would block until the situation is resolved. The need for a
non-blocking sockets is a purely historical one. In the days of single-threaded
processing (pre v3), everything needed to be done by multiplexing, and blocking was
not permitted. Since then, the engine has dramatically changed. Actions now run on
their own thread(s). As such, there is no longer a hard need to use non-blocking i/o
for sending data. Many other output plugins also do blocking wait (e.g. omelasticsearch).
As such, the real root cause of the trouble is unnecessarily using non-blocking mode,
and consequently the right solution is to change that.
Note that using blocking i/o might change some timeing inside rsyslog, especially
during shutdown. So theoretical there is regression potential in that area. However,
the core is designed to handle that situation (e.g. there is special shutdown code to
handle the blocking case), so this does not stand against the "proper" solution.
This patch applies the change on the rsyslog core level, within net.c. The only
users of the changed functionality are omfwd and omudpspoof. Imudp is unaffected as
it requests server sockets.
Note that according to the sendto() man page, there is a second cause for the EAGAIN
error, this is when the system temporarily runs out of emphermeral ports. It is not
100% clear if this can also happen in the blocking case. However, if so, we can argue
this is a case where we really want the default retry logic. So for the time being,
it is appropriate to not handle EAGAIN in a special case any longer.
closes https://github.com/rsyslog/rsyslog/issues/1665
The MsgDup() function will return a garbled message object under these conditions:
The message was originally created with message length equal or larger to CONF_RAWMSG_BUFSIZE.
This makes rsyslog store the message in dynamically allocated buffer space. Then,
a component reduces the message size to a size lower than CONGF_RAWMSG_BUFSIZE. A
frequent sample is the parser removing a known-bad LF at the end of the messages.
Then, MsgDup is executed. It checks the message size and finds that it is below
CONF_RAWMSG_BUFSIZE, which make it copy the msg object internal buffer instead
of the dynamically allocated one. That buffer was not written to in the first place, so
unitialized data is copied. Note that no segfault can happen, as the copied location was
properly allocated, just not used in this processing flow.
In the end result, the new message object contains garbage data. Whenever the new object
is used (e.g. in a async ruleset or action) that garbage will be used. Whenever the old
object is accessed, correct data will be used. Both types of access can happen inside
the same processing flow, which makes the problem appear to be random.
closes https://github.com/rsyslog/rsyslog/issues/1658
The KSI subsystem has been replaced by a newer Guardtime-provided
subsystem. Note that the old KSI subsystem does no longer work
to to Guardtime backend changes.
Libgt still continous to work.
closes https://github.com/rsyslog/rsyslog/issues/1590