Rainer Gerhards 1c0f9bba50
omfwd: implement native load balancing - phase 1
This patch implements a simple round-robin load balancer
for omfwd. It provides equal distribution of load to a pool
of target servers.

The code currently has no different modes and no special tuning
for the load balancer. However, it works very well in the most
common use cases. Furthermore, it provides a solid base on which
more elaborate functionality can be build if there is need to.

The new functionality is fully backwards compatible with previous
configuration settings.

New action() config params:
* pool.resumeinterval

New/"changed" rstats counters
Each target receives its own set of pstats counters. Most
importantly this is the case for byte counts. That counter retains
the same naming, but there may now be multiple of these counters,
one for each target ip, port tuple.

New pstats message count to target
Among others, this can be used for checking that the load balancer
works as intended. The so-far byte count emitted does not provide
a clear indication of how many messages the targets had actually
processed.

For obvious reasons, this message count makes most sense in
advanced load balancing scenarios, but also provides additional
insight into round-robin. Non-matches indicate that targets
went offline, and we can now evaluate the impact this had
on processing.

- re-design rebind functionality

This now works at the transaction level. It causes a rebind of all
pool members. Previous code did not work 100% correct since for a
couple of years now (after output batching integration).

As cleanup, rebindInterval support has been removed from tcpClt,
because omfwd is the only user. This permits a cleaner code path.

We also noticed a bug with rebindInterval:  it caused some mild
message duplication for quite some time. This went unnoticed.
To address that efficiently, rebindInterval in the future will
be considered once per batch. That means up to (maxBatchSize - 1)
messages may be transmitted more than the rebindinterval is.
That's the cleanest mode of operation and should not make any
difference for real deployments.

Some additional work done in this commit:

netstream: harden component against upper-layer logic errors

network subsystem: better handle API errors and provide more info

omfwd: add new parameter "iobuffer.maxsize"

add new global parameter debug.abortoninternalerror and use it

This parameter permits to make test runs fail when an internal error
is detected and gracefully handled by rsyslog. While it is great to
have it gracefully handled in practice, we should not accept this
during testing. The new parameter permits to abort in this case and
emits the related error message beforehand. It is turned on by
default in our regular tests.

add dedicated error code for "hard" program errors

omfwd: some cleanup + error message fix + new debug level messages

imptcp: improve error messages

add omfwd option to NOT do extended connection check

also output wrkr id in some omfwd messages (primarily debugging aid)

better debug info via LogMsg() interface

improve messages regarding imptcp and omfwd suspension / thread IDs

refactor and enchance minitcpsrvr for mimicing died servers

new global (debugging) option, correction of an informational msg

add global option allmessagestostderr

add new tests
2024-08-19 08:54:31 +02:00
..
2022-02-17 10:54:12 +01:00
2020-01-19 16:09:44 +01:00

This directory contains the rsyslog testbench. It is slowly
evolving. New tests are always welcome. So far, most tests check
out the functionality of a single module. More complex tests are
welcome.

For a simple sample, see rtinit.c, which does a simple
init/deinit check of the runtime system.

Test Naming
===========

Test that use valgrind shall end in "-vg.sh".
Test that use valgrind's helgrind thread debugger shall end in "-vgthread.sh".

Setting up Test Environments
============================

Setting up MariaDB/MySQL
------------------------
to create the necessary user:

echo "create user 'rsyslog'@'localhost' identified by 'testbench';" | mysql -u root
mysql -u root < ../plugins/ommysql/createDB.sql
echo "grant all on Syslog.* to 'rsyslog'@'localhost';" | mysql -u root

openSUSE
--------
To configure system properties like hostname and firewall, use the
graphical "yast2" administration tool. Note the ssh-access by default
is disable in the firewall!

Before running tests
====================
make check - this will compile all of the C code used in the tests, as well as
do any other preparations, and will start running all of the tests.  Ctrl-C to
stop running all of the tests.

Running all tests
=================
make check

Running named tests
===================
make testname.log

For example, to run the imfile-basic.sh test, use

    make imfile-basic.log

Test output is in imfile-basic.log

To re-run the test, first remove imfile-basic.log then make again

Or an alternative option is to run

    make check TESTS='imfile-basic.sh'

* Using gdb to debug rsyslog during a test run

Edit your test like this:

    . $srcdir/diag.sh startup
    if [ -n "${USE_GDB:-}" ] ; then
        echo attach gdb here
        sleep 54321 || :
    fi

Run your test in the background:

    USE_GDB=1 make mytest.sh.log &

Tail mytest.sh.log until you see 'attach gdb here'.  The log should also
tell you what is the rsyslogd pid.

   gdb ../tools/rsyslogd $rsyslogd_pid

Set breakpoints, whatever, then 'continue'

In another window, do ps -ef|grep 54321, then kill that pid

Core Dump Analysis
==================
The testbench contains some limited (yet useful) support for automatically
analyzing core dumps. In order for this to work, obviously core files need
to be generated. This often doesn't work as intended. If you hit this problem,
check

1. ulimit -c unlimited (or a reasonable limit)
   Note that root may need to increase a system-wide limit, which is
   usually recorded in /etc/security/limits.conf
   You need:
   *     soft    core      unlimited

2. cat  /proc/sys/kernel/core_pattern"
   On systemd systems (and some others), the pattern is changed to save
   core files so that systemd can import them -- with the result that the
   testbench doesn't see them any longer. We require classic format, which
   can be set via
   $ sudo bash -c "echo \"core\" > /proc/sys/kernel/core_pattern"

Note that you probably want to do neither of these changes to a production
system.