mirror of
https://github.com/rsyslog/rsyslog.git
synced 2025-12-13 04:50:41 +01:00
This patch implements a simple round-robin load balancer for omfwd. It provides equal distribution of load to a pool of target servers. The code currently has no different modes and no special tuning for the load balancer. However, it works very well in the most common use cases. Furthermore, it provides a solid base on which more elaborate functionality can be build if there is need to. The new functionality is fully backwards compatible with previous configuration settings. New action() config params: * pool.resumeinterval New/"changed" rstats counters Each target receives its own set of pstats counters. Most importantly this is the case for byte counts. That counter retains the same naming, but there may now be multiple of these counters, one for each target ip, port tuple. New pstats message count to target Among others, this can be used for checking that the load balancer works as intended. The so-far byte count emitted does not provide a clear indication of how many messages the targets had actually processed. For obvious reasons, this message count makes most sense in advanced load balancing scenarios, but also provides additional insight into round-robin. Non-matches indicate that targets went offline, and we can now evaluate the impact this had on processing. - re-design rebind functionality This now works at the transaction level. It causes a rebind of all pool members. Previous code did not work 100% correct since for a couple of years now (after output batching integration). As cleanup, rebindInterval support has been removed from tcpClt, because omfwd is the only user. This permits a cleaner code path. We also noticed a bug with rebindInterval: it caused some mild message duplication for quite some time. This went unnoticed. To address that efficiently, rebindInterval in the future will be considered once per batch. That means up to (maxBatchSize - 1) messages may be transmitted more than the rebindinterval is. That's the cleanest mode of operation and should not make any difference for real deployments. Some additional work done in this commit: netstream: harden component against upper-layer logic errors network subsystem: better handle API errors and provide more info omfwd: add new parameter "iobuffer.maxsize" add new global parameter debug.abortoninternalerror and use it This parameter permits to make test runs fail when an internal error is detected and gracefully handled by rsyslog. While it is great to have it gracefully handled in practice, we should not accept this during testing. The new parameter permits to abort in this case and emits the related error message beforehand. It is turned on by default in our regular tests. add dedicated error code for "hard" program errors omfwd: some cleanup + error message fix + new debug level messages imptcp: improve error messages add omfwd option to NOT do extended connection check also output wrkr id in some omfwd messages (primarily debugging aid) better debug info via LogMsg() interface improve messages regarding imptcp and omfwd suspension / thread IDs refactor and enchance minitcpsrvr for mimicing died servers new global (debugging) option, correction of an informational msg add global option allmessagestostderr add new tests
72 lines
2.3 KiB
Bash
Executable File
72 lines
2.3 KiB
Bash
Executable File
#!/bin/bash
|
|
# added 2024-02-24 by rgerhards. Released under ASL 2.0
|
|
. ${srcdir:=.}/diag.sh init
|
|
generate_conf
|
|
export NUMMESSAGES=10000 # MUST be an EVEN number!
|
|
|
|
# starting minitcpsrvr receivers so that we can obtain their port
|
|
# numbers
|
|
start_minitcpsrvr $RSYSLOG_OUT_LOG 1
|
|
|
|
# regular startup
|
|
add_conf '
|
|
$MainMsgQueueTimeoutShutdown 10000
|
|
$MainMsgQueueWorkerThreads 2
|
|
|
|
template(name="outfmt" type="string" string="%msg:F,58:2%\n")
|
|
module(load="builtin:omfwd" template="outfmt")
|
|
|
|
if $msg contains "msgnum:" then {
|
|
action(type="omfwd" target=["127.0.0.1", "127.0.0.1"]
|
|
port=["'$MINITCPSRVR_PORT1'", "'$TCPFLOOD_PORT'"]
|
|
protocol="tcp"
|
|
pool.resumeInterval="1"
|
|
action.resumeRetryCount="-1" action.resumeInterval="5")
|
|
}
|
|
'
|
|
|
|
startup
|
|
# we need special logic. In a first iteration, the second target is offline
|
|
# so everything is expected to go the the first target, only.
|
|
injectmsg
|
|
# combine both files to check for correct message content
|
|
#cat "$RSYSLOG_OUT_LOG" "$RSYSLOG2_OUT_LOG" > "$SEQ_CHECK_FILE"
|
|
wait_queueempty
|
|
wait_file_lines
|
|
cp "$RSYSLOG_OUT_LOG" tmp.log
|
|
seq_check
|
|
printf "\nSUCCESS for part 1 of the test\n\n"
|
|
|
|
echo WARNING: The next part of this test is flacky, because there is an
|
|
echo inevitable race on the port number for minitcpsrvr. If another
|
|
echo parallel test has aquired it in the interim, this test here will
|
|
echo invalidly fail.
|
|
./minitcpsrv -t127.0.0.1 -p $TCPFLOOD_PORT -f "$RSYSLOG2_OUT_LOG" \
|
|
-P "$RSYSLOG_DYNNAME.minitcpsrvr_port2" &
|
|
# Note: we use the port file just to make sure minitcpsrvr has initialized!
|
|
wait_file_exists "$RSYSLOG_DYNNAME.minitcpsrvr_port2"
|
|
BGPROCESS=$!
|
|
echo "### background minitcpsrv process id is $BGPROCESS port $TCPFLOOD_PORT ###"
|
|
echo "waiting a bit to ensure rsyslog retries the currently suspended pool member"
|
|
|
|
sleep 3
|
|
|
|
injectmsg $NUMMESSAGES $NUMMESSAGES
|
|
|
|
shutdown_when_empty
|
|
wait_shutdown
|
|
|
|
if [ "$(wc -l < $RSYSLOG2_OUT_LOG)" != "$(( NUMMESSAGES / 2 ))" ]; then
|
|
echo "ERROR: RSYSLOG2_OUT_LOG has invalid number of messages $(( NUMMESSAGES / 2 ))"
|
|
cat -n $RSYSLOG2_OUT_LOG | head -10
|
|
error_exit 100
|
|
fi
|
|
|
|
# combine both files to check for correct message content
|
|
export SEQ_CHECK_FILE="$RSYSLOG_DYNNAME.log-combined"
|
|
export NUMMESSAGES=$((NUMMESSAGES*2))
|
|
cat "$RSYSLOG_OUT_LOG" "$RSYSLOG2_OUT_LOG" > "$SEQ_CHECK_FILE"
|
|
seq_check
|
|
|
|
exit_test
|