core/action bugfix: 100% CPU utilization on suspension of output module

Triggering condition:

* output module using the legacy transaction interface
  (e.g. omelasticsearch, omlibdbi)
* output module needs to suspend itself

In these cases, rsyslog enters a busy loop trying to resolve the
suspend condition. The bug is rooted in rsyslog core action code.
This patch fixes it by inserting a 1-second sleep during calls
to the resume handler.

Note: we cannot sleep exactly as long as tryResume needs. This
would require larger refactoring, which probably is not worth for
the legacy interface. The current solution is almost as good, as
the one second sleep has very little overhead on a real system.
Thus we have choosen that approach.

This patch now also ensures that failed messages are properly
handled and do not cause eternal hang.

closes https://github.com/rsyslog/rsyslog/issues/2113

Note: the legacy interface must also be considered when implementing
error file functionality,
see also https://github.com/rsyslog/rsyslog/issues/1836
This commit is contained in:
Rainer Gerhards 2018-01-05 09:09:42 +01:00
parent c0163337ce
commit ee104ac13b

View File

@ -1189,11 +1189,19 @@ doTransaction(action_t *__restrict__ const pThis, wti_t *__restrict__ const pWti
*/
iRet = actionProcessMessage(pThis,
&actParam(iparams, pThis->iNumTpls, i, 0), pWti);
if(iRet != RS_RET_DEFER_COMMIT && iRet != RS_RET_PREVIOUS_COMMITTED &&
iRet != RS_RET_OK)
--i; /* we need to re-submit */
DBGPRINTF("doTransaction: action %d, processing msg %d, result %d\n",
pThis->iActionNbr, i,iRet);
if(iRet == RS_RET_SUSPENDED) {
--i; /* we need to re-submit */
/* note: we are suspended and need to retry. In order not to
* hammer the CPU, we now do a voluntarly wait of 1 second.
* The rest will be handled by the standard retry handler.
*/
srSleep(1, 0);
} else if(iRet != RS_RET_DEFER_COMMIT && iRet != RS_RET_PREVIOUS_COMMITTED &&
iRet != RS_RET_OK) {
FINALIZE; /* let upper peer handle the error condition! */
}
}
}
finalize_it:
@ -1335,6 +1343,7 @@ actionCommit(action_t *__restrict__ const pThis, wti_t *__restrict__ const pWti)
* do no real harm. - rgerhards, 2017-10-06
*/
iRet = actionTryCommit(pThis, pWti, wrkrInfo->p.tx.iparams, wrkrInfo->p.tx.currIParam);
DBGPRINTF("actionCommit[%s]: return actionTryCommit %d\n", pThis->pszName, iRet);
if(iRet == RS_RET_OK) {
FINALIZE;
}