mmsnareparse: add ignoreTrailingPattern parameter

... for trailing extra-data removal.

Add configurable mechanism to detect and remove trailing extra-data sections
from messages before parsing. This addresses cases where third-party enrichers
append non-standard data (e.g., "enrichment_section: fromhost-ip=...") that
can interfere with Snare event parsing.

The ignoreTrailingPattern parameter can be set at both module and action
levels, with action-level values overriding module defaults. When configured,
the parser searches for the pattern in trailing positions (after the last
tab-separated token). If found, the message is truncated at the start of the
last token, removing the entire trailing section including any preceding
content in that token (e.g., dynamic numeric prefixes).

The truncated extra-data section is optionally exposed as a !extradata_section
message property, allowing downstream processing to access the removed content
if needed (e.g., for extracting sender IP addresses).

Implementation details:
- Pattern matching is literal string-based (not regex)
- Truncation only occurs when pattern appears in valid trailing positions
- Conservative detection for non-tab messages (last 20% or 200 chars)
- Proper memory management for pattern strings and extra-data sections
- No changes to existing behavior when parameter is not set

Added test case mmsnareparse-trailing-extradata.sh with anonymized sample
data validating Event ID 13 parsing with trailing enrichment section.

Updated documentation in doc/source/configuration/modules/mmsnareparse.rst
with parameter description and usage notes.

docs: enhance AGENTS.md with WSL build/test instructions
Added complete dependency installation, module-specific configure examples,
test execution patterns, and debugging workflow based
on actual development sessions.

Co-authored-by: alorbach <alorbach@adiscon.com>
This commit is contained in:
Cursor Agent 2025-11-21 12:26:44 +00:00 committed by Andre Lorbach
parent a3623dec6e
commit 6681c3bea6
6 changed files with 348 additions and 7 deletions

View File

@ -274,15 +274,85 @@ Minimum setup requires:
- Autotools toolchain: `autoconf`, `automake`, `libtool`, `make`, `gcc`
- Side libraries: `libestr`, `librelp`, `libfastjson`, `liblognorm` (must be installed or built manually)
Example commands (swap the final step for the most relevant smoke test):
### Complete Dependency Installation (Ubuntu/Debian WSL)
For a full development environment with all common dependencies:
```bash
./autogen.sh
./configure --enable-debug --enable-testbench
make -j$(nproc)
sudo apt-get update
sudo apt-get install -y \
autoconf autoconf-archive automake autotools-dev \
bison flex gcc \
libcurl4-gnutls-dev libdbi-dev libgcrypt20-dev \
libglib2.0-dev libgnutls28-dev \
libtool libtool-bin libzstd-dev make \
libestr-dev python3-docutils libfastjson-dev \
liblognorm-dev libcurl4-gnutls-dev \
libaprutil1-dev libcivetweb-dev \
valgrind clang-format
```
Mark the environment as configured (optional, for tracking):
```bash
touch /tmp/rsyslog_base_env.flag
```
### Build Process
1. **Generate configure script** (required after fresh checkout or changes to build files):
```bash
./autogen.sh
```
2. **Configure with testbench and required modules**:
```bash
# Basic configuration
./configure --enable-testbench --enable-imdiag --enable-omstdout
# For specific module testing, add the module's enable flag:
./configure --enable-testbench --enable-imdiag --enable-omstdout \
--enable-mmsnareparse
# For multiple modules:
./configure --enable-testbench --enable-imdiag --enable-omstdout \
--enable-mmsnareparse \
--enable-omotlp \
--enable-imhttp
```
3. **Build the project**:
```bash
make -j$(nproc)
```
### Running Tests
#### Example 1: Run a single test directly (recommended for debugging)
```bash
./tests/imtcp-basic.sh
```
#### Example 2: Run a single test through make check
```bash
make check -j16 TESTS="imtcp-basic.sh"
```
#### Example 3: Run module-specific tests
```bash
# mmsnareparse test
make check -j16 TESTS="mmsnareparse-sysmon.sh"
# Multiple tests
make check -j16 TESTS="mmsnareparse-sysmon.sh mmsnareparse-trailing-extradata.sh"
```
#### Example 4: Run all tests (CI-style, time-consuming)
```bash
make check -j4
```
**Note:** The `-j` flag controls parallelism. Use `-j2` or `-j4` for reliability on resource-constrained systems, or `-j16` for faster execution on powerful machines.
Reserve `make check` for cases where you must mirror CI or chase harness-only failures. When you do run it, prefer `make check -j2` or `-j4` for reliability.
-----

View File

@ -285,6 +285,7 @@ Parameters
"``definition.json``", "string", "``unset``", "Inline JSON descriptor following the same schema as ``definition.file``. Processed after the file-based overrides."
"``runtime.config``", "string", "``unset``", "Persistent runtime configuration file. Supports the definition schema plus ``options`` such as ``enable_debug`` and ``enable_fallback``."
"``validation.mode`` / ``validation_mode``", "string", "``permissive``", "Selects parser strictness: ``permissive`` ignores issues, ``moderate`` records warnings, ``strict`` aborts when thresholds are exceeded."
"``ignoreTrailingPattern``", "string", "``unset``", "Pattern that marks the start of a trailing extra-data section to be ignored during parsing. When set, the parser searches for this pattern in trailing positions (after the last tab-separated token). If found, the message is truncated at that point before parsing, and the truncated extra-data section is stored in the ``!extradata_section`` message property. This is useful for removing non-standard trailing enrichment data that may be added by third-party enrichers. The pattern is a literal string match (not a regex)."
Extracted fields
----------------

View File

@ -441,6 +441,7 @@ typedef struct _instanceData {
sbool enableWdac;
sbool emitRawPayload;
sbool emitDebugJson;
uchar *ignoreTrailingPattern;
validation_context_t validationTemplate;
section_descriptor_t *sectionDescriptors;
size_t sectionDescriptorCount;
@ -492,6 +493,7 @@ struct modConfData_s {
char *definitionFile;
char *definitionJson;
char *runtimeConfigFile;
uchar *ignoreTrailingPattern;
validation_context_t validationTemplate;
};
static modConfData_t *loadModConf = NULL;
@ -5061,6 +5063,94 @@ static rsRetVal parse_snare_json(instanceData *pData, smsg_t *pMsg, const char *
return RS_RET_OK;
}
/**
* @brief Detect and truncate trailing extra-data section if pattern is configured.
*
* Searches for the configured pattern in the message. If found in a trailing
* position (after the last tab-separated token), truncates the message at that
* point and optionally stores the truncated content as a message property.
*
* @param pData Module instance configuration.
* @param mutableMsg Mutable copy of the message payload (will be modified if truncation occurs).
* @param msgLen Pointer to the length of mutableMsg (will be updated if truncation occurs).
* @return Pointer to the truncated extra-data section (if found and truncated), NULL otherwise.
* The caller is responsible for freeing this memory if not NULL.
*/
static char *detect_and_truncate_trailing_extradata(instanceData *pData, char *mutableMsg, size_t *msgLen) {
char *extradataSection = NULL;
if (pData->ignoreTrailingPattern == NULL || mutableMsg == NULL) {
return NULL;
}
const char *pattern = (const char *)pData->ignoreTrailingPattern;
size_t patternLen = strlen(pattern);
if (patternLen == 0) {
return NULL;
}
/* Find the last tab character to identify the end of the last token */
char *lastTab = strrchr(mutableMsg, '\t');
if (lastTab == NULL) {
/* No tabs found - this is an edge case as SNARE format normally uses tab-separated values.
* However, we still attempt to remove trailing enrichment sections from malformed or
* non-standard messages. Be conservative: only truncate if pattern appears in the
* trailing portion (last 20% or last 200 chars, whichever is smaller) to avoid
* accidentally removing legitimate message content. */
size_t msgLenVal = strlen(mutableMsg);
if (msgLenVal < patternLen) {
return NULL;
}
size_t trailingSearchLen = msgLenVal / 5; /* Last 20% */
if (trailingSearchLen > 200) {
trailingSearchLen = 200;
}
if (trailingSearchLen < patternLen) {
trailingSearchLen = patternLen;
}
/* Search backwards from end within the trailing portion only */
char *searchStart = mutableMsg + msgLenVal - trailingSearchLen;
for (char *searchPos = mutableMsg + msgLenVal - patternLen; searchPos >= searchStart; searchPos--) {
if (memcmp(searchPos, pattern, patternLen) == 0) {
/* Found pattern in trailing position - save it and truncate before it */
extradataSection = strdup(searchPos);
if (extradataSection == NULL) {
return NULL; /* out of memory */
}
*searchPos = '\0';
if (msgLen != NULL) {
*msgLen = searchPos - mutableMsg;
}
return extradataSection;
}
}
return NULL;
}
/* Pattern must appear after the last tab to be considered trailing */
char *searchStart = lastTab + 1;
char *patternPos = strstr(searchStart, pattern);
if (patternPos != NULL) {
/* Pattern found in trailing position - truncate at the start of the last token
* (after the last tab) to remove the entire enrichment section including any
* preceding content in that token (e.g., dynamic numbers before the pattern) */
/* Save the extra-data section before truncating */
extradataSection = strdup(searchStart);
if (extradataSection == NULL) {
return NULL;
}
/* Now truncate at the last tab */
*lastTab = '\0';
if (msgLen != NULL) {
*msgLen = lastTab - mutableMsg;
}
return extradataSection;
}
return NULL;
}
/**
* @brief Detect the payload type, parse it, and attach JSON metadata.
*
@ -5120,6 +5210,22 @@ static rsRetVal process_message(instanceData *pData, smsg_t *pMsg, uchar *msgTex
unescape_hash_sequences(mutableMsg);
normalize_literal_tabs(mutableMsg);
dbgprintf("[mmsnareparse DEBUG] After unescaping: '%s'\n", mutableMsg);
/* Detect and truncate trailing extra-data section if pattern is configured */
char *extradataSection = NULL;
size_t msgLen = strlen(mutableMsg);
extradataSection = detect_and_truncate_trailing_extradata(pData, mutableMsg, &msgLen);
if (extradataSection != NULL) {
dbgprintf("[mmsnareparse DEBUG] Truncated trailing extra-data section: '%s'\n", extradataSection);
/* Optionally expose the extra-data section as a message property */
if (pMsg != NULL) {
struct json_object *extradataJson = json_object_new_string(extradataSection);
if (extradataJson != NULL) {
msgAddJSON(pMsg, (uchar *)"!extradata_section", extradataJson, 0, 0);
}
}
}
cursor = mutableMsg;
while (cursor != NULL && tokenCount < ARRAY_SIZE(tokens)) {
tokens[tokenCount++] = cursor;
@ -5145,6 +5251,7 @@ static rsRetVal process_message(instanceData *pData, smsg_t *pMsg, uchar *msgTex
iRet = parse_snare_text(pData, pMsg, rawMsg, rawMsg, tokens, tokenCount);
}
free(mutableMsg);
free(extradataSection);
return iRet;
}
@ -5153,7 +5260,8 @@ DEF_OMOD_STATIC_DATA;
static struct cnfparamdescr modpdescr[] = {{"definition.file", eCmdHdlrString, 0},
{"definition.json", eCmdHdlrString, 0},
{"runtime.config", eCmdHdlrString, 0},
{"validation.mode", eCmdHdlrString, 0}};
{"validation.mode", eCmdHdlrString, 0},
{"ignoreTrailingPattern", eCmdHdlrString, 0}};
static struct cnfparamblk modpblk = {CNFPARAMBLK_VERSION, ARRAY_SIZE(modpdescr), modpdescr};
static struct cnfparamdescr actpdescr[] = {
@ -5164,7 +5272,7 @@ static struct cnfparamdescr actpdescr[] = {
{"emit.debugjson", eCmdHdlrBinary, 0}, {"debugjson", eCmdHdlrBinary, 0},
{"definition.file", eCmdHdlrString, 0}, {"definition.json", eCmdHdlrString, 0},
{"runtime.config", eCmdHdlrString, 0}, {"validation.mode", eCmdHdlrString, 0},
{"validation_mode", eCmdHdlrString, 0}};
{"validation_mode", eCmdHdlrString, 0}, {"ignoreTrailingPattern", eCmdHdlrString, 0}};
static struct cnfparamblk actpblk = {CNFPARAMBLK_VERSION, ARRAY_SIZE(actpdescr), actpdescr};
BEGINbeginCnfLoad
@ -5177,6 +5285,8 @@ BEGINbeginCnfLoad
pModConf->definitionJson = NULL;
free(pModConf->runtimeConfigFile);
pModConf->runtimeConfigFile = NULL;
free(pModConf->ignoreTrailingPattern);
pModConf->ignoreTrailingPattern = NULL;
init_validation_context(&pModConf->validationTemplate);
ENDbeginCnfLoad
@ -5229,6 +5339,13 @@ BEGINsetModCnf
ABORT_FINALIZE(r);
}
loadModConf->validationTemplate.mode = parsedMode;
} else if (!strcmp(modpblk.descr[i].name, "ignoreTrailingPattern")) {
char *value = es_str2cstr(pvals[i].val.d.estr, NULL);
if (value == NULL) {
ABORT_FINALIZE(RS_RET_OUT_OF_MEMORY);
}
free(loadModConf->ignoreTrailingPattern);
loadModConf->ignoreTrailingPattern = (uchar *)value;
} else {
dbgprintf("mmsnareparse: unhandled module parameter '%s'\n", modpblk.descr[i].name);
}
@ -5259,6 +5376,8 @@ BEGINfreeCnf
pModConf->definitionJson = NULL;
free(pModConf->runtimeConfigFile);
pModConf->runtimeConfigFile = NULL;
free(pModConf->ignoreTrailingPattern);
pModConf->ignoreTrailingPattern = NULL;
}
ENDfreeCnf
@ -5277,6 +5396,7 @@ ENDisCompatibleWithFeature
BEGINfreeInstance
CODESTARTfreeInstance;
free(pData->container);
free(pData->ignoreTrailingPattern);
free_runtime_tables(pData);
free_runtime_config(&pData->runtimeConfig);
ENDfreeInstance
@ -5296,6 +5416,7 @@ static inline void setInstParamDefaults(instanceData *pData) {
pData->enableWdac = 1;
pData->emitRawPayload = 1;
pData->emitDebugJson = 0;
pData->ignoreTrailingPattern = NULL;
init_validation_context(&pData->validationTemplate);
init_runtime_config(&pData->runtimeConfig);
pData->sectionDescriptors = NULL;
@ -5327,6 +5448,13 @@ BEGINnewActInst
if (loadModConf->runtimeConfigFile != NULL) {
CHKiRet(load_configuration(&pData->runtimeConfig, loadModConf->runtimeConfigFile));
}
if (loadModConf->ignoreTrailingPattern != NULL) {
free(pData->ignoreTrailingPattern);
pData->ignoreTrailingPattern = (uchar *)strdup((char *)loadModConf->ignoreTrailingPattern);
if (pData->ignoreTrailingPattern == NULL) {
ABORT_FINALIZE(RS_RET_OUT_OF_MEMORY);
}
}
}
for (i = 0; i < (int)actpblk.nParams; ++i) {
if (!pvals[i].bUsed) continue;
@ -5367,6 +5495,13 @@ BEGINnewActInst
}
CHKiRet(set_validation_mode(pData, mode));
free(mode);
} else if (!strcmp(actpblk.descr[i].name, "ignoreTrailingPattern")) {
char *value = es_str2cstr(pvals[i].val.d.estr, NULL);
if (value == NULL) {
ABORT_FINALIZE(RS_RET_OUT_OF_MEMORY);
}
free(pData->ignoreTrailingPattern);
pData->ignoreTrailingPattern = (uchar *)value;
}
}
CODE_STD_STRING_REQUESTnewActInst(1);

View File

@ -56,6 +56,84 @@ agents.
- Keep suppression files (e.g. `*.supp`) current when adding new Valgrind noise;
failing to do so will cause CI false positives.
### Enabling Debug Output
To enable rsyslog debug logging for a test, temporarily uncomment these lines in `tests/diag.sh` (around lines 88-89):
```bash
export RSYSLOG_DEBUG="debug nologfuncflow noprintmutexaction nostdout"
export RSYSLOG_DEBUGLOG="log"
```
This creates a `log` file in the tests directory with detailed execution traces.
**Important:** Remember to re-comment these lines after debugging to avoid cluttering test output.
### Preventing Test Cleanup for Inspection
To examine test output files after a test runs, temporarily comment out `exit_test` at the end of the test script:
```bash
# exit_test # Temporarily disabled to inspect logs
```
This preserves:
- `rstb_*.out.log` - The actual test output
- `rstb_*.conf` - The generated rsyslog configuration
- `log` - Debug log (if enabled)
- `rstb_*.input*` - Test input files
### Example Debugging Workflow
1. **Enable debug output** in `diag.sh`:
```bash
# Uncomment lines 88-89
export RSYSLOG_DEBUG="debug nologfuncflow noprintmutexaction nostdout"
export RSYSLOG_DEBUGLOG="log"
```
2. **Disable cleanup** in your test script:
```bash
# Comment the exit_test line
#exit_test
```
3. **Run the test**:
```bash
cd tests
./mmsnareparse-trailing-extradata.sh
```
4. **Examine output**:
```bash
# Check actual output vs expected
cat rstb_*.out.log
# Search debug log for specific patterns
grep "extradata_section" log
grep "Truncated trailing" log
```
5. **Restore test environment**:
- Re-comment debug exports in `diag.sh`
- Uncomment `exit_test` in your test script
- Clean up test artifacts: `rm -f rstb_* log`
### Understanding Test Output
When a test fails with `content_check`, the error shows:
```
FAIL: content_check failed to find "expected content"
FILE "rstb_*.out.log" content:
1 actual line 1
2 actual line 2
```
This helps identify:
- What the test expected vs what was produced
- Whether the module parsed the message correctly
- If fields are populated as expected
## Coordination
- When adding tests for a plugin or runtime subsystem, mention them in the
components `AGENTS.md` so future authors know smoke coverage exists.

View File

@ -513,7 +513,8 @@ TESTS += \
mmsnareparse-sysmon.sh \
mmsnareparse-kerberos.sh \
mmsnareparse-custom.sh \
mmsnareparse-realworld-4624-4634-5140.sh
mmsnareparse-realworld-4624-4634-5140.sh \
mmsnareparse-trailing-extradata.sh
if HAVE_VALGRIND
TESTS += \
mmsnareparse-comprehensive-vg.sh
@ -2095,6 +2096,7 @@ EXTRA_DIST= \
mmsnareparse-realworld-4624-4634-5140.sh \
mmsnareparse-syslog.sh \
mmsnareparse-value-types.sh \
mmsnareparse-trailing-extradata.sh \
mmexternal-InvldProg-vg.sh \
nested-call-shutdown.sh \
1.rstest 2.rstest 3.rstest err1.rstest \

View File

@ -0,0 +1,55 @@
#!/bin/bash
# Validate mmsnareparse parsing with trailing extra-data section truncation.
# This test verifies both the with-tabs code path and documents the bug fix for the no-tabs path.
unset RSYSLOG_DYNNAME
. ${srcdir:=.}/diag.sh init
generate_conf
add_conf '
module(load="../plugins/mmsnareparse/.libs/mmsnareparse")
template(name="outfmt" type="list") {
property(name="$!win!Event!EventID")
constant(value=",")
property(name="$!win!Event!Channel")
constant(value=",")
property(name="$!win!EventData!EventType")
constant(value=",")
property(name="$!win!EventData!TargetObject")
constant(value=",")
property(name="$!win!EventData!User")
constant(value=",")
property(name="$!extradata_section")
constant(value="\n")
}
action(type="mmsnareparse"
definition.file="../plugins/mmsnareparse/sysmon_definitions.json"
ignoreTrailingPattern="enrichment_section:")
action(type="omfile" file="'$RSYSLOG_OUT_LOG'" template="outfmt")
'
startup
cat <<'MSG' > ${RSYSLOG_DYNNAME}.input
<14>Mar 22 08:47:23 testhost MSWinEventLog 1 Microsoft-Windows-Sysmon/Operational 20977 Mon Mar 22 08:47:23 2025 13 Windows SYSTEM User SetValue testhost Registry value set (rule: RegistryEvent) Registry value set: RuleName: Default RegistryEvent EventType: SetValue UtcTime: 2025-03-22 08:47:23.284 ProcessGuid: {fd4d0da6-d589-6916-eb03-000000000000} ProcessId: 4 Image: System TargetObject: HKLM\System\CurrentControlSet\Services\TestService\ImagePath Details: "C:\Program Files\TestAgent\TestService.exe" User: NT AUTHORITY\SYSTEM 3385599 enrichment_section: fromhost-ip=192.168.45.217
MSG
injectmsg_file ${RSYSLOG_DYNNAME}.input
shutdown_when_empty
wait_shutdown
# Test that Event ID 13 (Registry value set) is parsed correctly
# The ignoreTrailingPattern should remove "enrichment_section: fromhost-ip=192.168.45.217"
# from the message before parsing, so it should NOT appear in any parsed fields.
# However, the truncated content is stored in the !extradata_section property.
# This test verifies:
# 1. Parsing works correctly (EventID=13, Channel, EventType=SetValue, TargetObject, User)
# 2. The enrichment section is removed from parsing (doesn't affect parsed fields)
# 3. The truncated content is stored in !extradata_section property (tests with-tabs code path)
#
# NOTE: A critical bug in the no-tabs code path (lines 5111-5121 in mmsnareparse.c) was fixed
# where strdup was called AFTER truncation, resulting in an empty string. The fix reverses
# the order: copy first, then truncate. This is now consistent with the with-tabs path.
content_check '13,Microsoft-Windows-Sysmon/Operational,SetValue,HKLM\System\CurrentControlSet\Services\TestService\ImagePath,NT AUTHORITY\SYSTEM,3385599 enrichment_section: fromhost-ip=192.168.45.217' $RSYSLOG_OUT_LOG
exit_test