rsyslog/IMFILE
Rainer Gerhards ab1bd8c01b imfile: large refactoring of complete module
This commit greatly refactors imfile internal workings. It changes the
handling of inotify, FEN, and polling modes. Mostly unchanged is the
processing of the way a file is read and state files are kept.

This is about a 50% rewrite of the module.

Polling, inotify, and FEN modes now use greatly unified code. Some
differences still exists and may be changed with further commits. The
internal handling of wildcards and file detection has been completely
re-written from scratch. For example, previously when multi-level
wildcards were used these were not reliably detected. The code also
now provides much of the same functionality in all modes, most importantly
wildcards are now also supported in polling mode.

The refactoring sets ground for further enhancements and smaller
refactorings. This commit provides the same feature set that imfile
had previously and all existing CI tests pass, as do some newly
created tests.

Some specific changes:
- bugfix: module parameter "sortfiles" ignored
  This parameter only works in Solaris FEN mode, but is otherwise
  ignored.  Most importantly it is ignored under Linux.
  fixes https://github.com/rsyslog/rsyslog/issues/2528
- bugfix: imfile did not pick up all files when not present
  at startup
  fixes https://github.com/rsyslog/rsyslog/issues/2241
  fixes https://github.com/rsyslog/rsyslog/issues/2230
  fixes https://github.com/rsyslog/rsyslog/issues/2354
- bugfix: directories only support "*" wildcard, no others
  fixes https://github.com/rsyslog/rsyslog/issues/2303
- bugfix: parameter "sortfiles" did only work in FEN mode
  fixes https://github.com/rsyslog/rsyslog/issues/2528
- provides the ability to dynamically add and remove files via
  multi-level wildcards
  see also https://github.com/rsyslog/rsyslog/issues/1280
- the state file name currently has been changed to inode number
  This will further be worked on in upcoming PRs
  see also https://github.com/rsyslog/rsyslog/issues/2231
- some enhancements were also done to CI tests, most importantly
  they were made more compatibile with BSD

Note that most of the mentioned bug fixes cannot be applied to older
versions, as they fix design issues which are solved by the refactoring.
Thus there are not separate commits for them.

Distro maintainers: you need to decide to apply this patch as whole
or not. Believe me, it is not worth the effort to try to extract
specific patches from this commit. There is a good reason we do
not have multiple commits.

closes https://github.com/rsyslog/rsyslog/issues/2359
2018-03-22 08:25:47 +01:00

132 lines
4.5 KiB
Plaintext

Use Cases
---------
Monitor:
[M1] /dir/*/*.log ; tag="monitor1"
[M2] /dir/abc/a.* ; tag="monitor2"
[M3] /dir/*/a/*.txt ; tag="monitor3"
[M4] /di?/*/b/*.txt ; tag="monitor4"
- if a.log is created, we have kind of a conflict: create two monitors then?
(sounds best so far)
- /dir/abc may not initially exist, but later on created
-> same problem for any path depth (polling? multiple inotify? fail?)
Let's assume /dir/abc exist, but nothing else under /dir
Setup initial inotify:
/dir [M1]
/dir [M3]
/dir/abc [M1]
/dir/abc [M2]
/dir/abc [M3]
/dir/abc [M4]
Rule? The parent of each component must be monitored
/ [M4] -- really??? Means we need to monitor whole FS tree!
/dir [M2] - /dir/abc may not yet exists but may be created later on
Q: can we combine /dir/abc?
A: well, we need to know what shall be monitored. e.g.
/dir/abc/a.logfile is created
watch /dir/abc must be evaluated
does not match [M1], but [M2]!
Event for /dir/abc is generated
need to check [M1], [M2]
/dir/def is created
watch /dir must be evaluated --> [M1], [M2], [M3], [M4] must be checked
matches [M1],[M3],[M4]
create new watch
/dir/def [M1, M3, M4] - this watch should not exist if all is right [-> assert()]
/dir/def/new.log is created
watch /dir/def must be evaluated --> [M1, M3, M4]
create new file watch
/dir/def/new.log [M1]
/dir/abc/a.log is created
watch /dir/abc must be evaluated --> [M1, M2, M3, M4]
create new file watch
/dir/abc/a.log [M1, M2]
At each level, I must know which child components to watch out for, and which
monitor/listener they belong to. Comparison must be done on immediate-child level.
Data Structures
--------------
Monitors
........
- path needs to be split into individial components --> array
File Watches
............
- contains active file state
- one file watch per file per monitor (/dir/abc/a.log sample with two active file entries)
- multiple entries are not properly supported by inotify (and possibly FEN)
--> need to multiplex ourselfes based on single notifcation
Dir Watches
...........
- contains active directory state
- used to discover new to-be-monitored files
Problems
--------
- we need to find, for each level, the proper "next component" that we need to check
against (fnmatch on component?)
- this must be done om a per-monitor basis
- one entry per monitored file, but a list of monitors (and component names to check against)
--> how do we efficiently describe the monitor so that we can do an O(1) name comparison?
--> we can NOT assume that the full path matches, just component-by-component!
--> we may NOT get notifications if path level are rapidly created, as we may not
have setup the watch sufficiently fast - so we need to poll the full tree
in any case
- state files: different state if the same file is monitored by more than one monitor
--> or should we flag this as an error case?
Poll vs. Watch
--------------
Polling is always needed (due to watch racieness). So it probably is best to have some generic
poll code, and use watches just to reduce the amount of polling. That also means that
poll mode can do exactly the same the notification mode can do!
Procedures
----------
e.g. /dir/aaa is created
this will be reported on /dir watch, which is watched by all 4 monitors
foreach monitor:
check if next-level component matches - this is the case for [M1, M2, M3]
if so:
create new/update existing entry (existing for all but first!)
arm watch for this entry (if newly created)
do a full poll of that entry and its subentries
AS FAR AS THE MONITOR IN QUESTION is affected:
M1 will scan subdir for "*.log"
M2 will scan for "abc", if it exists scan for "a.*"
M3 will scan for "a", if it exists scan for "*.txt"
--> rest of the path components need to be scanned, files
processed for each monitor
this is a recursive function
The above-mentioned poll process is fundamental, and can occur on each level
of the file system/monitor definitions. It needs to work both on active monitors as well
as the configured path components. Does ist make sense to have them in a single structure?
Is a tree more useful than an dirs table?
now the opposite:
e.g. /dir/aaa is deleted
this will be reported on /dir watch, which is watched by all 4 monitors
check if "aaa" exists as active(dynamic) entry
if so:
check for sub-entries, initiate delete events if any exists
(TODO: is MOVE here a problem?)
delete "aaa" entry
Note: we MUST NOT remove entries that are statically configured
(e.g. /dir in our samples, even if it is deleted - it may re-appear)