ROSI Collector could get stuck when Loki rejected a pushed sample with HTTP 400, for example because old or clock-skewed sender timestamps were outside Loki's accepted time window. In that state one bad backlog entry could keep the omhttp action retrying instead of advancing to newer logs. Remove the stray /var/log/debug selector from the Loki ruleset. Teach init.sh to hash the installed rsyslog.conf directory before and after copying/generating collector config. If the installed collector rsyslog config changed and an rsyslog Compose container already exists, init.sh recreates only that service with --force-recreate so Docker refreshes the bind-mounted config. First installs still defer to the normal docker compose up -d flow. Document the Loki 400 troubleshooting path, the collector timestamp choice, the automatic init.sh recreate behavior, and the manual checksum check for stale config suspicions. closes: https://github.com/rsyslog/rsyslog/issues/6843
ROSI Collector
Rsyslog Operations Stack Initiative - A production-ready centralized log collection and monitoring stack.
ROSI Collector provides a complete solution for collecting, storing, and visualizing system logs from multiple hosts using rsyslog, Loki, Grafana, and Prometheus.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ ROSI Collector │
│ ┌─────────────┐ ┌─────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Traefik │ │ Loki │ │ Grafana │ │ Prometheus │ │
│ │ (proxy) │ │ (logs) │ │ (viz) │ │ (metrics) │ │
│ └──────┬──────┘ └────┬────┘ └─────┬────┘ └──────┬───────┘ │
│ │ ▲ │ │ │
│ │ │ │ │ │
│ ┌──────┴──────────────┴─────────────┴──────────────┴───────┐ │
│ │ rsyslog (omhttp) │ │
│ │ (log receiver, TCP 10514/6514) │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲ ▲
│ logs (TCP 10514/6514) │ metrics (9100)
│ │
┌────────┴────────┐ ┌────────┴────────┐
│ Client Host │ │ Client Host │
│ (rsyslog + │ ... │ (rsyslog + │
│ node-exporter) │ │ node-exporter) │
└─────────────────┘ └─────────────────┘
▲ Server also forwards its own logs to localhost:10514
│ (configured automatically by init.sh)
Quick Start
Prerequisites
- Linux server (Ubuntu 24.04 LTS recommended, tested)
- Docker Engine 20.10+
- Docker Compose v2
- 2GB+ RAM (4GB+ recommended)
- Domain name (for TLS) or local deployment
Note
: The installation scripts have been tested on Ubuntu 24.04 LTS. Other Debian-based distributions should work with minor adjustments.
1. Clone Repository
git clone https://github.com/rsyslog/rsyslog.git
cd rsyslog/deploy/docker-compose/rosi-collector
2. Initialize Environment
Run the initialization script (as root):
sudo ./scripts/init.sh
The script will interactively prompt for:
- Installation directory (default:
/opt/rosi-collector) - TRAEFIK_DOMAIN - Domain or IP for accessing Grafana (required)
- TRAEFIK_EMAIL - Email for Let's Encrypt certificates
- GRAFANA_ADMIN_PASSWORD - Leave empty to auto-generate
- TLS Configuration - Enable encrypted syslog on port 6514
- Server syslog forwarding - Forward server's own logs to collector
For non-interactive installation, set environment variables:
sudo TRAEFIK_DOMAIN=logs.example.com \
TRAEFIK_EMAIL=admin@example.com \
./scripts/init.sh
For non-interactive server self-monitoring configuration:
sudo TRAEFIK_DOMAIN=logs.example.com \
TRAEFIK_EMAIL=admin@example.com \
SERVER_SYSLOG_FORWARDING=true \
./scripts/init.sh
The script will:
- Copy all configuration files
- Generate
.envwith secure passwords - Install and configure node_exporter for server self-monitoring
- Add server to Prometheus targets (using
prometheus-targethelper) - Configure server syslog forwarding to collector (optional)
- Create Docker network and systemd service
- Generate TLS certificates (if TLS enabled)
3. Start the Stack
cd /opt/rosi-collector # or your chosen install directory
docker compose up -d
# Check status
docker compose ps
# View logs
docker compose logs -f
4. Access Grafana
- Open
https://logs.example.com(your configured TRAEFIK_DOMAIN) - Login with
admin/ password from.envfile:grep GRAFANA_ADMIN_PASSWORD /opt/rosi-collector/.env - Pre-configured dashboards are available under Dashboards. To change dashboard layout or panels, edit the JSON files in
grafana/provisioning/dashboards/templates/and runpython3 scripts/render-dashboards.pyto updategenerated/; then copy to the install directory and restart Grafana.
5. Configure Clients
On each client host, forward logs to your ROSI Collector:
# Download and run client setup script
wget https://your-domain.com/downloads/install-rsyslog-client.sh
sudo bash install-rsyslog-client.sh
Note: omfwd is built-in in rsyslog and does not require module(load="omfwd").
If you see errors about omfwd.so missing, remove any explicit module load.
See clients/README.md for detailed client setup instructions.
Services
| Service | Port | Description |
|---|---|---|
| Traefik | 80, 443 | Reverse proxy with automatic TLS |
| Grafana | 3000 | Log visualization and dashboards |
| Loki | 3100 | Log storage and querying |
| Prometheus | 9090 | Metrics collection |
| rsyslog | 10514 | Log receiver (TCP plaintext) |
| rsyslog | 6514 | Log receiver (TCP TLS, if enabled) |
| node_exporter | 9100 | Host metrics (auto-installed and configured on server) |
| rsyslog-impstats exporter | 9898 | rsyslog impstats sidecar metrics (optional) |
Directory Structure
rosi-collector/
├── docker-compose.yml # Main stack definition
├── .env.template # Environment template
├── loki-config.yml # Loki configuration
├── prometheus.yml # Prometheus configuration
├── logrotate.conf # Log rotation
├── rsyslog.conf/ # rsyslog configuration
├── traefik/ # Traefik configuration
├── prometheus-targets/ # Prometheus scrape targets
│ ├── nodes.yml # node_exporter targets
│ └── impstats.yml # rsyslog impstats sidecar targets
├── grafana/
│ └── provisioning/
│ ├── dashboards/
│ │ ├── dashboards.yml # Points to generated/
│ │ ├── templates/ # Edit these; run render-dashboards.py
│ │ └── generated/ # Provisioned JSON (do not edit by hand)
│ ├── datasources/ # Loki & Prometheus configs
│ └── alerting/ # Alert rules (default.yml)
├── scripts/
│ ├── init.sh # Initialize environment
│ ├── monitor.sh # Health monitoring (rosi-monitor)
│ ├── prometheus-target.sh # Manage scrape targets (node + impstats)
│ ├── render-dashboards.py # Build generated/ from templates/
│ ├── generate-ca.sh # Generate TLS CA and server certs
│ ├── generate-client-cert.sh # Generate client certificates
│ ├── install-server.sh # Prepare fresh Ubuntu server
│ └── configs/ # Server config templates
└── clients/ # Client setup resources
Pre-built Dashboards
ROSI Collector includes several pre-configured Grafana dashboards (provisioned from grafana/provisioning/dashboards/generated/):
| Dashboard | Description |
|---|---|
| Syslog Explorer | Search and browse logs by host, time range, and filters |
| Syslog Analysis | Distribution analysis (severity, hosts, facilities) |
| Syslog Health | rsyslog impstats metrics (queues, actions, input, output, CPU) — see below |
| Host Metrics Overview | Node exporter metrics (CPU, memory, disk) per host |
| Alerting Overview | Active alerts and alert rule status |
Syslog Health (impstats) dashboard
When client hosts run the impstats sidecar (default with install-rsyslog-client.sh), Prometheus scrapes the sidecar’s /metrics endpoint. The Syslog Health dashboard visualizes those metrics. Add impstats targets with:
prometheus-target --job impstats add <client-ip>:9898 host=<hostname> role=rsyslog network=internal
# Or add both node and impstats for a client in one step:
prometheus-target add-client <client-ip> host=<hostname> role=rsyslog network=internal
The dashboard is grouped into sections:
| Section | Contents |
|---|---|
| Overview | Exporter count, action failures, suspended actions, queue full, open files, max RSS |
| Queues | Queue size vs max, utilization %, drop rates (discarded_full / discarded_nf) |
| Input | Ingest rate by input (e.g. imuxsock) |
| Actions | Action processed rate, action failures by action, suspended duration (collapsed by default) |
| Output & resource usage | Output bytes (omfwd) and CPU usage (impstats) side by side |
Use the Host dropdown to filter by client. For metric details and suggested alert thresholds, see ROSI_IMPSTATS_PLAN.md.
Management Scripts
Initialize Environment
# Run from rosi-collector directory
sudo ./scripts/init.sh
# With custom install directory (one-time)
sudo INSTALL_DIR=/srv/rosi ./scripts/init.sh
Configuration Persistence: On first run, the script will prompt for a custom
install directory. Your choice is saved to ~/.config/rsyslog/rosi-collector.conf
and automatically used for future runs.
Config file locations (in priority order):
- Environment variable:
INSTALL_DIR=/path ./scripts/init.sh - User config:
~/.config/rsyslog/rosi-collector.conf - System config:
/etc/rsyslog/rosi-collector.conf - Default:
/opt/rosi-collector
Impstats exporter firewall rule:
If UFW/firewalld is active, init.sh will add a rule to allow the Docker
subnet to reach the rsyslog impstats exporter on port 9898. To disable this:
ENABLE_IMPSTATS_FIREWALL=false ./scripts/init.sh.
Prometheus Targets
Use the prometheus-target helper to manage scrape targets. The default job
is node (node_exporter). For the rsyslog impstats sidecar exporter, use the
impstats job:
# Impstats only (sidecar on port 9898)
# Note: init.sh can install the impstats sidecar on the server interactively
# (or use SERVER_IMPSTATS_SIDECAR=true for non-interactive).
sudo prometheus-target --job impstats add 127.0.0.1:9898 host=rosi-collector role=rsyslog network=internal
sudo prometheus-target --job impstats list
# Add both node_exporter (9100) and impstats (9898) for a client in one step
sudo prometheus-target add-client 10.0.0.5 host=client01 role=rsyslog network=internal
Prepare Fresh Server (Optional)
⚠️ WARNING: Only run this on fresh/new systems. Do NOT run on servers you maintain yourself - it modifies system configuration, firewall rules, and installs packages.
For fresh Ubuntu 24.04 systems, this script installs Docker, configures firewall, and prepares the server:
# Interactive installation (asks about each config file)
sudo ./scripts/install-server.sh
# Non-interactive (install all configs)
sudo NONINTERACTIVE=1 ./scripts/install-server.sh
# With custom settings
sudo SSH_PORT=22 \
OPEN_TCP_PORTS="80 443 10514" \
HOSTNAME_FQDN="rosi.example.com" \
./scripts/install-server.sh
The script interactively asks before installing each configuration file:
sysctl-hardening.conf- Kernel networking and security settingsrsyslog-docker.conf- Docker container log routingdocker-logs-logrotate- Log rotation for Docker logsfail2ban-custom.local- SSH brute-force protectiondocker-daemon.json- Docker daemon configurationdocker-override.conf- Docker systemd service overrides
Monitor Stack Health
After running init.sh, the rosi-monitor command is available system-wide:
# Show container status (includes Docker internal IPs)
rosi-monitor status
# Show recent logs
rosi-monitor logs
# Health check
rosi-monitor health
# Check SMTP configuration
rosi-monitor smtp
# Backup configuration and data
rosi-monitor backup [name]
# Restore from backup
rosi-monitor restore [name]
# List backups
rosi-monitor backups
# Reset Grafana admin password (reads from .env)
rosi-monitor reset-password
# Reset to custom password
rosi-monitor reset-password mynewpassword
# Interactive debug menu
rosi-monitor debug
# Shell into container
rosi-monitor shell grafana
The status command displays:
- Docker Compose container status
- Individual container health
- Docker network information (network name, subnet, gateway)
- Internal container IPs (useful for debugging connectivity)
- Resource usage (CPU, memory, network I/O)
Alternatively, run from the installation directory:
cd /opt/rosi-collector
./scripts/monitor.sh status
Manage Prometheus Targets
After running init.sh, the prometheus-target command is available system-wide to manage node_exporter scrape targets:
# Add a target (host label is required)
prometheus-target add 10.0.0.5:9100 host=webserver
# Add with multiple labels (role, network, env)
prometheus-target add 10.0.0.5:9100 host=webserver role=web network=internal env=production
# List all configured targets
prometheus-target list
# Remove a target by IP:port
prometheus-target remove 10.0.0.5:9100
# Remove a target by hostname
prometheus-target remove webserver
Available labels:
| Label | Description |
|---|---|
host=<name> |
Hostname (required) |
role=<value> |
Server role (e.g., web, db, app) |
env=<value> |
Environment (e.g., production, staging) |
network=<value> |
Network zone (e.g., internal, dmz) |
<key>=<value> |
Any custom label |
Changes are picked up by Prometheus automatically within 5 minutes.
Installation directory auto-detection:
The script automatically detects your ROSI Collector installation by checking:
INSTALL_DIRenvironment variable- User config (
~/.config/rsyslog/rosi-collector.conf) - System config (
/etc/rsyslog/rosi-collector.conf) - Common locations (
/opt/rosi-collector,/srv/central,/srv/rosi-collector)
To override, set the INSTALL_DIR environment variable:
INSTALL_DIR=/custom/path prometheus-target list
Configuration
Environment Variables
Core configuration (prompted interactively or set via environment):
| Variable | Default | Description |
|---|---|---|
TRAEFIK_DOMAIN |
- | Domain or IP for the stack (required) |
TRAEFIK_EMAIL |
admin@domain | Let's Encrypt email |
GRAFANA_ADMIN_PASSWORD |
(generated) | Grafana admin password |
INSTALL_DIR |
/opt/rosi-collector |
Installation directory |
Server self-monitoring (non-interactive mode):
| Variable | Default | Description |
|---|---|---|
SERVER_SYSLOG_FORWARDING |
false |
Forward server's syslog to collector |
Note: node_exporter is always installed and configured on the server. The script will:
- Install the binary if not present
- Create/configure the systemd service if missing or not running
- Bind to the Docker bridge gateway IP (so Prometheus container can scrape it)
- Configure firewall rules (UFW/firewalld/iptables) to allow container access
- Add the server to Prometheus targets automatically
- Detect and update binding if Docker network changes
Optional features:
| Variable | Default | Description |
|---|---|---|
WRITE_JSON_FILE |
off |
Also write logs to JSON file |
SMTP_ENABLED |
false |
Enable email alerting |
SMTP_HOST |
- | SMTP server hostname |
SMTP_PORT |
587 |
SMTP server port |
SMTP_USER |
- | SMTP username |
SMTP_PASSWORD |
- | SMTP password |
TLS/mTLS configuration (for encrypted syslog):
| Variable | Default | Description |
|---|---|---|
SYSLOG_TLS_ENABLED |
false |
Enable TLS on port 6514 |
SYSLOG_TLS_HOSTNAME |
TRAEFIK_DOMAIN | Certificate hostname |
SYSLOG_TLS_CA_DAYS |
3650 |
CA certificate validity |
SYSLOG_TLS_SERVER_DAYS |
365 |
Server cert validity |
SYSLOG_TLS_CLIENT_DAYS |
365 |
Client cert validity |
SYSLOG_TLS_AUTHMODE |
anon |
Auth mode (anon/x509/certvalid/x509/name) |
SYSLOG_TLS_PERMITTED_PEERS |
*.hostname |
Allowed client CN patterns (x509/name only) |
TLS Configuration
Syslog TLS (Port 6514)
When TLS is enabled during init.sh, the script automatically:
- Generates a self-signed CA certificate
- Generates a server certificate signed by the CA
- Configures rsyslog to accept TLS connections on port 6514
Authentication Modes:
| Mode | Description |
|---|---|
anon |
TLS encryption only, no client certificates required |
x509/certvalid |
mTLS - clients must present valid CA-signed certificate |
x509/name |
mTLS - clients must have certificate matching PERMITTED_PEERS |
Generating Client Certificates:
# Generate and download client certificate package
rosi-generate-client-cert --download client-hostname
# The package includes: client-key.pem, client-cert.pem, ca.pem
Traefik TLS (HTTPS)
Traefik automatically obtains Let's Encrypt certificates for HTTPS access. For local/development without a domain:
- Comment out the TLS configuration in
docker-compose.yml - Access services directly via their ports
Log Retention
Loki is configured with:
- 30-day retention period
- Local filesystem storage
- Compaction enabled
Modify loki-config.yml to adjust retention settings.
Troubleshooting
When a service fails to start or becomes unhealthy after installation, follow this diagnostic flow.
Step 1: Run the health check
rosi-monitor health
This reports which containers are running and which endpoints are accessible. If the health check fails for a specific service, use the per-service section below.
Step 2: Per-service diagnostics
| Symptom | Commands to run |
|---|---|
| Container restarting / unhealthy | docker inspect <container> --format '{{.State.Health.Status}}' |
| Grafana web interface not accessible | See Grafana troubleshooting |
| 429 Too Many Requests on dashboard reload | See 429 rate limit |
| Loki not ready | curl http://localhost:3100/ready then docker compose logs loki --tail 100 |
| Prometheus targets down | docker exec prometheus-central wget -qO- http://localhost:9090/api/v1/targets |
| Logs not in Grafana | See Logs not appearing |
| Traefik / HTTPS 502 | docker compose logs traefik --tail 50 |
| Downloads URL returns 404 | See Downloads 404 |
For interactive debugging: rosi-monitor debug (menu with status, logs, shell access, restart).
Grafana not starting or not accessible
If rosi-monitor health reports "Grafana web interface is not accessible" or the Grafana container is unhealthy:
Note: Container names use the project name (e.g.
rosi-collector-grafana-1for/opt/rosi-collector). Usedocker compose psto see your actual container names.
-
Check container status:
docker inspect rosi-collector-grafana-1 --format '{{.State.Health.Status}}' docker logs rosi-collector-grafana-1 --tail 80 -
Permission denied on dashboards (crash-loop): Grafana runs as UID 472 and needs read access to the provisioning directory. Fix with:
cd /opt/rosi-collector # or your INSTALL_DIR chmod -R o+rX grafana/provisioning docker compose restart grafana -
Test Grafana API locally:
curl -s http://127.0.0.1:3000/api/healthExpect JSON with
"database":"ok". If connection refused, Grafana is not listening yet. -
HTTPS returns 404: Ensure you access
https://YOUR_TRAEFIK_DOMAIN/(not a subdomain) and thatGF_SERVER_ROOT_URLin the container matches your URL. Check:docker exec rosi-collector-grafana-1 env | grep GF_SERVER
429 Too Many Requests
If you see 429 Too Many Requests when reloading Grafana dashboards (especially on /api/annotations or panel queries), Traefik's rate limiter is triggering. A dashboard reload sends many parallel API calls.
The stack uses a separate, more permissive limit for Grafana (600 req/min, burst 300). To apply the fix:
- Re-run
init.shto regeneratetraefik/dynamic.yml - If you have
docker-compose.override.yml, ensure Grafana usesrate-limit-grafana@file(notrate-limit@file) - Restart Traefik:
docker compose restart traefik
Container won't start
# Check container logs (replace <service> with grafana, loki, prometheus, rsyslog, traefik)
docker compose logs <service> --tail 100
# Verify disk space
df -h /opt/rosi-collector
# Check Docker status
systemctl status docker
# If container is restarting, inspect exit reason
docker inspect rosi-collector-grafana-1 --format '{{.State.Status}} {{.State.Error}}'
Logs not appearing in Grafana
-
Check rsyslog is receiving logs:
docker compose logs rsyslog --tail 50 -
Verify rsyslog omhttp is sending to Loki:
docker compose logs rsyslog | grep -i loki -
Check Loki health:
curl http://localhost:3100/ready -
Test log flow with the monitor:
rosi-monitor health # Includes "Log flow (rsyslog -> omhttp -> Loki)" check
If rsyslog logs show omhttp: checkResult error http status code: 400
or Loki reports entries as "too far behind", Loki rejected the sample and
the collector should advance to newer messages. The shipped
rsyslog.conf/30-send-loki-http.conf marks HTTP 400 as non-retriable with
httpIgnorableCodes=["400"] and uses rsyslog timegenerated (collector
receive/process time) as the Loki stream timestamp. The sender-reported
timestamp remains in the original message text.
After changing files under rsyslog.conf/, re-run scripts/init.sh to copy
the updated files into the install directory. If an existing rsyslog
container is present and the installed rsyslog.conf/ content changed,
init.sh recreates only that container so the bind-mounted config is
reloaded. To force the same refresh manually:
docker compose up -d --force-recreate rsyslog
When debugging a stale config suspicion, compare the host and container copy:
md5sum rsyslog.conf/30-send-loki-http.conf
docker compose exec rsyslog md5sum /etc/rsyslog.d/30-send-loki-http.conf
Prometheus can't scrape node_exporter (server target down)
The node_exporter on the server must bind to the Docker bridge gateway IP for Prometheus (running inside a container) to reach it.
-
Check current binding:
grep listen-address /etc/systemd/system/node_exporter.service -
Check Docker network gateway:
rosi-monitor status # Shows network info and gateway # Or manually: docker network inspect rosi-collector-net --format '{{range .IPAM.Config}}{{.Gateway}}{{end}}' -
Fix binding if mismatched:
BIND_IP=$(docker network inspect rosi-collector-net --format '{{range .IPAM.Config}}{{.Gateway}}{{end}}') sed -i "s/listen-address=[0-9.]*/listen-address=${BIND_IP}/" /etc/systemd/system/node_exporter.service systemctl daemon-reload systemctl restart node_exporter -
Check firewall allows container access:
# For UFW: ufw status | grep 9100 # Add rule if missing: ufw allow from 172.20.0.0/16 to 172.20.0.1 port 9100 proto tcp -
Test from Prometheus container:
docker exec prometheus-central wget -q -O - --timeout=3 http://172.20.0.1:9100/metrics | head -3
Downloads URL returns 404
Symptom: https://YOUR_DOMAIN/downloads/install-rsyslog-client.sh returns 404.
-
Check downloads container is running:
docker compose ps downloads -
Verify file exists:
ls -la /opt/rosi-collector/downloads/install-rsyslog-client.shIf missing, run
init.shto populate downloads, then restart:cd /path/to/rsyslog/deploy/docker-compose/rosi-collector sudo ./scripts/init.sh cd /opt/rosi-collector && docker compose up -d -
Test from inside the container:
docker exec downloads-central wget -qO- http://127.0.0.1/downloads/install-rsyslog-client.sh | head -5
High memory usage
Loki stores recent logs in memory. Reduce chunk_idle_period in loki-config.yml for lower memory usage.
Client Setup
See the clients/ directory for:
rsyslog-forward.conf- Full rsyslog forwarding configrsyslog-forward-minimal.conf- Minimal forwarding configinstall-rsyslog-client.sh- Automated client setupinstall-node-exporter.sh- Prometheus node exporter setup
Security Considerations
- TLS is enabled by default via Traefik
- Grafana is protected by basic auth
- rsyslog port (10514) should be firewalled to trusted clients
- Prometheus targets file contains client IPs
Contributing
This is part of the rsyslog project. See the main repository's CONTRIBUTING.md for guidelines.
License
Apache License 2.0 - See LICENSE file in the rsyslog repository root.