-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Integrate Infrastructure Service Logs into Observability Stack
Purpose
Currently, only Node.js application logs (Ganymede, Gateway) are integrated into the observability stack via OTLP. Infrastructure service logs (PostgreSQL, Nginx, dnsmasq, PowerDNS, and gateway container services) are not yet collected, limiting our ability to correlate application issues with underlying infrastructure problems.
This issue tracks the implementation of OTLP Collector sidecars to collect and forward infrastructure service logs to the main observability stack (Loki via OTLP Collector).
Current State
✅ Already Integrated
- Node.js applications (Ganymede, Gateway) send logs via OTLP
- Logs are structured with trace context and service metadata
- Viewable in Grafana via Loki datasource
❌ Not Yet Integrated
Dev Container Services:
- PostgreSQL - Database logs
- Nginx (Stage 1) - Web server access/error logs (SSL termination)
- dnsmasq - DNS forwarder logs
- PowerDNS - DNS authoritative server logs
Gateway Container Services:
- Nginx (Stage 2) - Reverse proxy logs inside gateway containers
- OpenVPN - VPN server logs inside gateway containers
- app-gateway - Already sends logs via OTLP ✅
Architecture
The OTLP Collector runs in a sibling Docker container, not inside the dev container. This means we need log shipper agents (OTLP Collector sidecars) running inside:
- Dev container - To collect dev container service logs
- Gateway containers - To collect gateway container service logs
Both sidecars forward logs to the main OTLP Collector via HTTP/gRPC API.
Implementation Tasks
Phase 1: Dev Container Sidecar
- Install OTLP Collector binary in dev container
- Download and install
otelcol-contribbinary - Add to PATH or install to
/usr/local/bin/
- Download and install
- Create sidecar configuration file
- Location:
/root/.local-dev/observability/collector-sidecar-config.yaml - Configure
filelogreceivers for:- Nginx access logs (
/var/log/nginx/access.log) - Nginx error logs (
/var/log/nginx/error.log) - PowerDNS logs (
/var/log/pdns.log) - dnsmasq logs (
/var/log/dnsmasq.log)
- Nginx access logs (
- Configure
journaldreceiver for PostgreSQL - Set OTLP exporter to forward to main collector
- Location:
- Enable service logging
- Configure dnsmasq:
log-queriesandlog-facility=/var/log/dnsmasq.login/etc/dnsmasq.conf - Configure PowerDNS:
log-dns-queries=yesandlog-facility=/var/log/pdns.login/etc/powerdns/pdns.conf - Verify Nginx logs to
/var/log/nginx/(already configured)
- Configure dnsmasq:
- Create startup script/service
- Script to start sidecar collector as background process
- Ensure it starts automatically on dev container startup
- Handle collector restarts on failure
- Update network configuration
- Ensure dev container can access main OTLP Collector
- Use Docker network:
http://observability-otlp-collector:4318 - Or Docker bridge gateway IP if needed
Phase 2: Gateway Container Sidecar
- Update gateway Dockerfile
- Install OTLP Collector binary (
otelcol-contrib) - Add collector config file to image:
/opt/gateway/observability/collector-config.yaml
- Install OTLP Collector binary (
- Create gateway sidecar configuration
- Configure
filelogreceivers for:- Nginx access logs (
/var/log/nginx/access.log) - Nginx error logs (
/var/log/nginx/error.log) - OpenVPN logs (
/tmp/ovpn-*/logs/openvpn.log) - wildcard pattern for multiple VPN instances
- Nginx access logs (
- Add resource attributes:
service.name,gateway_id,deployment.environment - Set OTLP exporter to forward to main collector
- Configure
- Update gateway entrypoint
- Start collector sidecar after services are started
- Run as background process:
otelcol-contrib --config=/opt/gateway/observability/collector-config.yaml &
- Update gateway container network configuration
- Ensure gateway containers can access main OTLP Collector
- Add gateway containers to
observability-networkOR - Use Docker bridge and access collector via host IP
- Update
gateway-pool.shif needed- Ensure network configuration allows collector access
Phase 3: Testing & Verification
- Verify dev container sidecar is running
- Check process:
ps aux | grep otelcol-contrib - Check logs for errors
- Check process:
- Verify gateway container sidecars are running
- Check in each gateway container:
docker exec <container> ps aux | grep otelcol-contrib
- Check in each gateway container:
- Verify logs appear in Grafana
- Query:
{service_name="nginx"} - Query:
{service_name="dnsmasq"} - Query:
{service_name="powerdns"} - Query:
{service_name="postgresql"} - Query:
{service_name="gateway-nginx"} - Query:
{service_name="gateway-openvpn"}
- Query:
- Test log parsing
- Verify Nginx access logs are parsed correctly (remote_addr, method, path, status, etc.)
- Verify error logs include severity levels
- Verify gateway logs include
gateway_idattribute
- Test log volume and performance
- Monitor collector resource usage
- Verify no performance degradation
- Check log delivery latency
Phase 4: Documentation & Cleanup
- Update observability setup documentation
- Document sidecar configuration
- Document log locations and formats
- Add troubleshooting section
- Create Grafana dashboard examples
- Infrastructure service health dashboard
- Log volume and error rate dashboards
- Add log retention configuration
- Configure Loki retention policies
- Document storage impact estimates
Acceptance Criteria
- All dev container service logs (PostgreSQL, Nginx, dnsmasq, PowerDNS) are visible in Grafana
- All gateway container service logs (Nginx, OpenVPN) are visible in Grafana
- Logs are properly labeled with
service.nameand other relevant attributes - Gateway logs include
gateway_idfor filtering per-gateway logs - Logs are parsed correctly (structured fields extracted)
- Sidecars start automatically and handle failures gracefully
- No performance degradation observed
- Documentation is updated with setup and troubleshooting information
Log Locations Reference
Dev Container:
- Nginx:
/var/log/nginx/access.log,/var/log/nginx/error.log - PowerDNS:
/var/log/pdns.log(after configuration) - dnsmasq:
/var/log/dnsmasq.log(after configuration) - PostgreSQL: Systemd journal (
journalctl -u postgresql)
Gateway Containers:
- Nginx:
/var/log/nginx/access.log,/var/log/nginx/error.log - OpenVPN:
/tmp/ovpn-{random}/logs/openvpn.log(per VPN instance)
Related Documentation
- See
doc/guides/INFRASTRUCTURE_LOGS_OBSERVABILITY.mdfor detailed analysis and configuration examples - See
scripts/local-dev/OBSERVABILITY_SETUP.mdfor observability stack setup
Notes
- OTLP Collector sidecar approach is chosen for unified pipeline (all logs via OTLP)
- Alternative solutions (Promtail, Fluent Bit) were considered but OTLP Collector provides better integration with existing stack
- Log volume considerations: Nginx access logs can be high volume, consider sampling or filtering if needed
- Gateway containers may run multiple OpenVPN instances, sidecar must handle wildcard log paths