A high-throughput C++17 log aggregation engine that collects, streams, and stores logs from distributed nodes. Ten worker threads simulate independent distributed nodes, each generating 100,000 log entries for a total of 1 million events per run. Logs are streamed in real time to Apache Kafka via librdkafka, then batch-compressed with zlib and uploaded to AWS S3 -- providing an end-to-end pipeline from log generation to durable, cost-efficient cloud storage.
Security monitoring depends on reliable, low-latency log pipelines. Intrusion detection systems, SIEM platforms, and forensic analysis tools are only as good as the data feeding them. Dropped or delayed logs create blind spots that attackers exploit. This project builds that foundational infrastructure from scratch in C++, exercising the same concurrency, compression, and cloud-integration patterns found in production security tooling.
+---------------------+ +--------------------+ +------------------+ +-----------+
| Simulated Nodes (10)| ---> | Thread Pool | ---> | Mutex-Protected | ---> | Kafka |
| 100K logs each | | std::thread x 10 | | Shared Vector | | Producer |
+---------------------+ +--------------------+ +------------------+ +-----------+
|
v
+------------------+
| zlib Compression |
| (DEFLATE) |
+------------------+
|
v
+------------------+
| AWS S3 Upload |
| (PutObject) |
+------------------+
Flow: Worker threads produce logs concurrently into a mutex-protected shared vector and fire async messages to Kafka. After all threads are joined, the main thread compresses the aggregated logs and performs a single batch upload to S3.
- 10 worker threads spawned via
std::thread, each representing a simulated distributed node. - Master-worker pattern: the main thread creates the pool, joins all workers, then handles the S3 upload.
- Each worker generates 100,000 log entries (1 million total across all workers).
- A shared
std::vector<std::string>serves as the central log buffer. - Access is protected by a
std::mutexusingstd::lock_guardfor RAII-based automatic lock management. - Producer-consumer pattern: worker threads produce log entries into the shared buffer; the S3 upload path consumes the aggregated result after all producers complete.
- Uses librdkafka C++ bindings (
RdKafka::Producer). - Bootstrap servers configured at
localhost:9092. - Topic:
"logs"with automatic partitioning (RD_KAFKA_PARTITION_UA). - Produce calls are async and non-blocking, issued outside the critical section to avoid holding the mutex during I/O.
- Fire-and-forget delivery model for maximum throughput.
- AWS SDK for C++ (
Aws::S3::S3Client) handles cloud storage. - Logs are compressed using zlib DEFLATE at
Z_DEFAULT_COMPRESSION(level 6) before upload. - A single
PutObjectcall uploads the entire batch of concatenated, compressed logs after all threads have completed.
| Metric | Value |
|---|---|
| Total throughput | ~1,000,000 log events per run |
| Concurrent streams | 10 |
| Kafka produce latency | < 100 ms |
| Compression savings | ~25% storage cost reduction |
Rate_Limiter -- Token bucket and sliding window rate limiting library. Together these two projects cover core security infrastructure primitives: reliable log ingestion and request rate control.
g++with C++17 supportlibssl-devzlib1g-devlibrdkafka-dev- AWS SDK for C++ (S3 module)
- A running Kafka broker on
localhost:9092 - AWS credentials configured (
~/.aws/credentialsor environment variables)
sudo apt install g++ cmake libssl-dev zlib1g-dev librdkafka-devg++ -std=c++17 src/log_aggregator.cpp -o log_aggregator -laws_s3 -lrdkafka++ -lz -pthread./log_aggregatorThis project exercises skills directly applicable to security engineering roles:
- Systems-level memory management -- manual control over buffers, compression streams, and SDK resources in C++.
- Concurrent programming primitives -- mutexes, lock guards, and thread joins that mirror the patterns used in high-performance security tooling.
- Security infrastructure fundamentals -- building the log pipeline that intrusion detection, SIEM, and incident response systems depend on.
This project is licensed under the GPL-3.0 License.