Real-Time Log Aggregator for Distributed Systems

A high-throughput C++17 log aggregation engine that collects, streams, and stores logs from distributed nodes. Ten worker threads simulate independent distributed nodes, each generating 100,000 log entries for a total of 1 million events per run. Logs are streamed in real time to Apache Kafka via librdkafka, then batch-compressed with zlib and uploaded to AWS S3 -- providing an end-to-end pipeline from log generation to durable, cost-efficient cloud storage.

Why Build This?

Security monitoring depends on reliable, low-latency log pipelines. Intrusion detection systems, SIEM platforms, and forensic analysis tools are only as good as the data feeding them. Dropped or delayed logs create blind spots that attackers exploit. This project builds that foundational infrastructure from scratch in C++, exercising the same concurrency, compression, and cloud-integration patterns found in production security tooling.

Architecture

+---------------------+        +--------------------+        +------------------+        +-----------+
| Simulated Nodes (10)|  --->  | Thread Pool        |  --->  | Mutex-Protected  |  --->  | Kafka     |
| 100K logs each      |        | std::thread x 10   |        | Shared Vector    |        | Producer  |
+---------------------+        +--------------------+        +------------------+        +-----------+
                                                                      |
                                                                      v
                                                              +------------------+
                                                              | zlib Compression |
                                                              | (DEFLATE)        |
                                                              +------------------+
                                                                      |
                                                                      v
                                                              +------------------+
                                                              | AWS S3 Upload    |
                                                              | (PutObject)      |
                                                              +------------------+

Flow: Worker threads produce logs concurrently into a mutex-protected shared vector and fire async messages to Kafka. After all threads are joined, the main thread compresses the aggregated logs and performs a single batch upload to S3.

Technical Design

Threading Model

10 worker threads spawned via std::thread, each representing a simulated distributed node.
Master-worker pattern: the main thread creates the pool, joins all workers, then handles the S3 upload.
Each worker generates 100,000 log entries (1 million total across all workers).

Synchronization

A shared std::vector<std::string> serves as the central log buffer.
Access is protected by a std::mutex using std::lock_guard for RAII-based automatic lock management.
Producer-consumer pattern: worker threads produce log entries into the shared buffer; the S3 upload path consumes the aggregated result after all producers complete.

Kafka Integration

Uses librdkafka C++ bindings (RdKafka::Producer).
Bootstrap servers configured at localhost:9092.
Topic: "logs" with automatic partitioning (RD_KAFKA_PARTITION_UA).
Produce calls are async and non-blocking, issued outside the critical section to avoid holding the mutex during I/O.
Fire-and-forget delivery model for maximum throughput.

S3 Storage & Compression

AWS SDK for C++ (Aws::S3::S3Client) handles cloud storage.
Logs are compressed using zlib DEFLATE at Z_DEFAULT_COMPRESSION (level 6) before upload.
A single PutObject call uploads the entire batch of concatenated, compressed logs after all threads have completed.

Performance

Metric	Value
Total throughput	~1,000,000 log events per run
Concurrent streams	10
Kafka produce latency	< 100 ms
Compression savings	~25% storage cost reduction

Companion Project

Rate_Limiter -- Token bucket and sliding window rate limiting library. Together these two projects cover core security infrastructure primitives: reliable log ingestion and request rate control.

Build & Run

Prerequisites

g++ with C++17 support
libssl-dev
zlib1g-dev
librdkafka-dev
AWS SDK for C++ (S3 module)
A running Kafka broker on localhost:9092
AWS credentials configured (~/.aws/credentials or environment variables)

Install Dependencies (Debian/Ubuntu)

sudo apt install g++ cmake libssl-dev zlib1g-dev librdkafka-dev

Compile

g++ -std=c++17 src/log_aggregator.cpp -o log_aggregator -laws_s3 -lrdkafka++ -lz -pthread

Run

./log_aggregator

Security Relevance

This project exercises skills directly applicable to security engineering roles:

Systems-level memory management -- manual control over buffers, compression streams, and SDK resources in C++.
Concurrent programming primitives -- mutexes, lock guards, and thread joins that mirror the patterns used in high-performance security tooling.
Security infrastructure fundamentals -- building the log pipeline that intrusion detection, SIEM, and incident response systems depend on.

License

This project is licensed under the GPL-3.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Log Aggregator for Distributed Systems

Why Build This?

Architecture

Technical Design

Threading Model

Synchronization

Kafka Integration

S3 Storage & Compression

Performance

Companion Project

Build & Run

Prerequisites

Install Dependencies (Debian/Ubuntu)

Compile

Run

Security Relevance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time Log Aggregator for Distributed Systems

Why Build This?

Architecture

Technical Design

Threading Model

Synchronization

Kafka Integration

S3 Storage & Compression

Performance

Companion Project

Build & Run

Prerequisites

Install Dependencies (Debian/Ubuntu)

Compile

Run

Security Relevance

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages