Skip to content

Mananshah237/Log-Aggregator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Real-Time Log Aggregator for Distributed Systems

A high-throughput C++17 log aggregation engine that collects, streams, and stores logs from distributed nodes. Ten worker threads simulate independent distributed nodes, each generating 100,000 log entries for a total of 1 million events per run. Logs are streamed in real time to Apache Kafka via librdkafka, then batch-compressed with zlib and uploaded to AWS S3 -- providing an end-to-end pipeline from log generation to durable, cost-efficient cloud storage.

Why Build This?

Security monitoring depends on reliable, low-latency log pipelines. Intrusion detection systems, SIEM platforms, and forensic analysis tools are only as good as the data feeding them. Dropped or delayed logs create blind spots that attackers exploit. This project builds that foundational infrastructure from scratch in C++, exercising the same concurrency, compression, and cloud-integration patterns found in production security tooling.

Architecture

+---------------------+        +--------------------+        +------------------+        +-----------+
| Simulated Nodes (10)|  --->  | Thread Pool        |  --->  | Mutex-Protected  |  --->  | Kafka     |
| 100K logs each      |        | std::thread x 10   |        | Shared Vector    |        | Producer  |
+---------------------+        +--------------------+        +------------------+        +-----------+
                                                                      |
                                                                      v
                                                              +------------------+
                                                              | zlib Compression |
                                                              | (DEFLATE)        |
                                                              +------------------+
                                                                      |
                                                                      v
                                                              +------------------+
                                                              | AWS S3 Upload    |
                                                              | (PutObject)      |
                                                              +------------------+

Flow: Worker threads produce logs concurrently into a mutex-protected shared vector and fire async messages to Kafka. After all threads are joined, the main thread compresses the aggregated logs and performs a single batch upload to S3.

Technical Design

Threading Model

  • 10 worker threads spawned via std::thread, each representing a simulated distributed node.
  • Master-worker pattern: the main thread creates the pool, joins all workers, then handles the S3 upload.
  • Each worker generates 100,000 log entries (1 million total across all workers).

Synchronization

  • A shared std::vector<std::string> serves as the central log buffer.
  • Access is protected by a std::mutex using std::lock_guard for RAII-based automatic lock management.
  • Producer-consumer pattern: worker threads produce log entries into the shared buffer; the S3 upload path consumes the aggregated result after all producers complete.

Kafka Integration

  • Uses librdkafka C++ bindings (RdKafka::Producer).
  • Bootstrap servers configured at localhost:9092.
  • Topic: "logs" with automatic partitioning (RD_KAFKA_PARTITION_UA).
  • Produce calls are async and non-blocking, issued outside the critical section to avoid holding the mutex during I/O.
  • Fire-and-forget delivery model for maximum throughput.

S3 Storage & Compression

  • AWS SDK for C++ (Aws::S3::S3Client) handles cloud storage.
  • Logs are compressed using zlib DEFLATE at Z_DEFAULT_COMPRESSION (level 6) before upload.
  • A single PutObject call uploads the entire batch of concatenated, compressed logs after all threads have completed.

Performance

Metric Value
Total throughput ~1,000,000 log events per run
Concurrent streams 10
Kafka produce latency < 100 ms
Compression savings ~25% storage cost reduction

Companion Project

Rate_Limiter -- Token bucket and sliding window rate limiting library. Together these two projects cover core security infrastructure primitives: reliable log ingestion and request rate control.

Build & Run

Prerequisites

  • g++ with C++17 support
  • libssl-dev
  • zlib1g-dev
  • librdkafka-dev
  • AWS SDK for C++ (S3 module)
  • A running Kafka broker on localhost:9092
  • AWS credentials configured (~/.aws/credentials or environment variables)

Install Dependencies (Debian/Ubuntu)

sudo apt install g++ cmake libssl-dev zlib1g-dev librdkafka-dev

Compile

g++ -std=c++17 src/log_aggregator.cpp -o log_aggregator -laws_s3 -lrdkafka++ -lz -pthread

Run

./log_aggregator

Security Relevance

This project exercises skills directly applicable to security engineering roles:

  • Systems-level memory management -- manual control over buffers, compression streams, and SDK resources in C++.
  • Concurrent programming primitives -- mutexes, lock guards, and thread joins that mirror the patterns used in high-performance security tooling.
  • Security infrastructure fundamentals -- building the log pipeline that intrusion detection, SIEM, and incident response systems depend on.

License

This project is licensed under the GPL-3.0 License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors