MapReduce (learning project)

A minimal MapReduce implementation (master + workers) used for learning and demos. The repo contains a read-only web dashboard for monitoring, gRPC-based master/worker communication, and example scripts for demos and benchmarks.

For a deeper narrative and design notes, see Published Article.

Quick demo (recommended)

Prerequisites

Docker & Docker Compose

Start the demo (master + 3 workers + dashboard)

docker-compose -f docker-compose.benchmark.yml up --build

Open the dashboard at: http://localhost:5000

Stop the demo

docker-compose -f docker-compose.benchmark.yml down

Automated benchmark

Run automated benchmarks (no dashboard):

python3 examples/benchmark.py

This runs three tests: 1-worker baseline, 3-worker parallel run, and a failure-recovery test. Results are printed and saved to benchmark_results.txt.

Project layout (short)

protos/ — gRPC .proto definitions
src/master/ — master server, scheduler, and state management
src/worker/ — worker client and task executor
web/ — Flask-based read-only dashboard and templates
examples/ — demo and benchmark scripts
docker-compose*.yml — compose files for demo, benchmark, and DAG runs
docs/WRITEUP.md — detailed writeup and design notes

How it works (TL;DR)

Master splits the input into map tasks.
Workers register and request tasks from the master via gRPC.
Mappers write partitioned intermediate files (one file per reducer partition).
Reducers fetch intermediate files for their partition (the shuffle) and produce final output.
The master exposes /status which the dashboard polls for live updates.

Key points

The web dashboard is read-only and monitors the master's /status endpoint.
There is no master failover implemented (master is a single point of failure).
Tasks use sequence numbers and completion tracking to tolerate duplicate reports from slow or restarted workers.

Development (without Docker)

Install dependencies and generate protobufs

pip install -r requirements.txt
./scripts/generate_proto.sh

Start master

python -m src.master.server --input data/input/sample.txt

Start a worker (in another terminal)

python -m src.worker.client worker1

Useful commands

Demo: docker-compose -f docker-compose.benchmark.yml up --build
Stop demo: docker-compose -f docker-compose.benchmark.yml down
Run benchmarks: python3 examples/benchmark.py
Regenerate protobufs: ./scripts/generate_proto.sh

Simple manual run (no benchmark)

Follow these steps to run the master and workers manually in separate terminals — useful for demoing and stopping individual workers by closing their terminal.

Install dependencies

pip install -r requirements.txt

Generate Protobuf code

chmod +x scripts/generate_proto.sh
./scripts/generate_proto.sh

Start Master (new terminal)

python -m src.master.server

Start Workers (each in its own terminal; you can stop any worker by closing that terminal)

python -m src.worker.client worker1
python -m src.worker.client worker2
python -m src.worker.client worker3

Simulate Worker Failure (unstable worker)

Start a worker that purposely fails after completing N tasks (example below starts one that fails after 2 tasks):

python -m src.worker.client worker_unstable --fail-after 2
# This worker will exit/fail after finishing 2 tasks; watch master logs for reassignment

Notes

Stopping a worker terminal (or killing the process) will simulate a crash and the master will detect failure via heartbeats and reassign tasks.
The examples/benchmark.py script automates bringing up/down workers for repeatable tests; use the manual steps above for ad-hoc experimentation.

Notes & limitations

Master failover is not implemented — the master is a single point of failure.
The dashboard is monitoring-only and does not expose control endpoints.
This project is intended for learning and small-scale demos, not production use.

Troubleshooting

Workers not connecting

# Check if master is running
docker-compose ps master

# Restart the cluster
./scripts/run_cluster.sh restart

No tasks being processed

# Check master logs for task creation
docker-compose logs master | grep "Created"

# Verify workers are registered
docker-compose logs master | grep "registered"

Protobuf import errors

# Regenerate protobuf code
./scripts/generate_proto.sh

Key Files

protos/mapreduce.proto: gRPC service definitions
src/master/server.py: Master coordination logic
src/master/monitor.py: Fault detection mechanism
src/worker/client.py: Worker implementation
web/app.py: Flask web dashboard backend
web/templates/index.html: Dashboard UI
docker-compose.yml: Basic multi-container orchestration
docker-compose.benchmark.yml: Full demo with web dashboard

Metrics Tracked

Workers registered/failed
Tasks pending/running/completed/failed
Tasks per worker
Heartbeat status
Task reassignment on failure

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
app		app
config		config
data/input		data/input
examples		examples
protos		protos
scripts		scripts
src		src
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Process_Document.md		Process_Document.md
README.md		README.md
docker-compose.benchmark.yml		docker-compose.benchmark.yml
docker-compose.dag.yml		docker-compose.dag.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapReduce (learning project)

Quick demo (recommended)

Automated benchmark

Project layout (short)

How it works (TL;DR)

Development (without Docker)

Useful commands

Simple manual run (no benchmark)

Notes & limitations

Troubleshooting

Workers not connecting

No tasks being processed

Protobuf import errors

Key Files

Metrics Tracked

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MapReduce (learning project)

Quick demo (recommended)

Automated benchmark

Project layout (short)

How it works (TL;DR)

Development (without Docker)

Useful commands

Simple manual run (no benchmark)

Notes & limitations

Troubleshooting

Workers not connecting

No tasks being processed

Protobuf import errors

Key Files

Metrics Tracked

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages