Parallel Image Processing with Dask

A high-performance parallel image processing pipeline using Dask for Shared Memory Parallelism (SMP) on multi-core systems. This implementation demonstrates efficient utilization of CPU cores for batch image processing tasks with automatic workload distribution.

🎯 Overview

Modern image processing applications require efficient handling of large image datasets within strict time constraints. This project implements automatic parallelization of image processing tasks using Dask, achieving:

1.99× speedup for I/O-bound operations (simple resize)
3.56× speedup for CPU-intensive operations (filters + transformations)
Zero failures processing 10,000+ images
72% time reduction for complex processing tasks

Features

Automatic Parallelization: Leverages all available CPU cores without manual thread management
Dual Processing Modes: Optimized schedulers for both I/O-bound and CPU-bound workloads
Batch Processing: Handles thousands of images efficiently
Performance Metrics: Built-in benchmarking and comparison tools
Validation Tools: Automated verification of processing accuracy
Error Handling: Robust error management with detailed reporting
Cross-Platform: Works on Windows, Linux, and macOS

🔧 Installation

Prerequisites

Python 3.7 or higher
pip package manager

Setup

Clone the repository

git clone https://github.com/yourusername/parallel-image-processing.git
cd parallel-image-processing

Install required dependencies

pip install -r requirements.txt

Or install manually:

pip install dask pillow numpy

Verify installation

python -c "import dask, PIL, numpy; print('All dependencies installed successfully!')"

Quick Start

Process 10,000 images in 4 simple steps:

# Step 1: Generate test dataset (10,000 images)
python dummy_image_gen.py

# Step 2: Run I/O-bound processing (basic resize)
python new.py

# Step 3: Run CPU-intensive processing (filters + enhancements)
python cpu_intensive.py

# Step 4: Verify results
python verify.py

Expected Output:

Total images found: 10000
Sequential Processing Time: 26.79 seconds
Parallel Processing Time: 13.44 seconds
Speedup Achieved: 1.99x faster
✅ All checks passed! Processing was successful.

📖 Usage

Basic Usage

Process your own images:

Place your images in a folder (e.g., my_images/)
Update the input folder in the script:
```
input_folder = "my_images"
```
Run the processing script:
```
python new.py
```

Advanced Usage

Customize processing parameters:

# In new.py or cpu_intensive.py

# Change output dimensions
img_resized = img.resize((512, 512), Image.Resampling.LANCZOS)

# Adjust number of workers
compute(*tasks, scheduler='threads', num_workers=16)

# Change output directory
output_folder = "my_output_folder"

Processing Modes

Mode 1: I/O-Bound (Fast, Simple)

python new.py

Best for: Large batches, simple transformations
Operations: Load → Resize → Save
Scheduler: Thread-based
Speedup: ~2× faster

Mode 2: CPU-Intensive (Slower, High Quality)

python cpu_intensive.py

Best for: Quality enhancement, complex filters
Operations: Load → Resize → Filters → Matrix Operations → Save
Scheduler: Process-based
Speedup: ~3.5× faster

📁 Project Structure

parallel-image-processing/
├── dummy_image_gen.py          # Test dataset generator
├── new.py                      # I/O-bound parallel processing
├── cpu_intensive.py            # CPU-intensive parallel processing
├── verify.py                   # Result validation script
├── requirements.txt            # Python dependencies
├── README.md                   # This file
│
├── images/                     # Input images (generated)
│   ├── test_0.jpg
│   ├── test_1.jpg
│   └── ...
│
├── processed_seq/              # Sequential processing output
├── processed_par/              # Parallel processing output
├── processed_seq_intensive/    # Sequential CPU-intensive output
└── processed_par_intensive/    # Parallel CPU-intensive output

📊 Performance Results

Test Environment

CPU: 28 cores
Dataset: 10,000 images (300-600px, random colors)
Output: 256×256 pixels
Platform: Windows with Python 3.13

Benchmark Results

I/O-Bound Processing (Simple Resize)

Metric	Sequential	Parallel	Improvement
Time	26.79s	13.44s	13.35s saved
Speedup	1.0×	1.99×	99% faster
Throughput	373 img/s	744 img/s	+99%
Efficiency	100%	7.1%	-

CPU-Intensive Processing (Filters + Transformations)

Metric	Sequential	Parallel	Improvement
Time	51.23s	14.38s	36.85s saved
Speedup	1.0×	3.56×	256% faster
Throughput	195 img/s	695 img/s	+256%
Efficiency	100%	12.7%	-

Speedup Comparison

I/O-Bound:      ████████████ 1.99×
CPU-Intensive:  ████████████████████ 3.56× (79% better!)

⚙️ Configuration

Adjust Processing Parameters

Target Image Size:

# Change in process_image() function
img_resized = img.resize((512, 512), Image.Resampling.LANCZOS)

Number of Workers:

# Auto-detect (recommended)
num_workers = os.cpu_count()

# Manual setting
num_workers = 16

Scheduler Type:

# For I/O-bound tasks
scheduler = 'threads'

# For CPU-bound tasks
scheduler = 'processes'

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
BMP (.bmp)
GIF (.gif)

🔬 Technical Details

Architecture

Input Layer → Task Generation → Parallel Execution → Output Layer
     ↓              ↓                    ↓                ↓
File Discovery  Dask Delayed      Scheduler        Result Aggregation

Parallelization Strategy

Lazy Task Graph: Create delayed tasks without immediate execution

tasks = [delayed(process_image)(path, output) for path in images]

Parallel Execution: Execute all tasks concurrently

results = compute(*tasks, scheduler='threads', num_workers=28)

Automatic Load Balancing: Dask distributes work across available cores

Scheduler Comparison

Feature	Thread Scheduler	Process Scheduler
Best For	I/O operations	CPU computations
Overhead	Low	High
GIL Impact	Limited by GIL	Bypasses GIL
Memory	Shared	Replicated
Setup	Simple	Requires `if __name__`

🐛 Troubleshooting

Common Issues

1. RuntimeError: freeze_support() on Windows

Problem: Process scheduler fails with "freeze_support" error

Solution: Wrap code in if __name__ == '__main__': block

if __name__ == '__main__':
    main()

2. ModuleNotFoundError: No module named 'PIL'

Problem: Pillow not installed

Solution:

pip install pillow

3. Low Speedup (<1.5×)

Problem: Task is I/O-bound, disk is bottleneck

Solution:

Use faster storage (SSD/NVMe)
Reduce number of workers to avoid I/O contention
Consider CPU-intensive processing mode

4. High Memory Usage

Problem: Processing large images exhausts RAM

Solution:

Process in smaller batches
Reduce number of workers
Use thread scheduler (shared memory)

Performance Tips

✅ Use SSD storage for faster I/O
✅ Match workers to cores (don't over-provision)
✅ Choose correct scheduler for workload type
✅ Monitor resource usage during execution
✅ Profile bottlenecks before optimizing

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
images		images
processed_par		processed_par
processed_par_intensive		processed_par_intensive
processed_processes_intensive		processed_processes_intensive
processed_seq		processed_seq
processed_seq_intensive		processed_seq_intensive
processed_threads_intensive		processed_threads_intensive
.gitignore		.gitignore
README.md		README.md
cpu_intensive.py		cpu_intensive.py
dummy_image_gen.py		dummy_image_gen.py
new.py		new.py
verify.py		verify.py

akdswordguy/Parrallel_Dask

Folders and files

Latest commit

History

Repository files navigation

Parallel Image Processing with Dask

🎯 Overview

Features

📋 Table of Contents

🔧 Installation

Prerequisites

Setup

Quick Start

📖 Usage

Basic Usage

Advanced Usage

Processing Modes

Mode 1: I/O-Bound (Fast, Simple)

Mode 2: CPU-Intensive (Slower, High Quality)

📁 Project Structure

📊 Performance Results

Test Environment

Benchmark Results

I/O-Bound Processing (Simple Resize)

CPU-Intensive Processing (Filters + Transformations)

Speedup Comparison

⚙️ Configuration

Adjust Processing Parameters

Supported Image Formats

🔬 Technical Details

Architecture

Parallelization Strategy

Scheduler Comparison

🐛 Troubleshooting

Common Issues

Performance Tips

🔗 Related Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages