A high-performance parallel image processing pipeline using Dask for Shared Memory Parallelism (SMP) on multi-core systems. This implementation demonstrates efficient utilization of CPU cores for batch image processing tasks with automatic workload distribution.
Modern image processing applications require efficient handling of large image datasets within strict time constraints. This project implements automatic parallelization of image processing tasks using Dask, achieving:
- 1.99Γ speedup for I/O-bound operations (simple resize)
- 3.56Γ speedup for CPU-intensive operations (filters + transformations)
- Zero failures processing 10,000+ images
- 72% time reduction for complex processing tasks
- Automatic Parallelization: Leverages all available CPU cores without manual thread management
- Dual Processing Modes: Optimized schedulers for both I/O-bound and CPU-bound workloads
- Batch Processing: Handles thousands of images efficiently
- Performance Metrics: Built-in benchmarking and comparison tools
- Validation Tools: Automated verification of processing accuracy
- Error Handling: Robust error management with detailed reporting
- Cross-Platform: Works on Windows, Linux, and macOS
- Installation
- Quick Start
- Usage
- Project Structure
- Performance Results
- Configuration
- Technical Details
- Troubleshooting
- Contributing
- License
- Python 3.7 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/yourusername/parallel-image-processing.git cd parallel-image-processing -
Install required dependencies
pip install -r requirements.txt
Or install manually:
pip install dask pillow numpy
-
Verify installation
python -c "import dask, PIL, numpy; print('All dependencies installed successfully!')"
Process 10,000 images in 4 simple steps:
# Step 1: Generate test dataset (10,000 images)
python dummy_image_gen.py
# Step 2: Run I/O-bound processing (basic resize)
python new.py
# Step 3: Run CPU-intensive processing (filters + enhancements)
python cpu_intensive.py
# Step 4: Verify results
python verify.pyExpected Output:
Total images found: 10000
Sequential Processing Time: 26.79 seconds
Parallel Processing Time: 13.44 seconds
Speedup Achieved: 1.99x faster
β
All checks passed! Processing was successful.
Process your own images:
- Place your images in a folder (e.g.,
my_images/) - Update the input folder in the script:
input_folder = "my_images"
- Run the processing script:
python new.py
Customize processing parameters:
# In new.py or cpu_intensive.py
# Change output dimensions
img_resized = img.resize((512, 512), Image.Resampling.LANCZOS)
# Adjust number of workers
compute(*tasks, scheduler='threads', num_workers=16)
# Change output directory
output_folder = "my_output_folder"python new.py- Best for: Large batches, simple transformations
- Operations: Load β Resize β Save
- Scheduler: Thread-based
- Speedup: ~2Γ faster
python cpu_intensive.py- Best for: Quality enhancement, complex filters
- Operations: Load β Resize β Filters β Matrix Operations β Save
- Scheduler: Process-based
- Speedup: ~3.5Γ faster
parallel-image-processing/
βββ dummy_image_gen.py # Test dataset generator
βββ new.py # I/O-bound parallel processing
βββ cpu_intensive.py # CPU-intensive parallel processing
βββ verify.py # Result validation script
βββ requirements.txt # Python dependencies
βββ README.md # This file
β
βββ images/ # Input images (generated)
β βββ test_0.jpg
β βββ test_1.jpg
β βββ ...
β
βββ processed_seq/ # Sequential processing output
βββ processed_par/ # Parallel processing output
βββ processed_seq_intensive/ # Sequential CPU-intensive output
βββ processed_par_intensive/ # Parallel CPU-intensive output
- CPU: 28 cores
- Dataset: 10,000 images (300-600px, random colors)
- Output: 256Γ256 pixels
- Platform: Windows with Python 3.13
| Metric | Sequential | Parallel | Improvement |
|---|---|---|---|
| Time | 26.79s | 13.44s | 13.35s saved |
| Speedup | 1.0Γ | 1.99Γ | 99% faster |
| Throughput | 373 img/s | 744 img/s | +99% |
| Efficiency | 100% | 7.1% | - |
| Metric | Sequential | Parallel | Improvement |
|---|---|---|---|
| Time | 51.23s | 14.38s | 36.85s saved |
| Speedup | 1.0Γ | 3.56Γ | 256% faster |
| Throughput | 195 img/s | 695 img/s | +256% |
| Efficiency | 100% | 12.7% | - |
I/O-Bound: ββββββββββββ 1.99Γ
CPU-Intensive: ββββββββββββββββββββ 3.56Γ (79% better!)
Target Image Size:
# Change in process_image() function
img_resized = img.resize((512, 512), Image.Resampling.LANCZOS)Number of Workers:
# Auto-detect (recommended)
num_workers = os.cpu_count()
# Manual setting
num_workers = 16Scheduler Type:
# For I/O-bound tasks
scheduler = 'threads'
# For CPU-bound tasks
scheduler = 'processes'- JPEG (
.jpg,.jpeg) - PNG (
.png) - BMP (
.bmp) - GIF (
.gif)
Input Layer β Task Generation β Parallel Execution β Output Layer
β β β β
File Discovery Dask Delayed Scheduler Result Aggregation
-
Lazy Task Graph: Create delayed tasks without immediate execution
tasks = [delayed(process_image)(path, output) for path in images]
-
Parallel Execution: Execute all tasks concurrently
results = compute(*tasks, scheduler='threads', num_workers=28)
-
Automatic Load Balancing: Dask distributes work across available cores
| Feature | Thread Scheduler | Process Scheduler |
|---|---|---|
| Best For | I/O operations | CPU computations |
| Overhead | Low | High |
| GIL Impact | Limited by GIL | Bypasses GIL |
| Memory | Shared | Replicated |
| Setup | Simple | Requires if __name__ |
1. RuntimeError: freeze_support() on Windows
Problem: Process scheduler fails with "freeze_support" error
Solution: Wrap code in if __name__ == '__main__': block
if __name__ == '__main__':
main()2. ModuleNotFoundError: No module named 'PIL'
Problem: Pillow not installed
Solution:
pip install pillow3. Low Speedup (<1.5Γ)
Problem: Task is I/O-bound, disk is bottleneck
Solution:
- Use faster storage (SSD/NVMe)
- Reduce number of workers to avoid I/O contention
- Consider CPU-intensive processing mode
4. High Memory Usage
Problem: Processing large images exhausts RAM
Solution:
- Process in smaller batches
- Reduce number of workers
- Use thread scheduler (shared memory)
β
Use SSD storage for faster I/O
β
Match workers to cores (don't over-provision)
β
Choose correct scheduler for workload type
β
Monitor resource usage during execution
β
Profile bottlenecks before optimizing