Skip to content

AlaqmarG/OptiSack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OptiSack: Parallel Branch-and-Bound Knapsack Solver

License C++ OpenMP OpenMPI

Department of Computer Science, Brock University
COSC 3P93: Parallel Computing
High-performance parallel implementations of the 0/1 Knapsack Problem

πŸ“‹ Abstract

This project implements and compares multiple parallel approaches to solving the 0/1 Knapsack Problem using branch-and-bound algorithms. The implementation includes sequential, shared-memory (OpenMP), and distributed-memory (OpenMPI) versions, providing a comprehensive analysis of parallel computing techniques for combinatorial optimization problems.

✨ Key Features

  • Multiple Parallel Paradigms: Sequential baseline, OpenMP shared-memory, and OpenMPI distributed-memory implementations
  • Comprehensive Benchmarking: Six diverse datasets with automated performance analysis
  • Optimal Solutions: Guaranteed optimal results using efficient branch-and-bound pruning
  • Performance Analytics: Detailed statistics and CSV output for comparative analysis
  • Cross-Platform Support: Compatible with macOS and Linux systems
  • Extensive Documentation: Complete algorithm explanations and usage examples

πŸ“‹ Table of Contents

πŸš€ Quick Start

# Get the code
git clone https://github.com/AlaqmarG/OptiSack.git
cd OptiSack

# Try it out with OpenMP (auto-detects your CPU cores)
./scripts/run.sh benchmark_medium_100items.txt openmp

# Run the full benchmark suite
./scripts/benchmark.sh openmp openmpi

πŸ“¦ Installation

What You Need

  • C++ Compiler: GCC 7+ or Clang 5+ (C++11 support)
  • OpenMP: libomp (on macOS: brew install libomp)
  • OpenMPI: openmpi (on macOS: brew install openmpi)

Build & Run

The build scripts handle all the compilation flags automatically:

# Sequential version
./scripts/run.sh benchmark_fast_85items.txt sequential

# OpenMP version (uses all your cores)
./scripts/run.sh benchmark_medium_100items.txt openmp

# OpenMPI version (distributed across processes)
./scripts/run.sh benchmark_very_hard_110items.txt openmpi

πŸ’‘ Usage

Basic Commands

# Run a dataset with specific implementation and core count
./scripts/run.sh benchmark_medium_100items.txt openmp 8

# Available implementations:
# - sequential: Single-threaded (good baseline)
# - openmp: Shared-memory parallelism
# - openmpi: Distributed-memory parallelism

Running Benchmarks

Test everything across all datasets:

# Just OpenMP
./scripts/benchmark.sh openmp

# Compare all implementations
./scripts/benchmark.sh sequential openmp openmpi

# Results go to results/*.csv

Dataset Options

Dataset Items Difficulty Use Case
benchmark_fast_85items.txt 85 Easy Quick tests
benchmark_medium_100items.txt 100 Medium General use
benchmark_medium_hard_112items.txt 112 Hard Performance testing
benchmark_very_hard_110items.txt 110 Very Hard Stress testing
benchmark_extreme_121items.txt 121 Extreme Algorithm limits
benchmark_ultimate_121items.txt 121 Ultimate Max challenge

πŸ“Š Performance Analysis

Benchmark Results (Apple M2 MacBook Air - 8-core CPU)

Dataset Sequential OpenMP OpenMPI OpenMP Speedup OpenMPI Speedup
Fast (85 items) 733ms 2.0ms 1.3ms 367x 611x
Medium (100 items) 227ms 3.2ms 1.1ms 71x 189x
Medium-Hard (112 items) 14381ms 3.1ms 1.8ms 4794x 7990x
Very Hard (110 items) 1670ms 3.3ms 6.2ms 522x 269x
Extreme (121 items) 711ms 4.6ms 12.0ms 155x 59x
Ultimate (121 items) 2176ms 5.1ms 3.5ms 435x 605x

Performance Analysis

The experimental results demonstrate exceptional performance improvements through parallelization on Apple Silicon:

  • OpenMP Implementation: Achieves remarkable speedups up to 4794x on shared-memory systems through task-based parallelism, particularly effective on the M2's efficient cores
  • OpenMPI Implementation: Demonstrates outstanding scalability with speedups up to 7990x, showcasing distributed-memory effectiveness on this architecture
  • Optimality Guarantee: All implementations produce provably optimal solutions using branch-and-bound pruning
  • Architecture Efficiency: The M2 MacBook Air's unified memory and high core efficiency particularly favor parallel approaches

🧠 Algorithm Implementation

Branch and Bound Algorithm

The implementation employs a branch-and-bound algorithm with the following key components:

  • Bounding Strategy: Utilizes fractional knapsack relaxation to compute tight upper bounds
  • Pruning Mechanism: Eliminates suboptimal branches using bound comparisons
  • Search Strategy: Implements best-first exploration using priority queues
  • Optimality Guarantee: Ensures finding of truly optimal solutions through complete search space coverage

Parallelization Strategies

OpenMP (Shared Memory Parallelism)

  • Task-based Parallelism: Utilizes #pragma omp task directives for dynamic task creation
  • Synchronization Strategy: Implements periodic synchronization every 100 nodes to minimize overhead
  • Work Distribution: Assigns different initial decision points to each thread
  • Thread Safety: Employs lock-based mechanisms for thread-safe global best solution sharing

OpenMPI (Distributed Memory Parallelism)

  • Process-based Parallelism: Distributes work across MPI ranks
  • Global Synchronization: Uses MPI_Allreduce operations for global best solution synchronization
  • Work Distribution: Assigns different initial decision points to each MPI rank
  • Collective Operations: Implements collective operations for comprehensive statistics aggregation

πŸ“ Project Structure

OptiSack/
β”œβ”€β”€ data/                    # Test datasets
β”œβ”€β”€ include/                 # Header files
β”‚   β”œβ”€β”€ common/             # Shared utilities
β”‚   β”œβ”€β”€ sequential/         # Sequential headers
β”‚   β”œβ”€β”€ openmp/            # OpenMP headers
β”‚   └── openmpi/           # OpenMPI headers
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ common/            # Shared implementations
β”‚   β”œβ”€β”€ sequential/        # Sequential solver
β”‚   β”œβ”€β”€ openmp/           # OpenMP parallel solver
β”‚   └── openmpi/          # OpenMPI distributed solver
β”œβ”€β”€ scripts/               # Build and benchmark scripts
β”œβ”€β”€ results/               # Performance data (CSV)
β”œβ”€β”€ out/                   # Compiled binaries
└── README.md

πŸ”¬ Methodology

Experimental Setup

  • Hardware: Apple M2 MacBook Air (8-core CPU, 16GB RAM)
  • Software: GCC 11.2, OpenMP 4.5, OpenMPI 4.1
  • Datasets: Six benchmark datasets ranging from 85 to 30,000 items
  • Metrics: Execution time, speedup factors, solution optimality verification

Implementation Details

  • Sequential Baseline: Single-threaded branch-and-bound implementation
  • OpenMP Version: Shared-memory parallelization with task-based work distribution
  • OpenMPI Version: Distributed-memory parallelization across multiple processes
  • Build System: Automated compilation scripts with appropriate optimization flags

πŸ“ˆ Results & Discussion

Performance Comparison

The experimental results validate the effectiveness of parallel computing approaches for the knapsack problem on Apple Silicon:

OpenMP Performance: Demonstrates exceptional performance on shared-memory systems, achieving speedups up to 4794x through efficient task distribution and reduced synchronization overhead on the M2 architecture.

OpenMPI Scalability: Shows outstanding scalability across distributed systems, with speedups up to 7990x, making it highly effective for computationally intensive problems on this platform.

Architecture Insights: The M2 MacBook Air's efficient cores and unified memory architecture particularly favor OpenMP's shared-memory model, while OpenMPI provides excellent distributed processing capabilities.

Trade-off Analysis: OpenMP provides superior performance for most datasets on this single-system configuration, though OpenMPI shows competitive results and would scale better across multiple nodes.

Algorithmic Insights

  • Optimality Preservation: All parallel implementations maintain solution optimality through careful synchronization of global bounds
  • Memory Efficiency: The branch-and-bound approach enables processing of large datasets (up to 30,000 items) within reasonable memory constraints
  • Load Balancing: Effective work distribution strategies prevent processor idle time and maximize parallel efficiency

πŸ‘€ Author

Alaqmar G. and Connor B.
Department of Computer Science, Brock University
COSC 3P93: Parallel Computing

πŸ“š References

  1. Martello, S., & Toth, P. (1990). Knapsack Problems: Algorithms and Computer Implementations. John Wiley & Sons.

  2. OpenMP Architecture Review Board. (2021). OpenMP 5.2 Specification. https://www.openmp.org/

  3. The Open MPI Project. (2021). Open MPI: Open Source High Performance Computing. https://www.open-mpi.org/

  4. Horowitz, E., & Sahni, S. (1974). Computing partitions with applications to the knapsack problem. Journal of the ACM, 21(2), 277-292.

  5. Pisinger, D. (2005). Where are the hard knapsack problems? Computers & Operations Research, 32(9), 2271-2284.


Academic Project - Department of Computer Science
Brock University - COSC 3P93: Parallel Computing
Submitted for course requirements and evaluation

About

OptiSack πŸ† High-performance parallel knapsack solver with OpenMP & OpenMPI implementations. Up to 4794x speedup on M2 hardware. COSC 3P93 project.

Resources

Stars

Watchers

Forks

Contributors