Skip to content

Pendu/collisions_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

📝 Abstract

In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk.

Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty—failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising:

  1. A curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance
  2. An offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost

Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.

🏭 Real-World Facility Schematic

The image below shows a schematic layout of a waste-sorting facility with 12 containers and a processing unit (PU), connected by conveyor belts. Containers are continuously filled from above at varying rates, with their current fill states indicated by the shaded areas. The control system must decide which container to empty next to maximize throughput while avoiding safety violations.

🖊 Info

Example render of the environment during evaluation

🚀 Getting Started

Prerequisites

  • Python >=3.8.0,<3.10
  • Conda package manager

Installation

  1. Clone the repository:

    git clone git@gitlab.com:anonymousppocl_cm1/anonymous_collisions_paper.git
    cd anonymous_collisions_paper
  2. Create and activate a conda environment:

    conda create -n myenv python=3.8.8
    conda activate myenv
  3. Install dependencies:

    pip install -r requirements.txt

📊 Reproducing Results

To reproduce the results from the paper, simply run:

python reproduce_results.py

This script will:

  1. Generate CV comparison grid plots
  2. Create emptying volume analysis plots
  3. Generate results tables for Overleaf
  4. Copy all results to the results_overleaf directory in the project root

You can specify a custom output directory if needed:

python reproduce_results.py --output-dir custom_results_dir

📁 Repository Structure

  • reproduce_results.py: Script to reproduce paper results

  • README_pipeline.md: Detailed pipeline for training and evaluation

  • results_overleaf/: Default directory for generated results and visualizations

  • configs/: Configuration files for experiments

    • collisions/: Configuration files for collision-related experiments
      • 5mil_budget/: Configurations for PPO-CL, PPO-CL-CM and Naive PPO agents
      • ppo_baseline_multimodal_*.jsonc: Configuration files for different bunker setups
    • RW_param_all_bunkers_*.csv: Random walk parameters for container filling rates
  • trials/RL_trials/: Contains experiment results and analysis scripts

    • inference_rollouts/step2_multimodal/: Analysis scripts and visualization tools
      • plot_cv_comparision_grid.py: Generates CV comparison plots
      • plot_emptying_volume_analysis_plots.py: Analyzes emptying volume distributions
      • plot_results_table_overleaf.py: Creates results tables for publication
  • models_and_agents/: Implementation of RL agents and models

    • press_models.py: Processing unit model
    • random_walk.py: Container filling rates (stochastic random walk)
  • utils/: Utility functions and helper modules

    • callbacks.py: Custom callbacks for training monitoring
    • collisions_utils.py: Utilities for collision detection and prediction
    • config_utils.py: Functions for loading and parsing configuration files
    • container_utils.py: Container-related helper functions
    • episode_plotting.py: Functions for plotting episode data
    • inference_plotting.py: Visualization tools for inference results
    • metrics.py: Performance metrics calculation
    • penalty_multimodal.py: Reward function implementations
    • train_custom_policies.py: Custom policy implementations
    • train_plotting.py: Training visualization utilities
    • train_utils.py: General training helper functions
    • inference_utils/: Specialized utilities for inference
      • determine_next_action_*.py: Action determination strategies
      • inference_utils_RL.py: RL-specific inference utilities
  • SL_for_collision_matrix/: Trained XGBoost models for collision matrix prediction

    • models_and_dataset_in_use/: Pre-trained models and datasets for different configurations
      • 7b1p/, 13b1p/: Configuration-specific models and datasets
  • env_one_press_multimodalrew_col.py: Environment implementation

  • test_collisions_parallel.py: Script for parallel testing of collision prediction models

    • Runs inference across multiple seeds and thresholds
    • Compares different collision avoidance strategies
  • create_metrics.py: Generates performance metrics and visualizations

    • Creates comparison plots between different agent types
    • Analyzes metrics across different thresholds and seeds
  • train_SL_models.py: Trains XGBoost models for collision prediction

    • Handles model training for collision probability and timestep prediction
    • Evaluates model performance and saves trained models
  • collect_collision_data_multiseed.py: Generates training data for collision models

    • Simulates random walks to create synthetic collision data
    • Uses parallel processing for efficient data generation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors