In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk.
Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty—failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising:
- A curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance
- An offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost
Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.
The image below shows a schematic layout of a waste-sorting facility with 12 containers and a processing unit (PU), connected by conveyor belts. Containers are continuously filled from above at varying rates, with their current fill states indicated by the shaded areas. The control system must decide which container to empty next to maximize throughput while avoiding safety violations.
- Python >=3.8.0,<3.10
- Conda package manager
-
Clone the repository:
git clone git@gitlab.com:anonymousppocl_cm1/anonymous_collisions_paper.git cd anonymous_collisions_paper -
Create and activate a conda environment:
conda create -n myenv python=3.8.8 conda activate myenv
-
Install dependencies:
pip install -r requirements.txt
To reproduce the results from the paper, simply run:
python reproduce_results.pyThis script will:
- Generate CV comparison grid plots
- Create emptying volume analysis plots
- Generate results tables for Overleaf
- Copy all results to the
results_overleafdirectory in the project root
You can specify a custom output directory if needed:
python reproduce_results.py --output-dir custom_results_dir-
reproduce_results.py: Script to reproduce paper results -
README_pipeline.md: Detailed pipeline for training and evaluation -
results_overleaf/: Default directory for generated results and visualizations -
configs/: Configuration files for experimentscollisions/: Configuration files for collision-related experiments5mil_budget/: Configurations for PPO-CL, PPO-CL-CM and Naive PPO agentsppo_baseline_multimodal_*.jsonc: Configuration files for different bunker setups
RW_param_all_bunkers_*.csv: Random walk parameters for container filling rates
-
trials/RL_trials/: Contains experiment results and analysis scriptsinference_rollouts/step2_multimodal/: Analysis scripts and visualization toolsplot_cv_comparision_grid.py: Generates CV comparison plotsplot_emptying_volume_analysis_plots.py: Analyzes emptying volume distributionsplot_results_table_overleaf.py: Creates results tables for publication
-
models_and_agents/: Implementation of RL agents and modelspress_models.py: Processing unit modelrandom_walk.py: Container filling rates (stochastic random walk)
-
utils/: Utility functions and helper modulescallbacks.py: Custom callbacks for training monitoringcollisions_utils.py: Utilities for collision detection and predictionconfig_utils.py: Functions for loading and parsing configuration filescontainer_utils.py: Container-related helper functionsepisode_plotting.py: Functions for plotting episode datainference_plotting.py: Visualization tools for inference resultsmetrics.py: Performance metrics calculationpenalty_multimodal.py: Reward function implementationstrain_custom_policies.py: Custom policy implementationstrain_plotting.py: Training visualization utilitiestrain_utils.py: General training helper functionsinference_utils/: Specialized utilities for inferencedetermine_next_action_*.py: Action determination strategiesinference_utils_RL.py: RL-specific inference utilities
-
SL_for_collision_matrix/: Trained XGBoost models for collision matrix predictionmodels_and_dataset_in_use/: Pre-trained models and datasets for different configurations7b1p/,13b1p/: Configuration-specific models and datasets
-
env_one_press_multimodalrew_col.py: Environment implementation -
test_collisions_parallel.py: Script for parallel testing of collision prediction models- Runs inference across multiple seeds and thresholds
- Compares different collision avoidance strategies
-
create_metrics.py: Generates performance metrics and visualizations- Creates comparison plots between different agent types
- Analyzes metrics across different thresholds and seeds
-
train_SL_models.py: Trains XGBoost models for collision prediction- Handles model training for collision probability and timestep prediction
- Evaluates model performance and saves trained models
-
collect_collision_data_multiseed.py: Generates training data for collision models- Simulates random walks to create synthetic collision data
- Uses parallel processing for efficient data generation

