Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

📝 Abstract

In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk.

Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty—failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising:

A curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance
An offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost

Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.

🏭 Real-World Facility Schematic

The image below shows a schematic layout of a waste-sorting facility with 12 containers and a processing unit (PU), connected by conveyor belts. Containers are continuously filled from above at varying rates, with their current fill states indicated by the shaded areas. The control system must decide which container to empty next to maximize throughput while avoiding safety violations.

🖊 Info

Example render of the environment during evaluation

🚀 Getting Started

Prerequisites

Python >=3.8.0,<3.10
Conda package manager

Installation

Clone the repository:

git clone git@gitlab.com:anonymousppocl_cm1/anonymous_collisions_paper.git
cd anonymous_collisions_paper

Create and activate a conda environment:

conda create -n myenv python=3.8.8
conda activate myenv

Install dependencies:
```
pip install -r requirements.txt
```

📊 Reproducing Results

To reproduce the results from the paper, simply run:

python reproduce_results.py

This script will:

Generate CV comparison grid plots
Create emptying volume analysis plots
Generate results tables for Overleaf
Copy all results to the results_overleaf directory in the project root

You can specify a custom output directory if needed:

python reproduce_results.py --output-dir custom_results_dir

📁 Repository Structure

reproduce_results.py: Script to reproduce paper results
README_pipeline.md: Detailed pipeline for training and evaluation
results_overleaf/: Default directory for generated results and visualizations
configs/: Configuration files for experiments
- collisions/: Configuration files for collision-related experiments
  - 5mil_budget/: Configurations for PPO-CL, PPO-CL-CM and Naive PPO agents
  - ppo_baseline_multimodal_*.jsonc: Configuration files for different bunker setups
- RW_param_all_bunkers_*.csv: Random walk parameters for container filling rates
trials/RL_trials/: Contains experiment results and analysis scripts
- inference_rollouts/step2_multimodal/: Analysis scripts and visualization tools
  - plot_cv_comparision_grid.py: Generates CV comparison plots
  - plot_emptying_volume_analysis_plots.py: Analyzes emptying volume distributions
  - plot_results_table_overleaf.py: Creates results tables for publication
models_and_agents/: Implementation of RL agents and models
- press_models.py: Processing unit model
- random_walk.py: Container filling rates (stochastic random walk)
utils/: Utility functions and helper modules
- callbacks.py: Custom callbacks for training monitoring
- collisions_utils.py: Utilities for collision detection and prediction
- config_utils.py: Functions for loading and parsing configuration files
- container_utils.py: Container-related helper functions
- episode_plotting.py: Functions for plotting episode data
- inference_plotting.py: Visualization tools for inference results
- metrics.py: Performance metrics calculation
- penalty_multimodal.py: Reward function implementations
- train_custom_policies.py: Custom policy implementations
- train_plotting.py: Training visualization utilities
- train_utils.py: General training helper functions
- inference_utils/: Specialized utilities for inference
  - determine_next_action_*.py: Action determination strategies
  - inference_utils_RL.py: RL-specific inference utilities
SL_for_collision_matrix/: Trained XGBoost models for collision matrix prediction
- models_and_dataset_in_use/: Pre-trained models and datasets for different configurations
  - 7b1p/, 13b1p/: Configuration-specific models and datasets
env_one_press_multimodalrew_col.py: Environment implementation
test_collisions_parallel.py: Script for parallel testing of collision prediction models
- Runs inference across multiple seeds and thresholds
- Compares different collision avoidance strategies
create_metrics.py: Generates performance metrics and visualizations
- Creates comparison plots between different agent types
- Analyzes metrics across different thresholds and seeds
train_SL_models.py: Trains XGBoost models for collision prediction
- Handles model training for collision probability and timestep prediction
- Evaluates model performance and saves trained models
collect_collision_data_multiseed.py: Generates training data for collision models
- Simulates random walks to create synthetic collision data
- Uses parallel processing for efficient data generation

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Docs		Docs
SL_for_collision_matrix/models_and_dataset_in_use/newpeakheights_penalize_successful_actions		SL_for_collision_matrix/models_and_dataset_in_use/newpeakheights_penalize_successful_actions
configs		configs
models_and_agents		models_and_agents
results_overleaf		results_overleaf
trained_models		trained_models
trials/RL_trials/inference_rollouts		trials/RL_trials/inference_rollouts
utils		utils
README.md		README.md
README_inference_pipeline.md		README_inference_pipeline.md
Run_inference_and_analyze_all_metrics_10b1p_bvf9.bash		Run_inference_and_analyze_all_metrics_10b1p_bvf9.bash
Run_inference_and_analyze_all_metrics_10b1p_bvf9_baseline_2.bash		Run_inference_and_analyze_all_metrics_10b1p_bvf9_baseline_2.bash
Run_inference_and_analyze_all_metrics_11b1p_bvf9.bash		Run_inference_and_analyze_all_metrics_11b1p_bvf9.bash
Run_inference_and_analyze_all_metrics_11b1p_bvf9_baseline_2.bash		Run_inference_and_analyze_all_metrics_11b1p_bvf9_baseline_2.bash
Run_inference_and_analyze_all_metrics_12b1p_bvf9.bash		Run_inference_and_analyze_all_metrics_12b1p_bvf9.bash
Run_inference_and_analyze_all_metrics_12b1p_bvf9_baseline_2.bash		Run_inference_and_analyze_all_metrics_12b1p_bvf9_baseline_2.bash
Run_inference_and_analyze_all_metrics_7b1p_bvf9.bash		Run_inference_and_analyze_all_metrics_7b1p_bvf9.bash
Run_inference_and_analyze_all_metrics_7b1p_bvf9_baseline_2.bash		Run_inference_and_analyze_all_metrics_7b1p_bvf9_baseline_2.bash
Run_inference_and_analyze_all_metrics_8b1p_bvf9.bash		Run_inference_and_analyze_all_metrics_8b1p_bvf9.bash
Run_inference_and_analyze_all_metrics_8b1p_bvf9_baseline_2.bash		Run_inference_and_analyze_all_metrics_8b1p_bvf9_baseline_2.bash
Run_inference_and_analyze_all_metrics_9b1p_bvf9.bash		Run_inference_and_analyze_all_metrics_9b1p_bvf9.bash
Run_inference_and_analyze_all_metrics_9b1p_bvf9_baseline_2.bash		Run_inference_and_analyze_all_metrics_9b1p_bvf9_baseline_2.bash
collect_collision_data_multiseed.py		collect_collision_data_multiseed.py
create_metrics.py		create_metrics.py
env_one_press_multimodalrew_col.py		env_one_press_multimodalrew_col.py
example.gif		example.gif
reproduce_results.py		reproduce_results.py
requirements.txt		requirements.txt
schematic.png		schematic.png
test_collisions_parallel.py		test_collisions_parallel.py
train_SL_models.py		train_SL_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

📝 Abstract

🏭 Real-World Facility Schematic

🖊 Info

Example render of the environment during evaluation

🚀 Getting Started

Prerequisites

Installation

📊 Reproducing Results

📁 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

📝 Abstract

🏭 Real-World Facility Schematic

🖊 Info

Example render of the environment during evaluation

🚀 Getting Started

Prerequisites

Installation

📊 Reproducing Results

📁 Repository Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages