Skip to content

eelisee/bandit_playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Reproducible Framework for Fair Evaluation of Variance-Aware Bandit Algorithms

This repository contains the implementation of various multi-armed bandit algorithms and a dashboard for visualizing their performance. The goal is to compare the effectiveness of different algorithms in maximizing rewards and minimizing regret over time.

Version Information

This project uses the branch v1.0-SKILL2025 for the official SKILL 2025 submission. This documentation and code refers to the corresponding branch of this repository.

Algorithms and Tuning Parameters

The following algorithms are implemented, each with its own set of tuning parameters:

  • ETC (Explore-then-Commit): Explores all available arms for a certain number of rounds before committing to the arm with the highest estimated reward.

    • Tuning Parameters: exploration_rounds
    • Scenarios:
      • exploration_rounds: 10
      • exploration_rounds: 100
      • exploration_rounds: 1000
      • exploration_rounds: 10000
      • exploration_rounds: 100000
  • Epsilon-Greedy: Balances exploration and exploitation by choosing a random action with probability epsilon and the action with the highest estimated reward with probability (1 - \epsilon).

    • Tuning Parameters: epsilon
    • Scenarios:
      • epsilon: 0.5
      • epsilon: 0.1
      • epsilon: 0.01
      • epsilon: 0.05
      • epsilon: 0.005
  • UCB (Upper Confidence Bound): Selects the arm with the highest upper confidence bound to balance exploration and exploitation.

    • Tuning Parameters: None
  • UCB-Tuned: Adjusts the confidence bound by considering the variance of the rewards.

    • Tuning Parameters: None
  • UCB-V: Incorporates variance estimates into the upper confidence bounds.

    • Tuning Parameters: theta, c, b
    • Scenarios:
      • theta: 1, c: 1, b: 1
  • PAC-UCB: Guarantees with high probability that the regret is close to the optimal policy.

    • Tuning Parameters: c, b, q, beta
    • Scenarios:
      • c: 1, b: 1, q: 1.3, beta: 0.05
  • UCB-Improved: Enhances UCB with more sophisticated exploration strategies.

    • Tuning Parameters: delta
    • Scenarios:
      • delta: 1
  • EUCBV (Efficient-UCB with Variance): Uses empirical estimates of variance to adjust the upper confidence bounds.

    • Tuning Parameters: rho
    • Scenarios:
      • rho: 0.5

Bandit Model

The bandit model used in this repository focuses on a multi-armed bandit problem with Bernoulli-distributed arms. The arms are set with the reward probabilities for each arm of $[0.8, 0.89, 0.895, 0.9]$ and can be chosen for two- or three-armed scenarios as well as all possible permutations

Each algorithm is run for 100 rounds, and the results are stored in separate directories for different time steps. Additionally, there is a 'results_average' file for each algorithm, providing the average values for each time step based on 100 samples.

Plots in the Dashboard

Visualizations and dashboards were created using Plotly and Dash. There is a dashboard available with the following

  1. Average Total Reward Over Time: Displays how effectively each algorithm maximizes rewards over time.
  2. Average Regret Over Time: Shows how well each algorithm minimizes regret over time.
  3. Reward Distribution: A boxplot showing the distribution of zero and one rewards for each algorithm.
  4. Distribution of Total Regret at Timestep 100,000: A histogram of total regret values at timestep 1,000,000 across 100 iterations for a selected algorithm.
  5. Value-at-Risk (VaR) Function: Displays the VaR function for alpha values 0.01, 0.05, and 0.1, indicating the maximum potential loss at a given confidence level.
  6. Proportion of Suboptimal Arms Pulled: Shows the proportion of suboptimal arm selections compared to all selections up to each timestep.

Setup Instructions

1. Clone the Repository

First, you need to clone the repository from GitHub to your local machine. Open your terminal (or command prompt) and run the following command:

git clone https://github.com/eelisee/bandit_playground.git
cd bandit_playground

Once the repository is cloned, checkout the Version 1.0 branch by running:

git checkout -b v1.0-SKILL2025 origin/v1.0-SKILL2025

2. Set Up a Virtual Environment

It is strongly recommended to use a virtual environment to manage the project's dependencies. You can create and activate a virtual environment by running the following commands:

For macOS/Linux:

python3 -m venv .venv
source .venv/bin/activate

For Windows:

python -m venv .venv
.venv\Scripts\activate

3. Install Dependencies

Once the virtual environment is activated, you need to install the required Python packages. Install them by running the following command:

pip install -r requirements.txt

This command will install all the dependencies listed in the requirements.txt file. Ensure that all packages install without errors.

4. Running the Dashboard

Once the installation is complete, you can start the dashboard by running the following command:

python src/dashboard.py

After the command runs, your default web browser should automatically open with the dashboard at this URL:

http://127.0.0.1:8050

If it doesn't open automatically, you can manually copy and paste this URL into your browser.

Reproducibility

The code is designed for easy extensibility:

  • Configuration Management: All simulation parameters and settings are centrally defined in the configuration file config.py, enabling quick adjustments without changing the core code.
  • Adding New Algorithms: New multi-armed bandit algorithms can be integrated easily following a clear structure.
  • Customizing Plots: Additional plots for analysis and visualization can be added with minimal changes.
  • Flexible Scenario Definition: Different simulation scenarios can be defined via configurable arm distributions.

Detailed instructions for extending the simulation are available in the following documentation files:

Authors

Elise Wolf – [elise.marie.wolf@students.uni-mannheim.de] - University of Mannheim Affiliation

Citation

If you use this project, please cite @misc{wolf2025frameworkfairevaluationvarianceaware, title={A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms}, author={Elise Wolf}, year={2025}, eprint={2510.27001}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2510.27001}, }

License

This project is licensed under the MIT License – see the LICENSE file for details.

About

This repository contains the implementation of various multi-armed bandit algorithms and a dashboard for visualizing their performance. The goal is to compare the effectiveness of different algorithms in maximizing rewards and minimizing regret over time.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors