This repository contains the source code and experimental framework for the research on Adaptive Quantization Observers (AQO) for Hoeffding Tree Regressors. The project utilizes a custom fork of the River library to implement and evaluate novel splitting criteria against established baselines in concept drift scenarios.
The main goal of this framework is to evaluate the impact of different kernel density estimation methods (Triangular, Epanechnikov, Smooth) within the Quantization Observer splitter. The experiments measure performance across synthetic datasets (with controlled drift) and real-world data streams.
- Prequential Evaluation: Rigorous testing using parallel processing.
- Novel Splitters: Implementation and testing of
HTR-AQO(Adaptive Quantization Observer). - Drift Simulation: Comprehensive synthetic data generation (Friedman, Hyperplane, RBF) with abrupt and gradual drifts.
- Visualization: Automated plotting scripts for CD diagrams and performance analysis over time.
To reproduce the experiments, it is recommended to use a virtual environment. Crucially, this project requires a specific branch of the River library fork.
# 1. Clone the repository
git clone https://github.com/caiogimenes/river-evaluate.git
cd river-evaluate
# 2. Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# 3. Install standard dependencies
pip install -r requirements.txt
# 4. Install the custom River fork (Specific branch 'feat/adaptive-qo' is required)
pip install git+https://github.com/caiogimenes/river.git@feat/adaptive-qo
Main Dependencies:
- Python 3.10+
river(Fork: caiogimenes/river, Branch:feat/adaptive-qo)numpypandasmatplotlib/seabornjoblib
river-evaluate/
├── logs/ # Stores raw experiment results (.pkl files)
├── output/ # Generated plots and diagrams
├── src/
│ ├── data/ # Data generators and adapters (Synthetic & Real)
│ ├── models/ # Definition of Regressors and Splitters
│ ├── plot/ # Visualization utilities
│ ├── stats/ # Statistical tests (Friedman, Nemenyi)
│ └── utils.py # Evaluation loops
├── run_experiment.py # Main entry point for execution
├── log_analysis.ipynb # Jupyter notebook for result exploration
└── requirements.txt # Project dependencies
To run the full experimental suite, execute the main script. This will trigger the prequential evaluation on the defined datasets.
python run_experiment.py
Note: By default, the script is configured to process 1,000,000 instances per dataset. You can modify the INSTANCES constant in run_experiment.py for quicker debugging.
The experiments compare the following variations of Hoeffding Tree Regressors (HTR):
- Baselines:
HATR: Hoeffding Adaptive Tree Regressor (Standard implementation).HTR-QO-0.25: HTR with Quantization Observer (radius=0.25).HTR-QO-0.5: HTR with Quantization Observer (radius=0.5).
- Proposed Methods (Adaptive QO):
HTR-AQO-Triangular: Adaptive QO with Triangular kernel.HTR-AQO-Epanechnikov: Adaptive QO with Epanechnikov kernel.HTR-AQO-Smooth: Adaptive QO with Smooth kernel.
The framework utilizes a diverse set of data streams:
- Synthetic: Friedman (Gradual/Abrupt Drift), Hyperplane, RandomRBF.
- Real-world: Bikes, Elec2, CoverType.
After running the experiments, logs are saved in the logs/ directory. You can use the provided notebook log_analysis.ipynb or the scripts in src/plot/ to generate:
- Performance over time plots.
- Critical Difference (CD) diagrams.
- Resource usage analysis (Memory/Time).
If you use this code in your research, please cite:
@article{DIAS, C.G.,
title={Adaptive Quantization Observers for Online Regression Trees},
author={DIAS, C.G.]},
journal={},
year={2026}
}
Distributed under the MIT License. See LICENSE for more information.