Beaver is DSL for Machine Learning in live data. It's purpose is to simplify the process of data retrieval and preprocessing, model training, model prediction and output display. It uses multiple tools to achieve this:
- Kafka
- Quixstreams
- River
- Plotly
- Dash
- Docker
- TextX
- Jinja
- 🔍 Static Model Validation: Validate your
.bvrfiles before code generation - 📊 Model Analysis: Get insights and suggestions for improving your models
- 🛠️ Enhanced CLI: Unified command-line interface for all operations
- ✅ Syntax Checking: Validate generated Python code syntax and compilation
- 📈 Comprehensive Algorithm Support: Full support for all River algorithm types
- 🔧 Detailed Error Reporting: Clear error messages with suggestions for fixes
You can find full description of the language as well as examples and FAQ in the documentation page
To download the project run :
git clone https://github.com/deepblue597/beaver.gitWhen you finished downloading go to the repository by running
cd beaverCreate a new virutal environment:
python -m venv <YOUR-VENV-NAME>Activate the environment
Depending on your shell:
Bash:
source <YOUR-VENV-NAME>\bin\activatePowerShell:
<YOUR-VENV-NAME>\Scripts\activateTo download all the necessary libraries run:
pip install -e .Open a Text Editor of your choice and create a .bvr file.
If you are unsure how to structure a .bvr you can check the docs or use one of the examples that are provided in the examples folder.
Beaver now includes a powerful CLI with validation, analysis, and code generation features:
python beaver_cli.py examplespython beaver_cli.py validate --input examples/linear.bvr --verbose# Basic generation with validation
python beaver_cli.py generate --input examples/linear.bvr --output my_pipeline.py
# Generation with comprehensive checking
python beaver_cli.py generate --input examples/linear.bvr --output my_pipeline.py --check-syntax --verbose
# Preview without creating files
python beaver_cli.py generate --input examples/linear.bvr --dry-run# Analyze a specific file
python beaver_cli.py analyze --input examples/linear.bvr
# Analyze all examples
python beaver_cli.py analyze --directory examplesWhen you have generated your pipeline, you can run it using:
python my_pipeline.py# Show extended help with examples
python beaver_cli.py help
# Get help for specific commands
python beaver_cli.py generate --helpYou can still use the original generator directly:
python beaver/gen_enhanced.py --metamodel <PATH-TO-YOUR-METAMODEL> --generated_file_name <PATH-TO-THE-GENERATED-FILE> --check-syntax --verboseRecommended Workflow:
- Validate first:
python beaver_cli.py validate --input your_model.bvr - Generate code:
python beaver_cli.py generate --input your_model.bvr --check-syntax - Run your pipeline:
python generated_pipeline.py
If you don't have a kafka setup, Beaver provides one with 3 brokers, 3 controllers and a kafka UI provided by provectuslabs To set it up
-
Go to
kafka_projfoldercd kafka_proj -
Run the docker compose file
docker compose up -d -
Your UI will be on
localhost:8080the brokers at which you can connect are onlocalhost:49092,localhost:39092andlocalhost:29092.
A visual representation of the process that will be built is displayed below:
graph TD
A[User writes .bvr file] --> B[TextX parses .bvr file]
B --> C[Python code generation Jinja]
C --> D[Generated pipeline script with Quix + River]
D --> E[Kafka topics for input/output]
E --> F[Quix Streams processes live data]
F --> G[Model training & prediction River]
G --> H[Metrics & predictions published to Kafka]
G --> J[Model saved on pickle file]
G --> I[Live visualization in Dash dashboard]
If Beaver has been useful to you, and you would like to cite it in a scientific publication, please refer to the thesis:
@mastersthesis{kakandris2025beaver,
title={Design and implementation of a textual domain language to produce machine learning applications on data streams},
author={Kakandris, Iasonas},
year={2025},
school={Aristotle University of Thessaloniki}
}