Kaggle competition to complete: https://www.kaggle.com/competitions/asr-numbers-recognition-in-russian/overview
There are results from several runs on diffenert dataset formats:
Result of training with numbers representation (Audio -> 1234)

Result of training with numbers representation (Audio -> "тысяча двести тридцать четыре")

Result of training with numbers representation (Audio -> <1>|<200><30><4>)

Token representation allows us to achieve the best validation loss.
The project exposes training metrics to Prometheus via Pushgateway. The
docker-compose.yml file includes services pushgateway, prometheus and
grafana. After training each dataset version the script pushes the latest
val_loss metric to Pushgateway.
Grafana is provisioned with Prometheus as a data source and an alert rule that
checks the val_loss metric. Alerts are sent through a Telegram contact point.
Set the following environment variables before starting the stack:
export GF_TELEGRAM_BOT_TOKEN=your_bot_token
export GF_TELEGRAM_CHAT_ID=your_chat_idStart the full stack with:
docker-compose up -dGrafana will be available at http://localhost:3000 and Prometheus at http://localhost:9090.
Alerts will trigger when val_loss stays above 1 and will be sent to the
specified Telegram chat.
To collaborate on this project please use GitHub Workflow.
Please read Conventional Commits specification to enshure that your commits named properly.
This project supports orchestration using Apache Airflow with Docker Compose.
- Local Airflow deployment (via Docker)
- Pipeline with three steps: Data Processing → Model Training → Model Testing
- Integrated with DVC for data and artifact versioning
Make sure Docker and Docker Compose are installed.
docker-compose up -d- Airflow UI will be available at: http://localhost:8080
- Default credentials:
admin/admin
If port 8080 is busy, change it in docker-compose.yml:
ports:
- "8081:8080"Once Airflow is running:
- Go to the web UI.
- Find DAG named
mlops_pipeline. - Toggle it on, then click Trigger DAG.
The DAG contains:
process_data: runsdata_downloadanddata_preparetrain_model: runs model training
All steps are executed via uv for dependency consistency and tracked with dvc.
- MLFlow: http://localhost:5000
- MINIO: http://localhost:9000
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
The repository exposes a FastAPI application that wraps the existing training
pipeline located in src/models and src/data. The service can download and
prepare data, perform inference using a trained Conformer model and report
metrics on the validation split.
uvicorn src.api.main:app --reloadTo start the service with Docker Compose:
docker-compose up apiThe model checkpoint and validation dataset paths are configured in
service_config.yaml.
POST /download_data— download the raw dataset from Kaggle.POST /prepare_data— generate dataset splits and vocabularies.POST /predict— return transcription for an uploaded audio file.GET /metrics— compute character error rate on the validation split.
python scripts/query_api.py sends the validation set audio files to the API
and saves predictions to results.json.