Machine Learning Agent is an agentic system that automatically generates end-to-end machine learning notebooks from datasets.
It is built using Java (Spring AI) and Python (FastAPI), with seamless integration to Kaggleβs API for dataset retrieval.
| Capability | Description |
|---|---|
| π½ Dataset Acquisition | Automatically downloads and verifies datasets (via Kaggle API or direct URLs) |
| π§Ή Data Preprocessing | Handles cleaning, transformation, and feature processing |
| π§ Model Training | Trains one or more ML models using the generated notebook |
| π Performance Evaluation | Produces visual accuracy metrics (plots, confusion matrices, etc.) |
Two core agents drive the notebook lifecycle:
| Agent | Responsibility |
|---|---|
| NotebookCreatorAgent | Builds a machine learning notebook from scratch |
| NotebookUpdaterAgent | Revises or enhances an existing notebook |
Each step in notebook creation is validated before execution.
| Component | Role |
|---|---|
| CriticAgent | Reviews generated code before execution |
| EvaluatorAgent | Provides structured feedback when changes are required |
Workflow:
- Agent generates code for a step.
- CriticAgent reviews it:
- If approved β code is executed
- If rejected β feedback is applied and code is regenerated
- This repeats up to 3 correction attempts, after which the last revision is accepted
After execution, the notebook is tested using Papermill.
If an error occurs:
| Component | Role |
|---|---|
| ErrorHandlerAgent | Receives the exception, traceback, and failing code |
| Attempts to automatically fix issues up to 3 times | |
| If unresolved β returns HTTP 500 with diagnostic messaging |
| Layer | Technology |
|---|---|
| Core Orchestration | Java + Spring AI |
| Execution & Notebook Manipulation | Python + FastAPI |
| Dataset Integration | Kaggle API |
| Notebook Validation | Papermill |
- Multi-model comparison support (e.g., RandomForest vs XGBoost vs NeuralNet)
- Notebook-to-Production pipeline export (FastAPI endpoint / Batch job)
- Support for SQL / API-based data sources (Snowflake, BigQuery, REST, etc.)
- LLM-based summary of results and insights section at the end of notebooks