Python Data Science Project Example

This example demonstrates how to use devloop for a typical Python data science workflow, including Jupyter Lab development, model training, testing, and data preprocessing pipelines.

Overview

This project showcases a complete machine learning pipeline with:

Data preprocessing that triggers when raw data changes
Model training that triggers when source code or configs change
Automated testing that runs when code is modified
Jupyter Lab for interactive development and analysis

Project Structure

03-python-datascience/
├── .devloop.yaml           # Devloop configuration
├── requirements.txt        # Python dependencies
├── README.md              # This file
├── Makefile               # Build automation
├── src/                   # Source code
│   ├── train.py           # Model training script
│   └── data/
│       └── preprocess.py  # Data preprocessing pipeline
├── tests/                 # Test suite
│   ├── test_training.py   # Training pipeline tests
│   └── test_preprocessing.py  # Data preprocessing tests
├── notebooks/             # Jupyter notebooks
│   ├── 01_data_exploration.ipynb
│   └── 02_model_evaluation.ipynb
├── configs/               # Configuration files
│   └── model.yaml         # Model training configuration
├── data/                  # Data storage
│   ├── raw/               # Raw data files
│   └── processed/         # Processed data files
├── models/                # Trained models and metrics
└── logs/                  # Log files from devloop rules

Setup

Prerequisites

Python 3.8 or higher
devloop installed (installation guide)

Installation

Navigate to the example directory:
```
cd examples/03-python-datascience
```

Create a virtual environment (recommended):

python -m venv venv    # or use python3 
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Running the Example

Option 1: Using devloop (Recommended)

Start devloop to watch for file changes and automatically manage your development workflow:

devloop -c .devloop.yaml

This will start four concurrent processes:

Jupyter Lab (jupyter rule) - Interactive development environment
Model Training (train rule) - Automatic retraining when code/config changes
Tests (test rule) - Continuous testing when code changes
Data Pipeline (pipeline rule) - Data preprocessing when raw data changes

Option 2: Manual Execution

You can also run individual components manually:

# Run data preprocessing
python src/data/preprocess.py

# Train model
python src/train.py --config configs/model.yaml

# Run tests
pytest tests/ -v

# Start Jupyter Lab
jupyter lab --no-browser --port=8888

Workflow

1. Initial Setup

When you first run devloop, it will:

Generate sample datasets in data/raw/
Process the data and save to data/processed/
Train an initial model and save to models/
Start Jupyter Lab on port 8888
Run the test suite

2. Development Cycle

Modify Python source files (in src/):

Triggers model retraining
Runs test suite
Updates Jupyter Lab environment

Modify configuration (in configs/):

Triggers model retraining with new parameters
Saves new model and metrics

Modify raw data (in data/raw/):

Triggers data preprocessing pipeline
Updates processed datasets
May trigger model retraining if using processed data

Modify Jupyter notebooks (in notebooks/):

Restarts Jupyter Lab server
Preserves notebook state and outputs

3. Monitoring

Each rule outputs logs with prefixes for easy identification:

[jupyter] - Jupyter Lab server logs
[train] - Model training progress and metrics
[test] - Test execution results
[pipeline] - Data preprocessing status

Features Demonstrated

1. File Watching with Glob Patterns

watch:
  - action: "include"
    patterns:
      - "src/**/*.py"       # All Python files in src/
      - "configs/**/*.yaml" # All YAML configs
  - action: "exclude"
    patterns:
      - "**/__pycache__/**" # Ignore Python cache
      - "**/*.pyc"          # Ignore compiled Python files

2. Sequential Command Execution

commands:
  - "echo 'Starting model training...'"
  - "python src/train.py --config configs/model.yaml"

3. Working Directory Control

workdir: "."  # Run commands from project root

4. Log Prefixing for Multi-Process Development

settings:
  prefix_logs: true
  prefix_max_length: 10

Customization

Adding New Rules

To add a new development task, edit .devloop.yaml:

rules:
  - name: "Code Formatting"
    prefix: "format"
    watch:
      - action: "include"
        patterns:
          - "src/**/*.py"
    commands:
      - "black src/"
      - "flake8 src/"

Modifying Model Parameters

Edit configs/model.yaml to change model training parameters:

model_params:
  n_estimators: 200      # Increase number of trees
  max_depth: 15          # Allow deeper trees
  random_state: 42       # Keep reproducible results

Adding Dependencies

Add new packages to requirements.txt and reinstall:

echo "xgboost>=1.6.0" >> requirements.txt
pip install -r requirements.txt

Example Workflows

Data Science Exploration

Start devloop: devloop -c .devloop.yaml
Open Jupyter Lab: http://localhost:8888
Open notebooks/01_data_exploration.ipynb
Modify the notebook - Jupyter will restart automatically
Changes to src/ files are reflected immediately in notebooks

Model Development

Edit src/train.py to modify the training pipeline
Save the file - devloop automatically retrains the model
Check models/metrics.yaml for updated performance metrics
View results in notebooks/02_model_evaluation.ipynb

Iterative Testing

Write new tests in tests/
Modify source code in src/
Tests run automatically on every change
Fix failures and see immediate feedback

Data Pipeline Development

Add new raw data files to data/raw/
Preprocessing runs automatically
Check data/processed/ for updated datasets
Model retraining may trigger if using processed data

Integration with IDEs

This example works great with various development environments:

VS Code

Install Python extension
Use integrated terminal to run devloop
Edit files normally - devloop handles the rest

PyCharm

Open project directory
Use terminal to run devloop
Leverage PyCharm's debugging with running processes

Vim/Emacs

Run devloop in a tmux/screen session
Edit files as usual
Check devloop output for immediate feedback

Performance Tips

Exclude unnecessary files from watching:

- action: "exclude"
  patterns:
    - "models/**"      # Don't watch model outputs
    - "logs/**"        # Don't watch log files
    - ".git/**"        # Don't watch git files

Use specific patterns instead of **/*:

- action: "include"
  patterns:
    - "src/**/*.py"    # Only Python files in src

Optimize Jupyter startup for faster restarts:

jupyter lab --no-browser --port=8888 --NotebookApp.token=''

Troubleshooting

Common Issues

Port 8888 already in use:

# Find and kill existing Jupyter processes
lsof -ti:8888 | xargs kill -9
# Or use a different port in .devloop.yaml

Module import errors:

Ensure virtual environment is activated
Check that src/ is in Python path
Verify all dependencies are installed

Model training fails:

Check that configs/model.yaml exists
Verify data files are present
Review training logs for specific errors

Tests fail:

Run tests manually first: pytest tests/ -v
Check test dependencies are installed
Ensure test data is available

Getting Help

Check devloop logs for error messages
Run individual commands manually to isolate issues
Verify file permissions and paths
Check Python environment and dependencies

Next Steps

After exploring this example:

Customize for your project: Adapt the structure and configuration
Add more rules: Include linting, documentation generation, etc.
Scale up: Use devloop's agent/gateway mode for multi-project setups
Integrate CI/CD: Use similar patterns in your deployment pipeline

Related Examples

Full-Stack Web Application - Multi-language development
Microservices - Distributed development with gateway mode
Docker Integration - Container-based development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Data Science Project Example

Overview

Project Structure

Setup

Prerequisites

Installation

Running the Example

Option 1: Using devloop (Recommended)

Option 2: Manual Execution

Workflow

1. Initial Setup

2. Development Cycle

3. Monitoring

Features Demonstrated

1. File Watching with Glob Patterns

2. Sequential Command Execution

3. Working Directory Control

4. Log Prefixing for Multi-Process Development

Customization

Adding New Rules

Modifying Model Parameters

Adding Dependencies

Example Workflows

Data Science Exploration

Model Development

Iterative Testing

Data Pipeline Development

Integration with IDEs

VS Code

PyCharm

Vim/Emacs

Performance Tips

Troubleshooting

Common Issues

Getting Help

Next Steps

Related Examples

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Python Data Science Project Example

Overview

Project Structure

Setup

Prerequisites

Installation

Running the Example

Option 1: Using devloop (Recommended)

Option 2: Manual Execution

Workflow

1. Initial Setup

2. Development Cycle

3. Monitoring

Features Demonstrated

1. File Watching with Glob Patterns

2. Sequential Command Execution

3. Working Directory Control

4. Log Prefixing for Multi-Process Development

Customization

Adding New Rules

Modifying Model Parameters

Adding Dependencies

Example Workflows

Data Science Exploration

Model Development

Iterative Testing

Data Pipeline Development

Integration with IDEs

VS Code

PyCharm

Vim/Emacs

Performance Tips

Troubleshooting

Common Issues

Getting Help

Next Steps

Related Examples