Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: CI

on:
push:
branches: [main, dev/*]
pull_request:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"

- name: Run tests
run: pytest
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
local_test/
scratch/
.venv/
__pycache__/
*.pyc
*.pyo
*.pyd
*.db
*.sqlite3
*.log
*.bak
*.tmp
*.swp
.DS_Store
.vscode/
*.egg-info/
97 changes: 97 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

SyMANTIC is a PyTorch-based symbolic regression library that discovers interpretable, parsimonious equations from data. It combines mutual information-based feature selection, adaptive feature expansion with mathematical operators, L0-sparse regression with quantile-based partitioning, and Pareto-optimal solution extraction.

Install: `pip install symantic`

## Build & Test Commands

```bash
pip install -e ".[dev]" # Install in editable mode with test dependencies
pytest # Run all tests
pytest tests/test_model.py # Run a specific test file
pytest -k "test_auto" # Run tests matching a pattern
```

## Package Structure

```text
symantic/ # Main package (renamed from src/)
├── model.py # SymanticModel — main entry point
├── results.py # FitResult dataclass (unified return type)
├── validation.py # Input validation functions
├── exceptions.py # FeatureSpaceLimitError, ValidationError
├── feature_expansion/
│ ├── nondimensional.py # Non-dimensional feature expansion
│ └── dimensional.py # Dimensional feature expansion (sympy units)
├── regression/
│ ├── l0_greedy.py # Greedy forward selection + OLS
│ ├── l0_greedy_dimensional.py # Dimensional-aware regression
│ └── screening.py # Dimensional screening helpers
├── pareto.py # Pareto frontier identification
├── losses/ # Loss functions (Phase 4 — planned)
└── dynamics/ # Dynamic problem support (Phase 5 — planned)
tests/ # pytest test suite
docs/scaling.md # Computational scaling guide
docs/implemented_changes.md # Changelog of all improvements
examples/ # Jupyter notebook examples
```

The old `src/` directory provides a backward-compatible shim with deprecation warnings.

## Architecture

Pipeline: **Feature Space Construction → Sparse Regression → Pareto Analysis**

Two parallel pathways — non-dimensional and dimensional (unit-aware via sympy):

- `symantic/model.py` — **SymanticModel**: orchestrates the pipeline. Routes to dimensional or non-dimensional pathway based on whether `dimensionality` parameter is provided. Imports feature expansion modules as `fcc` (non-dimensional) and `dfcc` (dimensional).
- `symantic/feature_expansion/nondimensional.py` — Feature expansion using operators (+, -, *, /, exp, sin, cos). When `n_expansion=None`, runs adaptive auto-depth mode that iterates until metrics thresholds are met.
- `symantic/feature_expansion/dimensional.py` — Dimensional feature expansion with sympy-based unit validation.
- `symantic/regression/l0_greedy.py` — Greedy forward selection + OLS via `torch.linalg.lstsq`. Uses SIS (Sure Independence Screening) with quantile-based partitioning by complexity.
- `symantic/regression/l0_greedy_dimensional.py` — Dimensional-aware regression that filters features for unit consistency.
- `symantic/pareto.py` — Pareto front identification with configurable utopia point.

**Namespace**: The `__init__.py` exports qualified names (`NonDimensionalRegressor`, `DimensionalRegressor`, etc.) to avoid collisions. Default unqualified `Regressor` maps to non-dimensional.

## Key Dependencies

torch, numpy, pandas, scipy (spearmanr), sklearn (mutual_info_regression), sympy, matplotlib

## Key API

```python
from symantic import SymanticModel

model = SymanticModel(
df, # DataFrame: column 0 = target, columns 1+ = features
operators=['+','-','*','/'], # Operators for feature expansion
n_expansion=None, # None = adaptive auto-depth; integer = fixed depth (uses range(1, n))
n_term=3, # Max terms in equation (sparsity)
sis_features=20, # Features to screen per iteration
metrics=[0.06, 0.995], # [RMSE_threshold, R2_threshold] for auto-depth stopping
max_features=2000, # Max features before stopping expansion (None = default per mode)
device='cpu', # 'cpu' or 'cuda'
)
result = model.fit() # Returns FitResult
# Access via attributes:
result.rmse, result.equation, result.r2, result.pareto_front
# Or backward-compatible unpacking:
res, full_pareto = result # Auto-depth mode
rmse, equation, r2 = result # Fixed-depth mode
```

**Note on n_expansion**: When set to an integer, the loop is `range(1, n_expansion)`, so `n_expansion=2` gives 1 expansion level. `None` enables auto-depth.

## Planned Improvements (Multi-Phase)

This project is undergoing restructuring per the plan in `.claude/plans/`:

- **Phase 2**: UX/docs — `FitResult` dataclass, configurable `max_features`, input validation, scaling docs (COMPLETE)
- **Phase 3**: Performance — O(n log n) Pareto, expand() memory fix, batched combinations, vectorized pair computation (COMPLETE)
- **Phase 4**: L1/L2/ElasticNet regularization + classification losses (cross-entropy, hinge, focal)
- **Phase 5**: Dynamic problem placeholders (finite difference, Neural ODE gradient estimation)
Loading