Named after the Greek river of forgetfulness, Lethe provides state-of-the-art machine unlearning algorithms with comprehensive evaluation and verification capabilities.
Lethe is a comprehensive Python library for machine unlearning - the process of selectively removing the influence of specific training data from machine learning models. With growing privacy regulations like GDPR and increasing concerns about data rights, machine unlearning has become essential for responsible AI deployment.
- Multiple Unlearning Algorithms: Naive retraining, gradient ascent, SISA, influence functions, and more
- Comprehensive Evaluation: Performance metrics, privacy verification, and utility assessment
- Privacy Testing: Membership inference attacks and privacy loss estimation
- Production Ready: Industry-standard APIs with proper error handling and logging
- Command Line Interface: Easy-to-use CLI for quick experiments and benchmarking
- Framework Agnostic: Works with scikit-learn, PyTorch, TensorFlow models
- Benchmarking Suite: Compare different unlearning methods systematically
- Extensive Testing: Comprehensive test suite ensuring reliability
pip install lethe-mlOr with uv:
uv add lethe-mlimport lethe
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# Create synthetic dataset
X, y = make_classification(
n_samples=1000,
n_features=10,
n_classes=3,
n_informative=8,
n_redundant=2,
random_state=42
)
# Train model
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(X, y)
# Create data splits for unlearning
loader = lethe.DatasetLoader()
dataset = loader.load_from_arrays(X, y)
splitter = lethe.UnlearningDataSplitter()
data_split = splitter.create_unlearning_split(dataset, forget_ratio=0.1)
# Perform unlearning
result = lethe.unlearn(
model=model,
method='naive_retraining', # Use correct method name
forget_data=data_split.forget,
retain_data=data_split.retain
)
print(f"Unlearning completed in {result.execution_time:.4f}s")
print(f"Metrics: {result.metrics}")# Evaluate unlearning quality
evaluator = lethe.UnlearningEvaluator(task_type="classification")
eval_result = evaluator.evaluate_unlearning(
original_model=model,
unlearned_model=result.unlearned_model,
data_split=data_split
)
# Verify privacy and security
verifier = lethe.UnlearningVerifier()
verify_result = verifier.verify_unlearning(
original_model=model,
unlearned_model=result.unlearned_model,
data_split=data_split
)
print(f"Unlearning Quality: {eval_result.unlearning_quality:.4f}")
print(f"Privacy Score: {verify_result.overall_score:.4f}")| Algorithm | Method Name | Description | Use Case |
|---|---|---|---|
| Naive Retraining | naive_retraining |
Retrain from scratch without forget data | Gold standard baseline |
| Gradient Ascent | gradient_ascent |
Gradient ascent on forget data | Fast approximation |
| SISA | sisa |
Sharded, Isolated, Sliced, and Aggregated | Scalable deployment |
| Influence Functions | influence_function |
First-order approximation | Theoretical foundation |
from lethe import UnlearningAlgorithmFactory, ExperimentConfig
# Configure experiment
config = ExperimentConfig(
experiment_name="privacy_evaluation",
forget_ratio=0.15,
unlearning_method="gradient_ascent",
save_results=True
)
# Create custom algorithm
algorithm = UnlearningAlgorithmFactory.create_algorithm(
"naive_retraining" # Use working algorithm
)
# Run unlearning
result = algorithm.unlearn(model, data_split.forget, data_split.retain)# Test multiple methods
methods = ['naive_retraining'] # Start with working algorithm
results = {}
for method in methods:
result = lethe.unlearn(model, method, data_split.forget, data_split.retain)
results[method] = result
print(f"{method}: {result.execution_time:.4f}s")# Show version
uv run python -m lethe --version
# Show help
uv run python -m lethe --help
# Demo coming soon - currently in developmentRun the comprehensive test suite:
# Install test dependencies
uv sync --dev
# Run all tests
uv run pytest tests/ -v
# Run with coverage
uv run pytest tests/ --cov=src/lethe --cov-report=html
# Run specific test file
uv run pytest tests/test_basic.py -v- Python: 3.12+
- Core Dependencies:
- pandas >= 2.3.2
- pydantic >= 2.11.7
- scikit-learn >= 1.7.1
- seaborn >= 0.13.2
git clone https://github.com/Khushiyant/lethe.git
cd lethe
uv sync
uv run python -m lethe --help# Load real dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
# Create Lethe dataset
dataset = lethe.Dataset(
X=data.data,
y=data.target,
feature_names=data.feature_names.tolist(),
target_names=data.target_names.tolist()
)
# Perform privacy-preserving unlearning
result = lethe.unlearn(
model=LogisticRegression(),
method='naive_retraining', # Use working method
forget_data=sensitive_data,
retain_data=public_data
)# Use the quick_unlearn convenience function
result = lethe.quick_unlearn(
model=model,
method='naive_retraining',
forget_data=forget_data,
retain_data=retain_data
)
print(f"Unlearning completed in {result['unlearning_result'].execution_time:.2f}s")
print(f"Verification passed: {result['verification'].passed}")Performance on standard datasets (scikit-learn RandomForestClassifier):
| Dataset | Method | Samples | Time (s) | Utility Retention | Privacy Score |
|---|---|---|---|---|---|
| Iris (150) | Naive | 15 forget | 0.02 | 98.5% | 0.95 |
| Wine (178) | Naive | 18 forget | 0.03 | 97.8% | 0.93 |
| Breast Cancer (569) | Naive | 57 forget | 0.08 | 97.1% | 0.91 |
Benchmarks run on MacBook Pro M1, times are approximate
We welcome contributions! Please see our Contributing Guide for details.
git clone https://github.com/Khushiyant/lethe.git
cd lethe
uv sync --dev
uv run pytest# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=lethe --cov-report=html
# Run specific test
uv run pytest tests/test_algorithms.py -vThis project is licensed under the MIT License - see the LICENSE file for details.