A comprehensive tutorial and framework for applying physics-informed machine learning to photovoltaic materials discovery using Materials Project data.
This repository demonstrates how to:
- Extract photovoltaic-relevant properties from Materials Project database
- Construct physics-based descriptors for solar cell materials
- Build interpretable ML models with physical constraints
- Accelerate materials screening while maintaining scientific rigor
Key Philosophy: Rather than treating ML as a black box, we incorporate fundamental photovoltaic physics (band structure, optical absorption, charge transport) into the learning framework.
from src.data_extraction import MaterialsProjectExtractor
from src.descriptors import PhotovoltaicDescriptors
from src.models import PhysicsInformedModel
# Extract materials data
extractor = MaterialsProjectExtractor(api_key="YOUR_API_KEY")
materials = extractor.get_photovoltaic_candidates(n_materials=100)
# Compute physics-based descriptors
descriptor_calc = PhotovoltaicDescriptors()
features = descriptor_calc.compute_all(materials)
# Train physics-informed model
model = PhysicsInformedModel(enforce_bandgap_constraint=True)
model.fit(features, target='efficiency')-
Materials Project Data Extraction (
01_materials_project_data_extraction.ipynb)- API setup and authentication
- Querying relevant materials for PV applications
- Data cleaning and preprocessing
-
Physics Descriptors for Photovoltaics (
02_physics_descriptors_for_photovoltaics.ipynb)- Electronic band structure descriptors
- Optical absorption features
- Charge transport metrics
- Stability indicators
-
ML Models with Physics Constraints (
03_ml_models_with_physics_constraints.ipynb)- Incorporating Shockley-Queisser limit
- Band gap optimization for different architectures
- Multi-objective optimization (efficiency, stability, cost)
-
Interpretable Predictions (
04_interpretable_predictions.ipynb)- Feature importance analysis
- Physical interpretation of ML decisions
- Uncertainty quantification
- Band gap (Eg): Optimal range 1.1-1.7 eV for single junction
- Effective masses (m)*: Impacts charge mobility
- Band alignment: Valence/conduction band positions
- Density of states: Near band edges
- Absorption coefficient: Direct vs indirect transitions
- Spectral matching: Solar spectrum overlap
- Theoretical efficiency: Based on detailed balance
- Formation energy: Thermodynamic stability
- Decomposition energy: Against competing phases
- Synthesizability score: Likelihood of experimental realization
# Clone the repository
git clone https://github.com/NabKh/Physics-Informed-ML-Solar-Cells.git
cd Physics-Informed-ML-Solar-Cells
# Create conda environment
conda env create -f environment.yml
conda activate piml-pv
# Or use pip
pip install -r requirements.txt- Register at Materials Project
- Get your API key from dashboard
- Set environment variable:
export MP_API_KEY="your_key_here"
Traditional ML approaches may learn spurious correlations and violate physical laws. Our approach:
- Constrains predictions to physically reasonable ranges (e.g., Eg > 0)
- Uses domain knowledge in feature engineering
- Ensures interpretability through physics-based features
- Incorporates physical laws (e.g., detailed balance limit)
def theoretical_efficiency(bandgap, temperature=300):
"""
Calculate theoretical efficiency limit based on detailed balance.
Incorporates fundamental thermodynamic constraints.
"""
# Implementation based on SQ limit
passThis repository serves as:
- Tutorial for students learning ML for materials science
- Template for researchers starting PV materials projects
- Best practices guide for physics-informed ML
Screen 50-100 materials in minutes instead of months of DFT calculations
Identify promising regions in composition-property space
Suggest unconventional materials for experimental validation
Demonstrate integration of physics and machine learning
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
Nabil Khossossi
- Website: sustai-nabil.com
- Email: n.khossossi@differ.nl
- GitHub: @NabKh
MIT License - see LICENSE file for details
Note: This is an educational and research tool. For production applications, additional validation and testing are required.

