A comprehensive machine learning platform for predicting polymer blend properties using molecular descriptors and advanced simulation techniques. This system combines experimental data with computational simulation to create large-scale training datasets for robust property prediction across multiple polymer types and environmental conditions.
This system predicts 7 key polymer blend properties using molecular descriptors extracted from SMILES representations, combined with sophisticated data augmentation and machine learning techniques. The platform supports both single polymers and complex multi-component blends (up to 5 components) under various environmental conditions.
| Property | Unit | Environmental Parameters | Model Version | Description |
|---|---|---|---|---|
| WVTR | g/m²/day | Temperature, RH, Thickness | v4 | Water Vapor Transmission Rate |
| Tensile Strength | MPa | Thickness | v4 | Mechanical strength (MD & TD) |
| Elongation at Break | % | Thickness | v4 | Material flexibility |
| Cobb Value | g/m² | None | v2 | Water absorption capacity |
| OTR | cc/m²/day | Temperature, RH, Thickness | v2 | Oxygen Transmission Rate |
| Seal Strength | N/15mm | Thickness | v2 | Heat sealing performance |
| Home Compostability | % disintegration | Thickness | v5 | Biodegradation potential |
The system includes two complementary optimization systems that can be used at different stages of the pipeline to improve prediction accuracy:
- Purpose: Optimizes polymer-polymer compatibility (KI values) to minimize prediction errors
- When to Use:
- Before model training to improve simulation accuracy
- After model training to fine-tune compatibility parameters
- When prediction errors are high on validation blends
- Pipeline Stage: Simulation Layer - improves the underlying simulation rules
- What it Optimizes: KI values in
{property}_compatibility.yamlfiles - Input: Validation blends from
train/data/{property}/validationblends.csv - Output: Optimized KI values that improve MAE on validation data
- Purpose: Optimizes additive-polymer compatibility (KI values) to achieve target percentage changes
- When to Use:
- When you want specific additive effects (e.g., +20% WVTR increase)
- To calibrate additive behavior to match experimental observations
- When additive effects don't match expectations
- Pipeline Stage: Additive Integration Layer - fine-tunes additive effects
- What it Optimizes: KI values for additive-polymer pairs in compatibility files
- Input: Target percentage changes from
train/simulation/targets.csv - Output: Optimized KI values that achieve target additive effects
1. Experimental Data Collection
↓
2. Data Generation Pipeline
├── Simulation Rules (can be optimized with Blend Optimization)
├── Additive Integration (can be optimized with Additive Optimization)
└── Synthetic Data Generation
↓
3. Feature Extraction
↓
4. Model Training
↓
5. Prediction Engine
- Start with Blend Optimization to improve base simulation accuracy
- Then use Additive Optimization to fine-tune additive effects
- Retrain models with optimized parameters
- Validate on test data
- High prediction errors? → Use Blend Optimization
- Additive effects wrong? → Use Additive Optimization
- Both issues? → Use both systems in sequence
# Blend Optimization (improve simulation accuracy)
python simulate_optimize_blends.py --property wvtr --adaptive-lr
# Additive Optimization (calibrate additive effects)
python simulate_optimize_additives.py --property wvtr --tolerance 5.0
# Check results
git status # See what files were updated- Master Data: Real experimental measurements stored in
masterdata.csvfiles - Material Database: 80+ polymer materials with SMILES representations
- Property-Specific Datasets: Separate experimental data for each property
- Simulation Rules: Property-specific mathematical models for polymer blending
- Material Compatibility: Immiscibility rules and compatibility matrices
- Environmental Modeling: Temperature, humidity, and thickness effects
- Additive Integration: Advanced additive/filler system with UMM3 corrections
- Pairwise Interactions: Additive-polymer compatibility modeling
- Random Combinations: Generates thousands of polymer blend combinations
- Volume Fraction Sampling: Uses Dirichlet distribution for realistic compositions
- Environmental Variation: Tests across realistic temperature/humidity/thickness ranges
- Quality Control: UMM3 correction and validation systems
- Hybridization States: SP, SP2, SP3 for different elements (C, N, O, S, etc.)
- Ring Systems: Phenyls, cyclohexanes, cyclopentanes, thiophenes, heteroaromatic rings
- Functional Groups: 25+ chemical groups (esters, amides, alcohols, etc.)
- Structural Features: Branching factor, tree depth, chain lengths
- Carbon Classification: Primary, secondary, tertiary, quaternary carbons
- Volume Fraction Weighting: Combines individual polymer features using volume fractions
- Order Invariance: Sorts polymers by SMILES to ensure consistent featurization
- Environmental Integration: Adds temperature, humidity, thickness as features
- Algorithm: XGBoost gradient boosting models
- Preprocessing: Log transformations with property-specific offsets
- Feature Engineering: Categorical encoding and missing value handling
- Dual Property Support: Some properties predict multiple values (TS1+TS2, max_L+t0)
- Smart Data Splitting: "Last N" strategy for temporal validation
- Oversampling: Property-specific oversampling for rare cases
- Cross-Validation: Robust model performance evaluation
- Feature Selection: Automatic exclusion of SMILES and metadata columns
- Input Validation: Polymer/grade validation and volume fraction checking
- Feature Extraction: Real-time molecular descriptor calculation
- Model Loading: Dynamic model loading with version management
- Error Quantification: Uncertainty bounds and confidence intervals
- Curve Generation: Time-dependent curves for compostability and sealing profiles
- Environmental Scaling: Property-specific environmental parameter effects
- Multi-Property Prediction: Simultaneous prediction of multiple properties
The system includes a sophisticated additive/filler integration framework that allows for realistic modeling of how additives affect polymer blend properties. This system uses the UMM3 (Universal Material Modification Model) correction framework to apply both individual additive effects and pairwise additive-polymer interactions.
ingredients:
Glycerol: # Additive name
K_ts: -0.5 # Tensile strength effect (reduces strength)
K_wvtr: 0.3 # WVTR effect (increases permeability)
K_eab: 0.8 # Elongation effect (increases flexibility)
K_cobb: 0.0 # Cobb effect (no effect)
K_seal: -0.2 # Sealing effect (slightly reduces)
K_compost: 0.1 # Compostability effect (slightly improves)
K_otr: 0.2 # OTR effect (increases oxygen permeability)
G: 0 # Geometry factor (0 for additives)
type: "additive"
description: "Glycerol plasticizer - increases flexibility, reduces strength"
smiles: "C(C(CO)O)O" # SMILES representationProperty-specific compatibility files define how additives interact with different polymer families:
# wvtr_compatibility.yaml
"Glycerol-PLA": {KI: 0.3, description: "Glycerol-PLA: Moderate compatibility, plasticizer effect increases WVTR"}
"Glycerol-PBAT": {KI: 0.2, description: "Glycerol-PBAT: Good compatibility, plasticizer effect increases WVTR"}
"Glycerol-PCL": {KI: 0.1, description: "Glycerol-PCL: Excellent compatibility, plasticizer effect increases WVTR"}Material,Grade,SMILES,Type
Glycerol,Glycerol,C(C(CO)O)O,additive- Property-Specific Effects: Each additive has K_prop values for all 7 properties
- Transport Factors: K_ts, K_wvtr, K_eab, K_cobb, K_seal, K_compost, K_otr
- Realistic Ranges: Values typically between -1.0 to +1.0 for realistic effects
- Clipping Protection: System automatically clips extreme values to prevent overflow
- Additive-Polymer Compatibility: KI values define how well additives interact with different polymer families
- Property-Specific: Different KI values for each property (WVTR, TS, EAB, etc.)
- Compatibility Levels: Excellent (KI: 0.1), Good (KI: 0.2), Moderate (KI: 0.3), Poor (KI: 0.5+)
- Default Behavior: Missing pairs default to KI: 0.0 with warning
- Configurable Probability: Default 30% of simulated blends include additives
- Random Selection: System randomly selects which blends get additives
- Volume Fraction Sampling: Additive concentrations sampled from realistic ranges
- Individual Corrections: Apply K_prop values to base property values
- Pairwise Corrections: Apply KI values for additive-polymer interactions
- Clipping: Prevent mathematical overflow with extreme values
- Validation: Ensure physically realistic results
- Interactive Interface: Web-based exploration of additive effects
- Real-time Prediction: See immediate effects of adding Glycerol to blends
- Property Comparison: Compare predictions with and without additives
- Visual Feedback: Clear display of additive effects on all properties
# Launch the additive explorer
streamlit run simple_additive_app.py# Generate training data with additives enabled
python train/simulation/simulate.py --property wvtr --number 10000 --seed 42
# The system will:
# - Include additives in 30% of blends (default)
# - Apply UMM3 corrections for additive effects
# - Generate realistic additive-polymer interactions# Train models on additive-enhanced data
python train/training/train_unified_modular.py --property wvtr --input train/data/wvtr/polymerblends_for_ml_featurized.csv --output train/models/wvtr/vtest_additive/# Moderate plasticizer effects
Glycerol:
K_ts: -0.3 # Slight strength reduction
K_wvtr: 0.2 # Slight permeability increase
K_eab: 0.5 # Moderate flexibility increase
K_seal: -0.1 # Slight sealing reduction# For testing and demonstration
Glycerol:
K_ts: -5.0 # Massive strength reduction
K_wvtr: 5.0 # Massive permeability increase
K_eab: 5.0 # Massive flexibility increaseThe system includes an advanced blend optimization framework that automatically tunes polymer-polymer compatibility (KI) values to minimize prediction errors. This system uses gradient descent with blend-specific adaptive learning rates to optimize the entire KI vector simultaneously.
- Automatic Detection: Identifies all polymer families present in validation data
- Pair Generation: Creates KI values for all unique polymer-polymer pairs
- Validation-Based: Only optimizes pairs that exist in validation blends
- No Manual Configuration: System automatically determines what to optimize
- Personalized Learning: Each blend gets its own learning rate based on MAE proximity to 0
- Automatic Slowdown: Learning rate decreases as blend approaches target
- No Hard Limits: KI values can go wherever needed to minimize MAE
- Weighted Gradients: Each KI parameter's gradient is weighted by blend learning rates
- Finite Differences: Uses numerical differentiation to calculate gradients
- Convergence Detection: Stops when improvement becomes negligible
- Stability Controls: Gradient normalization and light clipping to prevent explosion
- Iterative Improvement: Continues until convergence or max iterations
# Optimize WVTR with adaptive learning rate
python simulate_optimize_blends.py --property wvtr --max-iterations 50 --base-lr 0.1 --adaptive-lr
# Optimize with fixed learning rate
python simulate_optimize_blends.py --property wvtr --max-iterations 50 --learning-rate 0.001 --no-adaptive-lr
# Optimize other properties
python simulate_optimize_blends.py --property ts --max-iterations 30 --base-lr 0.05 --adaptive-lr
python simulate_optimize_blends.py --property eab --max-iterations 30 --base-lr 0.05 --adaptive-lr# Custom iteration count and learning rate
python simulate_optimize_blends.py --property wvtr --max-iterations 100 --base-lr 0.2 --adaptive-lr
# Fixed learning rate with custom rate
python simulate_optimize_blends.py --property wvtr --max-iterations 50 --learning-rate 0.005 --no-adaptive-lr- Loads validation data from
train/data/{property}/validationblends.csv - Extracts polymer families and builds pair mapping
- Initializes KI vector (starts with existing YAML values or zeros)
- For each KI parameter, calculates finite difference gradient
- Tests small perturbation (ε = 1e-6) and measures MAE change
- Computes gradient = (MAE_plus - MAE_current) / ε
- Adaptive Mode: Calculates blend-specific learning rates based on individual MAE
- Fixed Mode: Uses single learning rate for all parameters
- Weighted Updates: Combines gradients weighted by learning rates
- Updates KI vector using gradient descent:
ki_new = ki_old - lr * gradient - Applies light clipping to prevent extreme values (-50 to +50)
- Stores results for analysis and YAML updates
- CSV Output:
blend_optimization_results_{property}.csv - Iteration Tracking: MAE, improvement, KI vector, learning rates per iteration
- Convergence Info: Final MAE, improvement, convergence status
- Automatic Updates: Updates
{property}_compatibility.yamlwith optimized KI values - Backup Safety: Original values preserved in git
- Validation: Ensures updated values are physically reasonable
- Robust: Less sensitive to learning rate choice
- Personalized: Each blend gets appropriate attention
- Stable: Consistent improvement across iterations
- Efficient: Automatically adjusts step size
- Simple: Straightforward gradient descent
- Tunable: Requires careful learning rate selection
- Fast: Can converge quickly with right parameters
- Sensitive: Performance depends heavily on learning rate choice
- KI Overrides: Optimized values take precedence over YAML defaults
- Fallback System: Falls back to YAML values for non-optimized pairs
- Hybrid Approach: Combines optimized and default values seamlessly
- Pre-Training: Run optimization before model training
- Post-Training: Run optimization after model training for fine-tuning
- Iterative: Can be run multiple times for continuous improvement
- New Properties: When optimizing for the first time
- Complex Blends: When dealing with many polymer families
- Robust Optimization: When you want consistent results
- Fine-Tuning: When you have a good starting point
- Quick Tests: When you want to test different parameters
- Custom Control: When you want precise control over step size
- Start with Adaptive: Use adaptive learning rate for initial optimization
- Analyze Results: Check convergence and final MAE
- Fine-Tune: Use fixed learning rate with smaller steps if needed
- Validate: Test updated models on validation data
- Iterate: Repeat if further improvement is needed
The system includes a dedicated additive optimization framework that tunes additive-polymer compatibility (KI) values to achieve specific target percentage changes in properties. This system uses representative blends and gradient descent to optimize additive effects across different polymer families.
- CSV-Driven Targets: Loads optimization targets from
train/simulation/targets.csv - Percentage-Based: Optimizes for specific percentage changes (e.g., +20% WVTR increase)
- Polymer-Specific: Different targets for different polymer families
- Additive-Specific: Different targets for different additives (e.g., Glycerol)
- Deterministic Blends: Creates consistent representative blends for each polymer family
- Volume Fraction Sampling: Uses realistic volume fractions (0.3-0.7 for primary polymer)
- Environmental Parameters: Tests across realistic temperature/humidity/thickness ranges
- Additive Integration: Includes additive at realistic concentrations (0.1-0.3)
- Individual Gradients: Each additive-polymer pair gets its own gradient
- Adaptive Learning: Learning rate adjusts based on individual target achievement
- Tolerance-Based: Stops when all targets are within specified tolerance
- Convergence Detection: Monitors individual and total errors
# Optimize Glycerol effects on WVTR
python simulate_optimize_additives.py --property wvtr --tolerance 5.0 --max-iterations 50
# Optimize Glycerol effects on Tensile Strength
python simulate_optimize_additives.py --property ts --tolerance 10.0 --max-iterations 30
# Optimize with custom parameters
python simulate_optimize_additives.py --property eab --tolerance 3.0 --max-iterations 100 --learning-rate 0.05# High precision optimization
python simulate_optimize_additives.py --property wvtr --tolerance 1.0 --max-iterations 200 --learning-rate 0.01
# Quick optimization for testing
python simulate_optimize_additives.py --property otr --tolerance 15.0 --max-iterations 20 --learning-rate 0.2Polymer Family,Additive,Target (average % increase on representative blends)
PLA,Glycerol,20.0
PBAT,Glycerol,15.0
PBS,Glycerol,25.0
PHAa,Glycerol,18.0
PHAs,Glycerol,22.0
PHBV,Glycerol,16.0- Positive Values: Increase in property (e.g., +20% WVTR increase)
- Negative Values: Decrease in property (e.g., -10% Tensile Strength decrease)
- Percentage-Based: Targets are percentage changes, not absolute values
- Representative Blends: Targets are achieved on consistent test blends
- Loads targets from
targets.csv - Validates target values and polymer-additive combinations
- Creates optimization vector for each target
- Generates deterministic blends for each polymer family
- Includes additive at realistic concentrations
- Tests across environmental parameter ranges
- Ensures consistent baseline for optimization
- Calculates individual gradients for each additive-polymer pair
- Uses finite differences to measure KI sensitivity
- Tracks individual percentage changes vs targets
- Computes total error across all targets
- Updates each KI value based on its individual gradient
- Uses adaptive learning rate based on target achievement
- Applies gradient clipping to prevent extreme values
- Monitors convergence and tolerance achievement
- Console Output: Real-time progress with individual target tracking
- Convergence Info: Final errors, target achievement, iteration count
- KI Updates: Shows KI value changes for each additive-polymer pair
- Automatic Updates: Updates
{property}_compatibility.yamlwith optimized KI values - Additive-Polymer Pairs: Updates specific additive-polymer compatibility values
- Validation: Ensures updated values are physically reasonable
- Optimized Values: Values from additive optimization take highest priority
- YAML Defaults: Falls back to YAML values for non-optimized pairs
- System Defaults: Uses 0.0 for completely missing pairs
- Individual Corrections: K_prop values from
ingredients.yaml - Pairwise Corrections: Optimized KI values from compatibility files
- Combined Effects: Both individual and pairwise effects applied together
- Target-Driven: Focuses on specific percentage changes
- Representative Testing: Uses consistent blends for reliable optimization
- Individual Tracking: Each additive-polymer pair optimized independently
- Tolerance-Based: Stops when targets are achieved within tolerance
- Fast Initial Progress: Large improvements in early iterations
- Fine-Tuning: Slower progress as targets are approached
- Tolerance Achievement: Stops when all targets within specified tolerance
- Robust: Handles multiple targets simultaneously
- Realistic Targets: Set achievable percentage changes based on literature
- Property-Specific: Different tolerances for different properties
- Additive-Specific: Consider additive type when setting targets
- Validation: Test targets on representative blends before optimization
- Start Conservative: Use higher tolerance (10-15%) for initial optimization
- Refine Targets: Lower tolerance (3-5%) for fine-tuning
- Validate Results: Test optimized values on validation blends
- Iterate: Repeat optimization if targets not achieved
- Set Targets: Define target percentage changes in
targets.csv - Run Optimization: Execute additive optimization for each property
- Validate Results: Test optimized KI values on validation data
- Update Models: Retrain models with optimized additive effects
- Test Performance: Evaluate model performance with new additive effects
Real Measurements → Master Data Files → Material Database
# Generate synthetic data for any property
python train/simulation/simulate.py --property wvtr --number 10000
# Generate data for all properties
python train/simulation/simulate.py --all --number 5000# Convert raw data to ML-ready features
python train/run_blend_featurization.py input.csv output.csv# Train models using modular pipeline
python train/training/train_unified_modular.py --property wvtr --input featurized_data.csv --output models/wvtr/v1# Generate comprehensive prediction database
python train/databasegeneration.py --max_materials 20 --random_samples 5- SP Hybridization: SP_C, SP_N
- SP2 Hybridization: SP2_C, SP2_N, SP2_O, SP2_S, SP2_B
- SP3 Hybridization: SP3_C, SP3_N, SP3_O, SP3_S, SP3_P, SP3_Si, SP3_B
- Halogens: SP3_F, SP3_Cl, SP3_Br, SP3_I
- Special: SP3D2_S
- Aromatic Rings: Phenyls, thiophenes, aromatic rings with N/O/S
- Aliphatic Rings: Cyclohexanes, cyclopentanes, cyclopentenes
- Heteroaromatic: Aromatic rings with nitrogen, oxygen, sulfur
- Mixed Systems: Aromatic rings with both N and O
- Carbonyl Groups: Carboxylic acids, esters, amides, ketones, aldehydes
- Nitrogen Groups: Amines (primary, secondary, tertiary, quaternary), imines, nitriles
- Sulfur Groups: Thiols, thioethers, sulfones, sulfoxides
- Special Groups: Imides, ureas, carbamates, azides, azo compounds
- Branching: Branching factor calculation
- Depth: Tree depth from terminal atoms
- Chains: Ethyl, propyl, butyl, long chain detection
- SMILES Parsing: Uses RDKit for molecular structure analysis
- Order Invariance: Consistent feature extraction regardless of input order
- Missing Value Handling: Robust handling of invalid SMILES
- Volume Weighting: Features weighted by polymer volume fractions in blends
- Blending Rule: Inverse rule of mixtures (1/Σ(ci/wi))
- Environmental Effects: Temperature (Arrhenius), humidity (power law), thickness (power law)
- Scaling: Dynamic thickness scaling based on polymer properties
- Blending Rule: Rule of mixtures with material type considerations
- Dual Properties: Machine direction (MD) and transverse direction (TD)
- Environmental Effects: Thickness-dependent scaling
- Blending Rule: Volume-weighted average with non-linear corrections
- Dual Properties: EAB1 and EAB2 predictions
- Material Compatibility: Immiscibility rules for certain polymer combinations
- Blending Rule: Simple rule of mixtures
- Environmental Independence: No temperature/humidity effects
- Material Specific: Intrinsic property of polymer composition
- Blending Rule: Similar to WVTR with oxygen-specific parameters
- Environmental Effects: Temperature and humidity scaling
- Thickness Scaling: Power law relationship
- Blending Rule: Volume-weighted average of individual polymer strengths
- Temperature Calculation: Rule of mixtures for melting temperatures
- Curve Generation: Cubic polynomial interpolation for temperature profiles
- Dual Properties: Maximum disintegration (max_L) and time to 50% (t0)
- Curve Generation: Sigmoid curves for disintegration, quintic curves for biodegradation
- Environmental Effects: Thickness-dependent degradation kinetics
- Log Transformations: Property-specific log scaling with offsets
- Zero Handling: Removal of zero targets for certain properties
- Missing Value Treatment: NaN handling for WVTR/OTR properties
- Feature Exclusion: Automatic removal of SMILES and metadata columns
- Last N Training: Specified blends always go to training set
- Last N Testing: Can override last N training for validation
- Automatic 80/20 Split: For remaining data
- Temporal Validation: Ensures realistic model evaluation
- Algorithm: XGBoost with property-specific hyperparameters
- Preprocessing: Categorical encoding and feature scaling
- Oversampling: Configurable oversampling for rare cases
- Dual Property Support: Separate models for multi-target properties
- Cross-Validation: K-fold validation for robust performance assessment
- Last N Analysis: Special validation on most recent blends
- Error Quantification: Model error bounds and uncertainty estimates
- Performance Metrics: MAE, RMSE, R² for each property
- WVTR: v4 (XGBoost)
- Tensile Strength: v4 (XGBoost, dual property)
- Elongation at Break: v4 (XGBoost, dual property)
- Cobb Value: v2 (XGBoost)
- OTR: v2 (XGBoost)
- Seal Strength: v2 (XGBoost)
- Home Compostability: v5 (XGBoost, dual property)
- High Accuracy: R² > 0.8 for most properties
- Robust Validation: Consistent performance across different blend types
- Error Bounds: Quantified uncertainty for all predictions
- Scalability: Handles up to 5-component blends efficiently
# WVTR prediction
python predict_blend_properties.py wvtr "PLA, 4032D, 0.5, PBAT, Ecoworld, 0.5" 25 60 100
# Tensile strength prediction
python predict_blend_properties.py ts "PLA, 4032D, 0.5, PBAT, Ecoworld, 0.5" 100
# All properties prediction
python predict_blend_properties.py all "PLA, 4032D, 0.5, PBAT, Ecoworld, 0.5" 25 60 100# Launch main Streamlit web app
streamlit run predict_blend_properties_app.py
# Launch additive explorer app
streamlit run simple_additive_app.pyfrom train.modules.prediction_engine import predict_blend_property
# WVTR prediction for PLA/PBAT blend
result = predict_blend_property(
property_type='wvtr',
polymers=[("PLA", "4032D", 0.5), ("PBAT", "Ecoworld", 0.5)],
available_env_params={'Temperature (C)': 25, 'RH (%)': 60, 'Thickness (um)': 100},
material_dict=material_dict
)
if result['success']:
print(f"WVTR: {result['prediction']:.2f} {result['unit']}")# Predict all properties simultaneously
properties = ['wvtr', 'ts', 'eab', 'cobb', 'otr', 'seal', 'compost']
results = []
for prop in properties:
result = predict_blend_property(prop, polymers, env_params, material_dict)
if result:
results.append(result)
# Display results
for result in results:
print(f"{result['name']}: {result['prediction']:.2f} {result['unit']}")# Predict properties with additives
polymers_with_additive = [
("PLA", "4032D", 0.4),
("PBAT", "Ecoworld", 0.4),
("Glycerol", "Glycerol", 0.2) # 20% Glycerol additive
]
# Predict with additive effects
result = predict_blend_property(
property_type='wvtr',
polymers=polymers_with_additive,
available_env_params={'Temperature (C)': 25, 'RH (%)': 60, 'Thickness (um)': 100},
material_dict=material_dict
)
if result['success']:
print(f"WVTR with Glycerol: {result['prediction']:.2f} {result['unit']}")
print(f"Additive effects applied via UMM3 corrections")Polymer-Blends-Model/
├── predict_blend_properties_app.py # Main Streamlit application
├── simple_additive_app.py # Additive explorer application
├── predict_blend_properties.py # Command-line interface
├── material-smiles-dictionary.csv # Polymer database
├── additives-fillers-dictionary.csv # Additives and fillers database
├── train/ # Training and simulation modules
│ ├── simulation/ # Data augmentation system
│ │ ├── simulate.py # Main simulation script
│ │ ├── simulation_common.py # Common simulation functions
│ │ ├── simulation_rules.py # Property-specific rules
│ │ ├── umm3_correction.py # UMM3 correction system
│ │ ├── config/ # Configuration files
│ │ │ ├── ingredients.yaml # Additive K_prop parameters
│ │ │ ├── ingredient_polymer.yaml # Additive-polymer compatibility
│ │ │ └── compatibility/ # Property-specific compatibility
│ │ │ ├── wvtr_compatibility.yaml
│ │ │ ├── ts_compatibility.yaml
│ │ │ ├── eab_compatibility.yaml
│ │ │ ├── cobb_compatibility.yaml
│ │ │ ├── otr_compatibility.yaml
│ │ │ ├── seal_compatibility.yaml
│ │ │ └── compost_compatibility.yaml
│ │ └── rules/ # Individual property rules
│ ├── modules/ # Core prediction modules
│ │ ├── feature_extractor.py # Molecular feature extraction
│ │ ├── blend_feature_extractor.py # Blend processing
│ │ ├── prediction_engine.py # Model prediction
│ │ ├── prediction_utils.py # Utility functions
│ │ └── error_calculator.py # Uncertainty quantification
│ ├── training/ # Model training pipeline
│ │ ├── train_unified_modular.py # Main training script
│ │ └── training_modules/ # Modular training components
│ ├── data/ # Training and validation data
│ │ ├── wvtr/ # WVTR datasets
│ │ ├── ts/ # Tensile strength datasets
│ │ ├── eab/ # Elongation datasets
│ │ ├── cobb/ # Cobb value datasets
│ │ ├── otr/ # OTR datasets
│ │ ├── seal/ # Seal strength datasets
│ │ └── eol/ # Compostability datasets
│ ├── models/ # Trained ML models
│ │ ├── wvtr/v4/ # WVTR models
│ │ ├── ts/v4/ # Tensile strength models
│ │ ├── eab/v4/ # Elongation models
│ │ ├── cobb/v2/ # Cobb value models
│ │ ├── otr/v2/ # OTR models
│ │ ├── seal/v2/ # Seal strength models
│ │ └── eol/v5/ # Compostability models
│ └── databasegeneration.py # Database generation script
├── train/modules_home/ # Compostability curve generation
│ ├── curve_generator.py # Curve generation functions
│ └── utils.py # Utility functions
├── train/modules_sealing/ # Sealing profile generation
│ ├── curve_generator.py # Sealing curve generation
│ └── utils.py # Utility functions
├── validation/ # Model validation results
└── test_results/ # Prediction outputs
pip install streamlit pandas numpy scikit-learn matplotlib seaborn rdkit scipy joblib xgboostmaterial-smiles-dictionary.csv: Polymer database with SMILES representationsmodelrrors.csv: Model error data for uncertainty quantification- Trained model files in
train/models/directories
- Install dependencies:
pip install -r requirements.txt - Launch main web app:
streamlit run predict_blend_properties_app.py - Launch additive explorer:
streamlit run simple_additive_app.py - Command line:
python predict_blend_properties.py all "PLA, 4032D, 0.5, PBAT, Ecoworld, 0.5"
# Generate 10,000 samples for WVTR
python train/simulation/simulate.py --property wvtr --number 10000
# Generate data for all properties
python train/simulation/simulate.py --all --number 5000# Convert to ML-ready features
python train/run_blend_featurization.py train/data/wvtr/polymerblends_for_ml.csv train/data/wvtr/polymerblends_for_ml_featurized.csv# Train WVTR model
python train/training/train_unified_modular.py --property wvtr --input train/data/wvtr/polymerblends_for_ml_featurized.csv --output train/models/wvtr/v1
# Train with custom parameters
python train/training/train_unified_modular.py --property ts --input train/data/ts/polymerblends_for_ml_featurized.csv --output train/models/ts/v1 --last_n_training 10 --last_n_testing 3# Generate comprehensive database
python train/databasegeneration.py --max_materials 20 --random_samples 5 --n_polymers 2 3 4- Disintegration Curves: Sigmoid functions for disintegration over time
- Biodegradation Curves: Quintic functions for biodegradation kinetics
- Time Profiles: Day-by-day predictions up to 400 days
- Thickness Scaling: Material thickness effects on degradation
- Temperature Profiles: Sealing strength vs temperature curves
- Cubic Interpolation: Smooth curves through boundary points
- Boundary Points: Initial sealing, polymer max, blend predicted, degradation
- Material Properties: Melt temperature and degradation temperature effects
- Model Errors: Property-specific error bounds
- Experimental Uncertainty: Standard deviations from experimental data
- Confidence Intervals: Upper and lower bounds for predictions
- Uncertainty Propagation: Error bounds for derived quantities
- Temperature Scaling: Arrhenius and power law relationships
- Humidity Effects: Moisture-dependent property changes
- Thickness Scaling: Dynamic and fixed thickness scaling models
- Material Interactions: Polymer-specific environmental responses
- Experimental Validation: All models trained on real experimental data
- Cross-Validation: Robust validation across different blend types
- Error Tracking: Comprehensive error analysis and reporting
- Reproducibility: Fixed random seeds and version control
- Material Compatibility: Immiscibility rules based on polymer chemistry
- Environmental Physics: Realistic temperature and humidity effects
- Blending Rules: Physically meaningful mathematical models
- Constraint Validation: Volume fraction and material property constraints
- Feature Importance: Detailed analysis of molecular descriptor contributions
- Blending Rules: Transparent mathematical models for property prediction
- Error Analysis: Clear understanding of model limitations
- Validation Results: Comprehensive performance metrics and visualizations
- High Accuracy: R² > 0.8 for most properties
- Robust Validation: Consistent performance across blend types
- Error Quantification: Bounded uncertainty for all predictions
- Scalability: Efficient prediction for complex multi-component blends
- Temporal Validation: Last N blends strategy for realistic evaluation
- Cross-Property Validation: Consistent performance across all properties
- Environmental Validation: Performance across temperature/humidity ranges
- Material Validation: Performance across different polymer types
- Add SMILES representation to
material-smiles-dictionary.csv - Validate SMILES syntax with RDKit
- Test predictions with the new material
- Update documentation
- Create property-specific simulation rules
- Add experimental data to
train/data/ - Configure training parameters
- Train and validate models
- Update prediction engine
- Modular Design: Clear separation of concerns
- Version Control: Model versioning and management
- Documentation: Comprehensive inline documentation
- Testing: Validation and error handling throughout
This project is for research and development purposes. Please ensure proper attribution when using the models and methodology.
- RDKit: Molecular structure analysis and SMILES processing
- XGBoost: Gradient boosting machine learning models
- Streamlit: Web application framework
- Experimental Data: Real polymer property measurements
- Scientific Community: Polymer science and materials engineering research
This comprehensive system represents a significant advancement in polymer blend property prediction, combining experimental data with computational simulation to create robust, accurate, and interpretable models for materials science applications.