Advanced Computational Framework for Contaminant Transport and Ecological Risk Assessment
Integrating numerical modeling with machine learning for environmental contamination analysis
- Overview
- Features
- Installation & Setup
- Quick Start
- Visualizations & Demos
- Project Architecture
- Programming Examples
- Configuration
- Results and Performance
- Scientific Background
- Research Team
- Citation & License
- Contact
This repository presents a state-of-the-art computational framework that combines numerical modeling of contaminant transport with machine learning techniques to predict ecological risk levels in aquatic environments. The system integrates a 2D advection-diffusion model with advanced ML algorithms to provide comprehensive risk assessment capabilities for environmental contamination scenarios.
- 🌊 Numerical Simulation: 2D advection-diffusion equation solver with customizable boundary conditions
- 🤖 Machine Learning: Multi-algorithm risk classification (Random Forest, SVM, Gradient Boosting, Logistic Regression)
⚠️ Risk Assessment: Automated ecological risk level classification (Low, Medium, High)- 🎨 Visualization: Comprehensive plotting, animations, and interactive dashboards
- 🔬 Scenario Analysis: 33 predefined hydrodynamic scenarios for comprehensive testing
| Field | Application | Use Case |
|---|---|---|
| Environmental Engineering 🌿 | Pollution Assessment | Contaminant plume modeling, risk zone identification |
| Aquatic Ecology 🐟 | Ecosystem Protection | Species impact assessment, habitat risk evaluation |
| Water Resources 💧 | Management Planning | Contamination source control, remediation strategies |
| Regulatory Compliance 📋 | Environmental Monitoring | Risk threshold validation, compliance reporting |
| Research & Development 🔬 | Scientific Studies | Model validation, methodology development |
- 2D Advection-Diffusion Solver: Finite difference implementation with adaptive time stepping
- Flexible Boundary Conditions: Dirichlet, Neumann, and mixed boundary types
- Source Term Modeling: Point and distributed contaminant sources with temporal control
- Physical Processes: Advection, diffusion, and first-order decay mechanisms
- Multi-Algorithm Support: Random Forest, SVM, Gradient Boosting, Logistic Regression
- Feature Engineering: 16 comprehensive features (8 fundamental + 8 derived)
- Hyperparameter Optimization: Automated GridSearchCV with cross-validation
- Performance Metrics: Accuracy, Precision, Recall, F1-Score (macro and weighted)
- Concentration Field Plots: Static and animated concentration distributions
- Risk Level Mapping: Color-coded risk classification visualization
- Model Comparison: Performance dashboards and confusion matrices
- Temporal Evolution: Video generation for concentration dynamics
- Feature Importance: Analysis of predictive variable significance
- Three-Level Classification: Low, Medium, and High ecological risk categories
- Threshold-Based: Configurable concentration thresholds for risk determination
- Spatial-Temporal Analysis: Risk evolution over space and time
- Predictive Modeling: ML-based risk prediction for new scenarios
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8+ | 3.9+ |
| RAM | 8 GB | 16 GB+ |
| CPU | 4 cores | 8+ cores |
| Storage | 2 GB | 10 GB+ (for datasets) |
| OS | Windows/Linux/macOS | Linux (optimal performance) |
# Core scientific computing
numpy >= 1.21.0 # Numerical computations
pandas >= 1.3.0 # Data manipulation and analysis
scipy >= 1.7.0 # Scientific algorithms
# Machine learning
scikit-learn >= 1.0.0 # ML algorithms and metrics
joblib >= 1.1.0 # Model serialization
# Visualization
matplotlib >= 3.5.0 # Scientific plotting
seaborn >= 0.11.0 # Statistical visualization
# Configuration and utilities
PyYAML >= 6.0 # Configuration file parsing
tqdm >= 4.62.0 # Progress bars# Method 1: Direct installation
git clone https://github.com/gstinoco/contaminant-transport-ml.git
cd contaminant-transport-ml
pip install -r requirements.txt
# Method 2: Virtual environment (recommended)
python -m venv contaminant_env
source contaminant_env/bin/activate # On Windows: contaminant_env\Scripts\activate
pip install -r requirements.txt
# Method 3: Conda environment
conda create -n contaminant_ml python=3.9
conda activate contaminant_ml
pip install -r requirements.txt# Test installation
python -c "import numpy, pandas, sklearn; print(':white_check_mark: Installation successful!')"
# Run quick demo
python main.py --help# Run full analysis pipeline
python main.py --complete# Run only simulations
python main.py --simulate
# Preprocess data for ML
python main.py --preprocess
# Train ML models
python main.py --train
# Generate visualizations
python main.py --visualize
# Create temporal videos
python main.py --create-videos# Use fundamental features (8 features)
python main.py --complete--fundamental-features
# Use complete features (16 features) - default
python main.py --complete# Complete workflow with fundamental features
python main.py --complete --fundamental-features
# Simulate specific scenario
python main.py --simulate --scenario baseline_left_center
# Simulate all scenarios
python main.py --simulate --all-scenarios
# Generate videos with custom frame rate
python main.py --create-videos --video-fps 15
# Generate specific number of snapshots
python main.py --create-snapshots --snapshots-count 6
# Train models without saving intermediate results
python main.py --train --no-save# Baseline scenarios (standard conditions)
python main.py --simulate --scenario baseline_left_center
python main.py --simulate --scenario baseline_upper
python main.py --simulate --scenario baseline_lower
# High flow scenarios
python main.py --simulate --scenario high_flow_left_center
python main.py --simulate --scenario very_high_flow_upper
# High discharge scenarios
python main.py --simulate --scenario high_discharge_left_center
python main.py --simulate --scenario very_high_discharge_lower
# Low conditions scenarios
python main.py --simulate --scenario low_flow_left_center
python main.py --simulate --scenario very_low_discharge_upperOur framework generates compelling visualizations that demonstrate contaminant transport dynamics and risk evolution in real-time.
Contaminant Transport Simulation - Baseline Scenario
Visualization of contaminant concentration spreading over time
Generated with: python main.py --create-videos
Ecological Risk Assessment - Dynamic Risk Zones
Real-time visualization of Low (🟢), Medium (🟡), and High (🔴) risk areas
Generated with: python main.py --create-videos
Concentration field evolution at key time points
| Time | Concentration Field |
|---|---|
| t = 0s | ![]() |
| t = 100s | ![]() |
| t = 200s | ![]() |
| t = 300s | ![]() |
# Generate videos for all scenarios (concentration and risk evolution)
python main.py --create-videos
# Generate videos with custom frame rate
python main.py --create-videos --video-fps 15
# First simulate specific scenario, then create videos
python main.py --simulate --scenario baseline_left_center
python main.py --create-videosfrom src.visualization.visualization import ContaminantVisualizer
# Initialize visualizer
viz = ContaminantVisualizer('config/parameters.yaml')
# Create concentration field plot
viz.plot_concentration_field(concentration, x, y, time=120.0,
scenario_name="baseline_left_center")
# Generate temporal snapshots
snapshot_paths = viz.create_snapshots(concentration_history, x, y,
time_points, num_snapshots=4)
# Create simulation video (GIF)
viz.create_simulation_video(concentration_history, x, y, time_points,
scenario_name="baseline_left_center", fps=10)
# Create risk evolution video
viz.create_risk_evolution_video(concentration_history, x, y, time_points,
scenario_name="baseline_left_center", fps=10)from src.ml_model.risk_classifier import RiskClassifier
# Initialize classifier
classifier = RiskClassifier('config/parameters.yaml')
# Generate performance dashboard
classifier.plot_all_model_metrics(model_results,
save_path="data/visualizations/ml_dashboard.png")
# Create detailed confusion matrix
classifier.plot_confusion_matrix_detailed(conf_matrix,
save_path="data/visualizations/confusion_matrix.png")
# Feature importance analysis
classifier.plot_feature_importance(feature_names, importance_scores,
save_path="data/visualizations/feature_importance.png")| Feature | Description | Output Format |
|---|---|---|
| Concentration Evolution 🌊 | Time-lapse of contaminant spreading | GIF, MP4 |
| Risk Zone Mapping |
Dynamic ecological risk assessment | GIF, MP4 |
| Temporal Snapshots 📷 | Key moments in simulation | PNG |
| ML Performance Dashboard 📈 | Comprehensive model comparison | PNG |
| Confusion Matrix Analysis 🎯 | Detailed classification metrics | PNG |
| Feature Importance 🔬 | ML feature ranking visualization | PNG |
# In config/parameters.yaml
visualization:
figure_size: [12, 8] # Figure dimensions
dpi: 300 # Image resolution
colormap: 'viridis' # Color scheme
contour_levels: 20 # Contour detail level
animation:
fps: 10 # Video frame rate
max_frames: 100 # Maximum frames for optimization
dpi_video: 150 # Video resolution📦 contaminant-transport-ml/
├── :snake: main.py # Main execution script and workflow orchestrator
│ ├── load_configuration() # YAML parameter loading
│ ├── run_simulations() # Numerical model execution
│ ├── train_models() # ML pipeline management
│ └── generate_visualizations() # Comprehensive plotting
│
├── :gear: requirements.txt # Python dependencies specification
│
├── :file_folder: config/ # Configuration management
│ └── parameters.yaml # Simulation and ML parameters
│
├── :building_construction: src/ # Core source code modules
│ ├── :ocean: numerical_model/
│ │ └── advection_diffusion.py # 2D transport simulation engine
│ │ ├── AdvectionDiffusionModel # Main solver class
│ │ ├── solve_transport() # Finite difference solver
│ │ └── apply_boundary_conditions() # Boundary condition handler
│ │
│ ├── :robot: ml_model/
│ │ ├── data_preprocessing.py # Data preparation and feature engineering
│ │ │ ├── DataPreprocessor # Feature extraction class
│ │ │ ├── extract_features() # Spatial-temporal feature extraction
│ │ │ └── prepare_datasets() # Train/test split management
│ │ │
│ │ └── risk_classifier.py # ML risk classification models
│ │ ├── RiskClassifier # Multi-algorithm classifier
│ │ ├── train_models() # Model training pipeline
│ │ └── evaluate_performance() # Metrics and validation
│ │
│ └── :art: visualization/
│ └── visualization.py # Plotting and animation tools
│ ├── ContaminantVisualizer # Main visualization class
│ ├── plot_model_comparison() # Performance dashboards
│ └── create_animations() # GIF generation
│
└── :file_cabinet: data/ # Generated datasets and results
├── processed/ # Preprocessed ML datasets
│ ├── X_train_complete.npy # Complete feature set (training)
│ ├── X_train_fundamental.npy # Fundamental features (training)
│ └── feature_names_*.txt # Feature documentation
│
├── results/ # Model outputs and metrics
│ ├── risk_classifier_model.pkl # Trained ML models
│ └── all_models_metrics_report.csv # Performance comparison
│
├── simulations/ # Numerical simulation results
│ ├── baseline_*/ # Standard flow conditions
│ ├── high_flow_*/ # High velocity scenarios
│ └── low_discharge_*/ # Low contamination scenarios
│
├── snapshots/ # Temporal concentration snapshots
├── videos/ # Animation files (GIF format)
├── visualizations/ # Static plots and figures
│ ├── confusion_matrix_detailed.png # Model performance analysis
│ ├── feature_importance.png # Feature ranking visualization
│ └── all_models_metrics_dashboard.png # Comprehensive metrics
└── article/
├── articulo_metodologia.tex # Research article
└── articulo_metodologia.pdf # Compiled article
from src.numerical_model.advection_diffusion import AdvectionDiffusionModel
# Load configuration and run simulation
config = load_config('config/parameters.yaml')
model = AdvectionDiffusionModel(config)
concentration_history = model.solve()from src.ml_model.risk_classifier import RiskClassifier
# Initialize and train classifier
classifier = RiskClassifier('config/parameters.yaml')
cv_scores = classifier.evaluate_models(X_train, y_train)
classifier.train_all_models(X_train, y_train)
results = classifier.evaluate_model(X_test, y_test)from src.visualization.visualization import ContaminantVisualizer
# Create visualizations
viz = ContaminantVisualizer('config/parameters.yaml')
viz.plot_concentration_field(concentration, x, y)
viz.create_simulation_video(concentration_history, x, y, times)The system is configured through config/parameters.yaml:
domain:
length_x: 200.0 # Domain length (m)
length_y: 10.0 # Domain width (m)
dx: 5.0 # Grid spacing x (m)
dy: 0.25 # Grid spacing y (m)
dt: 0.05 # Time step (s)
total_time: 300.0 # Simulation time (s)physics:
diffusion_coefficient: 0.1 # Diffusion (m²/s)
advection_velocity:
u: 0.5 # x-velocity (m/s)
v: 0.1 # y-velocity (m/s)
decay_rate: 0.001 # Decay rate (1/s)risk_thresholds:
low: 0.001 # Low risk threshold (mg/L)
medium: 0.01 # Medium risk threshold (mg/L)| Algorithm | Accuracy | Precision | Recall | F1-Score | Training Time |
|---|---|---|---|---|---|
| Gradient Boosting 🚀 | 0.9997 | 0.9992 | 0.9995 | 0.9994 | 8.7s |
| Random Forest 🌳 | 0.9968 | 0.9951 | 0.9893 | 0.9922 | 2.3s |
| SVM 🎯 | 0.9625 | 0.9358 | 0.9167 | 0.9256 | 15.2s |
| Logistic Regression 📈 | 0.8662 | 0.5779 | 0.6301 | 0.6029 | 0.8s |
| Algorithm | Accuracy | Precision | Recall | F1-Score | Training Time |
|---|---|---|---|---|---|
| Gradient Boosting 🚀 | 0.9893 | 0.9721 | 0.9774 | 0.9748 | 6.4s |
| Random Forest 🌳 | 0.9627 | 0.9447 | 0.8718 | 0.8999 | 1.8s |
| SVM 🎯 | 0.9394 | 0.9153 | 0.8439 | 0.8708 | 11.7s |
| Logistic Regression 📈 | 0.7435 | 0.4960 | 0.5407 | 0.5174 | 0.6s |
| Rank | Feature | Importance | Description |
|---|---|---|---|
| 1 | Maximum Concentration | 0.2847 | Peak contamination level in domain |
| 2 | Mean Concentration | 0.1923 | Average contamination across domain |
| 3 | Concentration Variance | 0.1456 | Spatial variability measure |
| 4 | Affected Area Ratio | 0.1234 | Proportion of contaminated region |
| 5 | Concentration Gradient | 0.0987 | Spatial concentration change rate |
| 6 | Distance to Source | 0.0654 | Proximity to contamination origin |
| 7 | Flow Velocity | 0.0543 | Advective transport strength |
| 8 | Diffusion Coefficient | 0.0356 | Dispersive transport parameter |
| Output Type | Quantity | Description |
|---|---|---|
| Simulation Data 🌊 | 198,000 fields | 33 scenarios × 6,000 time steps |
| ML Datasets 🤖 | 2 versions | Complete (16 features) + Fundamental (8 features) |
| Trained Models 🧠 | 8 models | 4 algorithms × 2 feature sets |
| Visualizations 🎨 | 3 dashboards | Performance metrics, confusion matrices, feature importance |
| Animations 🎥 | 66 GIF files | Concentration + risk evolution for each scenario (33 × 2) |
| Snapshots 📷 | 264 images | Detailed concentration field visualizations |
| Reports 📋 | 2 CSV files | Comprehensive metrics comparison |
| Metric | Complete Features | Fundamental Features | Difference |
|---|---|---|---|
| Best Accuracy | 99.97% (Gradient Boosting) | 98.93% (Gradient Boosting) | +1.05% |
| Average F1-Score | 88.00% | 81.57% | +7.88% |
| Training Time | 6.75s avg | 5.12s avg | -24.1% faster |
| Model Size | 4.5 MB | 5.7 MB | +26.7% |
|
Dr. Gerardo Tinoco Guerrero 🇲🇽
|
Dr. Francisco Javier Domínguez Mota 🇲🇽
|
|
Dr. José Alberto Guzmán Torres 🇲🇽
|
Research Focus Areas:
|
|
Undergraduate Student - María Goretti Fraga López 👩🎓
|
M.Sc. Student - Christopher Nolan Magaña Barocio 👨💻
|
|
M.Sc. Student - Jorge Luis González Figueroa 👨💻
|
Research Contributions:
|
The contaminant transport process is governed by the 2D advection-diffusion equation:
Where:
C: Contaminant concentration (mg/L)u, v: Velocity components (m/s)D: Diffusion coefficient (m²/s)λ: Decay rate (1/s)S: Source term (mg/L/s)
- Spatial Discretization: Central finite differences (second-order accuracy)
- Time Integration: Explicit Euler scheme with stability control
- Boundary Conditions: Mixed Dirichlet/Neumann with environmental relevance
- Stability: CFL condition enforcement for numerical stability
| Stage | Process | Description |
|---|---|---|
| Feature Extraction | Spatial-temporal analysis | Extract statistical and geometric features from concentration fields |
| Data Preprocessing | Normalization & scaling | StandardScaler for feature standardization |
| Model Training | Multi-algorithm approach | Random Forest, SVM, Gradient Boosting, Logistic Regression |
| Hyperparameter Tuning | Grid search optimization | Cross-validation for optimal parameter selection |
| Performance Evaluation | Comprehensive metrics | Accuracy, precision, recall, F1-score, confusion matrices |
source_x, source_y: Source coordinatesx_position, y_position: Observation point coordinatesvelocity_u, velocity_v: Flow field componentssource_strength: Emission intensitytime_normalized: Temporal coordinate
distance_to_source: Euclidean distance to sourcedx_from_source, dy_from_source: Displacement componentstravel_time_x, travel_time_y: Advective travel timespeclet_x, peclet_y: Péclet numbers (advection/diffusion ratio)diffusion_coeff: Local diffusion coefficient
| Risk Level | Concentration Range | Ecological Impact | Management Action |
|---|---|---|---|
| Low 🟢 | < 0.001 mg/L | Minimal ecosystem disruption | Routine monitoring |
| Medium 🟡 | 0.001 - 0.01 mg/L | Moderate impact on sensitive species | Enhanced surveillance |
| High 🔴 | > 0.01 mg/L | Significant ecological damage | Immediate intervention |
If you use this software in your research, please cite:
@software{contaminant_transport_ml_2024,
title={Contaminant Transport Modeling with Machine Learning:
Advanced Computational Framework for Environmental Risk Assessment},
author={Tinoco-Guerrero, Gerardo and
Domínguez-Mota, Francisco Javier and
Guzmán-Torres, José Alberto},
year={2025},
institution={Universidad Michoacana de San Nicolás de Hidalgo},
organization={SIIIA MATH: Soluciones en ingeniería},
url={https://github.com/gstinoco/contaminant-transport-ml},
note={Advanced computational framework for contaminant transport and ecological risk assessment}
}Primary Funding:
- 🏫 Universidad Michoacana de San Nicolás de Hidalgo (UMSNH)
- 🏫 Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI)
- 🏢 SIIIA MATH: Soluciones en ingeniería
This project is licensed under the MIT License - see the full license text below:
MIT License
Copyright (c) 2025 Gerardo Tinoco-Guerrero, Francisco Javier Domínguez-Mota, José Alberto Guzmán-Torres
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Academic Use: This software is developed for research and educational purposes. Commercial use requires explicit permission from the authors.
Primary Contact:
- Dr. Gerardo Tinoco Guerrero
- 📧 gerardo.tinoco@umich.mx
- 🏢 SIIIA MATH: Soluciones en ingeniería
- 🏛️ Universidad Michoacana de San Nicolás de Hidalgo
For technical questions and issues:
- GitHub Issues: Create an issue for bug reports or feature requests
- Email Support: Contact the research team directly for complex technical inquiries
- Academic Collaboration: Reach out for research partnerships and joint projects
- SIIIA MATH: Soluciones en ingeniería
- UMSNH: Universidad Michoacana de San Nicolás de Hidalgo
⭐ If this project helps your research, please consider giving it a star! ⭐
Advancing environmental science through computational innovation








