Skip to content

Advanced computational framework combining numerical modeling of contaminant transport with machine learning for environmental risk assessment. Features 2D advection-diffusion simulation, multi-algorithm ML classification, and comprehensive visualization tools.

License

Notifications You must be signed in to change notification settings

gstinoco/Contaminant-Transport-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contaminant Transport Modeling with Machine Learning 🌊

GitHub Python NumPy Scikit-learn Matplotlib License: MIT

Advanced Computational Framework for Contaminant Transport and Ecological Risk Assessment

Integrating numerical modeling with machine learning for environmental contamination analysis

🔗 Quick Links

:rocket: Quick Start :bar_chart: Results :movie_camera: Visualizations :busts_in_silhouette: Team


📋 Table of Contents


🌟 Overview

This repository presents a state-of-the-art computational framework that combines numerical modeling of contaminant transport with machine learning techniques to predict ecological risk levels in aquatic environments. The system integrates a 2D advection-diffusion model with advanced ML algorithms to provide comprehensive risk assessment capabilities for environmental contamination scenarios.

⚙️ Key Capabilities

  • 🌊 Numerical Simulation: 2D advection-diffusion equation solver with customizable boundary conditions
  • 🤖 Machine Learning: Multi-algorithm risk classification (Random Forest, SVM, Gradient Boosting, Logistic Regression)
  • ⚠️ Risk Assessment: Automated ecological risk level classification (Low, Medium, High)
  • 🎨 Visualization: Comprehensive plotting, animations, and interactive dashboards
  • 🔬 Scenario Analysis: 33 predefined hydrodynamic scenarios for comprehensive testing

🔬 Applications

Field Application Use Case
Environmental Engineering 🌿 Pollution Assessment Contaminant plume modeling, risk zone identification
Aquatic Ecology 🐟 Ecosystem Protection Species impact assessment, habitat risk evaluation
Water Resources 💧 Management Planning Contamination source control, remediation strategies
Regulatory Compliance 📋 Environmental Monitoring Risk threshold validation, compliance reporting
Research & Development 🔬 Scientific Studies Model validation, methodology development

✨ Features

🧮 Numerical Modeling

  • 2D Advection-Diffusion Solver: Finite difference implementation with adaptive time stepping
  • Flexible Boundary Conditions: Dirichlet, Neumann, and mixed boundary types
  • Source Term Modeling: Point and distributed contaminant sources with temporal control
  • Physical Processes: Advection, diffusion, and first-order decay mechanisms

🤖 Machine Learning

  • Multi-Algorithm Support: Random Forest, SVM, Gradient Boosting, Logistic Regression
  • Feature Engineering: 16 comprehensive features (8 fundamental + 8 derived)
  • Hyperparameter Optimization: Automated GridSearchCV with cross-validation
  • Performance Metrics: Accuracy, Precision, Recall, F1-Score (macro and weighted)

📊 Visualization & Analysis

  • Concentration Field Plots: Static and animated concentration distributions
  • Risk Level Mapping: Color-coded risk classification visualization
  • Model Comparison: Performance dashboards and confusion matrices
  • Temporal Evolution: Video generation for concentration dynamics
  • Feature Importance: Analysis of predictive variable significance

🎯 Risk Assessment

  • Three-Level Classification: Low, Medium, and High ecological risk categories
  • Threshold-Based: Configurable concentration thresholds for risk determination
  • Spatial-Temporal Analysis: Risk evolution over space and time
  • Predictive Modeling: ML-based risk prediction for new scenarios

📦 Installation & Setup

💻 System Requirements

Component Minimum Recommended
Python 3.8+ 3.9+
RAM 8 GB 16 GB+
CPU 4 cores 8+ cores
Storage 2 GB 10 GB+ (for datasets)
OS Windows/Linux/macOS Linux (optimal performance)

📦 Dependencies

# Core scientific computing
numpy >= 1.21.0          # Numerical computations
pandas >= 1.3.0          # Data manipulation and analysis
scipy >= 1.7.0           # Scientific algorithms

# Machine learning
scikit-learn >= 1.0.0    # ML algorithms and metrics
joblib >= 1.1.0          # Model serialization

# Visualization
matplotlib >= 3.5.0      # Scientific plotting
seaborn >= 0.11.0        # Statistical visualization

# Configuration and utilities
PyYAML >= 6.0            # Configuration file parsing
tqdm >= 4.62.0           # Progress bars

Quick Installation

# Method 1: Direct installation
git clone https://github.com/gstinoco/contaminant-transport-ml.git
cd contaminant-transport-ml
pip install -r requirements.txt

# Method 2: Virtual environment (recommended)
python -m venv contaminant_env
source contaminant_env/bin/activate  # On Windows: contaminant_env\Scripts\activate
pip install -r requirements.txt

# Method 3: Conda environment
conda create -n contaminant_ml python=3.9
conda activate contaminant_ml
pip install -r requirements.txt

✅ Installation Verification

# Test installation
python -c "import numpy, pandas, sklearn; print(':white_check_mark: Installation successful!')"

# Run quick demo
python main.py --help

🚀 Quick Start

⚡ Complete Workflow (Recommended)

# Run full analysis pipeline
python main.py --complete

🏗️ Individual Components

# Run only simulations
python main.py --simulate

# Preprocess data for ML
python main.py --preprocess

# Train ML models
python main.py --train

# Generate visualizations
python main.py --visualize

# Create temporal videos
python main.py --create-videos

⚙️ Feature Set Options

# Use fundamental features (8 features)
python main.py --complete--fundamental-features

# Use complete features (16 features) - default
python main.py --complete

🔧 Advanced Usage Examples

# Complete workflow with fundamental features
python main.py --complete --fundamental-features

# Simulate specific scenario
python main.py --simulate --scenario baseline_left_center

# Simulate all scenarios
python main.py --simulate --all-scenarios

# Generate videos with custom frame rate
python main.py --create-videos --video-fps 15

# Generate specific number of snapshots
python main.py --create-snapshots --snapshots-count 6

# Train models without saving intermediate results
python main.py --train --no-save

🌊 Available Scenarios

# Baseline scenarios (standard conditions)
python main.py --simulate --scenario baseline_left_center
python main.py --simulate --scenario baseline_upper
python main.py --simulate --scenario baseline_lower

# High flow scenarios
python main.py --simulate --scenario high_flow_left_center
python main.py --simulate --scenario very_high_flow_upper

# High discharge scenarios  
python main.py --simulate --scenario high_discharge_left_center
python main.py --simulate --scenario very_high_discharge_lower

# Low conditions scenarios
python main.py --simulate --scenario low_flow_left_center
python main.py --simulate --scenario very_low_discharge_upper

🎥 Visualizations & Demos

🌊 Simulation Videos

Our framework generates compelling visualizations that demonstrate contaminant transport dynamics and risk evolution in real-time.

🌊 Concentration Field Evolution

Contaminant Transport Simulation - Baseline Scenario

Visualization of contaminant concentration spreading over time

Concentration Evolution

Generated with: python main.py --create-videos

⚠️ Risk Level Evolution

Ecological Risk Assessment - Dynamic Risk Zones

Real-time visualization of Low (🟢), Medium (🟡), and High (🔴) risk areas

Risk Evolution

Generated with: python main.py --create-videos

📈 Interactive Dashboards

Machine Learning Performance Dashboard

ML Dashboard

Detailed Confusion Matrix Analysis

Confusion Matrix

Feature Importance Ranking

Feature Importance

📷 Temporal Snapshots

Concentration field evolution at key time points

Time Concentration Field
t = 0s Snapshot 1
t = 100s Snapshot 2
t = 200s Snapshot 3
t = 300s Snapshot 4

⚙️ How to Generate Visualizations

Create Simulation Videos

# Generate videos for all scenarios (concentration and risk evolution)
python main.py --create-videos

# Generate videos with custom frame rate
python main.py --create-videos --video-fps 15

# First simulate specific scenario, then create videos
python main.py --simulate --scenario baseline_left_center
python main.py --create-videos

Create Static Visualizations

from src.visualization.visualization import ContaminantVisualizer

# Initialize visualizer
viz = ContaminantVisualizer('config/parameters.yaml')

# Create concentration field plot
viz.plot_concentration_field(concentration, x, y, time=120.0, 
                           scenario_name="baseline_left_center")

# Generate temporal snapshots
snapshot_paths = viz.create_snapshots(concentration_history, x, y, 
                                    time_points, num_snapshots=4)

# Create simulation video (GIF)
viz.create_simulation_video(concentration_history, x, y, time_points,
                          scenario_name="baseline_left_center", fps=10)

# Create risk evolution video
viz.create_risk_evolution_video(concentration_history, x, y, time_points,
                              scenario_name="baseline_left_center", fps=10)

Machine Learning Visualizations

from src.ml_model.risk_classifier import RiskClassifier

# Initialize classifier
classifier = RiskClassifier('config/parameters.yaml')

# Generate performance dashboard
classifier.plot_all_model_metrics(model_results, 
                                 save_path="data/visualizations/ml_dashboard.png")

# Create detailed confusion matrix
classifier.plot_confusion_matrix_detailed(conf_matrix,
                                         save_path="data/visualizations/confusion_matrix.png")

# Feature importance analysis
classifier.plot_feature_importance(feature_names, importance_scores,
                                  save_path="data/visualizations/feature_importance.png")

💡 Visualization Features

Feature Description Output Format
Concentration Evolution 🌊 Time-lapse of contaminant spreading GIF, MP4
Risk Zone Mapping ⚠️ Dynamic ecological risk assessment GIF, MP4
Temporal Snapshots 📷 Key moments in simulation PNG
ML Performance Dashboard 📈 Comprehensive model comparison PNG
Confusion Matrix Analysis 🎯 Detailed classification metrics PNG
Feature Importance 🔬 ML feature ranking visualization PNG

🎨 Customization Options

# In config/parameters.yaml
visualization:
  figure_size: [12, 8]          # Figure dimensions
  dpi: 300                      # Image resolution
  colormap: 'viridis'           # Color scheme
  contour_levels: 20            # Contour detail level
  
  animation:
    fps: 10                     # Video frame rate
    max_frames: 100             # Maximum frames for optimization
    dpi_video: 150              # Video resolution

📂 Project Architecture

Core Components

📦 contaminant-transport-ml/
├── :snake: main.py                          # Main execution script and workflow orchestrator
│   ├── load_configuration()                # YAML parameter loading
│   ├── run_simulations()                   # Numerical model execution
│   ├── train_models()                      # ML pipeline management
│   └── generate_visualizations()           # Comprehensive plotting
│
├── :gear: requirements.txt                 # Python dependencies specification
│
├── :file_folder: config/                   # Configuration management
│   └── parameters.yaml                     # Simulation and ML parameters
│
├── :building_construction: src/             # Core source code modules
│   ├── :ocean: numerical_model/
│   │   └── advection_diffusion.py          # 2D transport simulation engine
│   │       ├── AdvectionDiffusionModel      # Main solver class
│   │       ├── solve_transport()            # Finite difference solver
│   │       └── apply_boundary_conditions()  # Boundary condition handler
│   │
│   ├── :robot: ml_model/
│   │   ├── data_preprocessing.py           # Data preparation and feature engineering
│   │   │   ├── DataPreprocessor            # Feature extraction class
│   │   │   ├── extract_features()          # Spatial-temporal feature extraction
│   │   │   └── prepare_datasets()          # Train/test split management
│   │   │
│   │   └── risk_classifier.py              # ML risk classification models
│   │       ├── RiskClassifier              # Multi-algorithm classifier
│   │       ├── train_models()              # Model training pipeline
│   │       └── evaluate_performance()      # Metrics and validation
│   │
│   └── :art: visualization/
│       └── visualization.py                # Plotting and animation tools
│           ├── ContaminantVisualizer       # Main visualization class
│           ├── plot_model_comparison()     # Performance dashboards
│           └── create_animations()         # GIF generation
│
└── :file_cabinet: data/                    # Generated datasets and results
    ├── processed/                          # Preprocessed ML datasets
    │   ├── X_train_complete.npy           # Complete feature set (training)
    │   ├── X_train_fundamental.npy        # Fundamental features (training)
    │   └── feature_names_*.txt            # Feature documentation
    │
    ├── results/                            # Model outputs and metrics
    │   ├── risk_classifier_model.pkl      # Trained ML models
    │   └── all_models_metrics_report.csv  # Performance comparison
    │
    ├── simulations/                        # Numerical simulation results
    │   ├── baseline_*/                     # Standard flow conditions
    │   ├── high_flow_*/                    # High velocity scenarios
    │   └── low_discharge_*/                # Low contamination scenarios
    │
    ├── snapshots/                          # Temporal concentration snapshots
    ├── videos/                             # Animation files (GIF format)
    ├── visualizations/                     # Static plots and figures
    │   ├── confusion_matrix_detailed.png   # Model performance analysis
    │   ├── feature_importance.png          # Feature ranking visualization
    │   └── all_models_metrics_dashboard.png # Comprehensive metrics
    └── article/
        ├── articulo_metodologia.tex        # Research article
        └── articulo_metodologia.pdf        # Compiled article

💡 Programming Examples

Basic Simulation

from src.numerical_model.advection_diffusion import AdvectionDiffusionModel

# Load configuration and run simulation
config = load_config('config/parameters.yaml')
model = AdvectionDiffusionModel(config)
concentration_history = model.solve()

Machine Learning Analysis

from src.ml_model.risk_classifier import RiskClassifier

# Initialize and train classifier
classifier = RiskClassifier('config/parameters.yaml')
cv_scores = classifier.evaluate_models(X_train, y_train)
classifier.train_all_models(X_train, y_train)
results = classifier.evaluate_model(X_test, y_test)

Visualization

from src.visualization.visualization import ContaminantVisualizer

# Create visualizations
viz = ContaminantVisualizer('config/parameters.yaml')
viz.plot_concentration_field(concentration, x, y)
viz.create_simulation_video(concentration_history, x, y, times)

⚙️ Configuration

The system is configured through config/parameters.yaml:

Domain Parameters

domain:
  length_x: 200.0      # Domain length (m)
  length_y: 10.0       # Domain width (m)
  dx: 5.0              # Grid spacing x (m)
  dy: 0.25             # Grid spacing y (m)
  dt: 0.05             # Time step (s)
  total_time: 300.0    # Simulation time (s)

Physical Parameters

physics:
  diffusion_coefficient: 0.1    # Diffusion (m²/s)
  advection_velocity:
    u: 0.5                      # x-velocity (m/s)
    v: 0.1                      # y-velocity (m/s)
  decay_rate: 0.001             # Decay rate (1/s)

Risk Thresholds

risk_thresholds:
  low: 0.001           # Low risk threshold (mg/L)
  medium: 0.01         # Medium risk threshold (mg/L)

📈 Results and Performance

🏆 Model Performance (Complete Features)

Algorithm Accuracy Precision Recall F1-Score Training Time
Gradient Boosting 🚀 0.9997 0.9992 0.9995 0.9994 8.7s
Random Forest 🌳 0.9968 0.9951 0.9893 0.9922 2.3s
SVM 🎯 0.9625 0.9358 0.9167 0.9256 15.2s
Logistic Regression 📈 0.8662 0.5779 0.6301 0.6029 0.8s

⚙️ Model Performance (Fundamental Features)

Algorithm Accuracy Precision Recall F1-Score Training Time
Gradient Boosting 🚀 0.9893 0.9721 0.9774 0.9748 6.4s
Random Forest 🌳 0.9627 0.9447 0.8718 0.8999 1.8s
SVM 🎯 0.9394 0.9153 0.8439 0.8708 11.7s
Logistic Regression 📈 0.7435 0.4960 0.5407 0.5174 0.6s

🔬 Feature Importance Analysis

Rank Feature Importance Description
1 Maximum Concentration 0.2847 Peak contamination level in domain
2 Mean Concentration 0.1923 Average contamination across domain
3 Concentration Variance 0.1456 Spatial variability measure
4 Affected Area Ratio 0.1234 Proportion of contaminated region
5 Concentration Gradient 0.0987 Spatial concentration change rate
6 Distance to Source 0.0654 Proximity to contamination origin
7 Flow Velocity 0.0543 Advective transport strength
8 Diffusion Coefficient 0.0356 Dispersive transport parameter

📊 Generated Outputs

Output Type Quantity Description
Simulation Data 🌊 198,000 fields 33 scenarios × 6,000 time steps
ML Datasets 🤖 2 versions Complete (16 features) + Fundamental (8 features)
Trained Models 🧠 8 models 4 algorithms × 2 feature sets
Visualizations 🎨 3 dashboards Performance metrics, confusion matrices, feature importance
Animations 🎥 66 GIF files Concentration + risk evolution for each scenario (33 × 2)
Snapshots 📷 264 images Detailed concentration field visualizations
Reports 📋 2 CSV files Comprehensive metrics comparison

🎯 Performance Benchmarks

Metric Complete Features Fundamental Features Difference
Best Accuracy 99.97% (Gradient Boosting) 98.93% (Gradient Boosting) +1.05%
Average F1-Score 88.00% 81.57% +7.88%
Training Time 6.75s avg 5.12s avg -24.1% faster
Model Size 4.5 MB 5.7 MB +26.7%

🧑‍🔬 Research Team

👨‍🔬 Principal Investigators

Dr. Gerardo Tinoco Guerrero 🇲🇽

Dr. Francisco Javier Domínguez Mota 🇲🇽

Dr. José Alberto Guzmán Torres 🇲🇽

Research Focus Areas:

  • 🌊 Environmental contamination modeling
  • 🤖 Machine learning for risk assessment
  • 🧮 Numerical methods for PDEs
  • 📈 Data-driven environmental management

🎓 Graduate Students

Undergraduate Student - María Goretti Fraga López 👩‍🎓

  • 🏛️ Universidad Michoacana de San Nicolás de Hidalgo
  • 🔬 Numerical simulation of contaminant transport
  • 📧 1702174b@umich.mx

M.Sc. Student - Christopher Nolan Magaña Barocio 👨‍💻

  • 🏛️ Universidad Michoacana de San Nicolás de Hidalgo
  • 🔬 AI applications in risk assessment
  • 📧 1339846k@umich.mx

M.Sc. Student - Jorge Luis González Figueroa 👨‍💻

  • 🏛️ Universidad Michoacana de San Nicolás de Hidalgo
  • 🔬 AI applications in biological dynamics
  • 📧 1718717h@umich.mx

Research Contributions:

  • ⚙️ Numerical model development
  • 🤖 ML algorithm implementation
  • 🎨 Visualization and analysis tools
  • 🔬 Scientific validation and testing

📚 Scientific Background

🧮 Mathematical Foundation

The contaminant transport process is governed by the 2D advection-diffusion equation:

$$ \frac{\partial C}{\partial t} + u\frac{\partial C}{\partial x} + v \frac{\partial C}{\partial y} = D \left(\frac{\partial^2 C}{\partial x^2} + \frac{\partial^2 C}{\partial y^2}\right) - \lambda C + S $$

Where:

  • C: Contaminant concentration (mg/L)
  • u, v: Velocity components (m/s)
  • D: Diffusion coefficient (m²/s)
  • λ: Decay rate (1/s)
  • S: Source term (mg/L/s)

⚙️ Numerical Implementation

  • Spatial Discretization: Central finite differences (second-order accuracy)
  • Time Integration: Explicit Euler scheme with stability control
  • Boundary Conditions: Mixed Dirichlet/Neumann with environmental relevance
  • Stability: CFL condition enforcement for numerical stability

🤖 Machine Learning Pipeline

Stage Process Description
Feature Extraction Spatial-temporal analysis Extract statistical and geometric features from concentration fields
Data Preprocessing Normalization & scaling StandardScaler for feature standardization
Model Training Multi-algorithm approach Random Forest, SVM, Gradient Boosting, Logistic Regression
Hyperparameter Tuning Grid search optimization Cross-validation for optimal parameter selection
Performance Evaluation Comprehensive metrics Accuracy, precision, recall, F1-score, confusion matrices

🔬 Feature Engineering

Fundamental Features (8)

  • source_x, source_y: Source coordinates
  • x_position, y_position: Observation point coordinates
  • velocity_u, velocity_v: Flow field components
  • source_strength: Emission intensity
  • time_normalized: Temporal coordinate

Derived Features (8)

  • distance_to_source: Euclidean distance to source
  • dx_from_source, dy_from_source: Displacement components
  • travel_time_x, travel_time_y: Advective travel times
  • peclet_x, peclet_y: Péclet numbers (advection/diffusion ratio)
  • diffusion_coeff: Local diffusion coefficient

⚠️ Risk Assessment Framework

Risk Level Concentration Range Ecological Impact Management Action
Low 🟢 < 0.001 mg/L Minimal ecosystem disruption Routine monitoring
Medium 🟡 0.001 - 0.01 mg/L Moderate impact on sensitive species Enhanced surveillance
High 🔴 > 0.01 mg/L Significant ecological damage Immediate intervention

📝 Citation

If you use this software in your research, please cite:

@software{contaminant_transport_ml_2024,
  title={Contaminant Transport Modeling with Machine Learning: 
         Advanced Computational Framework for Environmental Risk Assessment},
  author={Tinoco-Guerrero, Gerardo and 
          Domínguez-Mota, Francisco Javier and 
          Guzmán-Torres, José Alberto},
  year={2025},
  institution={Universidad Michoacana de San Nicolás de Hidalgo},
  organization={SIIIA MATH: Soluciones en ingeniería},
  url={https://github.com/gstinoco/contaminant-transport-ml},
  note={Advanced computational framework for contaminant transport and ecological risk assessment}
}

🏛️ Institutional Support

Primary Funding:

  • 🏫 Universidad Michoacana de San Nicolás de Hidalgo (UMSNH)
  • 🏫 Secretaría de Ciencia, Humanidades, Tecnología e Innovación (SECIHTI)
  • 🏢 SIIIA MATH: Soluciones en ingeniería

📑 License

This project is licensed under the MIT License - see the full license text below:

MIT License

Copyright (c) 2025 Gerardo Tinoco-Guerrero, Francisco Javier Domínguez-Mota, José Alberto Guzmán-Torres

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Academic Use: This software is developed for research and educational purposes. Commercial use requires explicit permission from the authors.


📧 Contact & Support

👥 Research Group Contact

Primary Contact:

  • Dr. Gerardo Tinoco Guerrero
    • 📧 gerardo.tinoco@umich.mx
    • 🏢 SIIIA MATH: Soluciones en ingeniería
    • 🏛️ Universidad Michoacana de San Nicolás de Hidalgo

❓ Technical Support

For technical questions and issues:

  1. GitHub Issues: Create an issue for bug reports or feature requests
  2. Email Support: Contact the research team directly for complex technical inquiries
  3. Academic Collaboration: Reach out for research partnerships and joint projects

🌐 Institutional Affiliations


⭐ If this project helps your research, please consider giving it a star! ⭐

Advancing environmental science through computational innovation

About

Advanced computational framework combining numerical modeling of contaminant transport with machine learning for environmental risk assessment. Features 2D advection-diffusion simulation, multi-algorithm ML classification, and comprehensive visualization tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages