⚡ PRISM

Predictive Reliability & Intelligence for Smart Manufacturing

Track A — Predictive Modelling Specialization | National AI/ML Hackathon by AVEVA

Predict Every Batch. Prevent Every Failure.

Team Knights — Aditya Rana · Aryan Pratap Singh · Sandeep Kumar

"One system. Four modules. Seven predictions. Zero guesswork."

🔍 Overview

Modern manufacturing plants generate massive amounts of process data every batch — yet most of it goes unanalyzed. Energy spikes, quality drops, and equipment failures continue to surprise operators, costing time, money, and carbon.

PRISM is an AI-driven manufacturing intelligence system built for pharmaceutical tablet manufacturing that predicts batch quality, yield, and energy consumption before production completes — while continuously monitoring equipment health through power and vibration patterns to catch failures before they happen.

Given process parameters (compression force, machine speed, drying temperature, etc.), PRISM:

Predicts all quality, yield, and performance targets before the batch completes
Monitors energy consumption patterns phase-by-phase to detect anomalies during the batch
Explains every prediction using SHAP values so operators know why, not just what
Tracks carbon footprint with adaptive targets aligned to regulatory requirements

Domain: Pharmaceutical tablet manufacturing
Dataset: 60 production batches (T001–T060) + 1 minute-by-minute sensor log
Target Accuracy: R² ≥ 0.90 across all primary quality targets (achieved: ≥ 0.95 on XGBoost) Hackathon: National AI/ML Hackathon by AVEVA — Team Knights

✨ Key Features

Feature	Description
🎯 Multi-Target Prediction	Simultaneously predicts Hardness, Friability, Dissolution Rate, Content Uniformity, Disintegration Time, Tablet Weight, and Energy (kWh)
⚡ Energy Pattern Analysis	Phase-wise power + vibration monitoring with Isolation Forest & LSTM Autoencoder anomaly detection
🔍 SHAP Explainability	Per-prediction and global feature importance — operators understand why, not just what
🌍 Carbon Footprint Tracker	CO₂e per batch using India CEA grid factor (0.716 kg/kWh) with adaptive target setting
🎛️ What-If Optimizer	Real-time slider-based parameter explorer — predictions update in < 100ms
📊 Batch Fingerprinting	Radar chart comparison of any two batches against the all-time best
📉 CUSUM Drift Detection	Detects gradual quality/energy degradation across batches over time
🧪 Composite Quality Score	Single 0–100 score blending all quality targets into one actionable metric
📈 Benchmark Dashboard	Full model performance report (R², MAE, RMSE, MAPE) with anomaly detector metrics

📁 Project Structure

manufacturing-intelligence/
│
├── 📄 README.md
├── 📄 SETUP.md                           ← Step-by-step environment setup guide
├── 📄 PIPELINE.md                        ← Plain-English pipeline explanation
├── 📄 BENCHMARK.md                       ← Model benchmark report with metrics
│
├── 📂 data/
│   ├── raw/
│   │   ├── _h_batch_process_data.xlsx    ← Sensor log (T001, 211 min × 11 cols)
│   │   └── _h_batch_production_data.xlsx ← Batch records (60 batches × 15 cols)
│   ├── processed/
│   │   ├── merged_dataset.csv            ← Final ML input (60 × ~22 features)
│   │   ├── phase_features.csv            ← Phase-aggregated sensor features
│   │   ├── batch_outcomes.csv            ← Cleaned outcomes + derived targets
│   │   └── carbon_history.csv            ← Per-batch CO₂e with adaptive targets
│   └── simulated/
│       └── simulated_sensors.csv         ← Physics-based sensor data T001–T060
│
├── 📂 notebooks/                         ← Core analysis (run in order)
│   ├── 01_EDA.ipynb                      ← Exploratory data analysis
│   ├── 02_feature_engineering.ipynb      ← Simulation + feature extraction
│   ├── 03_multitarget_models.ipynb       ← Model training + evaluation
│   ├── 04_anomaly_detection.ipynb        ← Isolation Forest + LSTM Autoencoder
│   └── 05_explainability.ipynb           ← SHAP analysis + plots
│
├── 📂 analysis/                          ← Deep-dive analysis notebooks
│   ├── 01_data_profiling.ipynb           ← Full stats, missing values, outliers
│   ├── 02_correlation_deep_dive.ipynb    ← Pearson/Spearman/VIF analysis
│   ├── 03_phase_energy_analysis.ipynb    ← Phase energy breakdown, CUSUM drift
│   ├── 04_model_comparison.ipynb         ← CV scores, residuals, timing benchmarks
│   └── 05_business_impact.ipynb          ← ROI, carbon savings, grid scenarios
│
├── 📂 src/
│   ├── config.py                         ← Constants, paths, thresholds
│   ├── preprocessing.py                  ← Load, validate, normalize
│   ├── simulate_sensors.py               ← Physics-based T002–T060 simulation
│   ├── feature_engineering.py            ← Phase aggregation, FFT, derived features
│   ├── multi_target_model.py             ← XGBoost + RF + MLP + stacking ensemble
│   ├── anomaly_detector.py               ← Isolation Forest + LSTM Autoencoder
│   ├── shap_explainer.py                 ← SHAP value computation + plots
│   ├── carbon_calculator.py              ← CO₂e calculation + adaptive targets
│   ├── run_pipeline.py                   ← Master training script
│   └── utils.py                          ← Shared helpers
│
├── 📂 models/                            ← Serialized trained models (after pipeline run)
│   ├── xgb_multitarget.pkl
│   ├── rf_multitarget.pkl
│   ├── mlp_model.keras
│   ├── stacking_meta.pkl
│   ├── isolation_forest.pkl
│   ├── lstm_autoencoder.keras
│   ├── scaler.pkl
│   ├── shap_values.pkl
│   ├── lstm_threshold.json
│   ├── lstm_norm_params.json
│   ├── evaluation_results.json           ← Per-target R², MAE, RMSE, MAPE
│   └── pipeline_summary.json             ← Full run summary
│
├── 📂 api/
│   ├── main.py                           ← FastAPI app + all route handlers
│   └── schemas.py                        ← Pydantic request/response models
│
├── 📂 dashboard/                         ← Next.js web dashboard (React + TypeScript)
│   ├── package.json
│   ├── next.config.ts
│   ├── tsconfig.json
│   └── src/
│       ├── app/
│       │   ├── layout.tsx                ← Root layout + navigation
│       │   ├── page.tsx                  ← Home redirect
│       │   ├── ClientLayout.tsx          ← Tab-based navigation shell
│       │   └── globals.css
│       ├── components/
│       │   ├── MetricCard.tsx            ← Reusable metric display card
│       │   ├── Slider.tsx                ← Parameter input slider
│       │   └── tabs/
│       │       ├── PredictionsTab.tsx    ← Tab 1: Quality & energy predictions
│       │       ├── EnergyTab.tsx         ← Tab 2: Phase charts + anomaly alerts
│       │       ├── ComparisonTab.tsx     ← Tab 3: Radar chart batch comparison
│       │       ├── CarbonTab.tsx         ← Tab 4: CO₂e trends + targets
│       │       ├── WhatIfTab.tsx         ← Tab 5: Real-time parameter explorer
│       │       └── BenchmarkTab.tsx      ← Tab 6: Full model benchmark report
│       └── lib/
│
├── 📂 tests/
│   ├── test_preprocessing.py
│   ├── test_models.py
│   └── test_api.py
│
├── 📂 docs/
│   ├── ARCHITECTURE.md                   ← System design + diagrams
│   ├── IMPLEMENTATION_PLAN.md            ← Build guide + code skeletons + timeline
│   └── PROJECT_DOCUMENTATION.md         ← Strategy, research rationale, business impact
│
└── 📄 requirements.txt

🛠️ Tech Stack

Layer	Technology	Purpose
Data Processing	`pandas`, `numpy`, `openpyxl`	Tabular data manipulation, Excel reading
ML — Gradient Boosting	`xgboost 2.0`	Primary multi-output prediction model
ML — Ensemble	`scikit-learn`	Random Forest, Ridge meta-learner, Isolation Forest, scalers
Deep Learning	`tensorflow 2.15 / keras`	LSTM Autoencoder for sequential anomaly detection, MLP regression
Hyperparameter Tuning	`optuna`	Bayesian search over XGBoost hyperparameters (50 trials)
Explainability	`shap`	TreeExplainer for XGBoost; beeswarm, waterfall, bar plots
API Backend	`fastapi`, `uvicorn`, `pydantic`	REST endpoints; auto Swagger docs; < 100ms inference
Web Dashboard	`Next.js 16`, `React 19`, `TypeScript`	6-tab interactive dashboard consuming the FastAPI backend
Charts	`recharts`, `plotly`	Time-series, radar, and bar charts
Serialization	`joblib`	Model persistence across sessions
Signal Processing	`scipy`	FFT analysis of vibration signals for motor health

📊 Dataset

`_h_batch_process_data.xlsx` — Minute-by-Minute Sensor Log

211 rows × 11 columns, 1 batch (T001), no missing values
Captures: Temperature, Pressure, Humidity, Motor Speed, Compression Force, Flow Rate, Power Consumption (kW), Vibration (mm/s)
Covers 8 sequential manufacturing phases over 211 minutes

Phase	Energy Used	Key Signal
Compression	38.69 kWh (50.4%) 🔴	Highest energy — #1 optimization target
Milling	9.00 kWh (11.7%)	Highest vibration (9.79 mm/s) ⚠️ — bearing wear indicator
Drying	10.09 kWh (13.1%)	Temperature + time sensitive
Others	18.96 kWh (24.7%)	Lower priority

`_h_batch_production_data.xlsx` — Batch Summary Records

60 rows × 15 columns (T001–T060), no missing values
8 input features: Granulation Time, Binder Amount, Drying Temp, Drying Time, Compression Force, Machine Speed, Lubricant Concentration, Moisture Content
6 output targets: Hardness, Friability, Dissolution Rate, Content Uniformity, Disintegration Time, Tablet Weight

Note: Feature-target correlations of 0.96–0.99 across most pairs — this dataset is highly structured and models reliably exceed R² = 0.93 on primary targets.

🧩 Modules

Module 1 — Multi-Target Prediction

Ensemble of XGBoost + Random Forest → Ridge stacking meta-learner. Predicts 7 targets simultaneously from 8 process parameters + phase features. Uses 5-fold cross-validation and Optuna hyperparameter search. MLP is trained but excluded from the stacking ensemble due to overfitting on the 60-sample dataset.

Module 2 — Energy Pattern Analyser

Two-layer anomaly detection:

Isolation Forest — fast batch-level screening (~5ms)
LSTM Autoencoder — deep sequential pattern analysis (~50ms)

Trained on physics-simulated sensor data for all 60 batches. Root cause attribution via domain-knowledge rule engine (bearing wear, motor overload, process drift).

Module 3 — SHAP Explainability

shap.TreeExplainer on XGBoost models provides exact Shapley values per feature per prediction. Beeswarm plots for global insights; waterfall plots for per-batch explanations.

Module 4 — Carbon Footprint Tracker

Converts predicted Energy_kWh to Carbon_kgCO2e using India CEA grid factor (0.716 kg/kWh). Adaptive target setting: dynamically adjusts goals based on best 10th-percentile operational performance vs regulatory floor. Supports India / EU / US / Renewable grid scenarios.

🚀 Quick Start

Full guide: See SETUP.md for step-by-step instructions.

1. Clone and Set Up Environment

git clone https://github.com/your-team/manufacturing-intelligence.git
cd manufacturing-intelligence

python -m venv venv
venv\Scripts\Activate.ps1      # Windows PowerShell
# source venv/bin/activate     # macOS / Linux

pip install -r requirements.txt

2. Add Data Files

data/raw/
├── _h_batch_process_data.xlsx
└── _h_batch_production_data.xlsx

3. Run the Full Training Pipeline

# Quick run (~2 min, no tuning):
python src/run_pipeline.py

# With Optuna XGBoost tuning (~7 min):
python src/run_pipeline.py --tune

4. Start the API

uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

API docs available at: http://localhost:8000/docs

5. Launch the Web Dashboard

cd dashboard
npm install       # first time only
npm run dev

Dashboard available at: http://localhost:3000

6. Run Tests

pytest tests/ -v

⚙️ Running the Pipeline

All-in-One

python src/run_pipeline.py

Step-by-Step (Manual)

# Step 1: Load & validate raw data
python -c "from src.preprocessing import load_data, validate_data; load_data()"

# Step 2: Sensor simulation for T002–T060
python src/simulate_sensors.py

# Step 3: Feature engineering
python src/feature_engineering.py

# Step 4: Train multi-target prediction models
python src/multi_target_model.py

# Step 5: Train anomaly detection models
python src/anomaly_detector.py

# Step 6: Compute SHAP values + generate plots
python src/shap_explainer.py

# Step 7: Build carbon footprint history
python src/carbon_calculator.py

Pipeline Output Files

File	Description
`models/xgb_multitarget.pkl`	XGBoost multi-output model
`models/rf_multitarget.pkl`	Random Forest model
`models/mlp_model.keras`	MLP neural network (Keras format)
`models/stacking_meta.pkl`	Stacking ensemble bundle
`models/isolation_forest.pkl`	Isolation Forest anomaly detector
`models/lstm_autoencoder.keras`	LSTM Autoencoder (Keras format)
`models/scaler.pkl`	StandardScaler for feature normalization
`models/shap_values.pkl`	Pre-computed SHAP values
`models/lstm_threshold.json`	LSTM anomaly reconstruction threshold
`models/evaluation_results.json`	Per-target R², MAE, RMSE, MAPE
`models/pipeline_summary.json`	Full run summary

🔌 API Reference

Base URL: http://localhost:8000
Interactive docs: http://localhost:8000/docs

Method	Endpoint	Description
`GET`	`/api/health`	Model load status
`POST`	`/api/predict`	Predict all quality targets + energy
`POST`	`/api/anomaly`	Detect energy anomalies for a batch
`GET`	`/api/explain/{batch_id}`	SHAP feature contributions
`GET`	`/api/carbon/{batch_id}`	CO₂e + adaptive target for a batch
`GET`	`/api/batches`	List all available batch IDs
`GET`	`/api/carbon_history`	Full carbon history (all batches)
`GET`	`/api/model_metrics`	Full benchmark: R², MAE, anomaly metrics

`POST /api/predict`

curl -X POST "http://localhost:8000/api/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "granulation_time": 16,
    "binder_amount": 9.0,
    "drying_temp": 60,
    "drying_time": 29,
    "compression_force": 12.0,
    "machine_speed": 170,
    "lubricant_conc": 1.2,
    "moisture_content": 2.0
  }'

Response:

{
  "hardness": 89.4,
  "friability": 0.81,
  "dissolution_rate": 90.7,
  "content_uniformity": 98.2,
  "disintegration_time": 8.3,
  "tablet_weight": 202.1,
  "energy_kwh": 72.4,
  "carbon_kg_co2e": 51.8,
  "composite_quality_score": 82.3
}

`POST /api/anomaly`

curl -X POST "http://localhost:8000/api/anomaly" \
  -H "Content-Type: application/json" \
  -d '{ "batch_id": "T045" }'

`GET /api/explain/{batch_id}`

curl "http://localhost:8000/api/explain/T023?target=Dissolution_Rate"

`GET /api/carbon/{batch_id}`

curl "http://localhost:8000/api/carbon/T045?grid=India"

`GET /api/model_metrics`

curl "http://localhost:8000/api/model_metrics"

Returns full benchmark data (regression R²/MAE/RMSE/MAPE per model & target, anomaly metrics).

📱 Dashboard

The Next.js web dashboard (http://localhost:3000) has 6 tabs, all consuming the FastAPI backend:

Tab	Description
🔮 Predictions	Enter 8 process parameters → get all quality predictions + Composite Quality Score
⚡ Energy Monitor	Select any batch → view phase-wise power/vibration chart + anomaly score + root cause alerts
📊 Batch Comparison	Compare any two batches side-by-side via normalized radar charts + delta table
🌍 Carbon Footprint	Trend chart of CO₂e across all batches + adaptive target line + grid selector (India/EU/US/Renewable)
🎛️ What-If Optimizer	Move sliders for any parameter → predictions update live in < 100ms
📈 Benchmark	Full model performance report — R², MAE, RMSE, MAPE per model & target; anomaly detector metrics

Start the Dashboard

cd dashboard
npm install    # first time only
npm run dev    # http://localhost:3000

📈 Results

Multi-Target Prediction (R² on held-out test set, 12 batches)

Target	XGBoost R²	RF R²	Stacking R²
Hardness	0.9895	0.9826	0.9896
Friability	0.9810	0.9530	0.9722
Dissolution Rate	0.9902	0.9727	0.9832
Content Uniformity	0.9926	0.9765	0.9919
Disintegration Time	0.9869	0.9733	0.9876
Tablet Weight	0.9327	0.9000	0.9571
Energy kWh	0.8094	0.8479	0.7799
Overall Mean R²	0.9546	0.9437	0.9516

Production model: Stacking Ensemble (XGBoost + RandomForest + per-target Ridge meta-learner, 5-fold OOF). MLP excluded — severely overfits on n=60 dataset (overall R² = –10.15).

Anomaly Detection

Model	Precision	Recall	F1	AUC-ROC
Isolation Forest	16.67%	16.67%	0.167	0.682
LSTM Autoencoder	10.00%	100%	0.182	0.324

Low precision is expected — severe class imbalance (6/60 anomalous). LSTM recall of 100% means zero missed anomalies. See BENCHMARK.md for full analysis.

Business Impact (500 batches/year)

Metric	Saving
Energy reduction (8–10% per batch)	~4,350 kWh/year
Carbon reduction	~3,100 kg CO₂e/year
Batch rejection prevention (est. 30 fewer/year)	~₹15 lakh/year
Early anomaly detection	Prevents catastrophic equipment failure

📚 Documentation

File	Contents
`SETUP.md`	Environment setup, data placement, running all services
`PIPELINE.md`	Plain-English explanation of every pipeline step
`BENCHMARK.md`	Full model performance benchmark with metric definitions
`docs/ARCHITECTURE.md`	System architecture diagrams, layer breakdowns, API schemas
`docs/IMPLEMENTATION_PLAN.md`	Dataset analysis, code skeletons, training strategy, timeline
`docs/PROJECT_DOCUMENTATION.md`	Problem statement, tech stack rationale, business impact, references

🔮 Future Work

Federated Learning — train across multiple factories without sharing raw batch data
Digital Twin — couple predictive models with physics-based tablet press simulation
Reinforcement Learning — RL agent learns optimal parameter settings through reward signals
Edge Deployment — quantize LSTM Autoencoder to ONNX for IIoT edge node deployment
Real-Time IIoT Pipeline — Apache Kafka → stream processor → live inference → Grafana
Real-Time Carbon API — integrate Electricity Maps API for marginal (not average) emission factors

⚠️ Known Limitations

Sensor time-series data exists for T001 only; T002–T060 are physics-based simulations
MLP overfit severely on n=60 dataset (R² = –10.15) and is excluded from production ensemble
Energy_kWh is physics-derived (not directly measured); lower R² (0.78–0.85) is expected
Anomaly precision/recall appear low due to extreme class imbalance (10% anomaly rate)
Carbon calculation uses annual-average grid emission factor, not real-time marginal
n=60 batches is small; production system requires 500+ batches for robust generalization

Built by Team Knights for the National AI/ML Hackathon by AVEVA
PRISM — Predictive Reliability & Intelligence for Smart Manufacturing

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
analysis		analysis
api		api
dashboard		dashboard
data		data
docs		docs
notebooks		notebooks
problem_statment		problem_statment
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
BENCHMARK.md		BENCHMARK.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
PIPELINE.md		PIPELINE.md
README.md		README.md
RELEASES.md		RELEASES.md
SETUP.md		SETUP.md
docker-compose.yml		docker-compose.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

⚡ PRISM

Predictive Reliability & Intelligence for Smart Manufacturing

Track A — Predictive Modelling Specialization | National AI/ML Hackathon by AVEVA

📑 Table of Contents

🔍 Overview

✨ Key Features

📁 Project Structure

🛠️ Tech Stack

📊 Dataset

_h_batch_process_data.xlsx — Minute-by-Minute Sensor Log

_h_batch_production_data.xlsx — Batch Summary Records

🧩 Modules

Module 1 — Multi-Target Prediction

Module 2 — Energy Pattern Analyser

Module 3 — SHAP Explainability

Module 4 — Carbon Footprint Tracker

🚀 Quick Start

1. Clone and Set Up Environment

2. Add Data Files

3. Run the Full Training Pipeline

4. Start the API

5. Launch the Web Dashboard

6. Run Tests

⚙️ Running the Pipeline

All-in-One

Step-by-Step (Manual)

Pipeline Output Files

🔌 API Reference

POST /api/predict

POST /api/anomaly

GET /api/explain/{batch_id}

GET /api/carbon/{batch_id}

GET /api/model_metrics

📱 Dashboard

Start the Dashboard

📈 Results

Multi-Target Prediction (R² on held-out test set, 12 batches)

Anomaly Detection

Business Impact (500 batches/year)

📚 Documentation

🔮 Future Work

⚠️ Known Limitations

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`_h_batch_process_data.xlsx` — Minute-by-Minute Sensor Log

`_h_batch_production_data.xlsx` — Batch Summary Records

`POST /api/predict`

`POST /api/anomaly`

`GET /api/explain/{batch_id}`

`GET /api/carbon/{batch_id}`

`GET /api/model_metrics`

Packages