SWA Estimator — Pipeline Cost Intelligence Platform

Parametric cost estimation for LA Basin natural gas pipeline projects.
ML regression model · Azure Static Web Apps · Python Azure Functions · BLS PPI inflation adjustment · AACE Class 5 compliant output

What This Is

Most pipeline cost estimates at early project stages are gut-feel spreadsheets. This tool replaces that with a production-deployed ML regression model trained on 418 completed LA Basin pipeline projects, wrapped in a full-stack web application that generates AACE Class 5 compliant estimates in seconds.

It is actively used internally at a major Southern California natural gas utility for capital project budget planning and portfolio-level cost screening.

Live Features

Instant parametric estimates — diameter, length, dig count, project type, and city as inputs; project cost + contingency + total budget as output
Cost escalation engine — compounds costs forward to any construction quarter (2025–2035) using a configurable inflation rate
AACE Class 5 accuracy range — −50% / +100% bounds rendered on an interactive range track
Full Basis of Estimate (BOE) report — exportable PDF with TIC cost breakdown, methodology, assumptions, and accuracy statement; AACE RP 34R-05 compliant
Model trace panel — per-feature scaled values, coefficients, and contributions; full transparency into what is driving each estimate
Historical dataset explorer — filterable, sortable paginated table of 431 training-set records with city and project type filters
Excel export — two-sheet workbook (Estimate Summary + Cost Breakdown) with model performance metadata

Architecture

Browser (Azure Static Web App)
    │
    ├── index.html + css/styles.css
    └── js/
        ├── api.js       →  POST /api/predict, GET /api/health
        ├── main.js      →  input handling, escalation preview
        ├── ui.js        →  rendering layer (estimate cards, BOE, trace table)
        ├── data.js      →  dataset explorer
        └── export.js    →  Excel (SheetJS) and PDF exports

Azure Functions (Python v2) — /api
    ├── predict/         →  POST /api/predict
    │   └── validates inputs → calls run_estimate() → returns JSON
    └── health/          →  GET /api/health

api/shared/
    ├── predictor.py     →  singleton model loader (Azure Blob Storage), run_estimate()
    └── m5_transformer.py →  sklearn transformer (OHE + feature engineering + scaler)

Azure Blob Storage
    └── model-artifacts/
        ├── pipeline.joblib
        └── model_metadata.json

train/
    ├── train.py         →  full training pipeline (BLS PPI fetch → Lasso selection → LR → joblib)
    └── data/ProjectData_LA.xlsx

The Model (M5)

Property	Value
Algorithm	Lasso feature selection → Linear Regression on log-transformed cost
Training set	418 LA Basin pipeline projects · 19 cities · 4 districts
Test R² (log scale)	0.888
MAE	$44K (2025 dollars)
MAPE	19.0%
CV R² (5-fold)	0.899 ± 0.015
Features selected	24 of 37 candidates
Inflation adjustment	BLS PPI series `WPUIP2311001` (Oil & Gas Field Machinery) — actuals normalised to 2025 dollars before training

Feature engineering: log(diameter × length), diameter², digs × diameter, log(digs), log(length) — plus one-hot encoded project type, city, and district. Lasso with 5-fold CV selects the final feature set; a plain LinearRegression is then fit on the selected features for full interpretability.

Why log-transform? Pipeline costs span nearly two orders of magnitude. Log-transforming the target produces normally distributed residuals and better-behaved regression coefficients, with exp() used at inference to return dollar values.

Tech Stack

Layer	Technology
Frontend hosting	Azure Static Web Apps
Backend compute	Azure Functions (Python v2)
Model storage	Azure Blob Storage
ML framework	scikit-learn 1.5.2
Data	pandas, numpy
Model persistence	joblib
Inflation data	BLS Public API (no key required)
Excel export	SheetJS (browser-side)
PDF export	Browser print API
CI/CD	GitHub Actions → Azure SWA

Project Structure

swa-estimator/
├── index.html
├── css/styles.css
├── js/
│   ├── api.js
│   ├── data.js
│   ├── dataset.json         
│   ├── export.js
│   ├── main.js
│   └── ui.js
├── staticwebapp.config.json
├── api/
│   ├── host.json
│   ├── requirements.txt
│   ├── predict/__init__.py   ← POST /api/predict
│   ├── health/__init__.py    ← GET /api/health
│   └── shared/
│       ├── predictor.py
│       ├── m5_transformer.py
│       └── model/            ← populated from Azure Blob at cold-start
│           ├── pipeline.joblib
│           └── model_metadata.json
└── train/
    ├── train.py
    └── data/ProjectData_LA.xlsx

Local Development

# Install dependencies
npm install -g @azure/static-web-apps-cli
pip install -r api/requirements.txt

# Start local emulator (serves frontend + functions together)
swa start . --api-location api
# → http://localhost:4280

Model artifacts are downloaded from Azure Blob Storage at cold-start via AZURE_STORAGE_CONNECTION_STRING. For local development, set this in api/local.settings.json.

Training the Model

cd swa-estimator/
source api/.venv/bin/activate
python -m train.train

This will:

Fetch current BLS PPI data (falls back to hardcoded values if offline)
Normalise all historical actuals to 2025 dollars
Engineer features and run LassoCV feature selection
Fit a LinearRegression on the selected features
Run smoke tests and print a summary
Save pipeline.joblib and model_metadata.json to api/shared/model/

Then upload the artifacts to Azure Blob Storage for the Functions backend to consume.

Deployment

Create Azure Static Web App (Azure Portal → Static Web Apps → Create)
- App location: /
- API location: api
- Output location: (leave blank)
Add GitHub secret: AZURE_STATIC_WEB_APPS_API_TOKEN

Set application settings in Azure Portal (Functions → Configuration):

AZURE_STORAGE_CONNECTION_STRING
BLOB_CONTAINER_NAME = model-artifacts
BLOB_MODEL_NAME     = pipeline.joblib
BLOB_META_NAME      = model_metadata.json

Push to main — GitHub Actions handles the rest.

API Reference

`GET /api/health`

Returns model load status and metadata summary.

`POST /api/predict`

Request:

{
  "pipe_diameter":        8,
  "pipe_length":          1356,
  "num_digs":             7,
  "project_type":         "Pipe Replacement",
  "city":                 "Glendale",
  "contingency_pct":      10,
  "construction_year":    2027,
  "construction_quarter": 2
}

Response (abbreviated):

{
  "costs": {
    "base_cost_2025":  485000,
    "project_cost":    524600,
    "contingency_amt":  52460,
    "total_budget":    577060
  },
  "escalation": {
    "years_from_base":   2.25,
    "escalation_factor": 1.0824,
    "inflation_rate_pct": 4.0
  },
  "aace_range": { "low": 288530, "estimate": 577060, "high": 1154120 },
  "trace": {
    "feature_names": ["Digs × Diameter", "..."],
    "contributions":  [0.0312, "..."]
  }
}

Design Decisions Worth Noting

Singleton model loading — predictor.py uses a module-level _pipeline variable so the model is loaded once at Azure Functions cold-start and reused across all subsequent invocations. Avoids the ~2–3s joblib load penalty on every request.

Blob Storage over bundled artifacts — The trained pipeline.joblib is not committed to the repo. It is uploaded to Azure Blob Storage post-training and pulled down to /tmp/model/ at cold-start. This keeps the scikit-learn version decoupled from the deployment process and makes model updates a blob upload, not a redeploy.

Hardcoded trace coefficients — predictor.py maintains a parallel set of COEFS and FEATURE_LABELS for the explanation trace panel. This avoids introspecting joblib internals for a UI feature and makes the explanation layer robust to future model packaging changes.

Lasso then LinearRegression — LassoCV is used purely for feature selection (zero-out coefficients), not as the final estimator. A plain LinearRegression is then fit on the selected feature subset, giving clean, interpretable coefficients with no regularisation shrinkage bias in the final estimates.

Roadmap

Re-calibrate model with incoming project actuals as the dataset grows
Upgrade to AACE Class 3 accuracy as project definition data becomes available
Add confidence interval bands to the estimate output
Extend city coverage beyond the current 19 LA Basin cities
Role-based access control for internal deployment

Author

Built and maintained by Priyank Rao — Data Scientist / ML Engineer
Portfolio · GitHub

This tool is an internal planning instrument. All estimates are AACE Class 5 parametric estimates suitable for screening and feasibility purposes only. Not for project authorisation.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
api		api
css		css
js		js
train		train
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
index.html		index.html
staticwebapp.config.json		staticwebapp.config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWA Estimator — Pipeline Cost Intelligence Platform

What This Is

Live Features

Architecture

The Model (M5)

Tech Stack

Project Structure

Local Development

Training the Model

Deployment

API Reference

`GET /api/health`

`POST /api/predict`

Design Decisions Worth Noting

Roadmap

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SWA Estimator — Pipeline Cost Intelligence Platform

What This Is

Live Features

Architecture

The Model (M5)

Tech Stack

Project Structure

Local Development

Training the Model

Deployment

API Reference

GET /api/health

POST /api/predict

Design Decisions Worth Noting

Roadmap

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /api/health`

`POST /api/predict`

Packages