Josiah

Synthetic Marketing Mix Model (MMM) data generator with known ground truth parameters. Built for testing and validating MMM implementations like PyMC Marketing.

Why

When building or evaluating an MMM, you need data where you already know the true effect of each channel. Josiah generates realistic datasets with configurable adstock, saturation, trend, seasonality, controls, and promotions — then exports the ground truth parameters alongside the data so you can measure how well your model recovers them.

Install

pip install -e .

Requires Python 3.9+.

Quick Start

Streamlit App

streamlit run app.py

The app has three pages:

Scenario Builder — configure single or batch scenarios, pick an engine, set scale presets, and tune channel/control/promo parameters.
Generate & Preview — run generation, inspect the output DataFrame and decomposition charts.
Export — download CSVs, Parquet files, or a ZIP bundle with ground truth JSON sidecars.

Python API

from josiah import ScenarioConfig, ChannelConfig, ControlConfig, PromoConfig, generate_single

config = ScenarioConfig(
    name="my_test",
    engine="pymc",
    start_date="2022-01-01",
    end_date="2024-12-31",
    frequency="W",
    intercept=5000.0,
    noise_std=100.0,
    trend_type="linear",
    trend_params={"slope": 0.5},
    seasonality_n_terms=2,
    channels=[
        ChannelConfig(name="facebook", alpha=0.7, l_max=8, lam=2.0, beta=800.0, spend_mean=3000.0, spend_std=500.0),
        ChannelConfig(name="google", alpha=0.5, l_max=4, lam=3.0, beta=1200.0, spend_mean=5000.0, spend_std=1000.0),
    ],
    controls=[
        ControlConfig(name="z1", gamma_shape=2.0, gamma_scale=1.0, coefficient=150.0),
    ],
    promos=[
        PromoConfig(name="black_friday", coefficient=500.0, n_occurrences=1, duration_days=3),
    ],
    seed=42,
)

df, ground_truth, decomp_df = generate_single(config)

Batch Generation

Generate multiple scenarios with randomized parameters:

from josiah import BatchConfig, generate_batch, run_batch

# Create randomized scenario configs
batch = BatchConfig(
    n_scenarios=10,
    engine="pymc",
    n_channels_range=(2, 5),
    beta_range=(200.0, 1500.0),
    intercept_range=(500.0, 2000.0),
    master_seed=42,
)
configs = generate_batch(batch)

# Generate all datasets
results = run_batch(configs)  # list of (df, ground_truth, decomp_df|None)

Export

from josiah import export_scenario, export_batch_to_zip

# Single scenario to files
export_scenario(df, ground_truth, path="output/", fmt="csv", decomp_df=decomp_df)

# Batch to ZIP (returns BytesIO)
zip_bytes = export_batch_to_zip(results, fmt="csv")

Engines

PyMC Engine (recommended)

Matches PyMC Marketing's formulas:

y = intercept + trend + seasonality + controls + channels + promos + noise

Where each channel contribution is:

beta * logistic_saturation(geometric_adstock(spend / max|spend|, alpha, l_max), lam)

Spend is normalized by max(abs(spend)) per channel before saturation, matching PyMC Marketing's MaxAbsScaler. The ground truth JSON includes channel_scales so you can denormalize.

Output columns: date, {channel}_spend, {control}, {promo}, y

Legacy Engine

Hill CPM curves + exponential adstock. Uses daily frequency.

Output columns: date, {channel}_spend, {channel}_impressions, {channel}_cpm, {channel}_revenue, seasonality_revenue, total_revenue, revenue, y, is_preflight

Ground Truth

Every generated dataset includes a JSON sidecar with the true parameters used for generation. This lets you validate model recovery — compare your fitted parameters against the known truth.

The PyMC ground truth includes: intercept, trend, seasonality coefficients, per-channel adstock/saturation/beta params, channel scales, control coefficients, promo coefficients, total ROAS, and the full formula string.

Available Channels

facebook, google, tiktok, pinterest, email, youtube, snapchat, linkedin, twitter, display

Available Promos

black_friday, cyber_monday, prime_day, summer_sale, holiday_sale, flash_sale, new_year_sale, back_to_school, valentines, spring_sale, labor_day, memorial_day

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
demantiq		demantiq
docs		docs
josiah		josiah
pages		pages
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Josiah

Why

Install

Quick Start

Streamlit App

Python API

Batch Generation

Export

Engines

PyMC Engine (recommended)

Legacy Engine

Ground Truth

Available Channels

Available Promos

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Josiah

Why

Install

Quick Start

Streamlit App

Python API

Batch Generation

Export

Engines

PyMC Engine (recommended)

Legacy Engine

Ground Truth

Available Channels

Available Promos

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages