Skip to content

burning-cost/experience-rating

Repository files navigation

experience-rating

Tests PyPI Python License: MIT

Experience modification factors, schedule rating, and NCD/bonus-malus systems for UK non-life insurance pricing. For teams whose experience rating logic lives in a spreadsheet no one fully understands.


The problem

Fleet motor is the clearest case. A 200-vehicle fleet has three years of claims history: 14 incidents, £320,000 in paid losses, £45,000 incurred but not yet settled. Market rating gives you the base. Experience rating answers what you actually want to know: how much should this account's own history move the price?

The maths is not hard, but the choices are. What credibility weight? What ballast? How do you cap a single catastrophic loss so it does not blow up the mod? These decisions are regularly buried in an Excel cell with no audit trail. This library makes them explicit and auditable.

Fleet also has no NCD system to fall back on — unlike personal motor, where a bad risk eventually self-selects up to a worse NCD level, fleet pricing is largely prospective. The experience mod factor is doing the full job of distinguishing good accounts from bad.

Personal motor NCD is the secondary use case here. If your team is asking "what is the steady-state distribution of our book across NCD levels at 10% claim frequency?" or "at what claim amount should a 65% NCD customer absorb the loss rather than claim?", this library handles that too. But the core experience rating machinery was designed around commercial lines where NCD does not exist.


Blog post

Your NCD Threshold Advice Is Wrong at 65%


What this library does not do

It does not calibrate BM scales from data (that requires a GLM pipeline and historical claims). It does not model policyholder heterogeneity (see the credibility library for that). It does not optimise NCD system design - it analyses a system you have already specified.


Installation

uv add experience-rating

Requires Python 3.10+. Dependencies: polars, numpy, scipy.


Quick start

Fleet motor: experience modification factor

Fleet is the primary use case. No NCD scale exists — the mod factor is the full adjustment.

import polars as pl
from experience_rating import ExperienceModFactor
from experience_rating.experience_mod import CredibilityParams

# With A=0.65 and B=£8,000, a fleet with £8k expected losses has 32.5% sensitivity
# to its own experience (A * E/(E+B) = 0.65 * 8k/16k). At £80k expected losses
# sensitivity rises to ~59% — the ballast-to-expected ratio drives how much own
# experience matters; larger fleets are more heavily experience-rated.
params = CredibilityParams(credibility_weight=0.65, ballast=8_000.0)
emod = ExperienceModFactor(params)

fleet_accounts = pl.DataFrame({
    "risk_id": ["Alpha Logistics", "Beta Haulage", "Gamma Couriers"],
    "expected_losses": [25_000.0, 80_000.0, 12_000.0],
    "actual_losses":   [32_000.0, 65_000.0,  4_000.0],
})

result = emod.predict_batch(fleet_accounts, cap=2.0, floor=0.5)
print(result)
# Alpha: slightly above 1.0 (worse than expected, moderate size)
# Beta: below 1.0 (better than expected, high credibility)
# Gamma: well below 1.0 (much better than expected, small fleet — high ballast-to-expected ratio damps the result)

Schedule rating for commercial risks

Use this alongside the mod factor for discretionary underwriter adjustments.

from experience_rating import ScheduleRating

sr = ScheduleRating(max_total_debit=0.25, max_total_credit=0.25)
sr.add_factor("Premises",      min_credit=-0.10, max_debit=0.10, description="Premises condition")
sr.add_factor("Management",    min_credit=-0.07, max_debit=0.07, description="Management quality")
sr.add_factor("Risk_Controls", min_credit=-0.08, max_debit=0.08, description="Risk controls")

factor = sr.rate({"Premises": 0.05, "Management": -0.03, "Risk_Controls": 0.02})
print(f"Schedule rating factor: {factor:.4f}")  # 1.0400

Personal motor: NCD scale and stationary distribution

NCD is the secondary use case — for personal lines teams who need to model the BM system analytically.

from experience_rating import BonusMalusScale, BonusMalusSimulator

# Commonly used UK NCD scale: levels 0%-65%, step up on claim-free year,
# back two on one claim, back to zero on two or more claims.
scale = BonusMalusScale.from_uk_standard()

sim = BonusMalusSimulator(scale, claim_frequency=0.10)

# Analytical stationary distribution (left eigenvector of transition matrix)
dist = sim.stationary_distribution(method="analytical")
print(dist)

# Expected premium factor at steady state
epf = sim.expected_premium_factor()
print(f"Average NCD at steady state: {(1 - epf) * 100:.1f}%")

Personal motor: optimal claiming threshold

from experience_rating import ClaimThreshold

ct = ClaimThreshold(scale, discount_rate=0.05)

# Customer at 65% NCD paying £280/year after discount
# Over a 3-year horizon, should they claim a £450 repair?
threshold = ct.threshold(current_level=9, annual_premium=280.0, years_horizon=3)
print(f"Claim only if loss exceeds £{threshold:.0f}")

should = ct.should_claim(
    current_level=9, claim_amount=450, annual_premium=280.0, years_horizon=3
)
print("Claiming is rational" if should else "Better to pay out of pocket")

API reference

ExperienceModFactor

Method Description
from_exposure(actual, full_credibility, ballast, formula) Construct from exposure-based credibility
predict(expected_losses, actual_losses, cap, floor) Single-risk mod factor
predict_batch(df, expected_col, actual_col, cap, floor) Portfolio mod factors (Polars DataFrame)
sensitivity(expected_losses, actual_range, n_points) Mod vs actual loss curve

ScheduleRating

Method Description
add_factor(name, min_credit, max_debit, description) Register a rating factor (chainable)
rate(features) Multiplicative schedule factor for one risk
rate_batch(df) Schedule factors for a portfolio DataFrame
summary() Registered factors as a Polars DataFrame

BonusMalusScale

Method Description
from_uk_standard() Commonly used UK NCD scale: 10 levels (0%-65%)
from_dict(spec) Build from a dictionary specification
transition_matrix(claim_frequency) Row-stochastic transition matrix (Poisson claims)
summary() Polars DataFrame of level definitions

BonusMalusSimulator

Method Description
simulate(n_policyholders, n_years) Monte Carlo simulation of level flows
stationary_distribution(method) "analytical" (eigenvector) or "simulation"
expected_premium_factor(method) Probability-weighted average premium factor at steady state

ClaimThreshold

Method Description
threshold(current_level, annual_premium, years_horizon) Minimum loss amount that makes claiming rational
should_claim(current_level, claim_amount, annual_premium, years_horizon) Boolean claiming decision
threshold_curve(current_level, annual_premium, max_horizon) Threshold vs horizon DataFrame
full_analysis(annual_premium, years_horizon) Thresholds for every level in the scale

Custom BM scale

spec = {
    "levels": [
        {
            "index": 0, "name": "No NCD", "premium_factor": 1.00, "ncd_percent": 0,
            "transitions": {"claim_free_level": 1, "claim_levels": {"1": 0, "2": 0}}
        },
        {
            "index": 1, "name": "20% NCD", "premium_factor": 0.80, "ncd_percent": 20,
            "transitions": {"claim_free_level": 2, "claim_levels": {"1": 0, "2": 0}}
        },
        {
            "index": 2, "name": "40% NCD", "premium_factor": 0.60, "ncd_percent": 40,
            "transitions": {"claim_free_level": 2, "claim_levels": {"1": 1, "2": 0}}
        },
    ]
}
scale = BonusMalusScale.from_dict(spec)

Design notes

Why expose ballast directly rather than deriving it? Because the choice of ballast is a deliberate actuarial decision that affects which risks get charged more and which get discounted. Hiding it inside a calibration function obscures a regulatory-facing choice. For fleet, this matters — a low ballast means small fleets are fully experience-rated, which can produce volatile pricing. Most underwriters choose a ballast that gives 50% credibility at around £5,000-£10,000 expected losses.

Why additive schedule rating (not multiplicative)? UK commercial practice is additive: factors are debits/credits expressed as percentage adjustments summed together. The aggregate cap is where you control total swing. Multiplicative schedule rating is used in some US lines but is not standard in UK admitted business.

Why eigenvector for stationary distribution? It is exact (no simulation noise) and fast. The simulation method exists as a sanity check - if the two disagree by more than a few percent, the transition matrix is probably not ergodic.


Tests

uv add "experience-rating[dev]"
pytest

105 tests covering scale construction, transition matrix properties, stationary distribution (analytical vs simulation agreement), claiming thresholds, experience modification formula, and schedule rating bounds validation.


Performance

Benchmarked against a flat portfolio rate (every policyholder charged the portfolio mean frequency, no individual adjustment) on a synthetic motor portfolio with a known data-generating process: 10,000 policyholders, 4 years of claims history, 10% annual mean frequency, with true individual frequencies drawn from a Gamma distribution (shape=2, mean=10%) — producing realistic heterogeneity across good, average, and bad risks.

The benchmark tests two components independently:

Experience modification factor: Credibility-weighted mod formula applied to 4 years of aggregate loss experience per policyholder, with cap=2.0 and floor=0.50.

NCD / bonus-malus system: 10,000 policyholders simulated through the commonly used UK motor NCD scale over 4 history years. The NCD level at year 5 is used as a premium predictor. This is a deliberately conservative test — the NCD level is a lossy encoding of history (level only, not raw claim counts), so some discrimination signal is discarded.

Method Gini vs holdout claims MSE vs DGP true frequency Notes
Flat portfolio rate baseline baseline No individual adjustment
NCD / BM system expected +2 to +6 pp expected 5–15% improvement Lossy history encoding
Experience mod factor expected +5 to +12 pp expected 10–25% improvement Full loss history retained

Gini and MSE figures are labelled "expected" because exact values depend on the random seed. The direction and ordering are consistent: experience mod outperforms NCD (because it uses full loss amounts rather than level transitions), and both outperform the flat rate whenever the portfolio has genuine individual risk heterogeneity (true frequency CV > 0.5, which this DGP produces by construction).

The NCD system's A/E ratio converges toward 1.0 within each risk quality tier (good/average/bad) more quickly than the flat rate, confirming that the NCD level discriminates underlying risk quality even though it is not designed for this purpose.

Both methods run in under 1 second for 100,000 risks. The computational case for flat rates over experience rating is nil once the data pipeline exists.

Run notebooks/benchmark.py on Databricks to reproduce.


Databricks Notebook

A ready-to-run Databricks notebook benchmarking this library against standard approaches is available in burning-cost-examples.

Related Burning Cost libraries

  • insurance-credibility - Bühlmann-Straub credibility weighting for scheme and affinity pricing. The experience mod factor here uses a simple credibility weight; insurance-credibility gives you the full structural parameter estimation (EPV, VHM, k) when you have panel data across multiple groups.
  • insurance-multilevel - Two-stage CatBoost + REML approach when individual risk factors and group factors need to be modelled jointly.

Model building

Library Description
shap-relativities Extract rating relativities from GBMs using SHAP
insurance-cv Walk-forward cross-validation respecting IBNR structure

Uncertainty quantification

Library Description
insurance-conformal Distribution-free prediction intervals for Tweedie models
bayesian-pricing Hierarchical Bayesian models for thin-data segments

Deployment and optimisation

Library Description
insurance-optimise Constrained rate change optimisation with FCA PS21/5 compliance
insurance-demand Conversion, retention, and price elasticity modelling

All libraries


Related Libraries

Library What it does
insurance-credibility Bühlmann-Straub group credibility and Bayesian experience rating at policy level
insurance-multilevel Two-stage CatBoost + REML random effects — applies the same credibility logic to broker and scheme factors

Licence

MIT

About

NCD/bonus-malus and experience modification factors for insurance pricing

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages