Skip to content

Commit de2d3ce

Browse files
nwoodruff-coclaude
andauthored
Enhance entity mapping with flexible aggregation methods and custom values (#184)
* Enhance entity mapping with flexible aggregation methods and custom values Add support for custom values and multiple aggregation methods to the entity mapping system, making it more flexible for complex analysis workflows. Features added: - values parameter: Map custom value arrays instead of existing columns - Extended how parameter with new aggregation methods: * Person → Group: 'sum' (default), 'first' * Group → Person: 'project' (default), 'divide' * Group → Group: 'sum', 'first', 'project', 'divide' Refactoring: - Created base YearData class to eliminate code duplication - UKYearData and USYearData now inherit from base class - Removed duplicate map_to_entity implementations Documentation: - Added comprehensive entity mapping section to core-concepts.md - Added examples to UK and US model documentation - Documented all aggregation methods with use cases All existing tests pass, confirming backward compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply code formatting fixes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix import sorting order 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add Claude-friendly documentation and quick reference Add comprehensive guides for AI assistants to use policyengine.py: - .claude/policyengine-guide.md: Detailed patterns and examples - .claude/quick-reference.md: Quick lookup for common operations Includes: - 7 common workflow patterns (synthetic scenarios, parameter sweeps, reforms) - Minimal working examples for UK and US - Entity mapping examples with all aggregation methods - Critical fields reference - Common parameters cheat sheet - Troubleshooting guide These guides help AI assistants quickly understand and use the package for tax-benefit microsimulation analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Add get_parameter and get_variable methods to TaxBenefitModelVersion Add convenience methods to look up parameters and variables by name: - get_parameter(name): Returns Parameter object by name - get_variable(name): Returns Variable object by name - Both raise ValueError if not found with helpful error messages Tests added (12 tests, all passing): - UK and US variable lookup tests - UK and US parameter lookup tests - Error handling tests for non-existent parameters/variables - Multiple parameter/variable lookup tests Usage: var = uk_latest.get_variable('income_tax') param = uk_latest.get_parameter('gov.hmrc.income_tax.allowances.personal_allowance.amount') 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Apply formatting to test_get_parameter_variable.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 2a7fe96 commit de2d3ce

File tree

14 files changed

+1481
-65
lines changed

14 files changed

+1481
-65
lines changed

.claude/policyengine-guide.md

Lines changed: 568 additions & 0 deletions
Large diffs are not rendered by default.

.claude/quick-reference.md

Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
# PolicyEngine.py Quick Reference
2+
3+
## Imports cheat sheet
4+
5+
```python
6+
# Core
7+
from policyengine.core import Simulation, Policy, Parameter, ParameterValue
8+
9+
# UK
10+
from policyengine.tax_benefit_models.uk import (
11+
PolicyEngineUKDataset,
12+
UKYearData,
13+
uk_latest
14+
)
15+
16+
# US
17+
from policyengine.tax_benefit_models.us import (
18+
PolicyEngineUSDataset,
19+
USYearData,
20+
us_latest
21+
)
22+
23+
# Outputs
24+
from policyengine.outputs.aggregate import Aggregate, AggregateType
25+
from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType
26+
27+
# Utilities
28+
from policyengine.utils.plotting import format_fig, COLORS
29+
from microdf import MicroDataFrame
30+
import pandas as pd
31+
import numpy as np
32+
```
33+
34+
## Minimal working example (UK)
35+
36+
```python
37+
import pandas as pd
38+
from microdf import MicroDataFrame
39+
from policyengine.tax_benefit_models.uk import (
40+
PolicyEngineUKDataset, UKYearData, uk_latest
41+
)
42+
from policyengine.core import Simulation
43+
44+
# Person data
45+
person_df = MicroDataFrame(pd.DataFrame({
46+
"person_id": [0],
47+
"person_household_id": [0],
48+
"person_benunit_id": [0],
49+
"age": [30],
50+
"employment_income": [30000],
51+
"person_weight": [1.0],
52+
}), weights="person_weight")
53+
54+
# Household data
55+
household_df = MicroDataFrame(pd.DataFrame({
56+
"household_id": [0],
57+
"region": ["LONDON"],
58+
"rent": [12000],
59+
"household_weight": [1.0],
60+
}), weights="household_weight")
61+
62+
# Benunit data
63+
benunit_df = MicroDataFrame(pd.DataFrame({
64+
"benunit_id": [0],
65+
"would_claim_uc": [True],
66+
"benunit_weight": [1.0],
67+
}), weights="benunit_weight")
68+
69+
# Create dataset
70+
dataset = PolicyEngineUKDataset(
71+
name="Example",
72+
filepath="./temp.h5",
73+
year=2026,
74+
data=UKYearData(person=person_df, household=household_df, benunit=benunit_df)
75+
)
76+
77+
# Run simulation
78+
sim = Simulation(dataset=dataset, tax_benefit_model_version=uk_latest)
79+
sim.run()
80+
81+
# Get results
82+
output = sim.output_dataset.data
83+
print(output.household[["household_net_income"]])
84+
```
85+
86+
## Minimal working example (US)
87+
88+
```python
89+
import pandas as pd
90+
from microdf import MicroDataFrame
91+
from policyengine.tax_benefit_models.us import (
92+
PolicyEngineUSDataset, USYearData, us_latest
93+
)
94+
from policyengine.core import Simulation
95+
96+
# Person data (US requires more entity links)
97+
person_df = MicroDataFrame(pd.DataFrame({
98+
"person_id": [0, 1],
99+
"person_household_id": [0, 0],
100+
"person_tax_unit_id": [0, 0],
101+
"person_spm_unit_id": [0, 0],
102+
"person_family_id": [0, 0],
103+
"person_marital_unit_id": [0, 0],
104+
"age": [35, 33],
105+
"employment_income": [60000, 40000],
106+
"person_weight": [1.0, 1.0],
107+
}), weights="person_weight")
108+
109+
# Create minimal entity dataframes
110+
entities = {}
111+
for entity in ["tax_unit", "spm_unit", "family", "marital_unit"]:
112+
entities[entity] = MicroDataFrame(pd.DataFrame({
113+
f"{entity}_id": [0],
114+
f"{entity}_weight": [1.0],
115+
}), weights=f"{entity}_weight")
116+
117+
household_df = MicroDataFrame(pd.DataFrame({
118+
"household_id": [0],
119+
"state_code": ["CA"],
120+
"household_weight": [1.0],
121+
}), weights="household_weight")
122+
123+
# Create dataset
124+
dataset = PolicyEngineUSDataset(
125+
name="Example",
126+
filepath="./temp.h5",
127+
year=2024,
128+
data=USYearData(
129+
person=person_df,
130+
tax_unit=entities["tax_unit"],
131+
spm_unit=entities["spm_unit"],
132+
family=entities["family"],
133+
marital_unit=entities["marital_unit"],
134+
household=household_df,
135+
)
136+
)
137+
138+
# Run simulation
139+
sim = Simulation(dataset=dataset, tax_benefit_model_version=us_latest)
140+
sim.run()
141+
142+
# Get results
143+
print(sim.output_dataset.data.household[["household_net_income"]])
144+
```
145+
146+
## Common patterns
147+
148+
### Parameter sweep (vary one input)
149+
```python
150+
n = 50
151+
incomes = np.linspace(0, 100000, n)
152+
153+
person_df = MicroDataFrame(pd.DataFrame({
154+
"person_id": range(n),
155+
"person_household_id": range(n),
156+
"person_benunit_id": range(n),
157+
"age": [30] * n,
158+
"employment_income": incomes,
159+
"person_weight": [1.0] * n,
160+
}), weights="person_weight")
161+
162+
# Create matching household/benunit data with n rows
163+
# ... then run simulation once for all scenarios
164+
```
165+
166+
### Policy reform
167+
```python
168+
import datetime
169+
from policyengine.core import Policy, Parameter, ParameterValue
170+
171+
parameter = Parameter(
172+
name="gov.hmrc.income_tax.allowances.personal_allowance.amount",
173+
tax_benefit_model_version=uk_latest,
174+
description="Personal allowance",
175+
data_type=float,
176+
)
177+
178+
policy = Policy(
179+
name="Reform",
180+
description="Change PA",
181+
parameter_values=[ParameterValue(
182+
parameter=parameter,
183+
start_date=datetime.date(2026, 1, 1),
184+
end_date=datetime.date(2026, 12, 31),
185+
value=15000,
186+
)]
187+
)
188+
189+
# Run with policy
190+
reform_sim = Simulation(dataset=dataset, tax_benefit_model_version=uk_latest, policy=policy)
191+
```
192+
193+
### Extract aggregate statistics
194+
```python
195+
from policyengine.outputs.aggregate import Aggregate, AggregateType
196+
197+
# Sum
198+
total = Aggregate(
199+
simulation=sim,
200+
variable="universal_credit",
201+
entity="benunit",
202+
aggregate_type=AggregateType.SUM,
203+
)
204+
total.run()
205+
206+
# Mean
207+
avg = Aggregate(
208+
simulation=sim,
209+
variable="household_net_income",
210+
entity="household",
211+
aggregate_type=AggregateType.MEAN,
212+
)
213+
avg.run()
214+
215+
# Count with filter
216+
count = Aggregate(
217+
simulation=sim,
218+
variable="person_id",
219+
entity="person",
220+
aggregate_type=AggregateType.COUNT,
221+
filter_variable="age",
222+
filter_geq=65, # Age >= 65
223+
)
224+
count.run()
225+
```
226+
227+
### Compare baseline vs reform
228+
```python
229+
from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType
230+
231+
winners = ChangeAggregate(
232+
baseline_simulation=baseline_sim,
233+
reform_simulation=reform_sim,
234+
variable="household_net_income",
235+
aggregate_type=ChangeAggregateType.COUNT,
236+
change_geq=1,
237+
)
238+
winners.run()
239+
240+
revenue = ChangeAggregate(
241+
baseline_simulation=baseline_sim,
242+
reform_simulation=reform_sim,
243+
variable="household_tax",
244+
aggregate_type=ChangeAggregateType.SUM,
245+
)
246+
revenue.run()
247+
```
248+
249+
### Entity mapping
250+
```python
251+
# Sum person income to household
252+
household_income = dataset.data.map_to_entity(
253+
source_entity="person",
254+
target_entity="household",
255+
columns=["employment_income"],
256+
how="sum"
257+
)
258+
259+
# Broadcast household rent to persons
260+
person_rent = dataset.data.map_to_entity(
261+
source_entity="household",
262+
target_entity="person",
263+
columns=["rent"],
264+
how="project"
265+
)
266+
267+
# Divide household value equally per person
268+
per_person = dataset.data.map_to_entity(
269+
source_entity="household",
270+
target_entity="person",
271+
columns=["total_savings"],
272+
how="divide"
273+
)
274+
275+
# Map custom values
276+
custom_totals = dataset.data.map_to_entity(
277+
source_entity="person",
278+
target_entity="household",
279+
values=custom_array,
280+
how="sum"
281+
)
282+
```
283+
284+
## Critical fields
285+
286+
### UK
287+
- **Person**: `person_id`, `person_household_id`, `person_benunit_id`, `age`, `employment_income`, `person_weight`
288+
- **Household**: `household_id`, `region`, `rent`, `household_weight`
289+
- **Benunit**: `benunit_id`, `would_claim_uc`, `benunit_weight`
290+
291+
### US
292+
- **Person**: `person_id`, `person_household_id`, `person_tax_unit_id`, `person_spm_unit_id`, `person_family_id`, `person_marital_unit_id`, `age`, `employment_income`, `person_weight`
293+
- **Household**: `household_id`, `state_code`, `household_weight`
294+
- **Other entities**: Each needs `{entity}_id` and `{entity}_weight`
295+
296+
## Common UK regions
297+
```python
298+
["LONDON", "SOUTH_EAST", "SOUTH_WEST", "EAST_OF_ENGLAND",
299+
"WEST_MIDLANDS", "EAST_MIDLANDS", "YORKSHIRE",
300+
"NORTH_WEST", "NORTH_EAST", "WALES", "SCOTLAND", "NORTHERN_IRELAND"]
301+
```
302+
303+
## Common US state codes
304+
```python
305+
["CA", "NY", "TX", "FL", "PA", "IL", "OH", "GA", "NC", "MI", ...]
306+
```
307+
308+
## Aggregate filter options
309+
```python
310+
# Exact match
311+
filter_eq=value
312+
313+
# Greater than/equal
314+
filter_geq=value
315+
316+
# Less than/equal
317+
filter_leq=value
318+
319+
# Quantile filtering (deciles)
320+
quantile=10 # Split into 10 groups
321+
quantile_eq=1 # First decile only
322+
quantile_geq=9 # Top two deciles
323+
quantile_leq=2 # Bottom two deciles
324+
```
325+
326+
## Common parameters
327+
328+
### UK
329+
```
330+
gov.hmrc.income_tax.allowances.personal_allowance.amount
331+
gov.hmrc.income_tax.rates.uk[0] # Basic rate
332+
gov.hmrc.national_insurance.class_1.rates.main
333+
gov.dwp.universal_credit.means_test.reduction_rate
334+
gov.dwp.universal_credit.elements.child.first_child
335+
gov.dwp.child_benefit.amount.first_child
336+
```
337+
338+
### US
339+
```
340+
gov.irs.income.standard_deduction.single
341+
gov.irs.income.standard_deduction.joint
342+
gov.irs.credits.ctc.amount.base
343+
gov.irs.credits.eitc.max[0]
344+
gov.ssa.payroll.rate.employee
345+
gov.usda.snap.normal_allotment.max[1]
346+
```
347+
348+
## Troubleshooting
349+
350+
| Issue | Solution |
351+
|-------|----------|
352+
| No UC calculated | Set `would_claim_uc=True` |
353+
| Random UC spikes | Set `is_disabled_for_benefits=False`, `uc_limited_capability_for_WRA=False` |
354+
| KeyError on column | Check variable name in docs, may be different entity level |
355+
| Empty results | Check weights sum correctly, verify ID linkages |
356+
| Slow performance | Use parameter sweep pattern (one simulation for N scenarios) |
357+
358+
## Visualisation template
359+
```python
360+
from policyengine.utils.plotting import format_fig, COLORS
361+
import plotly.graph_objects as go
362+
363+
fig = go.Figure()
364+
fig.add_trace(go.Scatter(x=x_vals, y=y_vals, line=dict(color=COLORS["primary"])))
365+
format_fig(fig, title="Title", xaxis_title="X", yaxis_title="Y")
366+
fig.show()
367+
```

0 commit comments

Comments
 (0)