Skip to content

Optimise UK simulation run (-63% cold sim time)#251

Closed
nikhilwoodruff wants to merge 1 commit intomainfrom
optimise-uk-sim-performance
Closed

Optimise UK simulation run (-63% cold sim time)#251
nikhilwoodruff wants to merge 1 commit intomainfrom
optimise-uk-sim-performance

Conversation

@nikhilwoodruff
Copy link
Collaborator

Summary

Two changes to the UK model's run() method that together cut cold simulation time from 39.6s to 14.8s (-63%):

  • Convert MicroDataFrames to plain DataFrames before passing to UKSingleYearDataset. The data pipeline only needs numeric arrays for copying and uprating — MicroDataFrame.copy() triggers expensive O(N²) weight linking that's wasted here.
  • Monkey-patch apply_uprating to skip its defensive deep copy of the entire multi-year dataset. extend_single_year_dataset already copies each year individually, so the second copy is redundant.

There's a companion microdf fix (PolicyEngine/microdf#281) that addresses the root cause of the O(N²) weight linking. This PR works independently of that fix but the two are complementary.

Benchmark results (3-run mean):

Phase Before After Change
Simulate (cold) 39.61s 14.77s -62.7%
Wall total 46.29s 21.49s -53.6%
Mean household income £54,562 £54,562 identical

Test plan

  • All 110 policyengine.py tests pass
  • Benchmark confirms mean household income unchanged
  • Warm cache performance unaffected

Two changes to the UK model's run() method:

1. Convert MicroDataFrames to plain DataFrames before passing to
   UKSingleYearDataset. The data pipeline only needs numeric arrays for
   copying and uprating — MicroDataFrame.copy() triggers expensive O(N²)
   weight linking that's wasted here.

2. Monkey-patch apply_uprating to skip its defensive deep copy of the
   entire multi-year dataset. extend_single_year_dataset already copies
   each year individually, so the second copy is redundant.

Benchmarked: cold simulate dropped from 39.6s to 14.8s (-63%), wall
total from 46.3s to 21.5s (-54%). Mean household income unchanged
(£54,562). All 110 tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
@nwoodruff-co
Copy link
Contributor

Closing in favour of the upstream fixes: the redundant copy is now handled in PolicyEngine/policyengine-uk#1523 (merged), and the MicroDataFrame O(N²) overhead is fixed in PolicyEngine/microdf#281 (merged). No changes needed in policyengine.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants