Skip to content

Performance: Optimization opportunities (35-45% throughput improvement) #149

@dshkol

Description

@dshkol

Summary

Meta-issue tracking 13 performance optimization opportunities identified in code audit.

Performance Items

ID Location Issue Expected Gain
P1 cansim.R:365-379 fold_in_metadata repeated left_joins 50-70%
P2 cansim_metadata.R:98-111 parse_metadata nested loops 60-80%
P3 cansim_parquet.R:675-715 cached_tables repeated reads 65-85%
P4 cansim_metadata.R:127-145 hierarchy O(n) cycle detection 40-60%
P5 cansim.R:156-162 gsub loop in factor conversion 30-50%
P6 cansim_parquet.R:254-263 field cache read miss 70-90%
P7 cansim_parquet.R:219-232 csv2sqlite transform copies 25-40%
P8 cansim_vectors.R:20-24 lapply to vapply 30-45%
P9 Multiple files French string constants 20-35%
P10 cansim.R:64 unnecessary as_tibble 5-15%
P11 cansim_metadata.R:123-124 hash lookup for parents 80-95%
P12 cansim_vectors.R:244-251 coordinate metadata loop 35-50%
P13 cansim.R:885,889,898 lapply unlist chains 20-30%

Proposed Implementation Plan

  • PR 4: Hot Paths (P1, P2, P5, P13)
  • PR 5: Caching & I/O (P3, P6, P7, P10)
  • PR 6: Lookups & Vectorization (P4, P8, P9, P11, P12)

All performance PRs will include microbenchmark results.


From code audit - 35-45% overall throughput improvement potential

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions