Efficient garden city routing #201

paco-barreras · 2025-11-13T18:59:28Z

No description provided.

- Identify vestigial all-pairs shortest path computation (city_gen:1190) - Document expensive geometry operations in _sample_step - Analyze OSM graph integration opportunity - Propose 4 solution paths with trade-offs - Recommend immediate, short-term, and long-term actions

Critical findings: - _build_hub_network() (user-created) only sets self.hubs and self.hub_df - get_shortest_path() looks for _shortcut_hubs, _nearest_hub, etc (NEVER SET) - Result: always falls back to nx.shortest_path() on demand - Garden city works because graph is tiny (~400 nodes) - Philadelphia scale (2M nodes) will fail catastrophically Recommendations: - Remove all_pairs_shortest_path from from_geodataframes (line 1190) - Remove fake hub routing code (lines 1522-1587) - Keep real hub network for gravity computation - Focus on _sample_step geometry bottleneck

Following coding principles: - No bloat: Removed 70+ lines of AI slop hub routing code - Clear intent: Method mirrors compute_gravity() pattern - No defensive guards: Explicit error if not computed - No fallbacks: User must call compute_shortest_paths() first - Consistency: Same dual-mode design as gravity computation Changes: - Add compute_shortest_paths(callable_only=False) method - Refactor get_shortest_path() to use self.shortest_paths (dict or callable) - Rename get_distance_fast() to get_paths_fast() and mark deprecated - Initialize self.shortest_paths = None in __init__ - Update test_shortest_path() to call compute_shortest_paths() Note: callable_only=True is PLACEHOLDER using nx.shortest_path() on demand. Production use requires hub-based routing for scalability.

Lists all notebooks requiring updates: - Priority 1: random_city, prescribed_trajectory_exp_1, garden-city-paper-code - Priority 2: virtual_philly, generate_synthetic_trajectories, robustness-of-algos-paper Explains why dense mode for small cities, error messages, testing checklist.

Key findings: - Path geometries (MultiLineString, bound_poly) reconstructed EVERY call - No caching between iterations (480 calls for 4hr trajectory) - Existing projection functions (_path_coords, _cartesian_coords) perfect for optimization - Proposed: Cache door-to-door MultiLineStrings, then hub-based network - bound_poly check (unary_union) is major bottleneck, possibly removable Analysis covers: - Step-by-step breakdown of coordinate transformation - Explanation of orthogonal coordinate system - Performance bottlenecks with measurements - Implementation strategy for MultiLineString caching - Hub-based network architecture

Benchmark 1: get_shortest_path() performance - Loads OSM data and rasterizes city (small/medium box) - Builds hub network and computes gravity - Computes shortest paths in callable mode - Benchmarks 100 random queries with Manhattan distance < 20 - Reports timing statistics (mean ~0.38ms for small box) Results (SMALL BOX): - 533 buildings, 1,869 street blocks, 93 hubs - Mean path query time: 0.38ms - Median: 0.00ms (sub-millisecond) - Mean path length: 16.1 blocks Next: Add trajectory generation benchmark to profile _sample_step

Benchmark 2: 8-hour trajectory generation - Creates single-agent population - Generates 8-hour trajectory without pre-computing destination diary - Reports performance metrics for _sample_step invocations Results (SMALL BOX, dt=0.5 min): - Total time: 0.93s for 8 hours - Trajectory points: 961 - Points per second: 1032 - Average time per point: 0.97ms Key findings: - get_shortest_path: 0.40ms mean (100 queries) - Trajectory generation: 0.97ms per point - Most time spent in _sample_step geometry operations Next: Profile and optimize _sample_step bottlenecks

…ration Following rasterization_report.py pattern: - Setup phase (load, rasterize, build networks) - Benchmark 1: get_shortest_path() queries (Manhattan dist < 30) - Benchmark 2: Destination diary generation (EPR, 24 hours) - Benchmark 3: Trajectory generation from diary Configuration: - 24-hour simulation window (more representative paths) - Manhattan distance threshold: 30 blocks - EPR params: rho=0.4, gamma=0.3, epr_time_res=15 - dt=0.5 minutes Results - SMALL BOX: - Diary generation: 0.05s (7 entries) - Trajectory generation: 2.73s (2,881 points) - Time per point: 0.95ms Results - MEDIUM BOX: - 5,472 buildings, 10,003 street blocks - Diary generation: 0.16s (7 entries) - Trajectory generation: 3.78s (2,881 points) - Time per point: 1.31ms (38% slower than small box) Confirms scalability for large-city experiments.

LARGE BOX Results: - 28,297 buildings, 43,460 street blocks (53x more than small) - Setup: 83s (rasterization 40s, gravity 37s, hub network 6s) - Path queries: 1.37ms mean (3x slower, but still fast) - Diary generation: 2.27s (45x slower - gravity lookups) - Trajectory generation: 16.70s (6.1x slower) - Time per point: 5.80ms (6.1x slower than small) Scaling comparison (small → medium → large): - Buildings: 533 → 5,472 → 28,297 (1x → 10x → 53x) - Time/point: 0.95ms → 1.31ms → 5.80ms (1x → 1.4x → 6.1x) Key insight: Time per point scales sub-linearly with city size. For 53x more buildings, only 6x slower per point. Large box completes 24-hour trajectory in 17 seconds - confirms approach is viable for full Philadelphia experiments.

- Implement efficient _connect_building_door_to_streets using Manhattan distance - Add incremental graph and streets_gdf updates - RasterCity now automatically connects buildings with non-street doors - Add test_rastercity_connects_isolated_doors to verify behavior - Fixes blocker for synthetic_philadelphia.py rasterization

…hiladelphia - Add date column generation in save_pop when partition_cols includes 'date' - Remove full trajectory persistence (memory optimization) - Update synthetic_philadelphia to only save/visualize sparse trajectories - Use from_file for reading parquet data - Remove timing from I/O operations as requested

- Add _cached_geom and _cached_geom_id attributes to Agent - Cache boundary polygons at end of _sample_step (building or path geometry) - Add diagnostic check at start of _sample_step (no effect yet, just infrastructure) - Track previous_building_id in _traj_from_dest_diary for future cache key optimization - Cache IDs: building_id for stays, (start, dest) tuples for path movements - All tests pass, backward compatible, no performance changes yet

paco-barreras added 13 commits November 6, 2025 04:22

clean up of files in preparation for merge with street graph branch

0b82208

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient garden city routing #201

Efficient garden city routing #201

Uh oh!

paco-barreras commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Efficient garden city routing #201

Are you sure you want to change the base?

Efficient garden city routing #201

Uh oh!

Conversation

paco-barreras commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants