Skip to content

Conversation

@paco-barreras
Copy link
Collaborator

No description provided.

- Identify vestigial all-pairs shortest path computation (city_gen:1190)
- Document expensive geometry operations in _sample_step
- Analyze OSM graph integration opportunity
- Propose 4 solution paths with trade-offs
- Recommend immediate, short-term, and long-term actions
Critical findings:
- _build_hub_network() (user-created) only sets self.hubs and self.hub_df
- get_shortest_path() looks for _shortcut_hubs, _nearest_hub, etc (NEVER SET)
- Result: always falls back to nx.shortest_path() on demand
- Garden city works because graph is tiny (~400 nodes)
- Philadelphia scale (2M nodes) will fail catastrophically

Recommendations:
- Remove all_pairs_shortest_path from from_geodataframes (line 1190)
- Remove fake hub routing code (lines 1522-1587)
- Keep real hub network for gravity computation
- Focus on _sample_step geometry bottleneck
Following coding principles:
- No bloat: Removed 70+ lines of AI slop hub routing code
- Clear intent: Method mirrors compute_gravity() pattern
- No defensive guards: Explicit error if not computed
- No fallbacks: User must call compute_shortest_paths() first
- Consistency: Same dual-mode design as gravity computation

Changes:
- Add compute_shortest_paths(callable_only=False) method
- Refactor get_shortest_path() to use self.shortest_paths (dict or callable)
- Rename get_distance_fast() to get_paths_fast() and mark deprecated
- Initialize self.shortest_paths = None in __init__
- Update test_shortest_path() to call compute_shortest_paths()

Note: callable_only=True is PLACEHOLDER using nx.shortest_path() on demand.
Production use requires hub-based routing for scalability.
Lists all notebooks requiring updates:
- Priority 1: random_city, prescribed_trajectory_exp_1, garden-city-paper-code
- Priority 2: virtual_philly, generate_synthetic_trajectories, robustness-of-algos-paper

Explains why dense mode for small cities, error messages, testing checklist.
Key findings:
- Path geometries (MultiLineString, bound_poly) reconstructed EVERY call
- No caching between iterations (480 calls for 4hr trajectory)
- Existing projection functions (_path_coords, _cartesian_coords) perfect for optimization
- Proposed: Cache door-to-door MultiLineStrings, then hub-based network
- bound_poly check (unary_union) is major bottleneck, possibly removable

Analysis covers:
- Step-by-step breakdown of coordinate transformation
- Explanation of orthogonal coordinate system
- Performance bottlenecks with measurements
- Implementation strategy for MultiLineString caching
- Hub-based network architecture
Benchmark 1: get_shortest_path() performance
- Loads OSM data and rasterizes city (small/medium box)
- Builds hub network and computes gravity
- Computes shortest paths in callable mode
- Benchmarks 100 random queries with Manhattan distance < 20
- Reports timing statistics (mean ~0.38ms for small box)

Results (SMALL BOX):
- 533 buildings, 1,869 street blocks, 93 hubs
- Mean path query time: 0.38ms
- Median: 0.00ms (sub-millisecond)
- Mean path length: 16.1 blocks

Next: Add trajectory generation benchmark to profile _sample_step
Benchmark 2: 8-hour trajectory generation
- Creates single-agent population
- Generates 8-hour trajectory without pre-computing destination diary
- Reports performance metrics for _sample_step invocations

Results (SMALL BOX, dt=0.5 min):
- Total time: 0.93s for 8 hours
- Trajectory points: 961
- Points per second: 1032
- Average time per point: 0.97ms

Key findings:
- get_shortest_path: 0.40ms mean (100 queries)
- Trajectory generation: 0.97ms per point
- Most time spent in _sample_step geometry operations

Next: Profile and optimize _sample_step bottlenecks
…ration

Following rasterization_report.py pattern:
- Setup phase (load, rasterize, build networks)
- Benchmark 1: get_shortest_path() queries (Manhattan dist < 30)
- Benchmark 2: Destination diary generation (EPR, 24 hours)
- Benchmark 3: Trajectory generation from diary

Configuration:
- 24-hour simulation window (more representative paths)
- Manhattan distance threshold: 30 blocks
- EPR params: rho=0.4, gamma=0.3, epr_time_res=15
- dt=0.5 minutes

Results - SMALL BOX:
- Diary generation: 0.05s (7 entries)
- Trajectory generation: 2.73s (2,881 points)
- Time per point: 0.95ms

Results - MEDIUM BOX:
- 5,472 buildings, 10,003 street blocks
- Diary generation: 0.16s (7 entries)
- Trajectory generation: 3.78s (2,881 points)
- Time per point: 1.31ms (38% slower than small box)

Confirms scalability for large-city experiments.
LARGE BOX Results:
- 28,297 buildings, 43,460 street blocks (53x more than small)
- Setup: 83s (rasterization 40s, gravity 37s, hub network 6s)
- Path queries: 1.37ms mean (3x slower, but still fast)
- Diary generation: 2.27s (45x slower - gravity lookups)
- Trajectory generation: 16.70s (6.1x slower)
- Time per point: 5.80ms (6.1x slower than small)

Scaling comparison (small → medium → large):
- Buildings: 533 → 5,472 → 28,297 (1x → 10x → 53x)
- Time/point: 0.95ms → 1.31ms → 5.80ms (1x → 1.4x → 6.1x)

Key insight: Time per point scales sub-linearly with city size.
For 53x more buildings, only 6x slower per point.

Large box completes 24-hour trajectory in 17 seconds - confirms
approach is viable for full Philadelphia experiments.
- Implement efficient _connect_building_door_to_streets using Manhattan distance
- Add incremental graph and streets_gdf updates
- RasterCity now automatically connects buildings with non-street doors
- Add test_rastercity_connects_isolated_doors to verify behavior
- Fixes blocker for synthetic_philadelphia.py rasterization
…hiladelphia

- Add date column generation in save_pop when partition_cols includes 'date'
- Remove full trajectory persistence (memory optimization)
- Update synthetic_philadelphia to only save/visualize sparse trajectories
- Use from_file for reading parquet data
- Remove timing from I/O operations as requested
- Add _cached_geom and _cached_geom_id attributes to Agent
- Cache boundary polygons at end of _sample_step (building or path geometry)
- Add diagnostic check at start of _sample_step (no effect yet, just infrastructure)
- Track previous_building_id in _traj_from_dest_diary for future cache key optimization
- Cache IDs: building_id for stays, (start, dest) tuples for path movements
- All tests pass, backward compatible, no performance changes yet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants