Rasterization and street graph #191

paco-barreras · 2025-11-06T09:04:00Z

Relatively large overhaul to how we rasterize a city and build street graphs connecting rasterized buildings.

New persistence methods
hub-based proxy graph distance (for scalability)
New indexing for buildings and "door" logic assignment to avoid index overlaps
New tests for city generation and trajectory generation (all pass)

- Remove chunking logic for city-scale downloads (single query ~4x faster) - Fix CRS handling in download_osm_buildings/streets for polygon queries - Convert all chunk_miles parameters to chunk_km (metric units) - Add progress reporting for chunked downloads - Add gpkg files to gitignore - Move synthetic Philadelphia notebook to virtual_philadelphia directory

- Remove redundant column existence checks (we control the pipeline) - Remove empty list checks (data always exists after download/load) - Remove broad exception catching that masks real errors - Simplify plotting to only show streets - Add automatic data loading from persisted GeoPackage

- Integrate rasterization utilities into city_gen.py (RasterCityGenerator class) - Implement lazy street graph building with shortcut (hub) network for O(n) memory - Add manual_streets mode to City class for rasterized cities - Update building classification: fix residential/home mapping, add residential signal requirement for inference - Standardize chunking to kilometers (chunk_km) throughout map_utils - Add utility scripts: ab_category_audit.py, streets_benchmark.py, create_sandbox.py - Add comprehensive tests: test_rasterization_city_gen.py - Remove obsolete rasterization.py and RASTERIZATION_PLAN.md - Update .gitignore to exclude large GeoPackage files from research directory

…ibility - Rewrite create_sandbox.py to download OSM data directly for Old City bbox - Remove dependency on synthetic_philadelphia.py output - Update all scripts to use relative paths instead of __file__ for Jupyter notebook compatibility - Scripts now work when run as notebooks or from command line

OSMnx expects geographic coordinates (EPSG:4326) for polygon queries. The download functions handle CRS conversion internally.

…ppers, add GeoJSON/Parquet/Shapefile I/O; enable street consolidation/length filter; speed up rasterization; fix tests; update report

…nts in streets report

…ndbox containment check

…ensive try/except

…emove defensive column guessing

…n_report.ipynb to use building_type

…tently

…th lazy shortcut routing and boundary scaling/cropping

…ching; expand sandbox boundary by 2.5x; remove City.id_to_door_cell and use direct columns; simplify get_building()

…recomputed gravity in traj_gen; report gravity build in test_rasterization_report

…stance table and exponent in gravity; improve docstrings; normalize building_type in from_geopackage; tests pass

…e canonicalization; update traj_gen tests to normalize legacy fixture labels; docs clarified

…rkplace/retail/park) with rng.choice; enforce no fallbacks; City.from_geopackage: poi_cols naming

…_sample_horizontal_noise; return true burst_info; add tests incl. bursts tz/window check

… maps/traj_gen/city_gen tests passing

…ct schema and errors for missing building ids

…yields no path

…erCityGenerator; enforce 'building_type' in City.from_geodataframes; fix KeyError in rasterization report use-cases

- Replaced _select_hubs with quantile-based evenly-spaced hub selection - Pivoted hub_df from edge list to dense adjacency matrix for O(1) lookup - Vectorized gravity computation using numpy broadcasting and fancy indexing - Simplified traj_gen.py gravity query from 17 lines to 3 lines - Fixed diagonal not being zeroed in distance matrix - Added test_compute_gravity() with real data from sandbox fixture - Streamlined rasterization_report.py to focus on timing and object inspection

…cessary defensive checks, derive timestamps when extending diaries, early gravity check, and streamline initialization paths.

- Add test_resolve_overlaps to verify feature increases building count - Test confirms resolve_overlaps=True adds more buildings (34 → 53 in test) - Add resolve_overlaps=True to rasterization_report.py - Benchmark results show 40-54% more buildings across all box sizes: - Small: 347 → 535 (+54%) - Medium: 3,935 → 5,504 (+40%) - Large: 20,252 → 28,419 (+40%) - Gravity computation scales with N² as expected (more buildings = longer time) - Memory footprint remains tiny with lean structures (<3 MB for 28k buildings) - All tests pass

- Refactor compute_gravity Part 1 to use 10 chunks (default) - Process buildings in chunks to avoid dense N×N matrix allocation - Memory reduction: 3.2 GB → 323 MB peak (10x improvement) - Runtime improvement: 444s → 17s for 28k buildings (26x faster!) - Eliminates memory spikes and thrashing - All tests pass, backward compatible - Update synthetic_philadelphia.py with streamlined EPR pipeline: - Use resolve_overlaps=True and callable_only=True - Generate 10 agents with destination diaries for 1 week - Persist config and diaries to disk - Production-ready for 100k+ building simulations

…cture - Remove bloat and useless print statements - Clean FULL_CITY vs LARGE_BOX branching logic - Match exact timing structure from rasterization_report.py - Professional formatting with clear sections - Separate timing for each pipeline step - Clean config persistence - No unnecessary variables or defensive checks - Production-ready notebook

- Fixed to_timestamp() to handle scalar inputs (NumPy 2.0 compatibility) - Wraps scalars in pd.Series for processing, unwraps on return - Prevents ValueError from pd.api.types.is_datetime64_any_dtype() - Added test_to_timestamp_scalar() to verify scalar string handling - Improved consistency in traj_gen.py by using to_timestamp() instead of .timestamp() - Lines 620, 643, 771 now handle string datetime inputs robustly - Added warning in generate_dest_diary when last_ping is beyond end_time - Prevents silent generation of empty destination diaries - Provides actionable error message to user - Removed debug pdb.set_trace() from generate_dest_diary All tests pass (13 traj_gen + 26 filter tests)

…t_diary - Convert string datetime to pd.Timestamp in Agent.__init__ - Ensures last_ping['datetime'] is always a proper datetime object - Fixes AttributeError: 'str' object has no attribute 'hour' - Convert string datetime in generate_dest_diary when reading from destination_diary - Handles case where destination_diary contains string datetimes All tests pass (13 traj_gen tests)

- Tests 9 different initialization scenarios: 1. Default (no kwargs) - uses home centroid and current time 2. x, y coordinates 3. location (building ID) 4. datetime as string 5. datetime as pd.Timestamp 6. timestamp (Unix seconds) 7. trajectory (takes precedence) 8. trajectory + kwargs (warns, trajectory wins) 9. Combined x, y, and datetime string with timezone - Verifies datetime objects always have .hour attribute - Ensures trajectory takes precedence over kwargs with warning - Confirms all datetime strings are converted to pd.Timestamp All 14 traj_gen tests pass

Bug: RasterCity._rasterize was using door coordinates for building IDs, causing duplicates when multiple buildings shared the same door. Root Cause: Line 1942 used door directly instead of an internal building block adjacent to the door (as City.add_building does at lines 358-366). Fix: Apply same logic as City.add_building - use check_adjacent() to find an internal building block adjacent to the door, ensuring unique IDs since buildings cannot overlap. Testing: Added test_unique_building_ids() regression test. All 17 tests pass.

…t path Changed from O(n² × partial_BFS) to O(n × full_BFS) where n is number of hubs. Before: Called nx.shortest_path_length(G, origin, dest) for each hub pair After: Call nx.single_source_shortest_path_length(G, origin) once per origin, then query hub destinations via direct dictionary lookup For n=100 hubs: 10,000 BFS runs -> 100 BFS runs Trade-off: Loses early-stop benefit but eliminates redundant BFS traversals. Expected to be faster since hubs are scattered evenly across large graphs. All non-sandbox tests pass (4 tests fail due to deleted sandbox data).

Tests should not depend on data in examples/research/ folders. Changed: - Renamed _load_sandbox() to _load_fixture() - Updated to load from nomad/data/city_fixture.gpkg (proper test fixture) - Added helpful error message pointing to generate_fixture.py if missing - Removed dependency on experimental sandbox data The fixture is generated by nomad/data/generate_fixture.py which downloads a small OSM area, rotates it, and saves buildings/streets/boundary. All 17 tests pass.

Added pdb import at top of city_gen.py (following coding principles). Debug statements in compute_gravity_row (city_gen.py): - Check for unexpected zero distances before nearby door override - Print building IDs with zero distances and hub info - Set breakpoint if more than 1 zero distance found Debug statements in generate_dest_diary (traj_gen.py): - Print curr, probs shape, sum, inf/nan/zero counts - Set breakpoint if inf, nan, or zero sum detected - Print first 10 y.index and probs values for inspection These will help diagnose why gravity contains NaN values.

- Add rotation_deg, offset_x, offset_y attributes to base City class - Update RasterCity to pass rotation and document coordinate flow - Fix streets_gdf geometry to use garden city units (consistent with buildings) - Set CRS to None for internal gdfs (garden city units are abstract) - Implement to_file method with reverse_affine_transformation option - Add comprehensive docstrings explaining coordinate system transformations

- Test verifies to_file exports with and without reverse transformation - Checks garden city columns are dropped when reverse_affine_transformation=True - Verifies CRS is set correctly to Web Mercator - Validates geometry scaling by comparing areas

- Add blocks_to_mercator_gdf() and mercator_to_blocks_gdf() in map_utils.py - Delete vestigial to_geodataframes() method - Delete OLD to_file() method (lines 978-986) - Refactor NEW to_file() to use map_utils transforms (reduced from ~100 to ~40 lines) - Refactor save_geopackage() to use direct gdf access and add reverse_affine parameter - Rename reverse_affine_transformation to reverse_affine for consistency - Update all tests to use city.buildings_gdf/streets_gdf directly - No bloat: clean separation of concerns (transforms in map_utils, persistence in City) - All 19 tests pass

paco-barreras · 2025-11-06T09:04:58Z

Also squashed a bug related to newer versions of Numpy breaking some filters.py functions

…vectorize street_geom and path construction

- Cache path_ml and bound_poly to avoid redundant unary_union and MultiLineString computations - Check cache validity by verifying cached path ends at current destination building - Clear cache when agent arrives at destination - Use previous destination building geometry in bound_poly when departing from it - Vectorize street_geom and path construction using blocks_gdf directly - Remove unnecessary start_info variable and clamping logic - Simplify cache validation to only check destination match Performance improvements: - test_complete_workflow: 7.09s -> 4.43s (37% faster) - prescribed_trajectory_exp_1.py: 10.74s -> 6.29s (41% faster)

- Fixed _connect_building_door_to_streets() to use building_type instead of redundant 'kind' column - Added 5 door connection tests from routing branch (updated for building_type) - Added compute_shortest_paths() calls to tests to reflect proper usage - Added auto-initialization to get_shortest_path() for convenience - Removed bloat: eliminated unnecessary hasattr check

…tests - Remove type checking and defensive hasattr checks - Remove redundant ValueError check for empty buildings - Remove DEBUG VERIFY code - Initialize street_graph = None in __init__ - Add warning for auto-initialization of shortest_paths - Fix pd.concat to preserve GeoDataFrame geometry (FutureWarning fix) - Add tests for geometry containment and building_type/building_id consistency - Add verification prints to rasterization_report.py

…version - Replace notebook with full pipeline version from ROUTING branch - Includes trajectory generation, sparse sampling, reprojection, and visualization - Keep ROUTING branch configuration (medium box, smaller parameters) - Remove temporary report .md files (CACHING_PLAN, CODING_PRINCIPLES, FINAL_REVISED_PLAN) - Delete ROUTING branch notebook file

…tadata, remove analysis/report/plan markdown files

- Remove type hints from 5 functions (Principle #4) - Remove dead code: unused imports (pdb, cm), unused functions, unused variables - Fix broken reset_trajectory() cache nulling - Optimize O(n) → O(1) building lookups using indexed .loc[] (7 locations) - Remove defensive bloat: unnecessary .empty checks, hasattr checks - Remove bounds clamping (agents always kept in bounds by geometry checks) - Extract duplicate destination selection logic to helper function - Update test expectations: KeyError instead of IndexError for missing buildings - Add testing/environment notes to CODING_PRINCIPLES.md Results: 1491 → 1477 lines (-14 lines), all 14 tests pass Significant performance improvement from O(1) building lookups

paco-barreras added 30 commits October 30, 2025 07:13

Add comprehensive implementation plan for Philadelphia rasterization

820bb47

Fix create_sandbox.py: pass bbox in EPSG:4326 to OSMnx

03671e1

OSMnx expects geographic coordinates (EPSG:4326) for polygon queries. The download functions handle CRS conversion internally.

Refactor map_utils: simplify OSM downloads, remove chunking/cache wra…

469c63a

…ppers, add GeoJSON/Parquet/Shapefile I/O; enable street consolidation/length filter; speed up rasterization; fix tests; update report

Rewrite streets scalability report as clear narrative prose

d80c466

Add targeted code references to traj_gen, city_gen, map_utils, consta…

46a264d

…nts in streets report

Buildings download: explode+clip to boundary for bbox/polygon; add sa…

707f9d7

…ndbox containment check

Simplify buildings clip: explode then gpd.clip; remove helper and def…

505ef6a

…ensive try/except

Standardize on building_type column; fix residential->home mapping; r…

1fe5a6f

…emove defensive column guessing

Fix example scripts: download_poi_data_from_OSM and test_rasterizatio…

988161e

…n_report.ipynb to use building_type

Fix remaining example scripts and notebooks: use building_type consis…

559b19c

…tently

Fix download_poi_data_from_OSM.ipynb to use building_type

1721be2

Add door_paths_scalability.py: benchmark 100-to-all door distances wi…

301382d

…th lazy shortcut routing and boundary scaling/cropping

Scalability in test_rasterization_report: door-to-all queries with ca…

caf0cdf

…ching; expand sandbox boundary by 2.5x; remove City.id_to_door_cell and use direct columns; simplify get_building()

Add City.compute_gravity (manhattan-to-hub + hub distances) and use p…

3e70243

…recomputed gravity in traj_gen; report gravity build in test_rasterization_report

Enforce precomputed city.grav in traj_gen (no fallbacks); add door di…

3522069

…stance table and exponent in gravity; improve docstrings; normalize building_type in from_geopackage; tests pass

City.from_geopackage: use poi_cols mapping (no guessing); remove valu…

aa26521

…e canonicalization; update traj_gen tests to normalize legacy fixture labels; docs clarified

traj_gen: use building_type; fix initial visit_freqs seeding (home/wo…

c54806a

…rkplace/retail/park) with rng.choice; enforce no fallbacks; City.from_geopackage: poi_cols naming

traj_gen: split sampling into generate_ping_times/thin_traj_by_times/…

5b96369

…_sample_horizontal_noise; return true burst_info; add tests incl. bursts tz/window check

Rename rasterization report files to avoid pytest collection; confirm…

5151576

… maps/traj_gen/city_gen tests passing

traj_gen: remove blind defensive guards in _sample_step; enforce stri…

052a6a7

…ct schema and errors for missing building ids

traj_gen: remove path fallback; enforce error when get_shortest_path …

dc316f7

…yields no path

rasterization: standardize building schema to 'building_type' in Rast…

868aeb0

…erCityGenerator; enforce 'building_type' in City.from_geodataframes; fix KeyError in rasterization report use-cases

Agent diary generation: move visit_freqs init into Agent, remove unne…

68e1032

…cessary defensive checks, derive timestamps when extending diaries, early gravity check, and streamline initialization paths.

cleaned up docstring

7cdadae

Simplified logic to add buildings

10b05a9

paco-barreras added 15 commits November 5, 2025 17:28

updated notebooks for experiments

e8ed48e

cleanup of lagging files

c2bd6e6

paco-barreras requested review from Krvenkat12 and andresmondragont November 6, 2025 09:05

paco-barreras added 8 commits November 13, 2025 01:06

Refactor _sample_step: use previous building geometry in bound_poly, …

197af25

…vectorize street_geom and path construction

partial merge of sandbox files from efficient-garden-city-routing

ce180cf

formatting of date as string in save_pop

355247b

Fix trajectory rotation: store rotation_deg and rotation_origin in me…

5991272

…tadata, remove analysis/report/plan markdown files

paco-barreras force-pushed the rasterization-and-street-graph branch from 9fbb49d to 5991272 Compare November 14, 2025 06:09

paco-barreras closed this Nov 18, 2025

paco-barreras deleted the rasterization-and-street-graph branch November 18, 2025 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rasterization and street graph #191

Rasterization and street graph #191

Uh oh!

paco-barreras commented Nov 6, 2025

Uh oh!

paco-barreras commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rasterization and street graph #191

Rasterization and street graph #191

Uh oh!

Conversation

paco-barreras commented Nov 6, 2025

Uh oh!

paco-barreras commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants