This tutorial guides you through using PyCuSFM for Structure from Motion (SfM) reconstruction.
- Raw Data Requirements
- Command Line Interface
- Quick Start Guide
- Multi-track Input or Localization Mode
- KITTI Dataset Example
- Bundle Adjustment Runner
Your input data folder should contain:
- Image files: Camera images in supported formats (JPEG, PNG, etc.)
frames_meta.json: Metadata file following theKeyframesMetadataCollectionprotobuf format
The frames_meta.json file is the core metadata file that follows the KeyframesMetadataCollection protobuf definition. It contains:
Required Top-Level Fields
keyframes_metadata: Array of keyframe metadata objectsinitial_pose_type: Pose interpretation type (EGO_MOTION, ALIGNMENT, GPS_IMU, MAP_POSE, SESSION_EGO_MOTION)camera_params_id_to_camera_params: Map of camera parameter IDs to camera calibration datacamera_params_id_to_session_name: Map of camera parameter IDs to session names
Optional Top-Level Fields
reference_latlngalt: GPS reference point for absolute positioningstereo_pair: Array defining stereo camera pairs with baseline distancesdetector_type: Feature detector type (e.g., SIFT, ORB, ALIKED)descriptor_type: Feature descriptor typevehicle_trajectory_files: Relative paths to trajectory filestrack_id_to_track_name: Mapping of track IDs to track names
Keyframe Metadata Structure
Each entry in keyframes_metadata contains:
id: Unique keyframe identifier (string)camera_params_id: Reference to camera parameters (string)timestamp_microseconds: Timestamp in microseconds (string)image_name: Relative path to image file (string)camera_to_world: 6DOF camera pose with axis-angle rotation and translationaxis_angle: Rotation as axis-angle representation with angle in degreesx,y,z: Rotation axis componentsangle_degrees: Rotation angle in degrees
translation: 3D position in world coordinatesx,y,z: Translation components in meters
synced_sample_id: Synchronization ID for multi-camera setups (string)
Camera Parameters Structure
Each camera in camera_params_id_to_camera_params includes:
sensor_meta_data:sensor_id: Unique sensor identifiersensor_type: Type (typically "CAMERA")sensor_name: Human-readable camera namefrequency: Frame rate in Hzsensor_to_vehicle_transform: Transform from camera to vehicle coordinate frame
calibration_parameters:image_width,image_height: Image dimensionscamera_matrix: 3x3 intrinsic camera matrixdistortion_coefficients: Distortion parametersrectification_matrix: Rectification matrix (for stereo cameras)projection_matrix: Projection matrix
Stereo Pair Configuration
For stereo camera setups, stereo_pair defines:
left_camera_param_id: ID of left cameraright_camera_param_id: ID of right camerabaseline_meters: Baseline distance between cameras in meters
Initial Pose Types
The initial_pose_type field determines how PyCuSFM interprets the camera poses in camera_to_world:
EGO_MOTION: Relative poses within a single track/sequence. Camera poses are relative to the first frame of the track.SESSION_EGO_MOTION: Relative poses within a session that may contain multiple tracks.ALIGNMENT: High-weight global pose constraints used for alignment between different sequences.GPS_IMU: Absolute position constraints from GPS/IMU sensors in global coordinates.MAP_POSE: Constant poses that don't change during optimization (e.g., previously mapped locations).
The most common use case is EGO_MOTION for sequential visual odometry data.
Example frames_meta.json
Here's a minimal example showing the required structure:
{
"keyframes_metadata": [
{
"id": "2356",
"camera_params_id": "6",
"timestamp_microseconds": "1707938736136246",
"image_name": "right_stereo_camera_left/1707938736136246532.jpeg",
"camera_to_world": {
"axis_angle": {
"x": 0.008544724537343526,
"y": -0.7085686596387489,
"z": 0.7055901375872029,
"angle_degrees": 177.9004863048234
},
"translation": {
"x": 4.607160850493191,
"y": -0.0672254120438331,
"z": 0.24059434306041566
}
},
"synced_sample_id": "318"
}
],
"initial_pose_type": "EGO_MOTION",
"camera_params_id_to_session_name": {
"6": "0"
},
"camera_params_id_to_camera_params": {
"6": {
"sensor_meta_data": {
"sensor_id": 6,
"sensor_type": "CAMERA",
"sensor_name": "right_stereo_camera_left",
"frequency": 30,
"sensor_to_vehicle_transform": {
"axis_angle": { "x": 0, "y": 0, "z": 0, "angle_degrees": 0 },
"translation": { "x": 0.093139, "y": -0.075002, "z": 0.34439 }
}
},
"calibration_parameters": {
"image_width": 1920,
"image_height": 1200,
"camera_matrix": {
"data": [961.123, 0, 952.127, 0, 958.858, 591.744, 0, 0, 1]
},
"distortion_coefficients": {
"data": [-0.173, 0.027, 0, 0, 0]
}
}
}
},
"stereo_pair": [
{
"left_camera_param_id": "6",
"right_camera_param_id": "7",
"baseline_meters": 0.1499989761475276
}
]
}-
Coordinate Systems:
camera_to_worldtransforms points from camera coordinate frame to world coordinate framesensor_to_vehicle_transformtransforms from camera to vehicle coordinate frame- Rotations use axis-angle representation with angle in degrees
-
Image Paths: All image paths in
image_nameare relative to the directory containingframes_meta.json -
Timestamp Format: Timestamps must be in microseconds as strings (not integers)
-
Optional Rolling Shutter Support:
- Use
start_camera_to_worldfor rolling shutter cameras (pose of first row) camera_to_worldbecomes the pose of the last row- Include
camera_params_id_to_distorted_row_indicesfor row distortion data
- Use
-
Multi-Camera Synchronization: Use
synced_sample_idto group frames captured simultaneously across different cameras
The example data in data/r2b_galileo demonstrates the expected format with 4 stereo camera pairs and multiple samples:
π Directory Structure
βββ frames_meta.json
βββ back_stereo_camera_left/
β βββ 1707938736136244532.jpeg
β βββ 1707938736169573532.jpeg
β βββ ... (30 total images)
βββ back_stereo_camera_right/
β βββ 1707938736136244532.jpeg
β βββ 1707938736169573532.jpeg
β βββ ... (30 total images)
βββ front_stereo_camera_left/
β βββ 1707938736136247532.jpeg
β βββ 1707938736169575532.jpeg
β βββ ... (29 total images)
βββ front_stereo_camera_right/
β βββ 1707938736136247532.jpeg
β βββ 1707938736169575532.jpeg
β βββ ... (29 total images)
βββ left_stereo_camera_left/
β βββ 1707938736169592532.jpeg
β βββ 1707938736202921532.jpeg
β βββ ... (29 total images)
βββ left_stereo_camera_right/
β βββ 1707938736169592532.jpeg
β βββ 1707938736202921532.jpeg
β βββ ... (29 total images)
βββ right_stereo_camera_left/
β βββ 1707938736136246532.jpeg
β βββ 1707938736169575532.jpeg
β βββ ... (28 total images)
βββ right_stereo_camera_right/
βββ 1707938736136246532.jpeg
βββ 1707938736169575532.jpeg
βββ ... (28 total images)
To convert a rosbag to the required mapping data format, follow the ISAAC Mapping ROS tutorial. This will help you:
- Set up the ISAAC Mapping ROS package
- Use the
rosbag_to_mapping_databinary - Provide initial pose estimates from a pose bag or TUM pose file
PyCuSFM provides the cusfm_cli command-line tool that runs the complete SfM pipeline:
cusfm_cli --input_dir <input_dir> --cusfm_base_dir <cusfm_base_dir>Required Parameters:
--input_dir: Path to your mapping data--cusfm_base_dir: Output directory for PyCuSFM results
βοΈ General Options
--binary_dir <binary_dir>: Specify path to cuSFM binary files--config_dir <config_dir>: Specify path to configuration files--enable_debug: Enable debug mode (saves intermediate results like feature extraction and matching)--av_data: Use default parameters optimized for outdoor autonomous driving scenarios--use_rsc: Enable rolling shutter correction (use when data contains rolling shutter correction information)
π Pipeline Control
Control which steps to run using --steps_to_run or skip specific steps with --skip_* options:
# Run specific steps
--steps_to_run feature_extractor vocab_generator pose_graph matcher mapper map_convertor
# Skip specific steps
--skip_cuvslam
--skip_feature_extractor
--skip_vocab_generator
--skip_pose_graph
--skip_matcher
--skip_mapper
--skip_map_convertorπ οΈ Step-Specific Options
--mask_dir <mask_dir>: Specify image mask regions (masked areas are excluded from feature extraction)--feature_type=[aliked,sift,superpoint](default: aliked)--multi_track_input: Process multiple image sequences
Geometry-based Keyframe Selection:
--min_inter_frame_distance: Minimum translational distance (in meters) between consecutive keyframes for geometry-based selection (default: 0.5)--min_inter_frame_rotation_degrees: Minimum rotational change (in degrees) between consecutive keyframes for geometry-based selection (default: 5)
These parameters filter keyframes based on geometric criteria - only frames that exceed the specified translation or rotation thresholds relative to the previous keyframe will be selected.
--skip_vocab_generator --cuvgl_dir=<cuvgl_dir>: Use prebuilt vocabulary instead of generating a new one
--debug_interval: Interval for saving debug matching images (default: 500)
--ba_frame_type=vehicle_rig: Fix extrinsics but do not optimize them--optimize_extrinsics --ba_frame_type=vehicle_rig: Enable extrinsic parameter refinement during bundle adjustment
Converts poses and sparse 3D maps to COLMAP format for visualization.
CUSFM supports multiple models for feature extraction and matching. During runtime, the system requires .engine files. If these files don't exist, the system will automatically load .onnx files from the model folder, compile them, and save the resulting .engine files. This compilation process can be time-consuming, but typically only needs to be performed once per device platform. Subsequent runs will use the previously saved .engine files for faster initialization.
Alternatively, you can use the Engine Exporter tool to pre-compile and save .engine files to the model folder before runtime. For detailed usage instructions of the Engine Exporter tool, please refer to the documentation here.
Follow these steps to run PyCuSFM with example data:
wget2 --max-threads=5 -r --no-parent --reject "index.html" --cut-dirs=5 -nH -P data/NRE1boxr0_2024_05_01-17_02_15/mapping_data https://pdx.s8k.io/v1/AUTH_team-osmo-ops/workflows/cuvslam_and_cusfm_benchmark-120/rosbag_to_mapping_data_conversion_0/Data Details:
- Source: NRE1boxr0 rosbag
- Content: 8 cameras, 450 images per camera
- Initial poses: From cuVSLAM
cusfm_cli --input_dir data/NRE1boxr0_2024_05_01-17_02_15/mapping_data --cusfm_base_dir data/NRE1boxr0_2024_05_01-17_02_15/cusfmβββ cuvgl_map
β βββ bow_index.pb
β βββ vocabulary
βββ keyframes
β βββ back_stereo_camera_left
β β βββ 1714608155873290968.pb
β βββ ...
β βββ frames_meta.json
βββ kpmap
β βββ keyframes
β βββ map_keypoints.pb
βββ matches
βββ pose_graph
βββ output_poses
β βββ 0
β β βββ camera_name-back_stereo_camera_left_pose_file.txt
β β βββ ...
β βββ merged_pose_file.tum
βββ sparse
βββ cameras.txt
βββ images.txt
βββ points3D.txt
kpmap/keyframes/frames_meta.json: Optimized keyframe poses (PyCuSFM format)output_poses/: TUM format pose filesmerged_pose_file.tum: Combined poses from all cameras0/: Individual camera pose files
sparse/: COLMAP sparse format for 3D point cloud and camera parameters
CuSFM supports joint mapping from multiple track inputs. The key difference in localization mode is that some tracks serve as fixed reference maps, while other tracks adjust their poses relative to these reference tracks. Therefore, input tracks can be categorized into fixed tracks (which can be considered as the map) and floating tracks.
Multi-track input is ideal for:
- Localization: Use existing map tracks as reference for new data
- Map merging: Combine multiple mapping sessions
- Cross-validation: Compare results across different data collection runs
- Incremental mapping: Add new areas to existing maps
For Isaac data or data with stereo cameras, CuSFM can independently compute relative poses between tracks without requiring initial poses to be in the same coordinate system. This is achieved through stereo camera geometry and robust feature matching across tracks.
Basic command for Isaac data:
cusfm_cli \
--input_dir <multi_track_data_dir> \
--cusfm_base_dir <output_dir> \
--multi_track_input \
--anchor_track=track_folder_name0,track_folder_name1 \
--use_cuvslam_slam_pose=False \
--skip_pose_graphFor AV data or data without stereo cameras, CuSFM currently cannot independently compute relative poses between tracks. Therefore, initial poses must be provided in the same coordinate system. This limitation requires pre-aligned pose estimates.
Basic command for AV data:
cusfm_cli \
--input_dir <multi_track_data_dir> \
--cusfm_base_dir <output_dir> \
--multi_track_input \
--anchor_track=track_folder_name0,track_folder_name1 \
--av_data--multi_track_input: Enables multi-track processing mode--anchor_track: Comma-separated list of track folder names to fix during mapping. These tracks serve as the reference coordinate system.
Multi-track data should be organized as follows:
multi_track_data_dir/
βββ track_00/
β βββ camera_1/
β β βββ image1.jpg
β β βββ image2.jpg
β βββ camera_2/
β β βββ image1.jpg
β β βββ image2.jpg
β βββ frames_meta.json
βββ track_01/
β βββ camera_1/
β β βββ image1.jpg
β β βββ image2.jpg
β βββ camera_2/
β β βββ image1.jpg
β β βββ image2.jpg
β βββ frames_meta.json
βββ ...
Each track folder contains its own camera subdirectories and a frames_meta.json file with pose information for that track.
Data Requirements per Track:
- At least one camera subdirectory with images
frames_meta.jsonwith timestamp, pose, and image path information- Consistent naming convention across tracks (camera folder names should match)
- For stereo setups: left/right camera pairs should be clearly identified
This section demonstrates how to run stereo visual odometry on the KITTI dataset.
-
Download KITTI Data
- Visit KITTI odometry dataset
- Register for an account
- Download
data_odometry_gray.zip(22 GB)
-
Extract Data
unzip data_odometry_gray.zip -d data/kitti
-
frames_meta.json File Setup
KITTI datasets don't include the
frames_meta.jsonfile required by PyCuSFM. The provided script can generate theframes_meta.jsonfile from the KITTI dataset:python data/kitti/get_framemeta_file_for_KITTI.py <dataset_dir> [--output-name OUTPUT_NAME]
For other custom datasets, you can refer to this script as a reference to generate the required
frames_meta.jsonfile. -
Expected Structure
data/kitti/ βββ 00 β βββ image_0 β β βββ 000000.png β β βββ ... β βββ image_1 β β βββ 000000.png β β βββ ... β βββ frames_meta.json βββ 02 β βββ ... βββ config/
Execute the following command for KITTI sequence 00:
cusfm_cli --input_dir data/kitti/00 --cusfm_base_dir data/kitti/00_result --config_dir data/kitti/configNote: The experiments in the cuSFM paper were conducted using pose graph optimization without data association. You can use the --skip_data_association flag to skip the data association step.
The bundle_adjustment_runner is a standalone tool that performs bundle adjustment optimization on COLMAP sparse reconstruction data using the Isaac Visual Mapping bundle adjustment infrastructure. This tool is useful when you have existing COLMAP sparse model files and want to refine them using the robust Isaac bundle adjustment solver.
cusfm_tool --binary_name bundle_adjustment_runner \
--args "--colmap_sparse_dir /path/to/colmap/sparse --output_dir /path/to/output [options]"- COLMAP Format Support: Reads standard COLMAP sparse model files (images.txt, cameras.txt, points3D.txt) in both text and binary formats
- Isaac Bundle Adjustment: Uses the same robust bundle adjustment infrastructure as the main cuSFM pipeline
- Configurable Optimization: Supports various loss functions, iteration limits, and solver options
- Output Formats: Saves optimized camera poses, 3D points, and camera parameters in readable formats
The bundle adjustment runner can be used as a standalone tool or integrated into the cuSFM pipeline:
- Standalone Usage: Directly optimize existing COLMAP sparse models
- Post-Processing: Refine results from the cuSFM pipeline's COLMAP output
- Comparison: Compare Isaac bundle adjustment results with COLMAP's native bundle adjuster
For detailed usage instructions and configuration options, see the auxiliary tools document.
