A vanilla minimal implementation of FreeTimeGS built on gsplat for reconstructing dynamic scenes from multi-view video.
Key Features
- 4D Gaussian Primitives - Each Gaussian has position, velocity, time, and duration
- Temporal Motion Model -
x(t) = x + v * (t - t_canonical) - gsplat Backend - Efficient CUDA kernels for fast rendering
- Flexible Optimization - MCMC and DefaultStrategy densification
- Keyframe Processing - Smart sampling for large video sequences
Based on the paper: FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang, Yong Chen, Hujun Bao, Sida Peng, Xiaowei Zhou CVPR 2025 [Paper] [Project Page]
FreeTimeGsVanilla/
│
├── src/ # Core source code
│ ├── simple_trainer_freetime_4d_pure_relocation.py # Main 4D GS trainer
│ ├── combine_frames_fast_keyframes.py # Keyframe point cloud combiner
│ ├── viewer_4d.py # Interactive 4D Gaussian viewer
│ └── utils.py # Utility functions (KNN, colormap, etc.)
│
├── datasets/ # Data loading & processing
│ ├── __init__.py # Package exports
│ ├── FreeTime_dataset.py # Dataset class (COLMAP poses, images)
│ ├── normalize.py # Scene normalization utilities
│ ├── traj.py # Camera trajectory generation
│ └── read_write_model.py # COLMAP binary/text I/O
│
├── run_pipeline.sh # Full pipeline (combine + train)
├── run_small.sh # Quick training (4M points)
├── run_full.sh # Full training (15M points)
│
├── LICENSE # AGPL-3.0 license
└── README.md # This file
The training pipeline consists of two main steps:
-
Point Cloud Preparation (
src/combine_frames_fast_keyframes.py):- Loads per-frame triangulated 3D points
- Extracts keyframes at specified intervals
- Estimates velocity using k-NN matching between consecutive keyframes
- Outputs an NPZ file with positions, velocities, colors, and timestamps
-
4D Gaussian Training (
src/simple_trainer_freetime_4d_pure_relocation.py):- Initializes 4D Gaussians from the NPZ file
- Trains with temporal parameters (position, velocity, time, duration)
- Outputs PLY sequences and trajectory videos
Processing every single frame of a video is computationally expensive and often redundant. Adjacent frames are typically very similar. Instead, we use keyframes - frames sampled at regular intervals.
The --keyframe-step parameter controls how many frames to skip between keyframes:
- Step = 1: Use ALL frames (no skipping) - most accurate but slowest
- Step = 5: Use every 5th frame (0, 5, 10, 15, ...) - good balance
- Step = 10: Use every 10th frame - faster but less temporal detail
Example: For a 60-frame video with --keyframe-step 5:
Frames: 0 1 2 3 4 5 6 7 8 9 10 11 12 ... 55 56 57 58 59
Keyframes: * * * *
0 5 10 55
This extracts 12 keyframes instead of 60 frames, reducing memory and computation by ~5x while preserving motion information.
Velocity is computed between consecutive keyframes (not all frames):
v = (position_keyframe[t+step] - position_keyframe[t]) / step
This gives the average velocity over the keyframe interval.
The NPZ file contains the initial 4D Gaussian data:
| Field | Shape | Description |
|---|---|---|
positions |
[N, 3] | 3D coordinates (x, y, z) |
velocities |
[N, 3] | Velocity vectors (vx, vy, vz) |
colors |
[N, 3] | RGB colors normalized to [0, 1] |
times |
[N, 1] | Normalized timestamps in [0, 1] |
durations |
[N, 1] | Temporal duration (visibility window) |
has_velocity |
[N] | Boolean mask for valid velocity estimates |
Metadata fields:
frame_start,frame_end: Frame rangen_keyframes: Number of keyframes usedkeyframe_step: Step between keyframesmode: Processing mode identifier
import numpy as np
# Your triangulated point clouds (one per frame)
points_frame_0 = np.load("points3d_frame000000.npy") # [M, 3]
colors_frame_0 = np.load("colors_frame000000.npy") # [M, 3], values 0-255
# Combine and save
np.savez(
"init_points.npz",
positions=positions, # [N, 3] float32
velocities=velocities, # [N, 3] float32
colors=colors / 255.0, # [N, 3] float32, normalized to [0, 1]
times=times, # [N, 1] float32, normalized to [0, 1]
durations=durations, # [N, 1] float32
has_velocity=has_velocity # [N] bool
)The src/combine_frames_fast_keyframes.py script expects:
input_dir/
├── points3d_frame000000.npy # [M, 3] float32 - 3D positions
├── colors_frame000000.npy # [M, 3] float32 - RGB colors (0-255)
├── points3d_frame000001.npy
├── colors_frame000001.npy
├── ...
└── points3d_frameXXXXXX.npy
These are typically generated by triangulating matched features across camera views.
The trainer expects a COLMAP sparse reconstruction:
data_dir/
├── images/ # Or images_Nx/ for downsampled
│ ├── cam01_frame000000.jpg
│ └── ...
└── sparse/
└── 0/
├── cameras.bin
├── images.bin
└── points3D.bin
bash run_pipeline.sh \
/path/to/triangulation/output \ # Input: per-frame NPY files
/path/to/colmap/data \ # COLMAP reconstruction
/path/to/results \ # Output directory
0 \ # Start frame
61 \ # End frame
5 \ # Keyframe step
0 \ # GPU ID
default_keyframe_small # Config nameStep 1: Combine keyframes
python src/combine_frames_fast_keyframes.py \
--input-dir /path/to/triangulation/output \
--output-path /path/to/keyframes.npz \
--frame-start 0 \
--frame-end 60 \
--keyframe-step 5Step 2: Train 4D Gaussians
CUDA_VISIBLE_DEVICES=0 python src/simple_trainer_freetime_4d_pure_relocation.py default_keyframe \
--data-dir /path/to/colmap/data \
--init-npz-path /path/to/keyframes.npz \
--result-dir /path/to/results \
--start-frame 0 \
--end-frame 61 \
--max-steps 30000| Config | Points | Description |
|---|---|---|
default_keyframe |
~15M | Full resolution, higher quality |
default_keyframe_small |
~4M | Reduced points, faster training |
After training, you'll find:
results/
├── ckpts/
│ └── ckpt_30000.pt # Model checkpoint
├── videos/
│ ├── traj_4d_step30000.mp4 # RGB trajectory video
│ ├── traj_duration_step30000.mp4 # Duration heatmap
│ └── traj_velocity_step30000.mp4 # Velocity heatmap
├── ply_sequence_step30000/
│ ├── frame_000000.ply # Per-frame PLY exports
│ └── ...
└── tb/ # TensorBoard logs
An interactive viewer for visualizing trained 4D Gaussian Splatting models with temporal animation.
The viewer requires additional dependencies:
# Core dependencies
pip install torch torchvision # PyTorch 2.0+
# Gaussian splatting backend
pip install gsplat # or: pip install git+https://github.com/nerfstudio-project/gsplat.git
# Viewer dependencies
pip install viser nerfview numpyVerify installation:
python -c "import viser; import nerfview; import gsplat; print('All dependencies installed!')"CUDA_VISIBLE_DEVICES=0 python src/viewer_4d.py \
--ckpt /path/to/results/ckpts/ckpt_30000.pt \
--port 8080 \
--total-frames 60 \
--temporal-threshold 0.05 \
--spatial-percentile 95Then open http://localhost:8080 in your browser.
The checkpoint file contains all trained 4D Gaussian parameters:
checkpoint = {
"splats": {
"means": tensor[N, 3], # Canonical 3D positions
"scales": tensor[N, 3], # Log-scale parameters
"quats": tensor[N, 4], # Rotation quaternions (wxyz)
"opacities": tensor[N], # Logit opacities
"sh0": tensor[N, 1, 3], # DC spherical harmonics
"shN": tensor[N, K, 3], # Higher-order SH coefficients
# 4D temporal parameters:
"times": tensor[N, 1], # Canonical time (when Gaussian is most visible)
"durations": tensor[N, 1], # Log temporal duration (visibility window width)
"velocities": tensor[N, 3], # Linear velocity vectors
},
"step": int, # Training step
...
}| Argument | Default | Description |
|---|---|---|
--ckpt |
required | Path to trained checkpoint .pt file |
--port |
8080 | HTTP port for the viewer |
--device |
cuda | Device to use (cuda, cuda:0, cuda:1, etc.) |
--total-frames |
300 | Total number of frames in the sequence |
--temporal-threshold |
0.01 | Minimum temporal opacity to render a Gaussian |
--spatial-percentile |
95 | Percentile of points to keep (removes outliers) |
--no-spatial-filter |
False | Disable spatial filtering |
--no-precompute |
False | Disable precomputing visibility masks |
--sh-degree |
3 | Spherical harmonics degree |
Controls which Gaussians are rendered at each frame based on their temporal opacity.
Each Gaussian has a temporal opacity computed as:
temporal_opacity(t) = exp(-0.5 * ((t - t_canonical) / duration)^2)
- Lower threshold (0.01): More Gaussians visible, smoother but slower
- Higher threshold (0.1): Fewer Gaussians, faster but may show gaps
Temporal opacity vs time for a Gaussian centered at t=0.5:
1.0 | ****
| * *
0.5 | * *
| * *
0.05 -|---*----------*--- threshold
| * *
0.0 +-------------------> time
0.0 0.5 1.0
^
Gaussian visible when opacity > threshold
Removes outlier Gaussians that are far from the scene center.
- 95%: Keep Gaussians within the 95th percentile distance from center (removes 5% outliers)
- 99%: Keep more Gaussians (removes only 1% outliers)
- 100%: Keep all Gaussians (no spatial filtering)
This is useful when training produces "floater" artifacts far from the main scene.
Example with 5M Gaussians:
┌─────────────────────────────────────┐
│ · · · · │ <- outliers (removed)
│ ┌───────────────────────┐ │
│ │ * * * * * * * * * * │ │ <- 95% kept
│ │ * * * SCENE * * * * │ │
│ │ * * * * * * * * * * │ │
│ └───────────────────────┘ │
│ · · │ <- outliers (removed)
└─────────────────────────────────────┘
Once the viewer is running, you can control it through the web interface:
Animation Panel:
- Frame Slider: Manually scrub through time
- Auto Play: Toggle automatic playback
- Play Speed (FPS): Control playback speed (1-60 FPS)
Visibility Filtering Panel:
- Temporal Opacity Threshold: Adjust visibility threshold in real-time
- Use Visibility Mask: Toggle efficient rendering on/off
Camera Controls (in browser):
- Left-click + drag: Rotate camera
- Right-click + drag: Pan camera
- Scroll: Zoom in/out
The viewer uses multi-level filtering for efficient rendering:
| Filter Stage | Purpose | Typical Reduction |
|---|---|---|
| Spatial filter | Remove outliers | 100% → 96% |
| Base opacity filter | Remove transparent Gaussians | 96% → 95% |
| Temporal filter | Only render temporally-visible | 95% → 8% |
Result: Only ~8% of Gaussians are rendered per frame, enabling interactive framerates with millions of Gaussians.
Basic viewing:
python src/viewer_4d.py --ckpt results/ckpts/ckpt_30000.pt --total-frames 60High-quality (show more Gaussians):
python src/viewer_4d.py \
--ckpt results/ckpts/ckpt_30000.pt \
--total-frames 60 \
--temporal-threshold 0.01 \
--spatial-percentile 99Fast preview (fewer Gaussians):
python src/viewer_4d.py \
--ckpt results/ckpts/ckpt_30000.pt \
--total-frames 60 \
--temporal-threshold 0.1 \
--spatial-percentile 90Debug mode (no filtering):
python src/viewer_4d.py \
--ckpt results/ckpts/ckpt_30000.pt \
--total-frames 60 \
--no-spatial-filter \
--temporal-threshold 0.0| Parameter | Default | Description |
|---|---|---|
--keyframe-step |
5 | Frames between keyframes |
--max-velocity-distance |
0.5 | Max k-NN match distance |
--sample-ratio |
1.0 | Point subsampling ratio |
| Parameter | Default | Description |
|---|---|---|
--max-steps |
60000 | Training iterations |
--init-duration |
0.1 | Initial temporal duration |
--velocity-lr-start |
5e-3 | Initial velocity learning rate |
--velocity-lr-end |
1e-4 | Final velocity learning rate |
--lambda-4d-reg |
1e-3 | 4D regularization weight |
Each Gaussian has 8 learnable parameter groups:
- Position (x): [N, 3] - Canonical 3D position
- Time (t): [N, 1] - When the Gaussian is most visible
- Duration (s): [N, 1] - Temporal width
- Velocity (v): [N, 3] - Linear velocity
- Scale: [N, 3] - 3D scale
- Quaternion: [N, 4] - Rotation
- Opacity: [N] - Base opacity
- Spherical Harmonics: [N, K, 3] - View-dependent color
Position at time t:
x(t) = x + v * (t - t_canonical)
Temporal opacity (Gaussian falloff):
opacity(t) = exp(-0.5 * ((t - t_canonical) / duration)^2)
If you find this work useful, please cite the original paper:
@InProceedings{Wang_2025_CVPR,
author = {Wang, Yifan and Yang, Peishan and Xu, Zhen and Sun, Jiaming and Zhang, Zhanhua and Chen, Yong and Bao, Hujun and Peng, Sida and Zhou, Xiaowei},
title = {FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {21750-21760}
}This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.
