Skip to content

OpsiClear/FreeTimeGsVanilla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FreeTimeGSVanilla

Gsplat-based 4D Gaussian Splatting for Dynamic Scenes

FreeTimeGS Demo

License: AGPL v3 Python 3.10+

A vanilla minimal implementation of FreeTimeGS built on gsplat for reconstructing dynamic scenes from multi-view video.

Key Features

  • 4D Gaussian Primitives - Each Gaussian has position, velocity, time, and duration
  • Temporal Motion Model - x(t) = x + v * (t - t_canonical)
  • gsplat Backend - Efficient CUDA kernels for fast rendering
  • Flexible Optimization - MCMC and DefaultStrategy densification
  • Keyframe Processing - Smart sampling for large video sequences

Based on the paper: FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang, Yong Chen, Hujun Bao, Sida Peng, Xiaowei Zhou CVPR 2025 [Paper] [Project Page]


Repository Structure

FreeTimeGsVanilla/
│
├── src/                          # Core source code
│   ├── simple_trainer_freetime_4d_pure_relocation.py   # Main 4D GS trainer
│   ├── combine_frames_fast_keyframes.py                # Keyframe point cloud combiner
│   ├── viewer_4d.py                                    # Interactive 4D Gaussian viewer
│   └── utils.py                                        # Utility functions (KNN, colormap, etc.)
│
├── datasets/                     # Data loading & processing
│   ├── __init__.py               # Package exports
│   ├── FreeTime_dataset.py       # Dataset class (COLMAP poses, images)
│   ├── normalize.py              # Scene normalization utilities
│   ├── traj.py                   # Camera trajectory generation
│   └── read_write_model.py       # COLMAP binary/text I/O
│
├── run_pipeline.sh               # Full pipeline (combine + train)
├── run_small.sh                  # Quick training (4M points)
├── run_full.sh                   # Full training (15M points)
│
├── LICENSE                       # AGPL-3.0 license
└── README.md                     # This file

Pipeline Overview

The training pipeline consists of two main steps:

  1. Point Cloud Preparation (src/combine_frames_fast_keyframes.py):

    • Loads per-frame triangulated 3D points
    • Extracts keyframes at specified intervals
    • Estimates velocity using k-NN matching between consecutive keyframes
    • Outputs an NPZ file with positions, velocities, colors, and timestamps
  2. 4D Gaussian Training (src/simple_trainer_freetime_4d_pure_relocation.py):

    • Initializes 4D Gaussians from the NPZ file
    • Trains with temporal parameters (position, velocity, time, duration)
    • Outputs PLY sequences and trajectory videos

Keyframes vs All Frames (Stride/Step)

Why Keyframes?

Processing every single frame of a video is computationally expensive and often redundant. Adjacent frames are typically very similar. Instead, we use keyframes - frames sampled at regular intervals.

Keyframe Step (Stride)

The --keyframe-step parameter controls how many frames to skip between keyframes:

  • Step = 1: Use ALL frames (no skipping) - most accurate but slowest
  • Step = 5: Use every 5th frame (0, 5, 10, 15, ...) - good balance
  • Step = 10: Use every 10th frame - faster but less temporal detail

Example: For a 60-frame video with --keyframe-step 5:

Frames:    0  1  2  3  4  5  6  7  8  9  10 11 12 ... 55 56 57 58 59
Keyframes: *              *              *              *
           0              5              10             55

This extracts 12 keyframes instead of 60 frames, reducing memory and computation by ~5x while preserving motion information.

Velocity Estimation

Velocity is computed between consecutive keyframes (not all frames):

v = (position_keyframe[t+step] - position_keyframe[t]) / step

This gives the average velocity over the keyframe interval.

NPZ File Format

The NPZ file contains the initial 4D Gaussian data:

Field Shape Description
positions [N, 3] 3D coordinates (x, y, z)
velocities [N, 3] Velocity vectors (vx, vy, vz)
colors [N, 3] RGB colors normalized to [0, 1]
times [N, 1] Normalized timestamps in [0, 1]
durations [N, 1] Temporal duration (visibility window)
has_velocity [N] Boolean mask for valid velocity estimates

Metadata fields:

  • frame_start, frame_end: Frame range
  • n_keyframes: Number of keyframes used
  • keyframe_step: Step between keyframes
  • mode: Processing mode identifier

Example NPZ Creation

import numpy as np

# Your triangulated point clouds (one per frame)
points_frame_0 = np.load("points3d_frame000000.npy")  # [M, 3]
colors_frame_0 = np.load("colors_frame000000.npy")    # [M, 3], values 0-255

# Combine and save
np.savez(
    "init_points.npz",
    positions=positions,      # [N, 3] float32
    velocities=velocities,    # [N, 3] float32
    colors=colors / 255.0,    # [N, 3] float32, normalized to [0, 1]
    times=times,              # [N, 1] float32, normalized to [0, 1]
    durations=durations,      # [N, 1] float32
    has_velocity=has_velocity # [N] bool
)

Input Requirements

Per-Frame Point Cloud Files

The src/combine_frames_fast_keyframes.py script expects:

input_dir/
├── points3d_frame000000.npy   # [M, 3] float32 - 3D positions
├── colors_frame000000.npy     # [M, 3] float32 - RGB colors (0-255)
├── points3d_frame000001.npy
├── colors_frame000001.npy
├── ...
└── points3d_frameXXXXXX.npy

These are typically generated by triangulating matched features across camera views.

COLMAP Data

The trainer expects a COLMAP sparse reconstruction:

data_dir/
├── images/                    # Or images_Nx/ for downsampled
│   ├── cam01_frame000000.jpg
│   └── ...
└── sparse/
    └── 0/
        ├── cameras.bin
        ├── images.bin
        └── points3D.bin

Usage

Full Pipeline

bash run_pipeline.sh \
    /path/to/triangulation/output \   # Input: per-frame NPY files
    /path/to/colmap/data \            # COLMAP reconstruction
    /path/to/results \                # Output directory
    0 \                               # Start frame
    61 \                              # End frame
    5 \                               # Keyframe step
    0 \                               # GPU ID
    default_keyframe_small            # Config name

Step by Step

Step 1: Combine keyframes

python src/combine_frames_fast_keyframes.py \
    --input-dir /path/to/triangulation/output \
    --output-path /path/to/keyframes.npz \
    --frame-start 0 \
    --frame-end 60 \
    --keyframe-step 5

Step 2: Train 4D Gaussians

CUDA_VISIBLE_DEVICES=0 python src/simple_trainer_freetime_4d_pure_relocation.py default_keyframe \
    --data-dir /path/to/colmap/data \
    --init-npz-path /path/to/keyframes.npz \
    --result-dir /path/to/results \
    --start-frame 0 \
    --end-frame 61 \
    --max-steps 30000

Available Configs

Config Points Description
default_keyframe ~15M Full resolution, higher quality
default_keyframe_small ~4M Reduced points, faster training

Outputs

After training, you'll find:

results/
├── ckpts/
│   └── ckpt_30000.pt              # Model checkpoint
├── videos/
│   ├── traj_4d_step30000.mp4      # RGB trajectory video
│   ├── traj_duration_step30000.mp4    # Duration heatmap
│   └── traj_velocity_step30000.mp4    # Velocity heatmap
├── ply_sequence_step30000/
│   ├── frame_000000.ply           # Per-frame PLY exports
│   └── ...
└── tb/                            # TensorBoard logs

4D Viewer

An interactive viewer for visualizing trained 4D Gaussian Splatting models with temporal animation.

Installation

The viewer requires additional dependencies:

# Core dependencies
pip install torch torchvision  # PyTorch 2.0+

# Gaussian splatting backend
pip install gsplat  # or: pip install git+https://github.com/nerfstudio-project/gsplat.git

# Viewer dependencies
pip install viser nerfview numpy

Verify installation:

python -c "import viser; import nerfview; import gsplat; print('All dependencies installed!')"

Quick Start

CUDA_VISIBLE_DEVICES=0 python src/viewer_4d.py \
    --ckpt /path/to/results/ckpts/ckpt_30000.pt \
    --port 8080 \
    --total-frames 60 \
    --temporal-threshold 0.05 \
    --spatial-percentile 95

Then open http://localhost:8080 in your browser.

Checkpoint File Format (.pt)

The checkpoint file contains all trained 4D Gaussian parameters:

checkpoint = {
    "splats": {
        "means": tensor[N, 3],       # Canonical 3D positions
        "scales": tensor[N, 3],      # Log-scale parameters
        "quats": tensor[N, 4],       # Rotation quaternions (wxyz)
        "opacities": tensor[N],      # Logit opacities
        "sh0": tensor[N, 1, 3],      # DC spherical harmonics
        "shN": tensor[N, K, 3],      # Higher-order SH coefficients
        # 4D temporal parameters:
        "times": tensor[N, 1],       # Canonical time (when Gaussian is most visible)
        "durations": tensor[N, 1],   # Log temporal duration (visibility window width)
        "velocities": tensor[N, 3],  # Linear velocity vectors
    },
    "step": int,                     # Training step
    ...
}

Command Line Arguments

Argument Default Description
--ckpt required Path to trained checkpoint .pt file
--port 8080 HTTP port for the viewer
--device cuda Device to use (cuda, cuda:0, cuda:1, etc.)
--total-frames 300 Total number of frames in the sequence
--temporal-threshold 0.01 Minimum temporal opacity to render a Gaussian
--spatial-percentile 95 Percentile of points to keep (removes outliers)
--no-spatial-filter False Disable spatial filtering
--no-precompute False Disable precomputing visibility masks
--sh-degree 3 Spherical harmonics degree

Understanding Key Parameters

--temporal-threshold

Controls which Gaussians are rendered at each frame based on their temporal opacity.

Each Gaussian has a temporal opacity computed as:

temporal_opacity(t) = exp(-0.5 * ((t - t_canonical) / duration)^2)
  • Lower threshold (0.01): More Gaussians visible, smoother but slower
  • Higher threshold (0.1): Fewer Gaussians, faster but may show gaps
Temporal opacity vs time for a Gaussian centered at t=0.5:

    1.0 |       ****
        |      *    *
    0.5 |     *      *
        |    *        *
  0.05 -|---*----------*--- threshold
        |  *            *
    0.0 +-------------------> time
        0.0    0.5    1.0
              ^
          Gaussian visible when opacity > threshold

--spatial-percentile

Removes outlier Gaussians that are far from the scene center.

  • 95%: Keep Gaussians within the 95th percentile distance from center (removes 5% outliers)
  • 99%: Keep more Gaussians (removes only 1% outliers)
  • 100%: Keep all Gaussians (no spatial filtering)

This is useful when training produces "floater" artifacts far from the main scene.

Example with 5M Gaussians:
┌─────────────────────────────────────┐
│  · ·                            · · │  <- outliers (removed)
│      ┌───────────────────────┐      │
│      │  * * * * * * * * * *  │      │  <- 95% kept
│      │  * * * SCENE * * * *  │      │
│      │  * * * * * * * * * *  │      │
│      └───────────────────────┘      │
│  ·                              ·   │  <- outliers (removed)
└─────────────────────────────────────┘

Viewer UI Controls

Once the viewer is running, you can control it through the web interface:

Animation Panel:

  • Frame Slider: Manually scrub through time
  • Auto Play: Toggle automatic playback
  • Play Speed (FPS): Control playback speed (1-60 FPS)

Visibility Filtering Panel:

  • Temporal Opacity Threshold: Adjust visibility threshold in real-time
  • Use Visibility Mask: Toggle efficient rendering on/off

Camera Controls (in browser):

  • Left-click + drag: Rotate camera
  • Right-click + drag: Pan camera
  • Scroll: Zoom in/out

Efficiency: Visibility Masking

The viewer uses multi-level filtering for efficient rendering:

Filter Stage Purpose Typical Reduction
Spatial filter Remove outliers 100% → 96%
Base opacity filter Remove transparent Gaussians 96% → 95%
Temporal filter Only render temporally-visible 95% → 8%

Result: Only ~8% of Gaussians are rendered per frame, enabling interactive framerates with millions of Gaussians.

Example Usage

Basic viewing:

python src/viewer_4d.py --ckpt results/ckpts/ckpt_30000.pt --total-frames 60

High-quality (show more Gaussians):

python src/viewer_4d.py \
    --ckpt results/ckpts/ckpt_30000.pt \
    --total-frames 60 \
    --temporal-threshold 0.01 \
    --spatial-percentile 99

Fast preview (fewer Gaussians):

python src/viewer_4d.py \
    --ckpt results/ckpts/ckpt_30000.pt \
    --total-frames 60 \
    --temporal-threshold 0.1 \
    --spatial-percentile 90

Debug mode (no filtering):

python src/viewer_4d.py \
    --ckpt results/ckpts/ckpt_30000.pt \
    --total-frames 60 \
    --no-spatial-filter \
    --temporal-threshold 0.0

Key Parameters

Point Cloud Preparation

Parameter Default Description
--keyframe-step 5 Frames between keyframes
--max-velocity-distance 0.5 Max k-NN match distance
--sample-ratio 1.0 Point subsampling ratio

Training

Parameter Default Description
--max-steps 60000 Training iterations
--init-duration 0.1 Initial temporal duration
--velocity-lr-start 5e-3 Initial velocity learning rate
--velocity-lr-end 1e-4 Final velocity learning rate
--lambda-4d-reg 1e-3 4D regularization weight

4D Gaussian Parameters

Each Gaussian has 8 learnable parameter groups:

  1. Position (x): [N, 3] - Canonical 3D position
  2. Time (t): [N, 1] - When the Gaussian is most visible
  3. Duration (s): [N, 1] - Temporal width
  4. Velocity (v): [N, 3] - Linear velocity
  5. Scale: [N, 3] - 3D scale
  6. Quaternion: [N, 4] - Rotation
  7. Opacity: [N] - Base opacity
  8. Spherical Harmonics: [N, K, 3] - View-dependent color

Motion Model

Position at time t:

x(t) = x + v * (t - t_canonical)

Temporal opacity (Gaussian falloff):

opacity(t) = exp(-0.5 * ((t - t_canonical) / duration)^2)

Citation

If you find this work useful, please cite the original paper:

@InProceedings{Wang_2025_CVPR,
    author    = {Wang, Yifan and Yang, Peishan and Xu, Zhen and Sun, Jiaming and Zhang, Zhanhua and Chen, Yong and Bao, Hujun and Peng, Sida and Zhou, Xiaowei},
    title     = {FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {21750-21760}
}

License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages