BlockFrank · BlockFrank · Apr 20, 2025 · Apr 22, 2025 · Apr 24, 2025 · May 21, 2025
diff --git a/.github/workflows/core_code_checks.yml b/.github/workflows/core_code_checks.yml
@@ -15,10 +15,10 @@ jobs:
 
     steps:
       - uses: actions/checkout@v3
-      - name: Set up Python 3.8.13
+      - name: Set up Python 3.11.13
         uses: actions/setup-python@v4
         with:
-          python-version: '3.8.13'
+          python-version: '3.11.13'
       - uses: actions/cache@v3
         with:
           path: ${{ env.pythonLocation }}

diff --git a/docs/developer_guides/pipelines/datamanagers.md b/docs/developer_guides/pipelines/datamanagers.md
@@ -115,6 +115,32 @@ To train splatfacto with a large dataset that's unable to fit in memory, please
 ns-train splatfacto --data {PROCESSED_DATA_DIR} --pipeline.datamanager.cache-images disk
 ```
 
+Checkout these flowcharts for more customization on large datasets!
+
+```{image} imgs/DatamanagerGuide-LargeNeRF-light.png
+:align: center
+:class: only-light
+:width: 600
+```
+
+```{image} imgs/DatamanagerGuide-LargeNeRF-dark.png
+:align: center
+:class: only-dark
+:width: 600
+```
+
+```{image} imgs/DatamanagerGuide-Large3DGS-light.png
+:align: center
+:class: only-light
+:width: 600
+```
+
+```{image} imgs/DatamanagerGuide-Large3DGS-dark.png
+:align: center
+:class: only-dark
+:width: 600
+```
+
 ## Migrating Your DataManager to the new DataManager 
 Many methods subclass a DataManager and add extra data to it. If you would like your custom datamanager to also support new parallel features, you can migrate any custom dataloading logic to the new `custom_ray_processor()` API. This function takes in a full training batch (either image or ray bundle) and allows the user to modify or add to it. Let's take a look at an example for the LERF method, which was built on Nerfstudio's VanillaDataManager. This API provides an interface to attach new information to the RayBundle (for ray based methods), Cameras object (for splatting based methods), or ground truth dictionary. It runs in a background process if disk caching is enabled, otherwise it runs in the main process.
 

diff --git a/docs/developer_guides/pipelines/imgs/DatamanagerGuide-Large3DGS-dark.png b/docs/developer_guides/pipelines/imgs/DatamanagerGuide-Large3DGS-dark.png
diff --git a/docs/developer_guides/pipelines/imgs/DatamanagerGuide-Large3DGS-light.png b/docs/developer_guides/pipelines/imgs/DatamanagerGuide-Large3DGS-light.png
diff --git a/docs/developer_guides/pipelines/imgs/DatamanagerGuide-LargeNeRF-dark.png b/docs/developer_guides/pipelines/imgs/DatamanagerGuide-LargeNeRF-dark.png
diff --git a/docs/developer_guides/pipelines/imgs/DatamanagerGuide-LargeNeRF-light.png b/docs/developer_guides/pipelines/imgs/DatamanagerGuide-LargeNeRF-light.png
diff --git a/docs/developer_guides/viewer/index.md b/docs/developer_guides/viewer/index.md
@@ -2,7 +2,7 @@
 
 > We have a real-time web viewer that requires no installation. It's available at [https://viewer.nerf.studio/](https://viewer.nerf.studio/), where you can connect to your training job.
 
-The viewer is built on [Viser](https://github.com/brentyi/viser/tree/main/viser) using [ThreeJS](https://threejs.org/) and packaged into a [ReactJS](https://reactjs.org/) application. This client viewer application will connect via a websocket to a server running on your machine.
+The viewer is built on [Viser](https://github.com/nerfstudio-project/viser) using [ThreeJS](https://threejs.org/) and packaged into a [ReactJS](https://reactjs.org/) application. This client viewer application will connect via a websocket to a server running on your machine.
 
 ```{toctree}
 :titlesonly:

diff --git a/docs/index.md b/docs/index.md
@@ -154,14 +154,15 @@ This documentation is organized into 3 parts:
 - [SIGNeRF](nerfology/methods/signerf.md): Controlled Generative Editing of NeRF Scenes
 - [K-Planes](nerfology/methods/kplanes.md): Unified 3D and 4D Radiance Fields
 - [LERF](nerfology/methods/lerf.md): Language Embedded Radiance Fields
+- [LiveScene](nerfology/methods/livescene.md): Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
 - [Feature Splatting](nerfology/methods/feature_splatting.md): Gaussian Feature Splatting based on GSplats
 - [Nerfbusters](nerfology/methods/nerfbusters.md): Removing Ghostly Artifacts from Casually Captured NeRFs
 - [NeRFPlayer](nerfology/methods/nerfplayer.md): 4D Radiance Fields by Streaming Feature Channels
 - [Tetra-NeRF](nerfology/methods/tetranerf.md): Representing Neural Radiance Fields Using Tetrahedra
 - [PyNeRF](nerfology/methods/pynerf.md): Pyramidal Neural Radiance Fields
 - [SeaThru-NeRF](nerfology/methods/seathru_nerf.md): Neural Radiance Field for subsea scenes
 - [Zip-NeRF](nerfology/methods/zipnerf.md): Anti-Aliased Grid-Based Neural Radiance Fields
-- [NeRFtoGSandBack](nerfology/methods/nerf2gs2nerf.md): Converting back and forth between NeRF and GS to get the best of both approaches.
+- [NeRFtoGSandBack](nerfology/methods/nerf2gs2nerf.md): Converting back and forth between NeRF and GS to get the best of both approaches
 - [OpenNeRF](nerfology/methods/opennerf.md): OpenSet 3D Neural Scene Segmentation
 
 **Eager to contribute a method?** We'd love to see you use nerfstudio in implementing new (or even existing) methods! Please view our {ref}`guide<own_method_docs>` for more details about how to add to this list!

diff --git a/docs/nerfology/methods/index.md b/docs/nerfology/methods/index.md
@@ -34,6 +34,7 @@ The following methods are supported in nerfstudio:
     SIGNeRF<signerf.md>
     K-Planes<kplanes.md>
     LERF<lerf.md>
+    LiveScene<livescene.md>
     Feature-Splatting<feature_splatting.md>
     Mip-NeRF<mipnerf.md>
     NeRF<nerf.md>

diff --git a/docs/nerfology/methods/livescene.md b/docs/nerfology/methods/livescene.md
@@ -0,0 +1,101 @@
+# LiveScene
+
+<h4>Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control</h4>
+
+```{button-link} https://tavish9.github.io/livescene//
+:color: primary
+:outline:
+Paper Website
+```
+
+```{button-link} https://github.com/Tavish9/livescene/
+:color: primary
+:outline:
+Code
+```
+
+<video id="demo" muted autoplay playsinline loop controls width="100%">
+    <source id="mp4" src="https://tavish9.github.io/livescene//static/video/demo.mp4" type="video/mp4">
+</video>
+
+**The first scene-level language-embedded interactive radiance field, which efficiently reconstructs and controls complex physical scenes, enabling manipulation of multiple articulated objects and language-based interaction.**
+
+## Installation
+
+First install nerfstudio dependencies. Then run:
+
+```bash
+pip install git+https://github.com/Tavish9/livescene
+```
+
+## Running LiveScene
+
+Details for running LiveScene (built with Nerfstudio!) can be found [here](https://github.com/Tavish9/livescene).
+Once installed, run:
+
+```bash
+ns-train livescene --help
+```
+
+There is only one default configuration provided. However, you can run it for different datasets.
+
+The default configurations provided is:
+
+| Method      | Description                                     | Memory | Quality |
+| ----------- | ----------------------------------------------- | ------ | ------- |
+| `livescene` | LiveScene with OpenCLIP ViT-B/16, used in paper | ~8 GB  | Good    |
+
+There are two new dataparser provider for LiveScene:
+
+| Method           | Description                     | Scene type        |
+| ---------------- | ------------------------------- | ----------------- |
+| `livescene-sim`  | OmniSim dataset for LiveScene   | Synthetic dataset |
+| `livescene-real` | InterReal dataset for LiveScene | Real dataset      |
+
+## Method
+
+LiveScene proposes an efficient factorization that decomposes the interactive scene into multiple local deformable fields to separately reconstruct individual interactive objects, achieving the first accurate and independent control on multiple interactive objects in a complex scene. Moreover, LiveScene introduces an interaction-aware language embedding method that generates varying language embeddings to localize individual interactive objects under different interactive states, enabling arbitrary control of interactive objects using natural language.
+
+### Overview
+
+Given a camera view and control variable $\boldsymbol{\kappa}$ of one specific interactive object, a series 3D points are sampled in a local deformable field that models the interactive motions of this specific interactive object, and then the interactive object with novel interactive motion state is generated via volume-rendering. Moreover, an interaction-aware language embedding is utilized to localize and control individual interactive objects using natural language.
+
+<img id="livescene_pipeline" src="https://tavish9.github.io/livescene//static/image/pipeline.png" style="background-color:white;" width="100%">
+
+### Multi-scale Interaction Space Factorization
+
+LiveScene maintains mutiple local deformable fields $\left \{\mathcal{R}_1, \mathcal{R}\_2, \cdots \mathcal{R}_\alpha \right \}$ for each interactive object in the 4D space, and project high-dimensional interaction features into a compact multi-scale 4D space. In training, LiveScene denotes a feature repulsion loss and to amplify the feature differences between distinct deformable scenes, which relieve the boundary ray sampling and feature storage conflicts.
+
+<img id="livescene_factorization" src="https://tavish9.github.io/livescene//static/image/decompose.png" style="background-color:white;" width="100%">
+
+### Interaction-Aware Language Embedding
+
+LiveScene Leverages the proposed multi-scale interaction space factorization to efficiently store language features in lightweight planes by indexing the maximum probability sampling instead of 3D fields in LERF. For any sampling point $\mathbf{p}$, it retrieves local language feature group, and perform bilinear interpolation to obtain a language embedding that adapts to interactive variable changes from surrounding clip features.
+
+<img id="livescene_language" src="https://tavish9.github.io/livescene//static/image/embeds.png" style="background-color:white;" width="100%">
+
+## Dataset
+
+To our knowledge, existing view synthetic datasets for interactive scene rendering are primarily limited to a few interactive objects, making it impractical to scale up to real scenarios involving multi-object interactions. To bridge this gap, we construct two scene-level, high-quality annotated datasets to advance research progress in reconstructing and understanding interactive scenes: OminiSim and InterReal, containing 28 subsets and 70 interactive objects with 2 million samples, providing rgbd images, camera trajectories, interactive object masks, prompt captions, and corresponding object state quantities at each time step.
+
+<video id="dataset" muted autoplay playsinline loop controls width="100%">
+    <source id="mp4" src="https://tavish9.github.io/livescene//static/video/livescene_dataset.mp4" type="video/mp4">
+</video>
+
+## Interaction
+
+For more interaction with viewer, please see [here](https://github.com/Tavish9/livescene?tab=readme-ov-file#3-interact-with-viewer).
+
+## BibTeX
+
+If you find our work helpful for your research, please consider citing
+
+```none
+@misc{livescene2024,
+    title={LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control},
+    author={Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Zhigang Wang, Dong Wang†, Xuelong Li†},
+    year={2024},
+    eprint={2406.16038},
+    archivePrefix={arXiv},
+  }
+```
diff --git a/nerfstudio/cameras/camera_utils.py b/nerfstudio/cameras/camera_utils.py
@@ -172,7 +172,7 @@ def get_interpolated_poses(pose_a: NDArray, pose_b: NDArray, steps: int = 10) ->
     quat_b = quaternion_from_matrix(pose_b[:3, :3])
 
     ts = np.linspace(0, 1, steps)
-    quats = [quaternion_slerp(quat_a, quat_b, t) for t in ts]
+    quats = [quaternion_slerp(quat_a, quat_b, float(t)) for t in ts]
     trans = [(1 - t) * pose_a[:3, 3] + t * pose_b[:3, 3] for t in ts]
 
     poses_ab = []
@@ -199,7 +199,7 @@ def get_interpolated_k(
         List of interpolated camera poses
     """
     Ks: List[Float[Tensor, "3 3"]] = []
-    ts = np.linspace(0, 1, steps)
+    ts = torch.linspace(0, 1, steps, dtype=k_a.dtype, device=k_a.device)
     for t in ts:
         new_k = k_a * (1.0 - t) + k_b * t
         Ks.append(new_k)
@@ -218,7 +218,7 @@ def get_interpolated_time(
         steps: number of steps the interpolated pose path should contain
     """
     times: List[Float[Tensor, "1"]] = []
-    ts = np.linspace(0, 1, steps)
+    ts = torch.linspace(0, 1, steps, dtype=time_a.dtype, device=time_a.device)
     for t in ts:
         new_t = time_a * (1.0 - t) + time_b * t
         times.append(new_t)

diff --git a/nerfstudio/cameras/cameras.py b/nerfstudio/cameras/cameras.py
@@ -1021,3 +1021,34 @@ def rescale_output_resolution(
             self.width = torch.ceil(self.width * scaling_factor).to(torch.int64)
         else:
             raise ValueError("Scale rounding mode must be 'floor', 'round' or 'ceil'.")
+
+    def update_tiling_intrinsics(self, tiling_factor: int) -> None:
+        """
+        Update camera intrinsics based on tiling_factor.
+        Must match tiling logic as defined in dataparser.
+
+        Args:
+            tiling_factor: Tiling factor to apply to the camera intrinsics.
+        """
+        if tiling_factor == 1:
+            return
+
+        num_tiles = tiling_factor**2
+
+        # Compute tile sizes
+        base_tile_w, remainder_w = self.width // tiling_factor, self.width % tiling_factor
+        base_tile_h, remainder_h = self.height // tiling_factor, self.height % tiling_factor
+
+        tile_indices = torch.arange(len(self.cx), device=self.cx.device).unsqueeze(1) % num_tiles
+        row_indices, col_indices = tile_indices // tiling_factor, tile_indices % tiling_factor
+
+        x_offsets = col_indices * base_tile_w + torch.minimum(col_indices, remainder_w)
+        y_offsets = row_indices * base_tile_h + torch.minimum(row_indices, remainder_h)
+
+        # Adjust principal points
+        self.cx = self.cx - x_offsets
+        self.cy = self.cy - y_offsets
+
+        # Adjust height/width
+        self.width = base_tile_w + (col_indices < remainder_w).to(torch.int)
+        self.height = base_tile_h + (row_indices < remainder_h).to(torch.int)
diff --git a/nerfstudio/cameras/rays.py b/nerfstudio/cameras/rays.py
@@ -136,6 +136,7 @@ def get_weights(self, densities: Float[Tensor, "*batch num_samples 1"]) -> Float
             Weights for each sample
         """
 
+        assert self.deltas is not None, "Deltas must be set to compute weights"
         delta_density = self.deltas * densities
         alphas = 1 - torch.exp(-delta_density)
 

diff --git a/nerfstudio/configs/external_methods.py b/nerfstudio/configs/external_methods.py
@@ -93,6 +93,21 @@ class ExternalMethod:
     )
 )
 
+# LiveScene
+external_methods.append(
+    ExternalMethod(
+        """[bold yellow]LiveScene[/bold yellow]
+For more information visit: https://docs.nerf.studio/nerfology/methods/livescene.html
+
+To enable LiveScene, you must install it first by running:
+  [grey]pip install git+https://github.com/Tavish9/livescene[/grey]""",
+        configurations=[
+            ("livescene", "LiveScene with OpenCLIP ViT-B/16, used in paper"),
+        ],
+        pip_package="git+https://github.com/Tavish9/livescene",
+    )
+)
+
 # Feature Splatting
 external_methods.append(
     ExternalMethod(

diff --git a/nerfstudio/configs/method_configs.py b/nerfstudio/configs/method_configs.py
@@ -219,7 +219,7 @@
     max_num_iterations=30000,
     mixed_precision=True,
     pipeline=VanillaPipelineConfig(
-        datamanager=VanillaDataManagerConfig(
+        datamanager=ParallelDataManagerConfig(
             _target=ParallelDataManager[DepthDataset],
             dataparser=NerfstudioDataParserConfig(),
             train_num_rays_per_batch=4096,

diff --git a/nerfstudio/data/datamanagers/full_images_datamanager.py b/nerfstudio/data/datamanagers/full_images_datamanager.py
@@ -26,6 +26,7 @@
 from copy import deepcopy
 from dataclasses import dataclass, field
 from functools import cached_property
+from itertools import islice
 from pathlib import Path
 from typing import Dict, ForwardRef, Generic, List, Literal, Optional, Tuple, Type, Union, cast, get_args, get_origin
 
@@ -45,7 +46,7 @@
 from nerfstudio.data.datasets.base_dataset import InputDataset
 from nerfstudio.data.utils.data_utils import identity_collate
 from nerfstudio.data.utils.dataloaders import ImageBatchStream, _undistort_image
-from nerfstudio.utils.misc import get_orig_class
+from nerfstudio.utils.misc import get_dict_to_torch, get_orig_class
 from nerfstudio.utils.rich_utils import CONSOLE
 
 
@@ -84,7 +85,7 @@ class FullImageDatamanagerConfig(DataManagerConfig):
     dataloader_num_workers: int = 4
     """The number of workers performing the dataloading from either disk/RAM, which 
     includes collating, pixel sampling, unprojecting, ray generation etc."""
-    prefetch_factor: int = 4
+    prefetch_factor: Optional[int] = 4
     """The limit number of batches a worker will start loading once an iterator is created. 
     More details are described here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader"""
     cache_compressed_images: bool = False
@@ -356,9 +357,9 @@ def fixed_indices_eval_dataloader(self) -> List[Tuple[Cameras, Dict]]:
                 self.eval_imagebatch_stream,
                 batch_size=1,
                 num_workers=0,
-                collate_fn=identity_collate,
+                collate_fn=lambda x: x[0],
             )
-            return [batch[0] for batch in dataloader]
+            return list(islice(dataloader, len(self.eval_dataset)))
 
         image_indices = [i for i in range(len(self.eval_dataset))]
         data = [d.copy() for d in self.cached_eval]
@@ -388,6 +389,8 @@ def next_train(self, step: int) -> Tuple[Cameras, Dict]:
         self.train_count += 1
         if self.config.cache_images == "disk":
             camera, data = next(self.iter_train_image_dataloader)[0]
+            camera = camera.to(self.device)
+            data = get_dict_to_torch(data, self.device)
             return camera, data
 
         image_idx = self.train_unseen_cameras.pop(0)
@@ -414,6 +417,8 @@ def next_eval(self, step: int) -> Tuple[Cameras, Dict]:
         self.eval_count += 1
         if self.config.cache_images == "disk":
             camera, data = next(self.iter_eval_image_dataloader)[0]
+            camera = camera.to(self.device)
+            data = get_dict_to_torch(data, self.device)
             return camera, data
 
         return self.next_eval_image(step=step)

diff --git a/nerfstudio/data/datamanagers/parallel_datamanager.py b/nerfstudio/data/datamanagers/parallel_datamanager.py
@@ -40,7 +40,7 @@
     RayBatchStream,
     variable_res_collate,
 )
-from nerfstudio.utils.misc import get_orig_class
+from nerfstudio.utils.misc import get_dict_to_torch, get_orig_class
 from nerfstudio.utils.rich_utils import CONSOLE
 
 
@@ -56,7 +56,7 @@ class ParallelDataManagerConfig(VanillaDataManagerConfig):
     dataloader_num_workers: int = 4
     """The number of workers performing the dataloading from either disk/RAM, which 
     includes collating, pixel sampling, unprojecting, ray generation etc."""
-    prefetch_factor: int = 10
+    prefetch_factor: Optional[int] = 10
     """The limit number of batches a worker will start loading once an iterator is created. 
     More details are described here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader"""
     cache_compressed_images: bool = False
@@ -241,12 +241,16 @@ def next_train(self, step: int) -> Tuple[RayBundle, Dict]:
         """Returns the next batch of data from the train dataloader."""
         self.train_count += 1
         ray_bundle, batch = next(self.iter_train_raybundles)[0]
+        ray_bundle = ray_bundle.to(self.device)
+        batch = get_dict_to_torch(batch, self.device)
         return ray_bundle, batch
 
     def next_eval(self, step: int) -> Tuple[RayBundle, Dict]:
         """Returns the next batch of data from the eval dataloader."""
         self.eval_count += 1
         ray_bundle, batch = next(self.iter_train_raybundles)[0]
+        ray_bundle = ray_bundle.to(self.device)
+        batch = get_dict_to_torch(batch, self.device)
         return ray_bundle, batch
 
     def next_eval_image(self, step: int) -> Tuple[Cameras, Dict]: