Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/core_code_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python 3.8.13
- name: Set up Python 3.11.13
uses: actions/setup-python@v4
with:
python-version: '3.8.13'
python-version: '3.11.13'
- uses: actions/cache@v3
with:
path: ${{ env.pythonLocation }}
Expand Down
26 changes: 26 additions & 0 deletions docs/developer_guides/pipelines/datamanagers.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,32 @@ To train splatfacto with a large dataset that's unable to fit in memory, please
ns-train splatfacto --data {PROCESSED_DATA_DIR} --pipeline.datamanager.cache-images disk
```

Checkout these flowcharts for more customization on large datasets!

```{image} imgs/DatamanagerGuide-LargeNeRF-light.png
:align: center
:class: only-light
:width: 600
```

```{image} imgs/DatamanagerGuide-LargeNeRF-dark.png
:align: center
:class: only-dark
:width: 600
```

```{image} imgs/DatamanagerGuide-Large3DGS-light.png
:align: center
:class: only-light
:width: 600
```

```{image} imgs/DatamanagerGuide-Large3DGS-dark.png
:align: center
:class: only-dark
:width: 600
```

## Migrating Your DataManager to the new DataManager
Many methods subclass a DataManager and add extra data to it. If you would like your custom datamanager to also support new parallel features, you can migrate any custom dataloading logic to the new `custom_ray_processor()` API. This function takes in a full training batch (either image or ray bundle) and allows the user to modify or add to it. Let's take a look at an example for the LERF method, which was built on Nerfstudio's VanillaDataManager. This API provides an interface to attach new information to the RayBundle (for ray based methods), Cameras object (for splatting based methods), or ground truth dictionary. It runs in a background process if disk caching is enabled, otherwise it runs in the main process.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/developer_guides/viewer/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> We have a real-time web viewer that requires no installation. It's available at [https://viewer.nerf.studio/](https://viewer.nerf.studio/), where you can connect to your training job.

The viewer is built on [Viser](https://github.com/brentyi/viser/tree/main/viser) using [ThreeJS](https://threejs.org/) and packaged into a [ReactJS](https://reactjs.org/) application. This client viewer application will connect via a websocket to a server running on your machine.
The viewer is built on [Viser](https://github.com/nerfstudio-project/viser) using [ThreeJS](https://threejs.org/) and packaged into a [ReactJS](https://reactjs.org/) application. This client viewer application will connect via a websocket to a server running on your machine.

```{toctree}
:titlesonly:
Expand Down
3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,14 +154,15 @@ This documentation is organized into 3 parts:
- [SIGNeRF](nerfology/methods/signerf.md): Controlled Generative Editing of NeRF Scenes
- [K-Planes](nerfology/methods/kplanes.md): Unified 3D and 4D Radiance Fields
- [LERF](nerfology/methods/lerf.md): Language Embedded Radiance Fields
- [LiveScene](nerfology/methods/livescene.md): Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
- [Feature Splatting](nerfology/methods/feature_splatting.md): Gaussian Feature Splatting based on GSplats
- [Nerfbusters](nerfology/methods/nerfbusters.md): Removing Ghostly Artifacts from Casually Captured NeRFs
- [NeRFPlayer](nerfology/methods/nerfplayer.md): 4D Radiance Fields by Streaming Feature Channels
- [Tetra-NeRF](nerfology/methods/tetranerf.md): Representing Neural Radiance Fields Using Tetrahedra
- [PyNeRF](nerfology/methods/pynerf.md): Pyramidal Neural Radiance Fields
- [SeaThru-NeRF](nerfology/methods/seathru_nerf.md): Neural Radiance Field for subsea scenes
- [Zip-NeRF](nerfology/methods/zipnerf.md): Anti-Aliased Grid-Based Neural Radiance Fields
- [NeRFtoGSandBack](nerfology/methods/nerf2gs2nerf.md): Converting back and forth between NeRF and GS to get the best of both approaches.
- [NeRFtoGSandBack](nerfology/methods/nerf2gs2nerf.md): Converting back and forth between NeRF and GS to get the best of both approaches
- [OpenNeRF](nerfology/methods/opennerf.md): OpenSet 3D Neural Scene Segmentation

**Eager to contribute a method?** We'd love to see you use nerfstudio in implementing new (or even existing) methods! Please view our {ref}`guide<own_method_docs>` for more details about how to add to this list!
Expand Down
1 change: 1 addition & 0 deletions docs/nerfology/methods/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ The following methods are supported in nerfstudio:
SIGNeRF<signerf.md>
K-Planes<kplanes.md>
LERF<lerf.md>
LiveScene<livescene.md>
Feature-Splatting<feature_splatting.md>
Mip-NeRF<mipnerf.md>
NeRF<nerf.md>
Expand Down
101 changes: 101 additions & 0 deletions docs/nerfology/methods/livescene.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# LiveScene

<h4>Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control</h4>

```{button-link} https://tavish9.github.io/livescene//
:color: primary
:outline:
Paper Website
```

```{button-link} https://github.com/Tavish9/livescene/
:color: primary
:outline:
Code
```

<video id="demo" muted autoplay playsinline loop controls width="100%">
<source id="mp4" src="https://tavish9.github.io/livescene//static/video/demo.mp4" type="video/mp4">
</video>

**The first scene-level language-embedded interactive radiance field, which efficiently reconstructs and controls complex physical scenes, enabling manipulation of multiple articulated objects and language-based interaction.**

## Installation

First install nerfstudio dependencies. Then run:

```bash
pip install git+https://github.com/Tavish9/livescene
```

## Running LiveScene

Details for running LiveScene (built with Nerfstudio!) can be found [here](https://github.com/Tavish9/livescene).
Once installed, run:

```bash
ns-train livescene --help
```

There is only one default configuration provided. However, you can run it for different datasets.

The default configurations provided is:

| Method | Description | Memory | Quality |
| ----------- | ----------------------------------------------- | ------ | ------- |
| `livescene` | LiveScene with OpenCLIP ViT-B/16, used in paper | ~8 GB | Good |

There are two new dataparser provider for LiveScene:

| Method | Description | Scene type |
| ---------------- | ------------------------------- | ----------------- |
| `livescene-sim` | OmniSim dataset for LiveScene | Synthetic dataset |
| `livescene-real` | InterReal dataset for LiveScene | Real dataset |

## Method

LiveScene proposes an efficient factorization that decomposes the interactive scene into multiple local deformable fields to separately reconstruct individual interactive objects, achieving the first accurate and independent control on multiple interactive objects in a complex scene. Moreover, LiveScene introduces an interaction-aware language embedding method that generates varying language embeddings to localize individual interactive objects under different interactive states, enabling arbitrary control of interactive objects using natural language.

### Overview

Given a camera view and control variable $\boldsymbol{\kappa}$ of one specific interactive object, a series 3D points are sampled in a local deformable field that models the interactive motions of this specific interactive object, and then the interactive object with novel interactive motion state is generated via volume-rendering. Moreover, an interaction-aware language embedding is utilized to localize and control individual interactive objects using natural language.

<img id="livescene_pipeline" src="https://tavish9.github.io/livescene//static/image/pipeline.png" style="background-color:white;" width="100%">

### Multi-scale Interaction Space Factorization

LiveScene maintains mutiple local deformable fields $\left \{\mathcal{R}_1, \mathcal{R}\_2, \cdots \mathcal{R}_\alpha \right \}$ for each interactive object in the 4D space, and project high-dimensional interaction features into a compact multi-scale 4D space. In training, LiveScene denotes a feature repulsion loss and to amplify the feature differences between distinct deformable scenes, which relieve the boundary ray sampling and feature storage conflicts.

<img id="livescene_factorization" src="https://tavish9.github.io/livescene//static/image/decompose.png" style="background-color:white;" width="100%">

### Interaction-Aware Language Embedding

LiveScene Leverages the proposed multi-scale interaction space factorization to efficiently store language features in lightweight planes by indexing the maximum probability sampling instead of 3D fields in LERF. For any sampling point $\mathbf{p}$, it retrieves local language feature group, and perform bilinear interpolation to obtain a language embedding that adapts to interactive variable changes from surrounding clip features.

<img id="livescene_language" src="https://tavish9.github.io/livescene//static/image/embeds.png" style="background-color:white;" width="100%">

## Dataset

To our knowledge, existing view synthetic datasets for interactive scene rendering are primarily limited to a few interactive objects, making it impractical to scale up to real scenarios involving multi-object interactions. To bridge this gap, we construct two scene-level, high-quality annotated datasets to advance research progress in reconstructing and understanding interactive scenes: OminiSim and InterReal, containing 28 subsets and 70 interactive objects with 2 million samples, providing rgbd images, camera trajectories, interactive object masks, prompt captions, and corresponding object state quantities at each time step.

<video id="dataset" muted autoplay playsinline loop controls width="100%">
<source id="mp4" src="https://tavish9.github.io/livescene//static/video/livescene_dataset.mp4" type="video/mp4">
</video>

## Interaction

For more interaction with viewer, please see [here](https://github.com/Tavish9/livescene?tab=readme-ov-file#3-interact-with-viewer).

## BibTeX

If you find our work helpful for your research, please consider citing

```none
@misc{livescene2024,
title={LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control},
author={Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Zhigang Wang, Dong Wang†, Xuelong Li†},
year={2024},
eprint={2406.16038},
archivePrefix={arXiv},
}
```
6 changes: 3 additions & 3 deletions nerfstudio/cameras/camera_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ def get_interpolated_poses(pose_a: NDArray, pose_b: NDArray, steps: int = 10) ->
quat_b = quaternion_from_matrix(pose_b[:3, :3])

ts = np.linspace(0, 1, steps)
quats = [quaternion_slerp(quat_a, quat_b, t) for t in ts]
quats = [quaternion_slerp(quat_a, quat_b, float(t)) for t in ts]
trans = [(1 - t) * pose_a[:3, 3] + t * pose_b[:3, 3] for t in ts]

poses_ab = []
Expand All @@ -199,7 +199,7 @@ def get_interpolated_k(
List of interpolated camera poses
"""
Ks: List[Float[Tensor, "3 3"]] = []
ts = np.linspace(0, 1, steps)
ts = torch.linspace(0, 1, steps, dtype=k_a.dtype, device=k_a.device)
for t in ts:
new_k = k_a * (1.0 - t) + k_b * t
Ks.append(new_k)
Expand All @@ -218,7 +218,7 @@ def get_interpolated_time(
steps: number of steps the interpolated pose path should contain
"""
times: List[Float[Tensor, "1"]] = []
ts = np.linspace(0, 1, steps)
ts = torch.linspace(0, 1, steps, dtype=time_a.dtype, device=time_a.device)
for t in ts:
new_t = time_a * (1.0 - t) + time_b * t
times.append(new_t)
Expand Down
31 changes: 31 additions & 0 deletions nerfstudio/cameras/cameras.py
Original file line number Diff line number Diff line change
Expand Up @@ -1021,3 +1021,34 @@ def rescale_output_resolution(
self.width = torch.ceil(self.width * scaling_factor).to(torch.int64)
else:
raise ValueError("Scale rounding mode must be 'floor', 'round' or 'ceil'.")

def update_tiling_intrinsics(self, tiling_factor: int) -> None:
"""
Update camera intrinsics based on tiling_factor.
Must match tiling logic as defined in dataparser.

Args:
tiling_factor: Tiling factor to apply to the camera intrinsics.
"""
if tiling_factor == 1:
return

num_tiles = tiling_factor**2

# Compute tile sizes
base_tile_w, remainder_w = self.width // tiling_factor, self.width % tiling_factor
base_tile_h, remainder_h = self.height // tiling_factor, self.height % tiling_factor

tile_indices = torch.arange(len(self.cx), device=self.cx.device).unsqueeze(1) % num_tiles
row_indices, col_indices = tile_indices // tiling_factor, tile_indices % tiling_factor

x_offsets = col_indices * base_tile_w + torch.minimum(col_indices, remainder_w)
y_offsets = row_indices * base_tile_h + torch.minimum(row_indices, remainder_h)

# Adjust principal points
self.cx = self.cx - x_offsets
self.cy = self.cy - y_offsets

# Adjust height/width
self.width = base_tile_w + (col_indices < remainder_w).to(torch.int)
self.height = base_tile_h + (row_indices < remainder_h).to(torch.int)
1 change: 1 addition & 0 deletions nerfstudio/cameras/rays.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ def get_weights(self, densities: Float[Tensor, "*batch num_samples 1"]) -> Float
Weights for each sample
"""

assert self.deltas is not None, "Deltas must be set to compute weights"
delta_density = self.deltas * densities
alphas = 1 - torch.exp(-delta_density)

Expand Down
15 changes: 15 additions & 0 deletions nerfstudio/configs/external_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,21 @@ class ExternalMethod:
)
)

# LiveScene
external_methods.append(
ExternalMethod(
"""[bold yellow]LiveScene[/bold yellow]
For more information visit: https://docs.nerf.studio/nerfology/methods/livescene.html

To enable LiveScene, you must install it first by running:
[grey]pip install git+https://github.com/Tavish9/livescene[/grey]""",
configurations=[
("livescene", "LiveScene with OpenCLIP ViT-B/16, used in paper"),
],
pip_package="git+https://github.com/Tavish9/livescene",
)
)

# Feature Splatting
external_methods.append(
ExternalMethod(
Expand Down
2 changes: 1 addition & 1 deletion nerfstudio/configs/method_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@
max_num_iterations=30000,
mixed_precision=True,
pipeline=VanillaPipelineConfig(
datamanager=VanillaDataManagerConfig(
datamanager=ParallelDataManagerConfig(
_target=ParallelDataManager[DepthDataset],
dataparser=NerfstudioDataParserConfig(),
train_num_rays_per_batch=4096,
Expand Down
13 changes: 9 additions & 4 deletions nerfstudio/data/datamanagers/full_images_datamanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from copy import deepcopy
from dataclasses import dataclass, field
from functools import cached_property
from itertools import islice
from pathlib import Path
from typing import Dict, ForwardRef, Generic, List, Literal, Optional, Tuple, Type, Union, cast, get_args, get_origin

Expand All @@ -45,7 +46,7 @@
from nerfstudio.data.datasets.base_dataset import InputDataset
from nerfstudio.data.utils.data_utils import identity_collate
from nerfstudio.data.utils.dataloaders import ImageBatchStream, _undistort_image
from nerfstudio.utils.misc import get_orig_class
from nerfstudio.utils.misc import get_dict_to_torch, get_orig_class
from nerfstudio.utils.rich_utils import CONSOLE


Expand Down Expand Up @@ -84,7 +85,7 @@ class FullImageDatamanagerConfig(DataManagerConfig):
dataloader_num_workers: int = 4
"""The number of workers performing the dataloading from either disk/RAM, which
includes collating, pixel sampling, unprojecting, ray generation etc."""
prefetch_factor: int = 4
prefetch_factor: Optional[int] = 4
"""The limit number of batches a worker will start loading once an iterator is created.
More details are described here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader"""
cache_compressed_images: bool = False
Expand Down Expand Up @@ -356,9 +357,9 @@ def fixed_indices_eval_dataloader(self) -> List[Tuple[Cameras, Dict]]:
self.eval_imagebatch_stream,
batch_size=1,
num_workers=0,
collate_fn=identity_collate,
collate_fn=lambda x: x[0],
)
return [batch[0] for batch in dataloader]
return list(islice(dataloader, len(self.eval_dataset)))

image_indices = [i for i in range(len(self.eval_dataset))]
data = [d.copy() for d in self.cached_eval]
Expand Down Expand Up @@ -388,6 +389,8 @@ def next_train(self, step: int) -> Tuple[Cameras, Dict]:
self.train_count += 1
if self.config.cache_images == "disk":
camera, data = next(self.iter_train_image_dataloader)[0]
camera = camera.to(self.device)
data = get_dict_to_torch(data, self.device)
return camera, data

image_idx = self.train_unseen_cameras.pop(0)
Expand All @@ -414,6 +417,8 @@ def next_eval(self, step: int) -> Tuple[Cameras, Dict]:
self.eval_count += 1
if self.config.cache_images == "disk":
camera, data = next(self.iter_eval_image_dataloader)[0]
camera = camera.to(self.device)
data = get_dict_to_torch(data, self.device)
return camera, data

return self.next_eval_image(step=step)
Expand Down
8 changes: 6 additions & 2 deletions nerfstudio/data/datamanagers/parallel_datamanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
RayBatchStream,
variable_res_collate,
)
from nerfstudio.utils.misc import get_orig_class
from nerfstudio.utils.misc import get_dict_to_torch, get_orig_class
from nerfstudio.utils.rich_utils import CONSOLE


Expand All @@ -56,7 +56,7 @@ class ParallelDataManagerConfig(VanillaDataManagerConfig):
dataloader_num_workers: int = 4
"""The number of workers performing the dataloading from either disk/RAM, which
includes collating, pixel sampling, unprojecting, ray generation etc."""
prefetch_factor: int = 10
prefetch_factor: Optional[int] = 10
"""The limit number of batches a worker will start loading once an iterator is created.
More details are described here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader"""
cache_compressed_images: bool = False
Expand Down Expand Up @@ -241,12 +241,16 @@ def next_train(self, step: int) -> Tuple[RayBundle, Dict]:
"""Returns the next batch of data from the train dataloader."""
self.train_count += 1
ray_bundle, batch = next(self.iter_train_raybundles)[0]
ray_bundle = ray_bundle.to(self.device)
batch = get_dict_to_torch(batch, self.device)
return ray_bundle, batch

def next_eval(self, step: int) -> Tuple[RayBundle, Dict]:
"""Returns the next batch of data from the eval dataloader."""
self.eval_count += 1
ray_bundle, batch = next(self.iter_train_raybundles)[0]
ray_bundle = ray_bundle.to(self.device)
batch = get_dict_to_torch(batch, self.device)
return ray_bundle, batch

def next_eval_image(self, step: int) -> Tuple[Cameras, Dict]:
Expand Down
Loading
Loading