Skip to content

Implement load_and_transform_depth_data #134

@OlafBraakman

Description

@OlafBraakman

Issues: #122 #14 #69 #121 report the fact that the load_and_transform_depth function is not implemented

I am raising this issue to implement the following data preprocessing steps in a PR, as it yield the reported 35% zero-shot classification for SUN-RGBD depth-only.

Important details for the scene classification task for SUNRGBD:

Scene subset:
The classification task only considers the following classes:
SCENES = ['bathroom', 'bedroom', 'classroom', 'computer_room', 'conference_room', 'corridor', 'dining_area', 'dining_room', 'discussion_area', 'furniture_store', 'home_office', 'kitchen', 'lab', 'lecture_theatre', 'library', 'living_room', 'office', 'rest_space', 'study_space' ]

To reproduce the SUNRGBD results one has to convert the raw depth data to standardized disparity in the following steps:

  1. Convert raw depth (uint16) to meters following the official SUN RGBD toolbox read3dPoints.m Toolbox
depth = cv2.imread(depth_file, cv2.IMREAD_UNCHANGED)
depth = ((depth >> 3) | (depth << 13)).astype(np.float32) / 1000.0
depth[depth > 8] = 8
  1. Convert depth to disparity using correct camera intrinsics. Following the response of @imisra with different baselines for each camera. Focal length for each sample can be obtained from the intrinsics.txt file.
from pathlib import Path # Optional I just used pathlib

focal_path = Path(depth_file).parents[1] / "intrinsics.txt"
focal_length = float(focal_path.read_text().strip().split()[0])
baseline = get_baseline(depth_file)
disparity = baseline * focal_length / depth

def get_baseline(path: str) -> float:
    if "kv1" in path:
        return 0.075
    elif "kv2" in path:
        return 0.075
    elif "realsense" in path:
        return 0.095
    elif "xtion" in path:
        return 0.095 # guessed based on length of 18cm for ASUS xtion v1
    else:
        raise Exception(f"No baseline found for path: {path}")
  1. Depth standardization by finding the mean and std of the disparity values across the training split. I find these values with the compute_depth_mean_std implementation from RGBD-Seg dataset_base.py.

This yields me the following mean and std values:
mean: 24.82968
std: 14.40078

Which can be used to normalize (depending on the raw of refined mode) as follows (based on preprocessing.py Normalize):

if self._depth_mode == 'raw':
    depth_0 = depth == 0
    depth = torchvision.transforms.Normalize(
        mean=24.82968, std=14.40078)(depth)
    # set invalid values back to zero again
    depth[depth_0] = 0
else:
    depth = torchvision.transforms.Normalize(
        mean=self.24.82968, std=14.40078)(depth)

Evaluated over the test split using above approach yield 35.2% depth accuracy

TODO: Create a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions