Implement load_and_transform_depth_data

Issues: #122 #14 #69 #121 report the fact that the `load_and_transform_depth` function is not implemented

I am raising this issue to implement the following data preprocessing steps in a PR, as it yield the reported 35% zero-shot classification for SUN-RGBD depth-only.

**Important details for the scene classification task for SUNRGBD:**

**Scene subset**:
The classification task only considers the following classes: 
`SCENES = ['bathroom',
        'bedroom',
        'classroom',
        'computer_room',
        'conference_room', 
        'corridor',
        'dining_area', 
        'dining_room',
        'discussion_area', 
        'furniture_store',
        'home_office',
        'kitchen',
        'lab',
        'lecture_theatre', 
        'library',
        'living_room', 
        'office',
        'rest_space', 
        'study_space'
    ]`

To reproduce the SUNRGBD results one has to convert the raw depth data to standardized disparity in the following steps:

1. Convert raw depth (uint16) to meters following the official SUN RGBD toolbox `read3dPoints.m` [Toolbox](https://rgbd.cs.princeton.edu/)
```
depth = cv2.imread(depth_file, cv2.IMREAD_UNCHANGED)
depth = ((depth >> 3) | (depth << 13)).astype(np.float32) / 1000.0
depth[depth > 8] = 8
```

2. Convert depth to disparity using correct camera intrinsics. Following the response of [@imisra](https://github.com/facebookresearch/omnivore/issues/12#issuecomment-1070911016) with different baselines for each camera. Focal length for each sample can be obtained from the `intrinsics.txt` file. 
```
from pathlib import Path # Optional I just used pathlib

focal_path = Path(depth_file).parents[1] / "intrinsics.txt"
focal_length = float(focal_path.read_text().strip().split()[0])
baseline = get_baseline(depth_file)
disparity = baseline * focal_length / depth

def get_baseline(path: str) -> float:
    if "kv1" in path:
        return 0.075
    elif "kv2" in path:
        return 0.075
    elif "realsense" in path:
        return 0.095
    elif "xtion" in path:
        return 0.095 # guessed based on length of 18cm for ASUS xtion v1
    else:
        raise Exception(f"No baseline found for path: {path}")
```

3. Depth standardization by finding the mean and std of the disparity values across the training split. I find these values with the `compute_depth_mean_std` implementation from [RGBD-Seg dataset_base.py](https://github.com/Barchid/RGBD-Seg/blob/master/dataloaders/dataset_base.py).

This yields me the following mean and std values: 
`mean: 24.82968`
`std: 14.40078`

Which can be used to normalize (depending on the raw of refined mode) as follows (based on [preprocessing.py Normalize](https://github.com/Barchid/RGBD-Seg/blob/master/preprocessing.py)):
```
if self._depth_mode == 'raw':
    depth_0 = depth == 0
    depth = torchvision.transforms.Normalize(
        mean=24.82968, std=14.40078)(depth)
    # set invalid values back to zero again
    depth[depth_0] = 0
else:
    depth = torchvision.transforms.Normalize(
        mean=self.24.82968, std=14.40078)(depth)
```

Evaluated over the test split using above approach yield 35.2% depth accuracy

TODO: Create a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement load_and_transform_depth_data #134

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement load_and_transform_depth_data #134

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions