-
Notifications
You must be signed in to change notification settings - Fork 843
Description
Issues: #122 #14 #69 #121 report the fact that the load_and_transform_depth function is not implemented
I am raising this issue to implement the following data preprocessing steps in a PR, as it yield the reported 35% zero-shot classification for SUN-RGBD depth-only.
Important details for the scene classification task for SUNRGBD:
Scene subset:
The classification task only considers the following classes:
SCENES = ['bathroom', 'bedroom', 'classroom', 'computer_room', 'conference_room', 'corridor', 'dining_area', 'dining_room', 'discussion_area', 'furniture_store', 'home_office', 'kitchen', 'lab', 'lecture_theatre', 'library', 'living_room', 'office', 'rest_space', 'study_space' ]
To reproduce the SUNRGBD results one has to convert the raw depth data to standardized disparity in the following steps:
- Convert raw depth (uint16) to meters following the official SUN RGBD toolbox
read3dPoints.mToolbox
depth = cv2.imread(depth_file, cv2.IMREAD_UNCHANGED)
depth = ((depth >> 3) | (depth << 13)).astype(np.float32) / 1000.0
depth[depth > 8] = 8
- Convert depth to disparity using correct camera intrinsics. Following the response of @imisra with different baselines for each camera. Focal length for each sample can be obtained from the
intrinsics.txtfile.
from pathlib import Path # Optional I just used pathlib
focal_path = Path(depth_file).parents[1] / "intrinsics.txt"
focal_length = float(focal_path.read_text().strip().split()[0])
baseline = get_baseline(depth_file)
disparity = baseline * focal_length / depth
def get_baseline(path: str) -> float:
if "kv1" in path:
return 0.075
elif "kv2" in path:
return 0.075
elif "realsense" in path:
return 0.095
elif "xtion" in path:
return 0.095 # guessed based on length of 18cm for ASUS xtion v1
else:
raise Exception(f"No baseline found for path: {path}")
- Depth standardization by finding the mean and std of the disparity values across the training split. I find these values with the
compute_depth_mean_stdimplementation from RGBD-Seg dataset_base.py.
This yields me the following mean and std values:
mean: 24.82968
std: 14.40078
Which can be used to normalize (depending on the raw of refined mode) as follows (based on preprocessing.py Normalize):
if self._depth_mode == 'raw':
depth_0 = depth == 0
depth = torchvision.transforms.Normalize(
mean=24.82968, std=14.40078)(depth)
# set invalid values back to zero again
depth[depth_0] = 0
else:
depth = torchvision.transforms.Normalize(
mean=self.24.82968, std=14.40078)(depth)
Evaluated over the test split using above approach yield 35.2% depth accuracy
TODO: Create a PR