-
Notifications
You must be signed in to change notification settings - Fork 1
object-ID alignment mismatch between the public caption JSON files and the instance IDs produced by the official ScanNet++ toolbox preprocessing #9
Copy link
Copy link
Open
Description
Hi, thanks for open-sourcing ExCap3D.
I am trying to reproduce the ScanNet++ captioning pipeline using the public code and the public ExCap3D caption release. I found what looks like an object-ID alignment mismatch between the public caption JSON files and the instance IDs produced by the official ScanNet++ toolbox preprocessing.
What I did:
- Prepared full
.pthfiles using the official ScanNet++ toolbox:
python -m semantic.prep.prepare_training_data semantic/configs/prepare_training_data.yml - Sampled points with
sample_pth.py - Converted sampled
.pthto Mask3D.npywithdatasets.preprocessing.scannetpp_pth_preprocessing.py
From the ScanNet++ toolbox code (semantic/transforms/mesh.py), vtx_instance_anno_id is assigned from:
instance_anno_id_multi[inst_mask, new_label_position] = instance['objectId']
sample['vtx_instance_anno_id'] = instance_anno_id
I verified that:
full_pth_data_normals and pth_data_sampled_normals have the same instance IDs
sampled .pth and final Mask3D .npy also have the same instance IDs
However, the public ExCap3D caption JSON keys do not match these IDs.
Concrete example: scene 286b55a2bf
In segments_anno.json, relevant objectIds are:
toilet -> 51
sink -> 50
soap dispenser -> 49
door -> 63
cabinet -> 72
In the public ExCap3D caption JSON, the corresponding captions are keyed by:
13 -> toilet caption
14 -> sink caption
15 -> soap dispenser caption
32 -> door caption
33 -> cabinet caption
I also checked:
PTH vs NPY for this scene: shared=23, pth_only=0, npy_only=0
PTH vs caption JSON for this scene: shared=0, pth_only=23, caption_only=13
On the validation split, this leads to many mismatches:
caption-only inst ids: 191 / 1135
suspicious overlap inst ids: 218 / 944
My questions are:
Are the public ExCap3D caption JSON keys expected to match ScanNet++ segGroups[].objectId?
If not, what ID do the public caption JSON keys correspond to?
Is there an official mapping script or a specific ScanNet++ preprocessing version needed to align the public captions with the released code?
Are the internal scene_captions files used in the training scripts different from the public excap3d-release JSON files?
If helpful, I can attach the single-scene evidence for 286b55a2bf.
Thanks.Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels