object-ID alignment mismatch between the public caption JSON files and the instance IDs produced by the official ScanNet++ toolbox preprocessing

Hi, thanks for open-sourcing ExCap3D.

I am trying to reproduce the ScanNet++ captioning pipeline using the public code and the public ExCap3D caption release. I found what looks like an object-ID alignment mismatch between the public caption JSON files and the instance IDs produced by the official ScanNet++ toolbox preprocessing.

What I did:
1. Prepared full `.pth` files using the official ScanNet++ toolbox:
   `python -m semantic.prep.prepare_training_data semantic/configs/prepare_training_data.yml`
2. Sampled points with `sample_pth.py`
3. Converted sampled `.pth` to Mask3D `.npy` with `datasets.preprocessing.scannetpp_pth_preprocessing.py`

From the ScanNet++ toolbox code (`semantic/transforms/mesh.py`), `vtx_instance_anno_id` is assigned from:
```python
instance_anno_id_multi[inst_mask, new_label_position] = instance['objectId']
sample['vtx_instance_anno_id'] = instance_anno_id
I verified that:

full_pth_data_normals and pth_data_sampled_normals have the same instance IDs
sampled .pth and final Mask3D .npy also have the same instance IDs
However, the public ExCap3D caption JSON keys do not match these IDs.

Concrete example: scene 286b55a2bf

In segments_anno.json, relevant objectIds are:
toilet -> 51
sink -> 50
soap dispenser -> 49
door -> 63
cabinet -> 72
In the public ExCap3D caption JSON, the corresponding captions are keyed by:
13 -> toilet caption
14 -> sink caption
15 -> soap dispenser caption
32 -> door caption
33 -> cabinet caption
I also checked:

PTH vs NPY for this scene: shared=23, pth_only=0, npy_only=0
PTH vs caption JSON for this scene: shared=0, pth_only=23, caption_only=13
On the validation split, this leads to many mismatches:

caption-only inst ids: 191 / 1135
suspicious overlap inst ids: 218 / 944
My questions are:

Are the public ExCap3D caption JSON keys expected to match ScanNet++ segGroups[].objectId?
If not, what ID do the public caption JSON keys correspond to?
Is there an official mapping script or a specific ScanNet++ preprocessing version needed to align the public captions with the released code?
Are the internal scene_captions files used in the training scripts different from the public excap3d-release JSON files?
If helpful, I can attach the single-scene evidence for 286b55a2bf.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

object-ID alignment mismatch between the public caption JSON files and the instance IDs produced by the official ScanNet++ toolbox preprocessing #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

object-ID alignment mismatch between the public caption JSON files and the instance IDs produced by the official ScanNet++ toolbox preprocessing #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions