Official implementation of SemGeo-AttentionNet for 3D visual saliency prediction.
- Hardware: NVIDIA GPU with CUDA support (A100/H100 recommended, 40GB+ VRAM)
- OS: Linux (Ubuntu 20.04/22.04)
- Python: 3.10
Note: Two separate conda environments are required to avoid dependency conflicts.
# Create and activate environment
conda env create -f geo_environment.yml
conda activate geo_env
# Initialize submodule
git submodule update --init --recursive
# Build CUDA operations
cd Pointcept/libs/pointops && pip install .
cd ../pointgroup_ops && pip install .
cd ../../..# Create and activate environment
conda env create -f sem_environment.yml
conda activate sem_envPyTorch3D is required for mesh rendering. Choose one method:
Option 1 - Conda (Recommended)
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3dOption 2 - From Source
pip install "git+https://github.com/facebookresearch/pytorch3d.git"Option 3 - Pre-built Wheels
pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt210/download.htmlTroubleshooting: If you get ModuleNotFoundError: No module named 'pytorch3d', see the official installation guide.
data/
├── meshes/ # Input .obj files
├── gaze/ # Gaze coordinates (ground truth alignment)
├── ground_truth_dense/ # Sparse GT labels (~20k points)
├── geometry_data/ # [Auto-generated] Preprocessed geometry
├── meshes_semantic/ # [Auto-generated] Semantic features
└── ground_truth_aligned/ # [Auto-generated] Aligned GT
Extract vertices and normals from meshes.
conda activate geo_env
python preprocess_geometry.pyOutput: geometry_data/*.pt (N × 6: XYZ + normals)
Render multi-view images and extract DINO + Stable Diffusion features.
conda activate sem_env
python sem_extractor.py --mesh_glob "meshes/*.obj" \
--out_dir meshes_semantic \
--num_views 100 \
--H 512 --W 512Output: meshes_semantic/*.pt (N × 2048: DINO 768-dim + SD 1280-dim)
Optional flags:
--use_normal_map: Enable normal map conditioning--prompt_mode filename: Derive text prompt from filename
Map sparse labels to full mesh resolution.
conda activate geo_env
python align_gt.pyOutput: ground_truth_aligned/*.pt (N)
conda activate geo_env
python train.pyConfig: Batch size 8, LR 1e-4, 100 epochs, KL-Div (10×) + Corr Coeff (2×) loss
conda activate geo_env
python eval_and_vis.pyOutput: results_viz_normalized/*.ply (colored point clouds)
- Download SAL3D dataset
- Organize data as shown above
- Run steps 1-5
- Geometric features: Point Transformer V3 (PTv3)
- Semantic features: DINOv2 + Stable Diffusion via multi-view rendering
- Fusion: Cross-attention for per-vertex saliency
Built with Pointcept, DINOv2, Stable Diffusion, and PyTorch3D.
MIT License