Skip to content

sohampahari/SemGeoAttnNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors

Official implementation of SemGeo-AttentionNet for 3D visual saliency prediction.

Quick Start

Requirements

  • Hardware: NVIDIA GPU with CUDA support (A100/H100 recommended, 40GB+ VRAM)
  • OS: Linux (Ubuntu 20.04/22.04)
  • Python: 3.10

Note: Two separate conda environments are required to avoid dependency conflicts.

Installation

Environment 1: Geometric Features & Training

# Create and activate environment
conda env create -f geo_environment.yml
conda activate geo_env

# Initialize submodule
git submodule update --init --recursive

# Build CUDA operations
cd Pointcept/libs/pointops && pip install .
cd ../pointgroup_ops && pip install .
cd ../../..

Environment 2: Semantic Features

# Create and activate environment
conda env create -f sem_environment.yml
conda activate sem_env

Installing PyTorch3D

PyTorch3D is required for mesh rendering. Choose one method:

Option 1 - Conda (Recommended)

conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

Option 2 - From Source

pip install "git+https://github.com/facebookresearch/pytorch3d.git"

Option 3 - Pre-built Wheels

pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt210/download.html

Troubleshooting: If you get ModuleNotFoundError: No module named 'pytorch3d', see the official installation guide.

Data Structure

data/
├── meshes/                    # Input .obj files
├── gaze/                      # Gaze coordinates (ground truth alignment)
├── ground_truth_dense/        # Sparse GT labels (~20k points)
├── geometry_data/             # [Auto-generated] Preprocessed geometry
├── meshes_semantic/           # [Auto-generated] Semantic features
└── ground_truth_aligned/      # [Auto-generated] Aligned GT

Usage

1. Preprocess Geometry

Extract vertices and normals from meshes.

conda activate geo_env
python preprocess_geometry.py

Output: geometry_data/*.pt (N × 6: XYZ + normals)

2. Extract Semantic Features

Render multi-view images and extract DINO + Stable Diffusion features.

conda activate sem_env
python sem_extractor.py --mesh_glob "meshes/*.obj" \
                        --out_dir meshes_semantic \
                        --num_views 100 \
                        --H 512 --W 512

Output: meshes_semantic/*.pt (N × 2048: DINO 768-dim + SD 1280-dim)

Optional flags:

  • --use_normal_map: Enable normal map conditioning
  • --prompt_mode filename: Derive text prompt from filename

3. Align Ground Truth

Map sparse labels to full mesh resolution.

conda activate geo_env
python align_gt.py

Output: ground_truth_aligned/*.pt (N)

4. Train

conda activate geo_env
python train.py

Config: Batch size 8, LR 1e-4, 100 epochs, KL-Div (10×) + Corr Coeff (2×) loss

5. Evaluate & Visualize

conda activate geo_env
python eval_and_vis.py

Output: results_viz_normalized/*.ply (colored point clouds)

Reproducing SAL3D Results

  1. Download SAL3D dataset
  2. Organize data as shown above
  3. Run steps 1-5

Architecture

  • Geometric features: Point Transformer V3 (PTv3)
  • Semantic features: DINOv2 + Stable Diffusion via multi-view rendering
  • Fusion: Cross-attention for per-vertex saliency

Citation

Acknowledgements

Built with Pointcept, DINOv2, Stable Diffusion, and PyTorch3D.

License

MIT License

About

Official github repository of the SemGeo Attention Net code base

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors