VectorRobotics · alexlin2 · Jan 21, 2026 · Jan 23, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/.gitignore b/.gitignore
@@ -209,3 +209,12 @@ cython_debug/
 marimo/_static/
 marimo/_lsp/
 __marimo__/
+
+# model files
+*.pt
+*.pth
+*.onnx
+*.engine
+*.ts
+
+output/
diff --git a/README.md b/README.md
@@ -1,41 +1,150 @@
 # Vector Perception ROS
 
-ROS2 perception stack for generalist robotics.
+ROS 2 perception stack for generalist robotics. This package provides vision-based perception capabilities (tracking, detection, semantic mapping) that wrap and extend the [vector_navigation_stack](../vector_navigation_stack/).
+
+## Architecture
+
+```
+vector_robotics/
+├── .venv/                      # Shared Python virtual environment
+├── vector_navigation_stack/    # Core autonomy (SLAM, planning, navigation)
+└── vector_perception_ros/      # Perception layer (this package)
+```
+
+`vector_perception_ros` is a perception wrapper that integrates with `vector_navigation_stack`. Both packages should be built and sourced together.
+
+## Requirements
+
+- **Ubuntu 24.04** + **ROS 2 Jazzy** (or Ubuntu 22.04 + ROS 2 Humble)
+- **Python 3.12**
+- **NVIDIA GPU** with CUDA (required for EdgeTAM, YOLO-E, VLMs)
+- **PyTorch** (installed separately, see below)
 
 ## Packages
 
 - **track_anything** - EdgeTAM tracking + 3D segmentation with RGBD
+- **detect_anything** - YOLO-E detection node and utilities
+- **semantic_mapping** - Semantic 3D mapping with VLM query hooks
+- **sensor_coverage** - Room segmentation and coverage analysis
+- **vlm** - Vision-Language Model interfaces (Qwen, Moondream)
 - **vector_perception_utils** - Image and point cloud utilities
 
 ## Installation
 
+### 1. System Dependencies
+
+```bash
+# ROS 2 and PCL
+sudo apt update
+sudo apt install -y \
+  ros-$ROS_DISTRO-desktop-full \
+  ros-$ROS_DISTRO-pcl-ros \
+  ros-$ROS_DISTRO-backward-ros \
+  libpcl-dev \
+  git \
+  cmake \
+  libgoogle-glog-dev \
+  libgflags-dev \
+  libatlas-base-dev \
+  libeigen3-dev \
+  libsuitesparse-dev \
+  nlohmann-json3-dev
+```
+
+### 2. Python Environment Setup
+
+We recommend using a **shared virtual environment** at the parent `vector_robotics/` level for both `vector_perception_ros` and `vector_navigation_stack`.
+
+```bash
+# Install uv (one time)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+export PATH="$HOME/.local/bin:$PATH"
+
+# Create shared venv at parent level
+cd /path/to/vector_robotics
+uv venv --python 3.12
+source .venv/bin/activate
+```
+
+### 3. Install PyTorch (before uv sync)
+
+PyTorch must be installed separately with CUDA support:
+
 ```bash
-# Create venv
-python3 -m venv ~/vector_venv
-source ~/vector_venv/bin/activate
+# With shared venv activated
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+```
+
+Verify GPU access:
+```bash
+python -c "import torch; print(torch.cuda.is_available())"
+```
 
-# Install dependencies
-cd /home/alex-lin/dev/vector_perception_ros
-pip install -r requirements.txt
+### 4. Install Python Dependencies
 
-# Build
-source /opt/ros/jazzy/setup.bash
-colcon build
+```bash
+cd /path/to/vector_robotics/vector_perception_ros
+source ../.venv/bin/activate  # Ensure shared venv is active
 
+# Use --active to install into the currently active venv
+uv sync --active
+```
 
+> **Note**: If you see a warning about `VIRTUAL_ENV` not matching the project path, use `uv sync --active` to target the shared parent environment.
+
+### 5. Build vector_navigation_stack (if not already built)
+
+See [vector_navigation_stack/README.md](../vector_navigation_stack/README.md) for full instructions.
+
+```bash
+cd /path/to/vector_robotics/vector_navigation_stack
+source /opt/ros/$ROS_DISTRO/setup.bash
+colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release
+```
+
+### 6. Build vector_perception_ros
+
+```bash
+cd /path/to/vector_robotics/vector_perception_ros
+source /opt/ros/$ROS_DISTRO/setup.bash
+source ../vector_navigation_stack/install/setup.bash  # Source nav stack first
+
+colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release \
+  --packages-skip arise_slam_mid360 arise_slam_mid360_msgs livox_ros_driver2
+```
+
+> **Note**: The `--packages-skip` flags exclude SLAM and Livox driver packages that are built separately or not needed for perception-only setups.
 
 ## Usage
 
+### Terminal Setup (every session)
+
 ```bash
-# Every terminal session, activate environment:
-source ~/vector_venv/bin/activate
-source /opt/ros/jazzy/setup.bash
-source /home/alex-lin/dev/vector_perception_ros/install/setup.bash
+# 1. Activate Python environment
+source /path/to/vector_robotics/.venv/bin/activate
 
+# 2. Source ROS 2
+source /opt/ros/$ROS_DISTRO/setup.bash
+
+# 3. Source navigation stack (provides base autonomy)
+source /path/to/vector_robotics/vector_navigation_stack/install/setup.bash
+
+# 4. Source perception stack
+source /path/to/vector_robotics/vector_perception_ros/install/setup.bash
+
+# 5. (Optional) Set API keys for cloud VLMs
+export ALIBABA_API_KEY=your_api_key_here  # Required for Qwen VLM
+```
+
+> **Tip**: Add the API key export to your `~/.bashrc` or use a `.env` file to avoid setting it every session.
+
+### Quick Tests
+
+```bash
 # Test EdgeTAM with webcam
 python -m track_anything.test_edge_tam
 
-# Run 3D tracking
+# Run 3D tracking node
 ros2 launch track_anything track_3d.launch.py
 ```
 
@@ -100,14 +209,22 @@ for det in detections:
 
 ## Troubleshooting
 
-**ModuleNotFoundError**: Activate venv: `source ~/vector_venv/bin/activate`
+**ModuleNotFoundError**: Ensure venv is activated: `source /path/to/vector_robotics/.venv/bin/activate`
+
+**`uv sync` warning about VIRTUAL_ENV**: Use `uv sync --active` to install into the shared parent venv.
 
 **No camera info**: Check camera is running: `ros2 topic list | grep camera_info`
 
-**Performance**: EdgeTAM needs GPU. Check: `nvidia-smi`
+**PyTorch not found / No CUDA**: Install PyTorch manually with CUDA support (see Installation step 3).
+
+**Performance issues**: EdgeTAM and VLMs require a GPU. Check: `nvidia-smi`
+
+**Build errors with SLAM packages**: Use `--packages-skip arise_slam_mid360 arise_slam_mid360_msgs livox_ros_driver2` if you don't need these.
 
 ## Documentation
 
 See package READMEs:
 - [track_anything/README.md](track_anything/README.md)
+- [semantic_mapping/README.md](semantic_mapping/README.md)
+- [vlm/README.md](vlm/README.md)
 - [vector_perception_utils/README.md](vector_perception_utils/README.md)
diff --git a/detect_anything/CMakeLists.txt b/detect_anything/CMakeLists.txt
@@ -0,0 +1,54 @@
+cmake_minimum_required(VERSION 3.10)
+project(detect_anything)
+
+find_package(ament_cmake REQUIRED)
+find_package(rclpy REQUIRED)
+find_package(sensor_msgs REQUIRED)
+find_package(std_msgs REQUIRED)
+find_package(vision_msgs REQUIRED)
+find_package(cv_bridge REQUIRED)
+find_package(rosidl_default_generators REQUIRED)
+find_package(Python3 REQUIRED COMPONENTS Interpreter)
+
+rosidl_generate_interfaces(${PROJECT_NAME}
+  "msg/DetectionResult.msg"
+  DEPENDENCIES std_msgs sensor_msgs
+)
+
+set(PYTHON_INSTALL_DIR "lib/python${Python3_VERSION_MAJOR}.${Python3_VERSION_MINOR}/site-packages")
+
+install(
+  DIRECTORY detect_anything
+  DESTINATION ${PYTHON_INSTALL_DIR}
+)
+
+install(
+  PROGRAMS
+    scripts/detection_node
+  DESTINATION lib/${PROJECT_NAME}
+)
+
+install(
+  FILES resource/detect_anything
+  DESTINATION share/${PROJECT_NAME}/resource
+)
+
+install(
+  DIRECTORY config
+  DESTINATION share/${PROJECT_NAME}
+)
+
+if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/models")
+  install(
+    DIRECTORY models
+    DESTINATION share/${PROJECT_NAME}
+  )
+endif()
+
+set(ENV_HOOK "${CMAKE_CURRENT_SOURCE_DIR}/env-hooks/venv_pythonpath.sh.in")
+set(ENV_HOOK_OUT "${CMAKE_CURRENT_BINARY_DIR}/ament_cmake_environment_hooks/venv_pythonpath.sh")
+configure_file(${ENV_HOOK} ${ENV_HOOK_OUT} @ONLY)
+ament_environment_hooks(${ENV_HOOK_OUT})
+
+ament_export_dependencies(rosidl_default_runtime)
+ament_package()
diff --git a/detect_anything/README.md b/detect_anything/README.md
@@ -0,0 +1,28 @@
+# detect_anything
+
+YOLO-E detection node that publishes `DetectionResult` with cropped masks and an annotated overlay topic.
+
+## What’s inside
+- `detect_anything/detection.py`: detection results container and Ultralytics parsing helpers.
+- `detect_anything/yoloe.py`: YOLO-E wrapper with prompt support and basic filtering.
+- `detect_anything/detection_node.py`: ROS2 node wiring the detector to `DetectionResult`.
+- `msg/DetectionResult.msg`: compressed image + cropped mask array.
+
+## Quick start
+```bash
+source ~/vector_venv/bin/activate
+source /opt/ros/jazzy/setup.bash
+colcon build --packages-select detect_anything
+source install/setup.bash
+
+ros2 run detect_anything detection_node \
+  --ros-args -p model_path:=/path/to/yoloe/models \
+             -p model_name:=yoloe-11s-seg-pf.pt \
+             -p conf:=0.6 \
+             -p max_area_ratio:=0.3 \
+             -p image_topic:=/camera/image
+```
+
+Topics:
+- Publishes `/detection_result` (`detect_anything/DetectionResult`) and `/annotated_image_detection` (`sensor_msgs/Image`).
+- Subscribes to `/camera/image` (or `/camera/image/compressed` if `use_compressed:=true`).
diff --git a/detect_anything/config/objects.yaml b/detect_anything/config/objects.yaml
@@ -0,0 +1,46 @@
+# Simple list of object names for prompting YOLO-E
+- chair
+- desk
+- tv_monitor
+- sofa
+- unknown
+- printer
+- coffee machine
+- refrigerator
+- trash can
+- shoe
+- sink
+- table
+- oven
+- bed
+- painting
+- bulletin board
+- plant
+- vase
+- cabinet
+- shelf
+- book
+- cup
+- sculpture
+- keyboard
+- mouse
+- clock
+- phone
+- toilet
+- bathtub
+- microwave oven
+- pan
+- suitcase
+- light
+- curtain
+- whiteboard
+- shower knob
+- bottle
+- water dispenser
+- vending machine
+- laptop
+- bag
+- locker
+- picture
+- cardboard
+- extinguisher
diff --git a/detect_anything/detect_anything/__init__.py b/detect_anything/detect_anything/__init__.py
@@ -0,0 +1 @@
+"""detect_anything package."""