Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -209,3 +209,12 @@ cython_debug/
marimo/_static/
marimo/_lsp/
__marimo__/

# model files
*.pt
*.pth
*.onnx
*.engine
*.ts

output/
151 changes: 134 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,150 @@
# Vector Perception ROS

ROS2 perception stack for generalist robotics.
ROS 2 perception stack for generalist robotics. This package provides vision-based perception capabilities (tracking, detection, semantic mapping) that wrap and extend the [vector_navigation_stack](../vector_navigation_stack/).

## Architecture

```
vector_robotics/
├── .venv/ # Shared Python virtual environment
├── vector_navigation_stack/ # Core autonomy (SLAM, planning, navigation)
└── vector_perception_ros/ # Perception layer (this package)
```

`vector_perception_ros` is a perception wrapper that integrates with `vector_navigation_stack`. Both packages should be built and sourced together.

## Requirements

- **Ubuntu 24.04** + **ROS 2 Jazzy** (or Ubuntu 22.04 + ROS 2 Humble)
- **Python 3.12**
- **NVIDIA GPU** with CUDA (required for EdgeTAM, YOLO-E, VLMs)
- **PyTorch** (installed separately, see below)

## Packages

- **track_anything** - EdgeTAM tracking + 3D segmentation with RGBD
- **detect_anything** - YOLO-E detection node and utilities
- **semantic_mapping** - Semantic 3D mapping with VLM query hooks
- **sensor_coverage** - Room segmentation and coverage analysis
- **vlm** - Vision-Language Model interfaces (Qwen, Moondream)
- **vector_perception_utils** - Image and point cloud utilities

## Installation

### 1. System Dependencies

```bash
# ROS 2 and PCL
sudo apt update
sudo apt install -y \
ros-$ROS_DISTRO-desktop-full \
ros-$ROS_DISTRO-pcl-ros \
ros-$ROS_DISTRO-backward-ros \
libpcl-dev \
git \
cmake \
libgoogle-glog-dev \
libgflags-dev \
libatlas-base-dev \
libeigen3-dev \
libsuitesparse-dev \
nlohmann-json3-dev
```

### 2. Python Environment Setup

We recommend using a **shared virtual environment** at the parent `vector_robotics/` level for both `vector_perception_ros` and `vector_navigation_stack`.

```bash
# Install uv (one time)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Create shared venv at parent level
cd /path/to/vector_robotics
uv venv --python 3.12
source .venv/bin/activate
```

### 3. Install PyTorch (before uv sync)

PyTorch must be installed separately with CUDA support:

```bash
# Create venv
python3 -m venv ~/vector_venv
source ~/vector_venv/bin/activate
# With shared venv activated
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

Verify GPU access:
```bash
python -c "import torch; print(torch.cuda.is_available())"
```

# Install dependencies
cd /home/alex-lin/dev/vector_perception_ros
pip install -r requirements.txt
### 4. Install Python Dependencies

# Build
source /opt/ros/jazzy/setup.bash
colcon build
```bash
cd /path/to/vector_robotics/vector_perception_ros
source ../.venv/bin/activate # Ensure shared venv is active

# Use --active to install into the currently active venv
uv sync --active
```

> **Note**: If you see a warning about `VIRTUAL_ENV` not matching the project path, use `uv sync --active` to target the shared parent environment.

### 5. Build vector_navigation_stack (if not already built)

See [vector_navigation_stack/README.md](../vector_navigation_stack/README.md) for full instructions.

```bash
cd /path/to/vector_robotics/vector_navigation_stack
source /opt/ros/$ROS_DISTRO/setup.bash
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release
```

### 6. Build vector_perception_ros

```bash
cd /path/to/vector_robotics/vector_perception_ros
source /opt/ros/$ROS_DISTRO/setup.bash
source ../vector_navigation_stack/install/setup.bash # Source nav stack first

colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release \
--packages-skip arise_slam_mid360 arise_slam_mid360_msgs livox_ros_driver2
```

> **Note**: The `--packages-skip` flags exclude SLAM and Livox driver packages that are built separately or not needed for perception-only setups.

## Usage

### Terminal Setup (every session)

```bash
# Every terminal session, activate environment:
source ~/vector_venv/bin/activate
source /opt/ros/jazzy/setup.bash
source /home/alex-lin/dev/vector_perception_ros/install/setup.bash
# 1. Activate Python environment
source /path/to/vector_robotics/.venv/bin/activate

# 2. Source ROS 2
source /opt/ros/$ROS_DISTRO/setup.bash

# 3. Source navigation stack (provides base autonomy)
source /path/to/vector_robotics/vector_navigation_stack/install/setup.bash

# 4. Source perception stack
source /path/to/vector_robotics/vector_perception_ros/install/setup.bash

# 5. (Optional) Set API keys for cloud VLMs
export ALIBABA_API_KEY=your_api_key_here # Required for Qwen VLM
```

> **Tip**: Add the API key export to your `~/.bashrc` or use a `.env` file to avoid setting it every session.

### Quick Tests

```bash
# Test EdgeTAM with webcam
python -m track_anything.test_edge_tam

# Run 3D tracking
# Run 3D tracking node
ros2 launch track_anything track_3d.launch.py
```

Expand Down Expand Up @@ -100,14 +209,22 @@ for det in detections:

## Troubleshooting

**ModuleNotFoundError**: Activate venv: `source ~/vector_venv/bin/activate`
**ModuleNotFoundError**: Ensure venv is activated: `source /path/to/vector_robotics/.venv/bin/activate`

**`uv sync` warning about VIRTUAL_ENV**: Use `uv sync --active` to install into the shared parent venv.

**No camera info**: Check camera is running: `ros2 topic list | grep camera_info`

**Performance**: EdgeTAM needs GPU. Check: `nvidia-smi`
**PyTorch not found / No CUDA**: Install PyTorch manually with CUDA support (see Installation step 3).

**Performance issues**: EdgeTAM and VLMs require a GPU. Check: `nvidia-smi`

**Build errors with SLAM packages**: Use `--packages-skip arise_slam_mid360 arise_slam_mid360_msgs livox_ros_driver2` if you don't need these.

## Documentation

See package READMEs:
- [track_anything/README.md](track_anything/README.md)
- [semantic_mapping/README.md](semantic_mapping/README.md)
- [vlm/README.md](vlm/README.md)
- [vector_perception_utils/README.md](vector_perception_utils/README.md)
54 changes: 54 additions & 0 deletions detect_anything/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
cmake_minimum_required(VERSION 3.10)
project(detect_anything)

find_package(ament_cmake REQUIRED)
find_package(rclpy REQUIRED)
find_package(sensor_msgs REQUIRED)
find_package(std_msgs REQUIRED)
find_package(vision_msgs REQUIRED)
find_package(cv_bridge REQUIRED)
find_package(rosidl_default_generators REQUIRED)
find_package(Python3 REQUIRED COMPONENTS Interpreter)

rosidl_generate_interfaces(${PROJECT_NAME}
"msg/DetectionResult.msg"
DEPENDENCIES std_msgs sensor_msgs
)

set(PYTHON_INSTALL_DIR "lib/python${Python3_VERSION_MAJOR}.${Python3_VERSION_MINOR}/site-packages")

install(
DIRECTORY detect_anything
DESTINATION ${PYTHON_INSTALL_DIR}
)

install(
PROGRAMS
scripts/detection_node
DESTINATION lib/${PROJECT_NAME}
)

install(
FILES resource/detect_anything
DESTINATION share/${PROJECT_NAME}/resource
)

install(
DIRECTORY config
DESTINATION share/${PROJECT_NAME}
)

if(EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/models")
install(
DIRECTORY models
DESTINATION share/${PROJECT_NAME}
)
endif()

set(ENV_HOOK "${CMAKE_CURRENT_SOURCE_DIR}/env-hooks/venv_pythonpath.sh.in")
set(ENV_HOOK_OUT "${CMAKE_CURRENT_BINARY_DIR}/ament_cmake_environment_hooks/venv_pythonpath.sh")
configure_file(${ENV_HOOK} ${ENV_HOOK_OUT} @ONLY)
ament_environment_hooks(${ENV_HOOK_OUT})

ament_export_dependencies(rosidl_default_runtime)
ament_package()
28 changes: 28 additions & 0 deletions detect_anything/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# detect_anything

YOLO-E detection node that publishes `DetectionResult` with cropped masks and an annotated overlay topic.

## What’s inside
- `detect_anything/detection.py`: detection results container and Ultralytics parsing helpers.
- `detect_anything/yoloe.py`: YOLO-E wrapper with prompt support and basic filtering.
- `detect_anything/detection_node.py`: ROS2 node wiring the detector to `DetectionResult`.
- `msg/DetectionResult.msg`: compressed image + cropped mask array.

## Quick start
```bash
source ~/vector_venv/bin/activate
source /opt/ros/jazzy/setup.bash
colcon build --packages-select detect_anything
source install/setup.bash

ros2 run detect_anything detection_node \
--ros-args -p model_path:=/path/to/yoloe/models \
-p model_name:=yoloe-11s-seg-pf.pt \
-p conf:=0.6 \
-p max_area_ratio:=0.3 \
-p image_topic:=/camera/image
```

Topics:
- Publishes `/detection_result` (`detect_anything/DetectionResult`) and `/annotated_image_detection` (`sensor_msgs/Image`).
- Subscribes to `/camera/image` (or `/camera/image/compressed` if `use_compressed:=true`).
46 changes: 46 additions & 0 deletions detect_anything/config/objects.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Simple list of object names for prompting YOLO-E
- chair
- desk
- tv_monitor
- sofa
- unknown
- printer
- coffee machine
- refrigerator
- trash can
- shoe
- sink
- table
- oven
- bed
- painting
- bulletin board
- plant
- vase
- cabinet
- shelf
- book
- cup
- sculpture
- keyboard
- mouse
- clock
- phone
- toilet
- bathtub
- microwave oven
- pan
- suitcase
- light
- curtain
- whiteboard
- shower knob
- bottle
- water dispenser
- vending machine
- laptop
- bag
- locker
- picture
- cardboard
- extinguisher
1 change: 1 addition & 0 deletions detect_anything/detect_anything/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""detect_anything package."""
Loading