A comprehensive robotic task planning and execution system that combines computer vision, AI-powered task decomposition, and ROS-based robot control for automated block manipulation tasks.
This project implements an intelligent robotic system capable of:
- Computer Vision Block Tracking: Real-time detection and tracking of colored blocks using OpenCV
- AI-Powered Task Planning: Using Google's Gemini AI to decompose high-level tasks into executable robot actions
- ROS-Based Robot Control: Coordinated robot arm control for pick-and-place operations
- 3D Point Cloud Processing: Depth perception and spatial understanding for precise manipulation
The system consists of several interconnected modules:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Task Input βββββΆβ Gemini AI βββββΆβ Task Executor β
β (User) β β Decomposition β β (ROS Node) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Block Tracker βββββΆβ Point Cloud βββββΆβ Robot Controllerβ
β (OpenCV) β β Transformer β β (ROS Node) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
gemini-api.py: Interfaces with Google's Gemini AI to decompose high-level tasks into robot-executable subtasksconfig.py: Manages API keys and configuration settingstask_executor.py: ROS node that receives and executes task plans
block_tracking.py: Standalone OpenCV-based block detection and trackingros_block_tracker.py: ROS-integrated version of block trackingcentroid_tracker.py: Object tracking algorithm for maintaining block IDs across frames
listener.py: Camera data processing and point cloud generationpoint_cloud_transformer.py: Coordinate frame transformations for robot integration
robot_controller.py: High-level robot arm control and pick-and-place operations
- Detects blocks in 7 different colors: red, orange, yellow, green, blue, purple
- Robust HSV color space filtering with morphological operations
- Real-time tracking with unique ID assignment
- Uses Gemini 2.0 Flash Thinking model for task planning
- Converts natural language commands into structured robot actions
- Supports "Pick" and "Place" primitive skills
- Generates JSON-formatted task plans
- Centroid-based object tracking with persistence
- Movement detection and visualization
- Trail visualization for motion analysis
- Robust handling of occlusions and temporary disappearances
- Full ROS ecosystem integration
- Point cloud processing and transformation
- Robot arm pose control
- Real-time sensor data processing
# Python dependencies
pip install opencv-python numpy imutils scipy google-genai python-dotenv
# ROS dependencies (if using ROS)
sudo apt-get install ros-noetic-cv-bridge ros-noetic-tf2-ros
pip install rospkg- Create a
.envfile in the project root:
GEMINI_API_KEY=your_gemini_api_key_here- Install the required Python packages:
pip install -r requirements.txt# Standalone block tracking with webcam
python block_tracking.py
# With video file
python block_tracking.py --video path/to/video.mp4# Interactive task planning
python gemini-api.py
# Enter task: "Stack the red block on top of the blue block"# Terminal 1: Start ROS core
roscore
# Terminal 2: Start camera listener
python listener.py
# Terminal 3: Start block tracker
python ros_block_tracker.py
# Terminal 4: Start point cloud transformer
python point_cloud_transformer.py
# Terminal 5: Start robot controller
python robot_controller.py
# Terminal 6: Start task executor
python task_executor.py- Color Ranges: HSV thresholds for each color in
block_tracking.py - Block Size:
MIN_BLOCK_AREAandMAX_BLOCK_AREAfor size filtering - Movement Threshold:
movement_thresholdfor motion detection
- Gripper Control:
gripper_openandgripper_closepositions - Pick/Place Heights:
pick_heightandplace_heightabove surfaces - Movement Delays: Timing for robot arm movements
- Model: Gemini 2.0 Flash Thinking (configurable in
gemini-api.py) - Skills: Currently supports "Pick" and "Place" (extensible)
- Output Format: JSON array of subtask objects
[
{
"subtask": "Pick the red block",
"skill": "Pick"
},
{
"subtask": "Place the red block on top of the blue block",
"skill": "Place"
}
]- Real-time block positions and IDs
- Movement status and trails
- Color classification results
- Camera not detected: Check camera permissions and device connections
- Color detection issues: Adjust HSV ranges in
color_rangesdictionary - ROS connection errors: Ensure ROS core is running and topics are published
- API key errors: Verify Gemini API key in
.envfile
- Reduce frame resolution for faster processing
- Adjust block size thresholds for your environment
- Use GPU acceleration for OpenCV operations
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Google Gemini AI for task planning capabilities
- OpenCV community for computer vision tools
- ROS community for robotics framework
- FRI (Friendly Robotics Initiative) for project inspiration