This document explains how to use the OpenArmX robot with the LeRobot framework for VLA data collection, ACT training, and inference.
β
Success Tips
π¨ The 3 most important things in this document:
1) The startup order must be followed strictly step by step.
2)W/H/FPSmust be exactly the same across camera publishing, collection, and inference.
3) Before collection, update dataset parameters inconfig/vla_collect.env.
- Device Roles
- General Prerequisites
- VLA Data Collection Workflow (IPC)
- 3.1 Manual Startup Order (must follow order)
- 3.2 GUI One-Click Startup (recommended)
- 3.3 Quick Topic Check (optional)
- 3.4 Recording UI Key Guide
- 3.5 Data Collection Command Parameters
- ACT Training Workflow (User High-Performance PC/Server)
- 4.1 Notes Before Training
- 4.2 Download ACT Dependency Models
- 4.3 ACT Training Commands
- VLA Inference Workflow (IPC + Inference Machine)
- 5.1 Inference Prerequisites
- 5.2 Startup Order
- 5.3 Inference Command Parameters
- ROS_DOMAIN_ID Configuration for Two-Machine Collaboration
- 6.1 Check Current Configuration
- 6.2 Set Both Machines to the Same Value (e.g., 77, if inconsistent)
- 6.3 Verify Again
- Camera Parameter Configuration Reference
- 7.1 Which Parameters Need to Be Modified in Commands
- 7.2 Available Resolution/FPS Combinations (D405 / D435)
- 7.3 Three-Camera Bandwidth Limit and Recommended Settings
- IPC (provided by us): Robot CAN control, Pico VR teleoperation, 3-camera publishing, LeRobot data collection.
- User machine (self-configured): Model training and inference (can work together with the IPC).
- The IPC workspace has been built, and
source ~/openarmx_ws/install/setup.bashworks properly. - The robot can start normally and can be teleoperated through VR.
- Two-machine communication is required for the inference scenario in Section 5; configure DOMAIN ID according to Section 6.
β οΈ If Section 2 is not satisfied, the following steps will likely fail.
π¨ Do not skip steps or start modules in parallel out of order.
cd ~/openarmx_ws
source install/setup.bash
ros2 launch openarmx_bringup openarmx.bimanual.launch.py \
control_mode:=mit \
robot_controller:=forward_position_controller \
use_fake_hardware:=falsecd ~/openarmx_ws
source install/setup.bash
ros2 run openarmx_teleop_bridge_vr_pico openarmx_teleop_bridge_vr_pico_nodecd ~/openarmx_ws
source install/setup.bash
ros2 launch openarmx_teleop_vr_pico teleop_vr_pico.launch.pyFirst replace the camera model and serial number in the command with your own device parameters:
- Supported camera models:
D435,D405 cam_left_*/cam_right_*/cam_head_*correspond to left hand, right hand, and head camera respectively- Query serial numbers:
rs-enumerate-devices | grep "Serial Number"
With the standard IPC + standard docking station, the stable upper limit for three cameras is 640x480 @ 30fps.
For camera parameter selection and available combinations, see Section 7 "Camera Parameter Configuration Reference".
π‘ Recommended default: run through the full pipeline with
424x240 @ 30fpsfirst, then gradually increase resolution.
cd ~/openarmx_ws
source install/setup.bash
W=424; H=240; FPS=30
ros2 launch openarmx_lerobot camera_publisher.launch.py \
width:=$W height:=$H fps:=$FPS \
cam_left_serial:=218622270388 cam_left_type:=D405 \
cam_right_serial:=218622274446 cam_right_type:=D405 \
cam_head_serial:=335522070220 cam_head_type:=D435You need to modify these parameters based on your actual devices:
W/H/FPS: unified resolution and frame rate for all three cameras (example:424x240@30).cam_left_serial/cam_right_serial/cam_head_serial: replace with your three camera serial numbers.cam_left_type/cam_right_type/cam_head_type: set to actual camera modelD405orD435.
If you need to tune exposure parameters for all three cameras at startup, you can use the following example:
cd ~/openarmx_ws
source install/setup.bash
W=424; H=240; FPS=30
ros2 launch openarmx_lerobot camera_publisher.launch.py \
width:=$W height:=$H fps:=$FPS \
cam_left_serial:=218622270388 cam_left_type:=D405 \
cam_right_serial:=218622274446 cam_right_type:=D405 \
cam_head_serial:=335522070220 cam_head_type:=D435 \
cam_left_color_auto_exposure:=true \
cam_left_color_exposure:=10000 \
cam_left_color_gain:=32 \
cam_right_color_auto_exposure:=true \
cam_right_color_exposure:=10000 \
cam_right_color_gain:=32 \
cam_head_color_auto_exposure:=true \
cam_head_color_exposure:=10000 \
cam_head_color_gain:=16Common adjustable color parameters:
cam_*_color_auto_exposure: color auto exposure, valuestrue/false/unsetcam_*_color_exposure: color manual exposure, range1..10000cam_*_color_gain: color manual gain, range0..128cam_*_color_auto_white_balance: color auto white balance, valuestrue/false/unsetcam_*_color_white_balance: color manual white balance, range2800..6500cam_*_color_brightness: brightness, range-64..64cam_*_color_contrast: contrast, range0..100cam_*_color_saturation: saturation, range0..100cam_*_color_sharpness: sharpness, range0..100
Notes:
cam_left_*/cam_right_*/cam_head_*apply to left hand, right hand, and head camera respectivelyunsetmeans do not force-set this parameter and keep default driver behavior- If only
cam_*_color_exposureorcam_*_color_gainis specified, launch will automatically addcam_*_color_auto_exposure:=false - If only
cam_*_color_white_balanceis specified, launch will automatically addcam_*_color_auto_white_balance:=false
Enter the LeRobot environment first, then run the recording command:
W/H/FPSconfigures camera resolution and frame rate during data collection (for example:W=640; H=480; FPS=30).- Here,
W/H/FPSmust be exactly the same aswidth/height/fpsincamera_publisher.launch.py. - After changing W/H/FPS in camera publisher, update W/H/FPS in the collection command accordingly; otherwise format mismatch will cause errors.
π¨ Key constraint:
Collection W/H/FPS=Camera publish width/height/fps.
lerobot-env
W=424; H=240; FPS=30
HF_HUB_OFFLINE=1 lerobot-record \
--robot.type=openarmx_follower_ros2 \
--robot.cameras="{cam_left: {type: ros2, image_topic: /cam_left/color/image, depth_topic: /cam_left/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_right: {type: ros2, image_topic: /cam_right/color/image, depth_topic: /cam_right/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_head: {type: ros2, image_topic: /cam_head/color/image, depth_topic: /cam_head/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}}" \
--teleop.type=openarmx_leader_ros2 \
--dataset.repo_id=local/your_dataset_name \
--dataset.single_task="task_name_you_perform" \
--dataset.num_episodes=total_number_of_episodes \
--dataset.episode_time_s=duration_per_episode_seconds \
--dataset.reset_time_s=interval_after_each_episode \
--dataset.push_to_hub=false \
--display_data=trueExample:
lerobot-env
W=424; H=240; FPS=30
HF_HUB_OFFLINE=1 lerobot-record \
--robot.type=openarmx_follower_ros2 \
--robot.cameras="{cam_left: {type: ros2, image_topic: /cam_left/color/image, depth_topic: /cam_left/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_right: {type: ros2, image_topic: /cam_right/color/image, depth_topic: /cam_right/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_head: {type: ros2, image_topic: /cam_head/color/image, depth_topic: /cam_head/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}}" \
--teleop.type=openarmx_leader_ros2 \
--dataset.repo_id=local/take_box \
--dataset.single_task="take box" \
--dataset.num_episodes=70 \
--dataset.episode_time_s=180 \
--dataset.reset_time_s=5 \
--dataset.push_to_hub=false \
--display_data=trueIf you want to automatically launch multiple terminal windows in a fixed order, like Section 3.1, you can use the one-click startup script in this repository:
scripts/vla_collect_gui.sh: GUI multi-terminal one-click startup scriptβ οΈ config/vla_collect.env: one-click startup config file, centrally storing robot/camera/data-collection parameters. Please prioritize parameter changes herescripts/README_GUI_EN.md: standalone guide for one-click startup
Recommended to enter the repository directory first:
cd /home/openarmx/openarmx_ws/src/openarmx_vlaRun a pre-check before startup:
bash scripts/vla_collect_gui.sh checkThe pre-check validates:
- Whether
WORKSPACE_DIR/install/setup.bashexists - Whether current session is a graphical desktop (
DISPLAY/WAYLAND_DISPLAY) - Whether terminal command specified by
GUI_TERMINALis available (defaultgnome-terminal) - Whether
ros2is available - In
collectmode, whetherLEROBOT_ENV_CMD(defaultlerobot-env) is found in interactive shell
Common startup modes:
# 1. Start only real robot + Pico Bridge + VR Teleop
bash scripts/vla_collect_gui.sh base
# 2. Start robot base stack + camera publishing
bash scripts/vla_collect_gui.sh base_camera
# 3. Start robot base stack + camera publishing + LeRobot data collection
# β οΈ Note: before each one-click collection run, update DATASET_REPO_ID in config/vla_collect.env
bash scripts/vla_collect_gui.sh collect
# Close all terminals started by this script
bash scripts/vla_collect_gui.sh stop
β οΈ Always checkDATASET_REPO_IDbeforecollectto avoid writing to the wrong dataset directory.
Mode mapping:
base: equivalent to manually running "real robot + Pico Bridge + VR teleop"base_camera: starts three-camera publishing on top ofbasecollect: starts LeRobot data collection on top ofbase_camerastop: precisely closes windows launched by this script via state file; no error even if some windows were closed manually
Script startup behavior:
- Pops up multiple terminal windows in sequence and starts each module with configured delays
- Each window remains open after command execution for on-site troubleshooting
- If windows from the previous run are still active, it prompts you to run
bash scripts/vla_collect_gui.sh stopfirst - In
collectmode, it runsLEROBOT_ENV_CMDfirst in recording terminal, then runslerobot-record - Default config path is
config/vla_collect.env; you can also temporarily switch withVLA_CONFIG_FILE=/your_path.env bash scripts/vla_collect_gui.sh collect
Usually, you only need to modify these items in config/vla_collect.env:
- Base path and GUI parameters: workspace path, terminal command, window state file path
- Robot base parameters:
CONTROL_MODE,ROBOT_CONTROLLER,USE_FAKE_HARDWARE - VR teleoperation parameters: Pico/Teleop control rate, grasp threshold, topic names, etc.
- Camera parameters:
W/H/FPS, three camera serial numbers, camera model, exposure/gain/white balance, etc. - Data collection parameters: dataset name, task description, episode count, episode duration, reset time, whether to display data, etc.
Special attention:
W/H/FPSis used by both camera publishing andlerobot-record; keep it consistent with actual camera outputCAM_LEFT_TYPE/CAM_RIGHT_TYPE/CAM_HEAD_TYPEmust be set to actual devicesD405orD435CAM_LEFT_SERIAL/CAM_RIGHT_SERIAL/CAM_HEAD_SERIALmust be replaced with your camera serial numbers- If using
collectmode, make sure dataset parameters likeDATASET_REPO_IDandDATASET_SINGLE_TASKare updated to your task
π¨ The 4 items above are frequent failure points; check them one by one before each collection run.
For a more complete one-click startup guide, refer to: scripts/README_GUI_EN.md.
ros2 topic list | grep cam
ros2 topic list | grep joint_states
ros2 topic list | grep forward_position_controller/commandsExpected at minimum:
- Camera topics:
/cam_left/color/image,/cam_right/color/image,/cam_head/color/image - Joint state:
/joint_states - Teleop outputs:
/left_forward_position_controller/commands,/right_forward_position_controller/commands
If the above conditions are met, data collection can proceed.
β(Right Arrow): End and save current episode, then enter reset stage.β(Left Arrow): Discard current episode and re-record.Esc: Stop recording and exit, then save dataset.
Common parameters:
HF_HUB_OFFLINE=1: Enable Hugging Face Hub offline mode.--robot.type=openarmx_follower_ros2: Specify target robot type being controlled.--teleop.type=openarmx_leader_ros2: Specify teleoperation device type.--dataset.repo_id=local/xxx: Dataset storage identifier (path under~/.cache/huggingface/lerobot/local/).--dataset.single_task: Task description.--dataset.num_episodes: Total number of episodes.--dataset.episode_time_s: Maximum duration per episode (seconds).--dataset.reset_time_s: Reset wait time between episodes (seconds).--dataset.push_to_hub: Whether to upload to Hugging Face Hub.--display_data: Whether to display real-time data.
Other parameters:
--dataset.root: Custom dataset save path.--dataset.fps: Limit recording frame rate.--dataset.video: Whether to encode images as video.--dataset.vcodec: Video codec (defaultlibsvtav1).--dataset.video_encoding_batch_size: Number of episodes per batch video encoding.--dataset.private: Set private when uploading to Hub.--dataset.tags: Hub dataset tags.--dataset.num_image_writer_processes: Number of image writer processes.--dataset.num_image_writer_threads_per_camera: Number of writer threads per camera.--dataset.rename_map: Rename observation keys.
ACT training is recommended on a user-provided high-performance PC or server (dedicated GPU recommended). Before training, complete LeRobot environment installation on this machine and download required model files. This document uses ACT as an example. For more environment setup and model training tutorials, see the official docs: http://docs.openarmx.com/.
Run in a LeRobot environment terminal:
mkdir -p ~/.cache/torch/hub/checkpoints
# Enter LeRobot environment and install dependencies
lerobot-env
wget https://mirrors.tuna.tsinghua.edu.cn/pytorch/models/resnet18-f37072fd.pth \
-O ~/.cache/torch/hub/checkpoints/resnet18-f37072fd.pthlerobot-env
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
lerobot-train \
--dataset.repo_id=local/your_dataset_name \
--dataset.root=absolute_path_to_your_dataset \
--policy.type=act \
--policy.push_to_hub=false \
--output_dir=outputs/your_trained_model_name \
--batch_size=batch_size_per_training_step \
--steps=total_training_steps \
--log_freq=log_every_n_steps \
--save_freq=save_every_n_stepslerobot-env
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
torchrun --nproc_per_node=number_of_your_gpus \
"$(which lerobot-train)" \
--dataset.repo_id=local/your_dataset_name \
--dataset.root=absolute_path_to_your_dataset \
--policy.type=act \
--policy.push_to_hub=false \
--output_dir=outputs/your_trained_model_name \
--batch_size=batch_size_per_training_step \
--steps=total_training_steps \
--log_freq=log_every_n_steps \
--save_freq=save_every_n_stepsAfter training, record the exported pretrained_model path in output_dir for inference in Section 5.
Using ACT as an example, this section explains how to load a trained model for online inference.
In the current workflow, two-machine communication between IPC and user machine is required. Complete ROS_DOMAIN_ID configuration in Section 6 before inference.
π¨ If two-machine communication is not configured before inference, linkage will almost certainly fail.
- Training is completed and model path is available (usually the
pretrained_modeldirectory). - IPC can start robot and cameras normally.
- Inference is usually run on another user machine, so two-machine communication is required; complete
ROS_DOMAIN_IDconfiguration in Section 6 first.
cd ~/openarmx_ws
source install/setup.bash
ros2 launch openarmx_bringup openarmx.bimanual.launch.py \
control_mode:=mit \
robot_controller:=forward_position_controller \
use_fake_hardware:=falseModify W/H/FPS and the three camera serial/type according to Section 7 "Camera Parameter Configuration Reference".
cd ~/openarmx_ws
source install/setup.bash
W=424; H=240; FPS=30
ros2 launch openarmx_lerobot camera_publisher.launch.py \
width:=$W height:=$H fps:=$FPS \
cam_left_serial:=218622270388 cam_left_type:=D405 \
cam_right_serial:=218622274446 cam_right_type:=D405 \
cam_head_serial:=335522070220 cam_head_type:=D435If you need fixed exposure for all three cameras before inference, you can append the same three-camera exposure settings used in Section 3:
- Left hand:
cam_left_color_auto_exposure:=false cam_left_color_exposure:=400 cam_left_color_gain:=32 - Right hand:
cam_right_color_auto_exposure:=false cam_right_color_exposure:=400 cam_right_color_gain:=32 - Head:
cam_head_color_auto_exposure:=false cam_head_color_exposure:=300 cam_head_color_gain:=16
W/H/FPSin the inference command must be exactly the same aswidth/height/fpsof the current camera publishing node.- During inference, camera format (resolution/frame rate) should match data collection format (recommended: same format as training data for this model).
π¨ Key constraint:
Inference W/H/FPS=Collection W/H/FPS=Camera publish width/height/fps.
lerobot-env
W=424; H=240; FPS=30
HF_HUB_OFFLINE=1 lerobot-record \
--robot.type=openarmx_follower_ros2 \
--robot.cameras="{cam_left: {type: ros2, image_topic: /cam_left/color/image, depth_topic: /cam_left/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_right: {type: ros2, image_topic: /cam_right/color/image, depth_topic: /cam_right/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_head: {type: ros2, image_topic: /cam_head/color/image, depth_topic: /cam_head/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}}" \
--robot.skip_send_action=false \
--dataset.repo_id="local/inference_result_model_name" \
--dataset.single_task="your_task_name" \
--dataset.num_episodes=number_of_inference_runs \
--dataset.push_to_hub=false \
--display_data=true \
--policy.path="path_to_your_trained_model"Example:
lerobot-env
W=424; H=240; FPS=30
HF_HUB_OFFLINE=1 lerobot-record \
--robot.type=openarmx_follower_ros2 \
--robot.cameras="{cam_left: {type: ros2, image_topic: /cam_left/color/image, depth_topic: /cam_left/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_right: {type: ros2, image_topic: /cam_right/color/image, depth_topic: /cam_right/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}, cam_head: {type: ros2, image_topic: /cam_head/color/image, depth_topic: /cam_head/depth/image, use_depth: true, width: $W, height: $H, fps: $FPS}}" \
--robot.skip_send_action=false \
--dataset.repo_id=local/eval_take_box \
--dataset.single_task="take the box" \
--dataset.num_episodes=10 \
--dataset.push_to_hub=false \
--display_data=true \
--policy.path="/home/i4090/openarmx_vla/src/VLA/OUTPUTS/045000/pretrained_model"HF_HUB_OFFLINE=1: Offline mode, do not fetch online resources.--robot.type=openarmx_follower_ros2: Target robot type for inference action publishing.--robot.skip_send_action=false:falsemeans send real actions;truemeans validate pipeline only without moving robot.--dataset.repo_id="local/inference_result_model_name": Identifier for inference result storage.--dataset.single_task="your_task_name": Inference task name (metadata).--dataset.num_episodes=number_of_inference_runs: Number of inference episodes.--dataset.push_to_hub=false: Do not upload to Hugging Face Hub.--display_data=true: Display inference process data.--policy.path="path_to_your_trained_model": Local model path.
When IPC and user machine need cross-machine communication, ROS_DOMAIN_ID must be identical on both machines (recommended: same value, e.g. 77).
π¨ If
ROS_DOMAIN_IDis inconsistent between the two machines, cross-machine topic discovery will fail.
Run on both machines separately:
echo $ROS_DOMAIN_IDRun on both machines (set DOMAIN_ID to a shared value first, here 77):
DOMAIN_ID=77
grep -q '^export ROS_DOMAIN_ID=' ~/.bashrc \
&& sed -i "s/^export ROS_DOMAIN_ID=.*/export ROS_DOMAIN_ID=${DOMAIN_ID}/" ~/.bashrc \
|| echo "export ROS_DOMAIN_ID=${DOMAIN_ID}" >> ~/.bashrc
source ~/.bashrcecho $ROS_DOMAIN_IDBoth machines should print the same value.
When using camera_publisher.launch.py, usually you only need to modify these parameters:
W/H/FPS: resolution and frame rate.cam_left_serial/cam_right_serial/cam_head_serial: serial numbers of the three cameras.cam_left_type/cam_right_type/cam_head_type: camera model (D405orD435) for each camera.- In the parameter names below,
*is not a literal character but a placeholder and must be replaced with a specific camera prefix:- Use
cam_leftfor left hand camera - Use
cam_rightfor right hand camera - Use
cam_headfor head camera
- Use
- For example,
cam_*_color_exposureshould actually becam_left_color_exposure,cam_right_color_exposure, orcam_head_color_exposure. cam_*_color_auto_exposure: color auto exposure, valuestrue/false/unset.cam_*_color_exposure: color manual exposure, range1..10000.cam_*_color_gain: color manual gain, range0..128.cam_*_color_auto_white_balance: color auto white balance, valuestrue/false/unset.cam_*_color_white_balance: color manual white balance, range2800..6500.cam_*_color_brightness: brightness, range-64..64.cam_*_color_contrast: contrast, range0..100.cam_*_color_saturation: saturation, range0..100.cam_*_color_sharpness: sharpness, range0..100.
The same W/H/FPS must also be set in lerobot-record commands (collection and inference), and must stay consistent:
W/H/FPSinlerobot-record=width/height/fpsincamera_publisher.launch.py.- Inference
W/H/FPS= Data collectionW/H/FPS(recommended to match the training data format used by the model).
π¨ Consistency here is the core stability constraint of the entire pipeline.
Serial number query command:
rs-enumerate-devices | grep "Serial Number"camera_publisher.launch.py has built-in validation; only the following valid combinations can be used:
| Resolution | Supported FPS |
|---|---|
| 1280 x 720 | 5, 15, 30 |
| 848 x 480 | 5, 15, 30, 60, 90 |
| 640 x 480 | 5, 15, 30, 60, 90 |
| 640 x 360 | 5, 15, 30, 60, 90 |
| 480 x 270 | 5, 15, 30, 60, 90 |
| 424 x 240 | 5, 15, 30, 60, 90 |
| Resolution | Supported FPS |
|---|---|
| 1920 x 1080 | 6, 15, 30 |
| 1280 x 720 | 6, 15, 30 |
| 848 x 480 | 6, 15, 30, 60, 90 |
| 640 x 480 | 6, 15, 30, 60, 90 |
| 640 x 360 | 6, 15, 30, 60, 90 |
| 480 x 270 | 6, 15, 30, 60, 90 |
| 424 x 240 | 6, 15, 30, 60, 90 |
- With standard IPC + standard docking station, stable upper limit for three cameras:
640x480 @ 30fps. - Default recommended setting:
424x240 @ 30fps(lower bandwidth usage, more stable). - If you need higher image quality, prioritize reducing frame rate or reducing the number of concurrent cameras.