Phuc Duong, Kunwoo Min, Jeffrey Wei
CPSC 5800: Introduction to Computer Vision
Yale University, Department of Computer Science
Vision-Language-Action (VLA) policies often require task- and domain-specific fine-tuning to achieve strong manipulation performance, yet many pipelines lack a simple mechanism for continual improvement from deployment-time interaction. We propose Filtered Behavior Cloning (FBC), a lightweight self-training recipe that executes a pretrained policy, filters its rollouts to retain only successful episodes, and fine-tunes on these self-generated demonstrations using parameter-efficient LoRA updates. Using the Pi0.5-LIBERO checkpoint and evaluating on LIBERO-90, FBC yields measurable gains in overall success rate and improves performance on a majority of non-trivial tasks under a constrained rollout budget. Our results suggest that success-filtered self-training is a practical and scalable primitive for refining large VLA policies, motivating future work that increases self refinement and adds safeguards to prevent over-specialization under repeated self-training.
Our dataset and fine-tuned model checkpoint are available here. The full report and results are available here.
Dependencies
To install the relevant requirements, first install uv, to manage Python dependencies. See the uv installation instructions. We used Python 3.10.12.
Once installed, first set up a virtual environment:
uv venv
source venv/bin/activateClone the submodules:
git submodule update --init --recursiveThen, install the requirements:
uv pip install -r requirements.txtInstall LIBERO Package
The editable install doesn't work properly with uv, so use a .pth file workaround similar to below:
# Create path file to add LIBERO to Python path
echo "/path/to/fbc/third_party/libero" > .venv/lib/python3.11/site-packages/libero_path.pthOr use the absolute path:
echo "/lambda/nfs/home-phd/fbc/third_party/libero" > .venv/lib/python3.11/site-packages/libero_path.pthTraining Environment
We used a Ubuntu 22.04 linux system to run our code and a NVIDIA GPU H100 to train our model.
The repository is forked over from the openpi repository. We made the following additions
- examples/libero/main.py - Added the ability to save the rollouts metadata. Previously, only the outcome video was saved. We now save the
LeRobotDatasetmetadata so the data can be reconstructed into a new dataset, which we later filter by success for fine-tuning. We also added the ability to output a log file that shows the results of each task, how many iterations were run, whether each iteration was successful, and the metadata for each episode. - examples/libero/postprocess.py - Added a postprocessing script to only include trials that were successful. Saved a new
LeRobotDatasetwith those successful trials only. - src/openpi/training/config.py - Added
pi05_libero_success_lorafor fine-tuning with LoRa utilizing our success onlyLeRobotDataset.
To run the baseline evaluation on the pre-trained pi05_libero use the follow the intructions below.
- Start the server to serve the pre-trained LIBERO pi05 policy.
python scripts/serve_policy.py --env LIBERO- Run the evaluation on a specific task suite (e.g,. libero_spatial, libero_10, libero_90). For our project, we used libero_90, and ran 10 trials per task.
python examples/libero/main.py \
--args.task-suite-name libero_10 \
--args.num-trials-per-task 2 \A full list of configurable arguments can be found in examples/libero/main.py.
We postprocessed our data by filtering out the original dataset for success only trials during our baseline run.
Running success-only filtering
python post_process.py \
--args.rollouts_log rollouts_log \
--args.input_dataset input_dataset \
--args.output_dataset output_datasetFor custom fine-tuning, ad a new training config in src/openpi/training/config.py. (See examples in the files).
For this project our configuration uses LoRa under the name pi05_libero_success_lora. Follow the instructions below on how to run fine-tune on a dataset after adding the configuration.
HF_LEROBOT_HOME=<path_to_dataset> \
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 \
uv run scripts/train.py pi05_libero_success_lora \
--exp-name=my_libero_finetune \
--overwriteNote:
- We need to set
HF_LEROBOT_HOMEto a Hugging Face path containing the dataset for fine-tuning, or directly to a local path that contains our dataset. For example, a local path could be/home/ubuntu/home-phd/soar/examples/libero/data/dataset. XLA_PYTHON_CLIENT_MEM_FRACTIONspecifies how much GPU memory the program is allowed to use.
Our fine-tuned model checkpoint, total rollouts, filtered successful rollouts, and metadata-json files can be found here.