Kushal Kedia*, Prithwish Dan*, Angela Chao, Maximus A. Pace, Sanjiban Choudhury (*Equal Contribution) Cornell University
All datasets will be loaded in the code using HuggingFace API.
Datasets can be found at: https://huggingface.co/datasets/prithwishdan/RHyME
Follow these steps to install RHyME:
- Create and activate the conda environment:
cd rhyme conda env create -f environment.yml conda activate rhyme pip install -e .
- Before running any scripts, make sure to set "base_dev_dir" to your working directory for the codebase. You may directly write this value into the config files under ./config/simulation, or alternatively override the argument in the command line when running scripts.
RHyME consists of three steps:
Run the following script to pretrain the visual encoder. By default, we use the Sphere-Easy dataset.
python scripts/train_vision_encoder.pyFor different demonstrator types, use these configs:
python scripts/train_vision_encoder.py --config-name=easy_pretrain_hf
python scripts/train_vision_encoder.py --config-name=medium_pretrain_hf
python scripts/train_vision_encoder.py --config-name=hard_pretrain_hfCompute sequence-level distance metrics and generate pairings from unpaired datasets using pre-trained visual encoder:
bash scripts/automatic_pairing.sh --pretrain_model_name <name> --checkpoint <num> --cross_embodiment <type> Required Parameters:
--pretrain_model_name: Folder name of vision encoder in ./experiment/pretrain--checkpoint: Checkpoint number--cross_embodiment: Dataset type (sphere-easy, sphere-medium, sphere-hard)
Train conditional diffusion policy to translate imagined demonstrator videos into robot actions:
python scripts/train_diffusion_policy.py \
--pretrain_model_name <name> \
--pretrain_ckpt <num> \
--eval_cfg.demo_type <type> \Required Parameters:
pretrain_model_name: Folder name of vision encoder in ./experiment/pretrainpretrain_ckpt: Checkpoint numbereval_cfg.demo_type: Dataset type (sphere-easy, sphere-medium, sphere-hard)
Evaluate policy on unseen demonstrator videos:
python scripts/eval_checkpoint.pyKey parameters:
pretrain_model_name: Folder name of vision encoder in ./experiment/pretrainpretrain_ckpt: Checkpoint numbereval_cfg.demo_type: Specifies which demonstrator to evaluate onpolicy_name: Folder name of diffusion policy in ./experiment/diffusion_bc/kitchen
@article{
kedia2024one,
title={One-shot imitation under mismatched execution},
author={Kedia, Kushal and Dan, Prithwish and Chao, Angela and Pace, Maximus Adrian and Choudhury, Sanjiban},
journal={arXiv preprint arXiv:2409.06615},
year={2024}
}- Much of the training pipeline is adapted from XSkill.
- Diffusion Policy is adapted from Diffusion Policy
- Many useful utilities are adapted from XIRL.

