Skip to content

BeingBeyond/Rethink_VLA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Repository for the paper "Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization"

1. Pre-trained Models

You can download the pre-trained checkpoints from Hugging Face:

2. Pre-training

To run the pre-training stage, use the following scripts:

# Stage 2 Pre-training
bash shell/pretrain-M1-240k-stage2.sh

# Pre-training
bash shell/pretrain-M1-240k.sh

3. Post-training

LIBERO Benchmark

First, download the required LIBERO datasets:

Preprocess the data: python src/data_postprocessor/libero.py

Run the post-training script:

bash shell/relative-post-libero-full-eef_relative-5shot.sh

RoboCasa Benchmark

First, download the RoboCasa dataset:

Preprocess the data: python src/data_postprocessor/robocasa_human.py

Run the post-training script:

bash shell/relative-post-robocasa-full-eef_relative.sh

4. Evaluation

LIBERO Evaluation

Ensure the LIBERO environment is installed.

Run the evaluation script:

bash shell/eval-libero-relative.sh

RoboCasa Evaluation

Ensure the RoboCasa environment is installed.

Run the evaluation script:

bash shell/eval-robocasa-relative.sh

Acknowledgments

We thank the authors of the following projects for their contributions to the robotics and machine learning communities:

  • BeingH0.5: VLA framework
  • InternVL: Vision-Language model backbone
  • Bagel: Training framework
  • Qwen: Language model
  • LIBERO: Benchmark for lifelong robot learning
  • RoboCasa: Large-scale simulation benchmark for everyday tasks

About

Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors