Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

Songze Li, Zun Wang, Gengze Zhou, Jialu Li, Xiangyu Zeng, Limin Wang, Yu Qiao, Qi Wu, Mohit Bansal, Yi Wang

🏠 About

Goal-oriented language-guided navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize shortest-path trajectories, lacking effective exploration priors for prioritizing the success rate. To address the above challenges, we present SID, a goal-oriented language-guided navigation learning approach with Self-Improving Demonstrations. Specifically, SID learns an initial agent on the shortest-path data sampled from environments and then leverages this agent to generate novel exploration trajectories. The novel rollouts provide demonstrations with stronger exploration signals to train a better agent, which in turn produces higher-quality agent demonstrations for the next round of training. We show that this iterative self-improving pipeline readily scales to new environments, and the resulting demonstrations can be transferred across a variety of language-guided navigation tasks, elevating the performance ceiling in diverse goal-oriented navigation. Extensive experiments demonstrate that SID significantly boosts the exploration capabilities and generalization of navigation agents. The resulting agent achieves new state-of-the-art performance on goal-oriented language-guided navigation tasks, including REVERIE, SOON, notably achieving a 50.9% success rate on the unseen validation splits of SOON, surpassing the prior leading approaches by a margin of 13.9%.

📢 Update

[2025-09-30] We realease the paper for SID-VLN.

[2025-09-22] We realease the code and data for SID-VLN.

🛠 Getting Started

We test under the following environment:

Python 3.8.10
Pytorch 2.0.0
CUDA Version 11.7

Install Matterport3D simulators: follow detailed instructions here. We use the latest version instead of v0.1. Here is simplified instructions:

git clone git@github.com:peteanderson80/Matterport3DSimulator.git
git submodule update --init --recursive 
sudo apt-get install libjsoncpp-dev libepoxy-dev libglm-dev libosmesa6 libosmesa6-dev libglew-dev libopencv-dev
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make -j8

After successful installation, run:

cp your_path/Matterport3DSimulator/build/MatterSim.cpython-38-x86_64-linux-gnu.so your_conda_path/envs/sidvln/lib/python3.8/MatterSim.cpython-38-x86_64-linux-gnu.so
export PYTHONPATH=your_path/SIDVLN/mapnav:$PYTHONPATH
export PYTHONPATH=your_path/Matterport3DSimulator/build:$PYTHONPATH

Install requirements:

conda create --name sidvln python=3.8.10
conda activate sidvln
cd SID-VLN
pip install -r requirements.txt

🏆 Model and Data

We release our final pretrained model and available data here. Details:

Connectivity:

Connectivity of the navigation graphs.

Data:

scan_round0_860scan.jsonl – Image goal navigatoin trajectories in 800 HM3D environements.
sid_lang_goal.jsonl – Final detailed caption goal navigatoin trajectories for pretraining and REVERIE augmentation.
img_goal_val*.json – Image goal navigation validation seen and unseen splits.
cap_goal_val*.json – Caption goal navigation validation seen and unseen splits.
scanvp_candview_relangles_with_hm3d_gibson.json – Candidates related to scan and vp in HM3D environments.

Features:

siglip_base.hdf5 – SigLIP features on MP3D and HM3D environments.
dinov2_base.hdf5 – DINOv2 features on MP3D and HM3D environments.
obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5 – Object features for REVERIE.

HM3D_cap:

Generated detailed style captions for target images in HM3D and MP3D environments.

Model:

model_step_124000.pt – The final pretrained model for downstream VLN finetuning.
img_goal_best_val_unseen – The image goal navigation agent which can be utilized to generate trajectories with high quality demonstrations on exploration strategies.
model_LXRT.pth – The pretrained LXMERT model for initialization DUET.

The data folder should follow this structure:

```shell
datasets/
├── ckpts/
    ├── model_LXRT.pth
    ├── img_goal_best_val_unseen
    ├── model_step_124000.pt   
|── REVERIE
│   ├── annotations/
│   	├── scan_round0_860scan.jsonl       
│     	├── sid_lang_goal.jsonl
│     	├── img_goal_val*.json
│     	├── cap_goal_val*.json
│     	└── scanvp_candview_relangles_with_hm3d_gibson.json  
│   ├── connectivity/
        ├── scanname_connectivity.json
        └── scans.txt
│   ├── features/
│   	├── siglip_base.hdf5        
│     	├── dinov2_base.hdf5
│     	└── obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5     
├── SOON/

🚀 Training

Multi-Round SID Pre-training

We use 8 NVIDIA A800 GPUs for pre-training agents on image goal navigation.
```
cd pretrain
bash run_img_goal.sh
```
SID Fine-tunning & Trajectories Generating

We use 8 NVIDIA A800 GPUs for fine-tuning agents and generating trajectories for next-round training.
```
cd mapnav
bash scripts/run_img_goal.sh
```
Langugae Goal Pre-training

We use 8 NVIDIA A800 GPUs for pre-training language goal navigation agents.
```
bash run_lang_goal.sh
```
Downstream VLN tasks Fine-tuning

We use one NVIDIA A800 GPU for finetuning our agent on downstream VLN tasks. Concrete config is presented in the scripts.
```
bash run_lang_goal.sh
```

🙋‍♂️ Questions or Issues

Please feel free to open an issue if you encounter any problems or have questions about SID-VLN.

🔗 Citation

If you find our work useful in your research, please consider starring 🌟 this repo and cite the following paper:

@article{li2025learning,
  title={Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale},
  author={Li, Songze and Wang, Zun and Zhou, Gengze and Li, Jialu and Zeng, Xiangyu and Wang, Limin and Qiao, Yu and Wu, Qi and Bansal, Mohit and Wang, Yi},
  journal={arXiv preprint arXiv:2509.24910},
  year={2025}
}

👏 Acknowledgements

We thank the developers of DUET, SRDF, InternVL for their public code release.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
mapnav		mapnav
pretrain		pretrain
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SID.png		SID.png
requirements.txt		requirements.txt
sample_shortest_paths.py		sample_shortest_paths.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

Songze Li, Zun Wang, Gengze Zhou, Jialu Li, Xiangyu Zeng, Limin Wang, Yu Qiao, Qi Wu, Mohit Bansal, Yi Wang

🏠 About

📢 Update

🛠 Getting Started

🏆 Model and Data

🚀 Training

🙋‍♂️ Questions or Issues

🔗 Citation

👏 Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

OpenGVLab/SID-VLN

Folders and files

Latest commit

History

Repository files navigation

Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

Songze Li, Zun Wang, Gengze Zhou, Jialu Li, Xiangyu Zeng, Limin Wang, Yu Qiao, Qi Wu, Mohit Bansal, Yi Wang

🏠 About

📢 Update

🛠 Getting Started

🏆 Model and Data

🚀 Training

🙋‍♂️ Questions or Issues

🔗 Citation

👏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages