Skip to content

Commit a91edcb

Browse files
yangyuqiangHanqingWangAI
authored andcommitted
[docs] update data and checkpoint paths
1 parent d1e393e commit a91edcb

File tree

4 files changed

+177
-66
lines changed

4 files changed

+177
-66
lines changed

source/en/user_guide/internnav/quick_start/installation.md

Lines changed: 130 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,9 @@
1414
# Installation Guide
1515

1616
😄 Don’t worry — both [Quick Installation](#quick-installation) and [Dataset Preparation](#dataset-preparation) are beginner-friendly.
17-
18-
19-
<!-- > 💡NOTE \
20-
> 🙋 **[First-time users:](#-lightweight-installation-recommended-for-beginners)** Skip GenManip for now — it requires installing NVIDIA [⚙️ Isaac Sim](#), which can be complex.
21-
Start with **CALVIN** or **SimplerEnv** for easy setup and full training/eval support.\
22-
> 🧠 **[Advanced users:](#-full-installation-advanced-users)** Feel free to use all benchmarks, including **GenManip** with Isaac Sim support. -->
23-
24-
<!-- > For 🙋**first-time** users, we recommend skipping the GenManip benchmark, as it requires installing NVIDIA [⚙️ Isaac Sim](#) for simulation (which can be complex).
25-
Instead, start with **CALVIN** or **SimplerEnv** — both are easy to set up and fully support training and evaluation. -->
26-
27-
<!-- This guide provides comprehensive instructions for installing and setting up the InternManip robot manipulation learning suite. Please read through the following prerequisites carefully before proceeding with the installation. -->
17+
```
18+
Detailed technical report will be released in about two weeks.
19+
```
2820

2921
## Prerequisites
3022

@@ -176,6 +168,10 @@ We provide a flexible installation tool for users who want to use InternNav for
176168

177169

178170
## Quick Installation
171+
Clone the InternNav repository:
172+
```bash
173+
git clone https://github.com/InternRobotics/InternNav.git --recursive
174+
```
179175

180176
Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:
181177

@@ -187,43 +183,39 @@ Choose the environment that best fits your specific needs to optimize your exper
187183
### Isaac Sim Environment
188184
#### Prerequisite
189185
- Ubuntu 20.04, 22.04
190-
- Conda
191186
- Python 3.10.16 (3.10.* should be ok)
192187
- NVIDIA Omniverse Isaac Sim 4.5.0
193188
- NVIDIA GPU (RTX 2070 or higher)
194189
- NVIDIA GPU Driver (recommended version 535.216.01+)
195190
- PyTorch 2.5.1, 2.6.0 (recommended)
196191
- CUDA 11.8, 12.4 (recommended)
197-
- Docker (Optional)
198-
- NVIDIA Container Toolkit (Optional)
199192

200193
Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed.
201194

202-
To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
195+
<!-- To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
203196
```bash
204197
docker pull registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
205198
docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
206-
```
199+
``` -->
207200
#### Conda installation
208201
```bash
209-
$ conda create -n <env> python=3.10 libxcb=1.14
202+
conda create -n <env> python=3.10 libxcb=1.14
210203

211204
# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
212-
$ conda activate <env>
213-
$ pip install internutopia
205+
conda activate <env>
206+
pip install internutopia
214207

215208
# Configure the conda environment.
216-
$ python -m internutopia.setup_conda_pypi
217-
$ conda deactivate && conda activate <env>
209+
python -m internutopia.setup_conda_pypi
210+
conda deactivate && conda activate <env>
218211
```
219212
For InternUtopia installation, you can find more detailed [docs](https://internrobotics.github.io/user_guide/internutopia/get_started/installation.html) in [InternUtopia](https://github.com/InternRobotics/InternUtopia?tab=readme-ov-file).
220213
```bash
221214
# Install PyTorch based on your CUDA version
222-
$ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
215+
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
223216

224217
# Install other deps
225-
$ pip install -r isaac_requirements.txt
226-
218+
pip install -r requirements/isaac_requirements.txt
227219
```
228220

229221

@@ -237,46 +229,81 @@ If you need to train or evaluate models on [Habitat](#optional-habitat-environme
237229
- CUDA 12.4
238230
- GPU: NVIDIA A100 or higher (optional for VLA training)
239231

232+
```bash
233+
conda create -n <env> python=3.9
234+
conda activate <env>
235+
```
236+
Install habitat sim and habitat lab:
240237
```bash
241238
conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
242239
git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
243240
cd habitat-lab
244241
pip install -e habitat-lab # install habitat_lab
245242
pip install -e habitat-baselines # install habitat_baselines
246-
pip install -r habitat_requirements.txt
243+
```
244+
Install pytorch and other requirements:
245+
```bash
246+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url ​https://download.pytorch.org/whl/cu124
247+
pip install -r requirements/habitat_requirements.txt
247248
```
248249

249250

250251
## Verification
251252

252-
Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory. Download the VLN-CE dataset from [huggingface](). The final folder structure should look like this:
253+
### Data/Checkpoints Preparation
254+
To get started, we need to prepare the data and checkpoints.
255+
1. **InternVLA-N1 pretrained Checkpoints**
256+
Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
257+
2. **DepthAnything v2 Checkpoints**
258+
Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
259+
3. **Matterport3D Scenes**
260+
Download the MP3D scenes from [official project pages](https://niessner.github.io/Matterport/) and place them under `data/scene_datasets/mp3d`.
261+
4. **VLN-CE Episodes**
262+
- [r2r](https://drive.google.com/file/d/18DCrNcpxESnps1IbXVjXSbGLDzcSOqzD/view) (rename R2R_VLNCE_v1/ -> r2r/)
263+
- [rxr](https://drive.google.com/file/d/145xzLjxBaNTbVgBfQ8e9EsBAV8W-SM0t/view) (rename RxR_VLNCE_v0/ -> rxr/)
264+
- [envdrop](https://drive.google.com/file/d/1fo8F4NKgZDH-bPSdVU3cONAkt5EW-tyr/view) (rename R2R_VLNCE_v1-3_preprocessed/envdrop/ -> envdrop/)
265+
The final folder structure should look like this:
253266

254267
```bash
255268
InternNav/
256-
|-- data/
257-
| |-- datasets
258-
|-- vln
259-
|-- vln_datasets
260-
|-- scene_datasets
261-
|-- hm3d
262-
|-- mp3d
263-
264-
|-- src/
265-
| |-- ...
266-
267-
|-- checkpoints/
268-
| |-- InternVLA-N1/
269-
| | |-- model-00001-of-00004.safetensors
270-
| | |-- config.json
271-
| | |-- ...
272-
| |-- InternVLA-N1-S2
273-
| | |-- model-00001-of-00004.safetensors
274-
| | |-- config.json
275-
| | |-- ...
269+
├── data/
270+
│ ├── datasets/
271+
│ │ ├── r2r/
272+
│ │ │ ├── train/
273+
│ │ │ ├── val_seen/
274+
│ │ │ ├── val_unseen/
275+
│ │ ├── rxr/
276+
│ │ │ ├── train/
277+
│ │ │ ├── val_seen/
278+
│ │ │ ├── val_unseen/
279+
│ │ ├── envdrop/
280+
│ │ │ ├── train/
281+
│ │ │ ├── val_seen/
282+
│ │ │ ├── val_unseen/
283+
│ ├── scene_datasets/
284+
│ │ ├── mp3d
285+
│ │ │ ├──17DRP5sb8fy/
286+
│ │ │ ├── 1LXtFkjw3qL/
287+
│ │ │ └── ...
288+
├── src/
289+
│ ├── ...
290+
291+
├── checkpoints/
292+
│ ├── InternVLA-N1/
293+
│ │ ├── model-00001-of-00004.safetensors
294+
│ │ ├── config.json
295+
│ │ ├── ...
296+
│ ├── InternVLA-N1-S2
297+
│ │ ├── model-00001-of-00004.safetensors
298+
│ │ ├── config.json
299+
│ │ ├── ...
300+
│ │ depth_anything_v2_vits.pth
276301
```
302+
### Gradio demo
277303

278-
Replace the 'model_path' variable in 'vln_ray_backend.py' with the path of InternVLA-N1 checkpoint.
304+
Currently the gradio demo is only available in **habitat** environment. Replace the 'model_path' variable in 'vln_ray_backend.py' with the path of InternVLA-N1 checkpoint.
279305
```bash
306+
conda activate <habitat-env>
280307
srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vln_ray_backend.py
281308
```
282309
Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the server's IP address. Start the gradio.
@@ -294,8 +321,11 @@ Click the 'Start Navigation Simulation' button to send a VLN request to the back
294321

295322

296323

297-
## Dataset Preparation
298-
We also prepare high-quality data for trainning system1/system2. To set up the trainning dataset, please follow the steps below:
324+
## InternData-N1 Dataset Preparation
325+
```
326+
Due to network throttling restrictions on HuggingFace, InternData-N1 has not been fully uploaded yet. Please wait patiently for several days.
327+
```
328+
We also prepare high-quality data for **training** system1/system2 and **evaluation** on isaac sim environment. To set up the dataset, please follow the steps below:
299329

300330
1. Download Datasets
301331
- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for:
@@ -327,13 +357,60 @@ data/
327357
│ │ └── val_unseen.json.gz
328358
├── └── traj_data/
329359
│ └── mp3d/
330-
│ └── trajectory_0/
331-
│ ├── data/
332-
│ ├── meta/
333-
│ └── videos/
360+
│ └── 17DRP5sb8fy/
361+
│ └── 1LXtFkjw3qL/
362+
│ └── ...
334363
├── vln_ce/
335364
│ ├── raw_data/
336365
│ └── traj_data/
337366
└── vln_n1/
338367
└── traj_data/
339368
```
369+
370+
If you want to evaluate on habitat environment and finish the data preparation mentioned [above](#DataCheckpoints-Preparation), the final data structure should look like this:
371+
```bash
372+
data/
373+
├── scene_data/
374+
│ ├── mp3d_pe/
375+
│ │ ├──17DRP5sb8fy/
376+
│ │ ├── 1LXtFkjw3qL/
377+
│ │ └── ...
378+
│ ├── mp3d_ce/
379+
│ └── mp3d_n1/
380+
├── vln_pe/
381+
│ ├── raw_data/
382+
│ │ ├── train/
383+
│ │ ├── val_seen/
384+
│ │ │ └── val_seen.json.gz
385+
│ │ └── val_unseen/
386+
│ │ └── val_unseen.json.gz
387+
├── └── traj_data/
388+
│ └── mp3d/
389+
│ └── 17DRP5sb8fy/
390+
│ └── 1LXtFkjw3qL/
391+
│ └── ...
392+
393+
├── vln_ce/
394+
│ ├── raw_data/
395+
│ └── traj_data/
396+
└── vln_n1/
397+
│ └── traj_data/
398+
├── datasets/
399+
│ ├── r2r/
400+
│ ├── ├── train/
401+
│ ├── ├── val_seen/
402+
│ ├── ├── val_unseen/
403+
│ ├── rxr/
404+
│ ├── ├── train/
405+
│ ├── ├── val_seen/
406+
│ ├── ├── val_unseen/
407+
│ ├── envdrop/
408+
│ ├── ├── train/
409+
│ ├── ├── val_seen/
410+
│ ├── ├── val_unseen/
411+
├── scene_datasets
412+
│ ├── mp3d
413+
│ │ ├──17DRP5sb8fy/
414+
│ │ ├── 1LXtFkjw3qL/
415+
│ │ └── ...
416+
```

source/en/user_guide/internnav/quick_start/train_eval.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This document presents how to train and evaluate models for different systems wi
66
## Whole-system
77

88
### Evaluation
9-
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments). Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
9+
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
1010

1111
#### Evaluation on isaac sim
1212
The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.
@@ -150,7 +150,7 @@ data/
150150

151151
### Training
152152

153-
Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the trainning of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
153+
Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
154154

155155
```base
156156
# train cma model

source/en/user_guide/internnav/tutorials/model.md

Lines changed: 44 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
# Model
22

3-
This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the InterNav framework.
3+
This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the internNav framework.
44

55
---
66

77
## System 1: Navdp
88

99
<!-- navdp content start -->
1010

11-
This tutorial introduces the structure and implementation of the navdp policy model in the InterNav framework, helping you understand and customize each module.
11+
This tutorial introduces the structure and implementation of the navdp policy model in the internNav framework, helping you understand and customize each module.
1212

1313
---
1414

1515
### Model Structure Overview
1616

17-
The navdp policy model in InterNav mainly consists of the following parts:
17+
The navdp policy model in internNav mainly consists of the following parts:
1818

1919
- **RGBD Encoder (NavDP_RGBD_Backbone)**: Extracts multi-frame RGB+Depth features.
2020
- **Goal Point/Image Encoder**: Encodes goal point or goal image information.
@@ -98,8 +98,6 @@ def forward(self, goal_point, goal_image, input_images, input_depths, output_act
9898
return action_pred, value_pred
9999
```
100100

101-
---
102-
103101
### Key Code Snippets
104102

105103
#### Load Model
@@ -116,14 +114,50 @@ To customize the backbone, decoder, or heads, refer to `navdp_policy.py` and `na
116114
---
117115

118116
### Reference
119-
- [navdp_policy.py](../../internnav/model/basemodel/navdp/navdp_policy.py)
120-
- [navdp_backbone.py](../../internnav/model/encoder/navdp_backbone.py)
121-
- [navdp.py config](../../scripts/train/configs/navdp.py)
117+
- [diffusion policy](https://github.com/real-stanford/diffusion_policy)
122118

123119
<!-- navdp content end -->
124120

125121
---
126122

127-
## System 2: InternVLA-N1-S2
123+
## Dual System: InternVLA-N1
124+
This tutorial provides a detailed guide for training the InternVLA-N1 policy model within the internNav framework.
125+
126+
1. Qwen2.5-VL Backbone
127+
The system 2 model is built on Qwen2.5-VL, a state-of-the-art vision-language model:
128+
129+
```python
130+
class InternVLAN1ForCausalLM(Qwen2_5_VLForConditionalGeneration, InternVLAN1MetaForCausalLM):
131+
config_class = InternVLAN1ModelConfig
128132

129-
*TODO
133+
def __init__(self, config):
134+
Qwen2_5_VLForConditionalGeneration.__init__(self, config)
135+
config.model_type == "internvla_n1"
136+
137+
self.model = InternVLAN1Model(config)
138+
self.rope_deltas = None
139+
self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
140+
self.post_init()
141+
```
142+
Qwen2.5-VL supports multi-turn conversations, image understanding, and text generation. We finetune the qwenVL model on the self-collected navigation dataset.
143+
144+
2. Latent Queries
145+
Our model learns a set of latent queries to query the latent vector of Qwen2.5-VL, which is used to model trajectory context.
146+
```python
147+
self.latent_queries = nn.Parameter(torch.randn(1, config.n_query, config.hidden_size))
148+
```
149+
150+
3. NavDP Integration
151+
Embeds the System 1 (NavDP) policy for low-level trajectory generation:
152+
153+
```python
154+
def build_navdp(navdp_cfg):
155+
navdp = NavDP_Policy_DPT_CriticSum_DAT(navdp_pretrained=navdp_cfg.navdp_pretrained)
156+
navdp.load_model()
157+
return navdp
158+
```
159+
NavDP converts high-level waypoints from the language model to continuous action sequences.
160+
161+
162+
### Reference
163+
[Qwen2.5-VL Documentation](https://lmdeploy.readthedocs.io/en/latest/multi_modal/qwen2_5_vl.html)

source/en/user_guide/internnav/tutorials/training.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,4 +116,4 @@ For customizing the model structure or dataset format, see [model.md](./model.md
116116

117117
## System 2: InternVLA-N1-S2
118118

119-
*TODO
119+
Currently we don't support the training of InternVLA-N1-S2 in this repository.

0 commit comments

Comments
 (0)