Skip to content

Commit 11166cb

Browse files
HanqingWangAIyangyuqiangwangyukaiElaine6107刘雨
committed
[docs] update internnav
Co-authored-by: yangyuqiang <yangyuqiang@pjlab.org.cn> Co-authored-by: hanqing <409082492@qq.com> Co-authored-by: wangyukai <wangyukai@pjlab.org.cn> Co-authored-by: longyilin <yilinlong137@163.com> Co-authored-by: zengyiming <zengyiming> Co-authored-by: 刘雨 <liuyu2@pjlab.org.cn>
1 parent 5024d50 commit 11166cb

File tree

7 files changed

+66
-96
lines changed

7 files changed

+66
-96
lines changed
199 KB
Loading
341 KB
Loading

source/en/user_guide/internnav/quick_start/installation.md

Lines changed: 35 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -256,35 +256,32 @@ To get started, we need to prepare the data and checkpoints.
256256
Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
257257
2. **DepthAnything v2 Checkpoints**
258258
Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
259-
3. **Matterport3D Scenes**
260-
Download the MP3D scenes from [official project pages](https://niessner.github.io/Matterport/) and place them under `data/scene_datasets/mp3d`.
261-
4. **VLN-CE Episodes**
262-
- [r2r](https://drive.google.com/file/d/18DCrNcpxESnps1IbXVjXSbGLDzcSOqzD/view) (rename R2R_VLNCE_v1/ -> r2r/)
263-
- [rxr](https://drive.google.com/file/d/145xzLjxBaNTbVgBfQ8e9EsBAV8W-SM0t/view) (rename RxR_VLNCE_v0/ -> rxr/)
264-
- [envdrop](https://drive.google.com/file/d/1fo8F4NKgZDH-bPSdVU3cONAkt5EW-tyr/view) (rename R2R_VLNCE_v1-3_preprocessed/envdrop/ -> envdrop/)
259+
3. **InternData-N1 VLN-CE Episodes**
260+
Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for `vln-ce`. Extract them into the `data/vln_ce/` directory.
261+
4. **Scene-N1**
262+
Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory.
263+
265264
The final folder structure should look like this:
266265

267266
```bash
268267
InternNav/
269268
├── data/
270-
│ ├── datasets/
271-
│ │ ├── r2r/
272-
│ │ │ ├── train/
273-
│ │ │ ├── val_seen/
274-
│ │ │ ├── val_unseen/
275-
│ │ ├── rxr/
276-
│ │ │ ├── train/
277-
│ │ │ ├── val_seen/
278-
│ │ │ ├── val_unseen/
279-
│ │ ├── envdrop/
280-
│ │ │ ├── train/
281-
│ │ │ ├── val_seen/
282-
│ │ │ ├── val_unseen/
283-
│ ├── scene_datasets/
284-
│ │ ├── mp3d
285-
│ │ │ ├──17DRP5sb8fy/
286-
│ │ │ ├── 1LXtFkjw3qL/
287-
│ │ │ └── ...
269+
│ ├── vln_ce/
270+
│ │ ├── raw_data/
271+
│ │ │ ├── r2r
272+
│ │ │ │ ├── train
273+
│ │ │ │ ├── val_seen
274+
│ │ │ │ │ └── val_seen.json.gz
275+
│ │ │ │ └── val_unseen
276+
│ │ │ │ └── val_unseen.json.gz
277+
│ │ └── traj_data/
278+
│ ├── scene_data/
279+
│ │ ├── mp3d_ce/
280+
│ │ │ ├── mp3d/
281+
│ │ │ │ ├── 17DRP5sb8fy/
282+
│ │ │ │ ├── 1LXtFkjw3qL/
283+
│ │ │ │ └── ...
284+
288285
├── src/
289286
│ ├── ...
290287

@@ -308,12 +305,12 @@ srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vl
308305
```
309306
Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the server's IP address. Start the gradio.
310307
```bash
311-
python navigation_ui.py
308+
python scripts/eval/navigation_ui.py
312309
```
313-
Note that it's better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below.
310+
Note that it's better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Download the gradio scene assets from [huggingface](https://huggingface.co/datasets/InternRobotics/Scene-N1) and extract them into the `scene_assets` directory of the client. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below.
314311
![img.png](../../../_static/image/gradio_interface.jpg)
315312

316-
Click the 'Start Navigation Simulation' button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 3 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this.
313+
Click the 'Start Navigation Simulation' button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 1 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this.
317314
![img.png](../../../_static/image/gradio_result.jpg)
318315

319316

@@ -343,10 +340,14 @@ After downloading, organize the datasets into the following structure:
343340
data/
344341
├── scene_data/
345342
│ ├── mp3d_pe/
346-
│ │ ├──17DRP5sb8fy/
343+
│ │ ├── 17DRP5sb8fy/
347344
│ │ ├── 1LXtFkjw3qL/
348345
│ │ └── ...
349346
│ ├── mp3d_ce/
347+
│ │ ├── mp3d/
348+
│ │ │ ├── 17DRP5sb8fy/
349+
│ │ │ ├── 1LXtFkjw3qL/
350+
│ │ │ └── ...
350351
│ └── mp3d_n1/
351352
├── vln_pe/
352353
│ ├── raw_data/
@@ -362,55 +363,14 @@ data/
362363
│ └── ...
363364
├── vln_ce/
364365
│ ├── raw_data/
366+
│ │ ├── r2r
367+
│ │ │ ├── train
368+
│ │ │ ├── val_seen
369+
│ │ │ │ └── val_seen.json.gz
370+
│ │ │ └── val_unseen
371+
│ │ │ └── val_unseen.json.gz
365372
│ └── traj_data/
366373
└── vln_n1/
367374
└── traj_data/
368375
```
369376

370-
If you want to evaluate on habitat environment and finish the data preparation mentioned [above](#DataCheckpoints-Preparation), the final data structure should look like this:
371-
```bash
372-
data/
373-
├── scene_data/
374-
│ ├── mp3d_pe/
375-
│ │ ├──17DRP5sb8fy/
376-
│ │ ├── 1LXtFkjw3qL/
377-
│ │ └── ...
378-
│ ├── mp3d_ce/
379-
│ └── mp3d_n1/
380-
├── vln_pe/
381-
│ ├── raw_data/
382-
│ │ ├── train/
383-
│ │ ├── val_seen/
384-
│ │ │ └── val_seen.json.gz
385-
│ │ └── val_unseen/
386-
│ │ └── val_unseen.json.gz
387-
├── └── traj_data/
388-
│ └── mp3d/
389-
│ └── 17DRP5sb8fy/
390-
│ └── 1LXtFkjw3qL/
391-
│ └── ...
392-
393-
├── vln_ce/
394-
│ ├── raw_data/
395-
│ └── traj_data/
396-
└── vln_n1/
397-
│ └── traj_data/
398-
├── datasets/
399-
│ ├── r2r/
400-
│ ├── ├── train/
401-
│ ├── ├── val_seen/
402-
│ ├── ├── val_unseen/
403-
│ ├── rxr/
404-
│ ├── ├── train/
405-
│ ├── ├── val_seen/
406-
│ ├── ├── val_unseen/
407-
│ ├── envdrop/
408-
│ ├── ├── train/
409-
│ ├── ├── val_seen/
410-
│ ├── ├── val_unseen/
411-
├── scene_datasets
412-
│ ├── mp3d
413-
│ │ ├──17DRP5sb8fy/
414-
│ │ ├── 1LXtFkjw3qL/
415-
│ │ └── ...
416-
```

source/en/user_guide/internnav/quick_start/train_eval.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Finally, start the client:
2828
INTERNUTOPIA_ASSETS_PATH=/path/to/InternUTopiaAssets MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_cfg.py
2929
```
3030

31-
The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 3 hours at RTX4090 platform.
31+
The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
3232

3333

3434
#### Evaluation on habitat
@@ -49,7 +49,7 @@ For multi-gpu inference, currently we only support inference on SLURM.
4949

5050
### Training
5151

52-
Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and extract them into the `data/datasets/` directory.
52+
Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and organize them in the form mentioned in [installation](./installation.md).
5353

5454
```bash
5555
./scripts/train/start_train.sh --name "$NAME" --model-name navdp
@@ -123,10 +123,14 @@ The final folder structure should look like this:
123123
data/
124124
├── scene_data/
125125
│ ├── mp3d_pe/
126-
│ │ ├──17DRP5sb8fy/
126+
│ │ ├── 17DRP5sb8fy/
127127
│ │ ├── 1LXtFkjw3qL/
128128
│ │ └── ...
129129
│ ├── mp3d_ce/
130+
│ │ ├── mp3d/
131+
│ │ │ ├── 17DRP5sb8fy/
132+
│ │ │ ├── 1LXtFkjw3qL/
133+
│ │ │ └── ...
130134
│ └── mp3d_n1/
131135
├── vln_pe/
132136
│ ├── raw_data/
@@ -143,6 +147,12 @@ data/
143147
│ └── videos/
144148
├── vln_ce/
145149
│ ├── raw_data/
150+
│ │ ├── r2r
151+
│ │ │ ├── train
152+
│ │ │ ├── val_seen
153+
│ │ │ │ └── val_seen.json.gz
154+
│ │ │ └── val_unseen
155+
│ │ │ └── val_unseen.json.gz
146156
│ └── traj_data/
147157
└── vln_n1/
148158
└── traj_data/

source/en/user_guide/internnav/tutorials/dataset.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ You’ll learn:
55

66
- 📁 [How to structure the dataset](#dataset-format)
77
- 🔁 [How to convert popular datasets like VLN-CE](#convert-to-lerobotdataset)
8-
- 🎮 [How to collect your own demonstrations in GRUtopia](#collect-demonstration-dataset-in-grutopia)
8+
- 🎮 [How to collect your own demonstrations in InternUtopia](#collect-demonstration-dataset-in-internutopia)
99

1010

1111
These steps ensure compatibility with our training and evaluation framework across all supported benchmarks.
@@ -428,6 +428,6 @@ InternNav adopts the [LeRobot](https://github.com/huggingface/lerobot) format fo
428428

429429

430430

431-
## Collect Demonstration Dataset in GRUtopia
431+
## Collect Demonstration Dataset in InternUtopia
432432

433-
Support for collecting demos via GRUtopia simulation is coming soon — stay tuned!
433+
Support for collecting demos via InternUtopia simulation is coming soon — stay tuned!

source/en/user_guide/internnav/tutorials/model.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
# Model
22

3-
This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the internNav framework.
3+
This tutorial introduces the structure and implementation of both System 1 (NavDP) and whole-system (InternVLA-N1) policy models in the internNav framework.
44

55
---
66

7-
## System 1: Navdp
7+
## System 1: NavDP
88

9-
<!-- navdp content start -->
9+
<!-- NavDP content start -->
1010

11-
This tutorial introduces the structure and implementation of the navdp policy model in the internNav framework, helping you understand and customize each module.
11+
This tutorial introduces the structure and implementation of the NavDP policy model in the internNav framework, helping you understand and customize each module.
1212

1313
---
1414

1515
### Model Structure Overview
1616

17-
The navdp policy model in internNav mainly consists of the following parts:
17+
The NavDP policy model in internNav mainly consists of the following parts:
1818

1919
- **RGBD Encoder (NavDP_RGBD_Backbone)**: Extracts multi-frame RGB+Depth features.
2020
- **Goal Point/Image Encoder**: Encodes goal point or goal image information.
@@ -116,7 +116,7 @@ To customize the backbone, decoder, or heads, refer to `navdp_policy.py` and `na
116116
### Reference
117117
- [diffusion policy](https://github.com/real-stanford/diffusion_policy)
118118

119-
<!-- navdp content end -->
119+
<!-- NavDP content end -->
120120

121121
---
122122

source/en/user_guide/internnav/tutorials/training.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
# Training
22

3-
This tutorial provides a detailed guide for training both System 1 (navdp) and System 2 (rdp) policy models within the InterNav framework.
3+
This tutorial provides a detailed guide for training both System 1 (NavDP) and whole system (InternVLA-N1-S2) policy models within the InterNav framework.
44

55
---
66

7-
## System 1: Navdp
7+
## System 1: NavDP
88

9-
<!-- navdp content start -->
9+
<!-- NavDP content start -->
1010

11-
This tutorial provides a detailed guide for training the navdp policy model within the InterNav framework. It covers the **training workflow**, **configuration and parameters**, **command-line usage**, and **troubleshooting**.
11+
This tutorial provides a detailed guide for training the NavDP policy model within the InterNav framework. It covers the **training workflow**, **configuration and parameters**, **command-line usage**, and **troubleshooting**.
1212

1313
---
1414

1515
### Overview of the Training Process
1616

17-
The navdp training process in InterNav includes the following steps:
17+
The NavDP training process in InterNav includes the following steps:
1818

19-
1. **Model Initialization**: Load navdp configuration and initialize model structure and parameters.
19+
1. **Model Initialization**: Load NavDP configuration and initialize model structure and parameters.
2020
2. **Dataset Loading**: Configure dataset paths and preprocessing, build the DataLoader.
2121
3. **Training Parameter Setup**: Set batch size, learning rate, optimizer, and other hyperparameters.
2222
4. **Distributed Training Environment Initialization**: Multi-GPU training is supported out of the box.
@@ -32,7 +32,7 @@ Ensure you have installed InterNav and its dependencies, and have access to a mu
3232

3333
#### 2. Configuration Check
3434

35-
The navdp training configuration file is located at:
35+
The NavDP training configuration file is located at:
3636

3737
```bash
3838
InternNav/scripts/train/configs/navdp.py
@@ -72,7 +72,7 @@ torchrun \
7272

7373
### Training Parameters and Configuration
7474

75-
The main training parameters for navdp are set in `scripts/train/configs/navdp.py`. Common parameters include:
75+
The main training parameters for NavDP are set in `scripts/train/configs/navdp.py`. Common parameters include:
7676

7777
| Parameter | Description | Example |
7878
|-------------------|---------------------------|---------|
@@ -110,7 +110,7 @@ For more parameters, see the comments in the configuration file.
110110

111111
For customizing the model structure or dataset format, see [model.md](./model.md) and [dataset.md](./dataset.md).
112112

113-
<!-- navdp content end -->
113+
<!-- NavDP content end -->
114114

115115
---
116116

0 commit comments

Comments
 (0)