[docs] update internnav

HanqingWangAI · yangyuqiang · wangyukai · HanqingWangAI · commit 11166cb11da4 · 2025-08-01T12:49:48.000Z
Co-authored-by: yangyuqiang &lt;yangyuqiang@pjlab.org.cn&gt;
Co-authored-by: hanqing &lt;409082492@qq.com&gt;
Co-authored-by: wangyukai &lt;wangyukai@pjlab.org.cn&gt;
Co-authored-by: longyilin &lt;yilinlong137@163.com&gt;
Co-authored-by: zengyiming &lt;zengyiming&gt;
Co-authored-by: 刘雨 &lt;liuyu2@pjlab.org.cn&gt;
diff --git a/source/en/_static/image/gradio_interface.jpg b/source/en/_static/image/gradio_interface.jpg
diff --git a/source/en/_static/image/gradio_result.jpg b/source/en/_static/image/gradio_result.jpg
diff --git a/source/en/user_guide/internnav/quick_start/installation.md b/source/en/user_guide/internnav/quick_start/installation.md
@@ -256,35 +256,32 @@ To get started, we need to prepare the data and checkpoints.
 Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
 2. **DepthAnything v2 Checkpoints**
 Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
-3. **Matterport3D Scenes**
-Download the MP3D scenes from [official project pages](https://niessner.github.io/Matterport/) and place them under `data/scene_datasets/mp3d`.
-4. **VLN-CE Episodes**
-   - [r2r](https://drive.google.com/file/d/18DCrNcpxESnps1IbXVjXSbGLDzcSOqzD/view) (rename R2R_VLNCE_v1/ -> r2r/)
-   - [rxr](https://drive.google.com/file/d/145xzLjxBaNTbVgBfQ8e9EsBAV8W-SM0t/view) (rename RxR_VLNCE_v0/ -> rxr/)
-   - [envdrop](https://drive.google.com/file/d/1fo8F4NKgZDH-bPSdVU3cONAkt5EW-tyr/view) (rename R2R_VLNCE_v1-3_preprocessed/envdrop/ -> envdrop/)
+3. **InternData-N1 VLN-CE Episodes**
+Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for `vln-ce`. Extract them into the `data/vln_ce/` directory.
+4. **Scene-N1**
+Download the [SceneData-N1](https://huggingface.co/datasets/InternRobotics/Scene-N1) for `mp3d_ce`. Extract them into the `data/scene_data/` directory.
+
 The final folder structure should look like this:
 
 ```bash
 InternNav/
 ├── data/
-│   ├── datasets/
-│   │   ├── r2r/
-│   │   │   ├── train/
-│   │   │   ├── val_seen/
-│   │   │   ├── val_unseen/
-│   │   ├── rxr/
-│   │   │   ├── train/
-│   │   │   ├── val_seen/
-│   │   │   ├── val_unseen/
-│   │   ├── envdrop/
-│   │   │   ├── train/
-│   │   │   ├── val_seen/
-│   │   │   ├── val_unseen/
-│   ├── scene_datasets/
-│   │   ├── mp3d
-│   │   │   ├──17DRP5sb8fy/
-│   │   │   ├── 1LXtFkjw3qL/
-│   │   │   └── ...
+│   ├── vln_ce/
+│   │   ├── raw_data/
+│   │   │   ├── r2r
+│   │   │   │   ├── train
+│   │   │   │   ├── val_seen
+│   │   │   │   │   └── val_seen.json.gz
+│   │   │   │   └── val_unseen
+│   │   │   │       └── val_unseen.json.gz
+│   │   └── traj_data/
+│   ├── scene_data/
+│   │   ├── mp3d_ce/
+│   │   │   ├── mp3d/
+│   │   │   │   ├── 17DRP5sb8fy/
+│   │   │   │   ├── 1LXtFkjw3qL/
+│   │   │   │   └── ...
+
 ├── src/
 │   ├── ...
 
@@ -308,12 +305,12 @@ srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vl
 ```
 Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the server's IP address. Start the gradio.
 ```bash
-python navigation_ui.py
+python scripts/eval/navigation_ui.py
 ```
-Note that it's better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below.
+Note that it's better to run the Gradio client on a machine with a graphical user interface (GUI) but ensure there is proper network connectivity between the client and the server. Download the gradio scene assets from [huggingface](https://huggingface.co/datasets/InternRobotics/Scene-N1) and extract them into the `scene_assets` directory of the client. Then open a browser and enter the Gradio address (such as http://0.0.0.0:5700). We can see the interface as shown below.
 ![img.png](../../../_static/image/gradio_interface.jpg)
 
-Click the 'Start Navigation Simulation' button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 3 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this.
+Click the 'Start Navigation Simulation' button to send a VLN request to the backend. The backend will submit a task to ray server and simulate the VLN task with InternVLA-N1 models. Wait about 1 minutes, the VLN task will be finished and return a result video. We can see the result video in the gradio like this.
 ![img.png](../../../_static/image/gradio_result.jpg)
 
 
@@ -343,10 +340,14 @@ After downloading, organize the datasets into the following structure:
 data/
 ├── scene_data/
 │   ├── mp3d_pe/
-│   │   ├──17DRP5sb8fy/
+│   │   ├── 17DRP5sb8fy/
 │   │   ├── 1LXtFkjw3qL/
 │   │   └── ...
 │   ├── mp3d_ce/
+│   │   ├── mp3d/
+│   │   │   ├── 17DRP5sb8fy/
+│   │   │   ├── 1LXtFkjw3qL/
+│   │   │   └── ...
 │   └── mp3d_n1/
 ├── vln_pe/
 │   ├── raw_data/
@@ -362,55 +363,14 @@ data/
 │           └── ...
 ├── vln_ce/
 │   ├── raw_data/
+│   │   ├── r2r
+│   │   │   ├── train
+│   │   │   ├── val_seen
+│   │   │   │   └── val_seen.json.gz
+│   │   │   └── val_unseen
+│   │   │       └── val_unseen.json.gz
 │   └── traj_data/
 └── vln_n1/
     └── traj_data/
 ```
 
-If you want to evaluate on habitat environment and finish the data preparation mentioned [above](#DataCheckpoints-Preparation), the final data structure should look like this:
-```bash
-data/
-├── scene_data/
-│   ├── mp3d_pe/
-│   │   ├──17DRP5sb8fy/
-│   │   ├── 1LXtFkjw3qL/
-│   │   └── ...
-│   ├── mp3d_ce/
-│   └── mp3d_n1/
-├── vln_pe/
-│   ├── raw_data/
-│   │   ├── train/
-│   │   ├── val_seen/
-│   │   │   └── val_seen.json.gz
-│   │   └── val_unseen/
-│   │       └── val_unseen.json.gz
-├── └── traj_data/
-│       └── mp3d/
-│           └── 17DRP5sb8fy/
-│           └── 1LXtFkjw3qL/
-│           └── ...
-│
-├── vln_ce/
-│   ├── raw_data/
-│   └── traj_data/
-└── vln_n1/
-│    └── traj_data/
-├── datasets/
-│   ├── r2r/
-│   ├── ├── train/
-│   ├── ├── val_seen/
-│   ├── ├── val_unseen/
-│   ├── rxr/
-│   ├── ├── train/
-│   ├── ├── val_seen/
-│   ├── ├── val_unseen/
-│   ├── envdrop/
-│   ├── ├── train/
-│   ├── ├── val_seen/
-│   ├── ├── val_unseen/
-├── scene_datasets
-│   ├── mp3d
-│   │   ├──17DRP5sb8fy/
-│   │   ├── 1LXtFkjw3qL/
-│   │   └── ...
-```
diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/train_eval.md
@@ -28,7 +28,7 @@ Finally, start the client:
 INTERNUTOPIA_ASSETS_PATH=/path/to/InternUTopiaAssets MESA_GL_VERSION_OVERRIDE=4.6 python scripts/eval/eval.py --config scripts/eval/configs/h1_internvla_n1_cfg.py
 ```
 
-The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 3 hours at RTX4090 platform.
+The evaluation results will be saved in the `eval_results.log` file in the output_dir of the config file. The whole evaluation process takes about 10 hours at RTX-4090 graphics platform.
 
 
 #### Evaluation on habitat
@@ -49,7 +49,7 @@ For multi-gpu inference, currently we only support inference on SLURM.
 
 ### Training
 
-Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and extract them into the `data/datasets/` directory.
+Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and organize them in the form mentioned in [installation](./installation.md).
 
 ```bash
 ./scripts/train/start_train.sh --name "$NAME" --model-name navdp
@@ -123,10 +123,14 @@ The final folder structure should look like this:
 data/
 ├── scene_data/
 │   ├── mp3d_pe/
-│   │   ├──17DRP5sb8fy/
+│   │   ├── 17DRP5sb8fy/
 │   │   ├── 1LXtFkjw3qL/
 │   │   └── ...
 │   ├── mp3d_ce/
+│   │   ├── mp3d/
+│   │   │   ├── 17DRP5sb8fy/
+│   │   │   ├── 1LXtFkjw3qL/
+│   │   │   └── ...
 │   └── mp3d_n1/
 ├── vln_pe/
 │   ├── raw_data/
@@ -143,6 +147,12 @@ data/
 │               └── videos/
 ├── vln_ce/
 │   ├── raw_data/
+│   │   ├── r2r
+│   │   │   ├── train
+│   │   │   ├── val_seen
+│   │   │   │   └── val_seen.json.gz
+│   │   │   └── val_unseen
+│   │   │       └── val_unseen.json.gz
 │   └── traj_data/
 └── vln_n1/
     └── traj_data/
diff --git a/source/en/user_guide/internnav/tutorials/dataset.md b/source/en/user_guide/internnav/tutorials/dataset.md
@@ -5,7 +5,7 @@ You’ll learn:
 
 - 📁 [How to structure the dataset](#dataset-format)
 - 🔁 [How to convert popular datasets like VLN-CE](#convert-to-lerobotdataset)
-- 🎮 [How to collect your own demonstrations in GRUtopia](#collect-demonstration-dataset-in-grutopia)
+- 🎮 [How to collect your own demonstrations in InternUtopia](#collect-demonstration-dataset-in-internutopia)
 
 
 These steps ensure compatibility with our training and evaluation framework across all supported benchmarks.
@@ -428,6 +428,6 @@ InternNav adopts the [LeRobot](https://github.com/huggingface/lerobot) format fo
 
 
 
-## Collect Demonstration Dataset in GRUtopia
+## Collect Demonstration Dataset in InternUtopia
 
-Support for collecting demos via GRUtopia simulation is coming soon — stay tuned!
+Support for collecting demos via InternUtopia simulation is coming soon — stay tuned!
diff --git a/source/en/user_guide/internnav/tutorials/model.md b/source/en/user_guide/internnav/tutorials/model.md
@@ -1,20 +1,20 @@
 # Model
 
-This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the internNav framework.
+This tutorial introduces the structure and implementation of both System 1 (NavDP) and whole-system (InternVLA-N1) policy models in the internNav framework.
 
 ---
 
-## System 1: Navdp
+## System 1: NavDP
 
-<!-- navdp content start -->
+<!-- NavDP content start -->
 
-This tutorial introduces the structure and implementation of the navdp policy model in the internNav framework, helping you understand and customize each module.
+This tutorial introduces the structure and implementation of the NavDP policy model in the internNav framework, helping you understand and customize each module.
 
 ---
 
 ### Model Structure Overview
 
-The navdp policy model in internNav mainly consists of the following parts:
+The NavDP policy model in internNav mainly consists of the following parts:
 
 - **RGBD Encoder (NavDP_RGBD_Backbone)**: Extracts multi-frame RGB+Depth features.
 - **Goal Point/Image Encoder**: Encodes goal point or goal image information.
@@ -116,7 +116,7 @@ To customize the backbone, decoder, or heads, refer to `navdp_policy.py` and `na
 ### Reference
 - [diffusion policy](https://github.com/real-stanford/diffusion_policy)
 
-<!-- navdp content end -->
+<!-- NavDP content end -->
 
 ---
 
diff --git a/source/en/user_guide/internnav/tutorials/training.md b/source/en/user_guide/internnav/tutorials/training.md
@@ -1,22 +1,22 @@
 # Training
 
-This tutorial provides a detailed guide for training both System 1 (navdp) and System 2 (rdp) policy models within the InterNav framework.
+This tutorial provides a detailed guide for training both System 1 (NavDP) and whole system (InternVLA-N1-S2) policy models within the InterNav framework.
 
 ---
 
-## System 1: Navdp
+## System 1: NavDP
 
-<!-- navdp content start -->
+<!-- NavDP content start -->
 
-This tutorial provides a detailed guide for training the navdp policy model within the InterNav framework. It covers the **training workflow**, **configuration and parameters**, **command-line usage**, and **troubleshooting**.
+This tutorial provides a detailed guide for training the NavDP policy model within the InterNav framework. It covers the **training workflow**, **configuration and parameters**, **command-line usage**, and **troubleshooting**.
 
 ---
 
 ### Overview of the Training Process
 
-The navdp training process in InterNav includes the following steps:
+The NavDP training process in InterNav includes the following steps:
 
-1. **Model Initialization**: Load navdp configuration and initialize model structure and parameters.
+1. **Model Initialization**: Load NavDP configuration and initialize model structure and parameters.
 2. **Dataset Loading**: Configure dataset paths and preprocessing, build the DataLoader.
 3. **Training Parameter Setup**: Set batch size, learning rate, optimizer, and other hyperparameters.
 4. **Distributed Training Environment Initialization**: Multi-GPU training is supported out of the box.
@@ -32,7 +32,7 @@ Ensure you have installed InterNav and its dependencies, and have access to a mu
 
 #### 2. Configuration Check
 
-The navdp training configuration file is located at:
+The NavDP training configuration file is located at:
 
 ```bash
 InternNav/scripts/train/configs/navdp.py
@@ -72,7 +72,7 @@ torchrun \
 
 ### Training Parameters and Configuration
 
-The main training parameters for navdp are set in `scripts/train/configs/navdp.py`. Common parameters include:
+The main training parameters for NavDP are set in `scripts/train/configs/navdp.py`. Common parameters include:
 
 | Parameter         | Description                | Example |
 |-------------------|---------------------------|---------|
@@ -110,7 +110,7 @@ For more parameters, see the comments in the configuration file.
 
 For customizing the model structure or dataset format, see [model.md](./model.md) and [dataset.md](./dataset.md).
 
-<!-- navdp content end -->
+<!-- NavDP content end -->
 
 ---