[docs] update data and checkpoint paths

yangyuqiang · HanqingWangAI · commit a91edcbcf16c · 2025-07-30T08:21:45.000Z
diff --git a/source/en/user_guide/internnav/quick_start/installation.md b/source/en/user_guide/internnav/quick_start/installation.md
@@ -14,17 +14,9 @@
 # Installation Guide
 
 😄 Don’t worry — both [Quick Installation](#quick-installation) and [Dataset Preparation](#dataset-preparation) are beginner-friendly.
-
-
-<!-- > 💡NOTE \
-> 🙋 **[First-time users:](#-lightweight-installation-recommended-for-beginners)** Skip GenManip for now — it requires installing NVIDIA [⚙️ Isaac Sim](#), which can be complex.
-Start with **CALVIN** or **SimplerEnv** for easy setup and full training/eval support.\
-> 🧠 **[Advanced users:](#-full-installation-advanced-users)** Feel free to use all benchmarks, including **GenManip** with Isaac Sim support. -->
-
-<!-- > For 🙋**first-time** users, we recommend skipping the GenManip benchmark, as it requires installing NVIDIA [⚙️ Isaac Sim](#) for simulation (which can be complex).
-Instead, start with **CALVIN** or **SimplerEnv** — both are easy to set up and fully support training and evaluation. -->
-
-<!-- This guide provides comprehensive instructions for installing and setting up the InternManip robot manipulation learning suite. Please read through the following prerequisites carefully before proceeding with the installation. -->
+```
+Detailed technical report will be released in about two weeks.
+```
 
 ## Prerequisites
 
@@ -176,6 +168,10 @@ We provide a flexible installation tool for users who want to use InternNav for
 
 
 ## Quick Installation
+Clone the InternNav repository:
+```bash
+git clone https://github.com/InternRobotics/InternNav.git --recursive
+```
 
 Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:
 
@@ -187,43 +183,39 @@ Choose the environment that best fits your specific needs to optimize your exper
 ### Isaac Sim Environment
 #### Prerequisite
 - Ubuntu 20.04, 22.04
-- Conda
 - Python 3.10.16 (3.10.* should be ok)
 - NVIDIA Omniverse Isaac Sim 4.5.0
 - NVIDIA GPU (RTX 2070 or higher)
 - NVIDIA GPU Driver (recommended version 535.216.01+)
 - PyTorch 2.5.1, 2.6.0 (recommended)
 - CUDA 11.8, 12.4 (recommended)
-- Docker (Optional)
-- NVIDIA Container Toolkit (Optional)
 
 Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed.
 
-To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
+<!-- To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
 ```bash
 docker pull registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
 docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
-```
+``` -->
 #### Conda installation
 ```bash
-$ conda create -n <env> python=3.10 libxcb=1.14
+conda create -n <env> python=3.10 libxcb=1.14
 
 # Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
-$ conda activate <env>
-$ pip install internutopia
+conda activate <env>
+pip install internutopia
 
 # Configure the conda environment.
-$ python -m internutopia.setup_conda_pypi
-$ conda deactivate && conda activate <env>
+python -m internutopia.setup_conda_pypi
+conda deactivate && conda activate <env>
 ```
 For InternUtopia installation, you can find more detailed [docs](https://internrobotics.github.io/user_guide/internutopia/get_started/installation.html) in [InternUtopia](https://github.com/InternRobotics/InternUtopia?tab=readme-ov-file).
 ```bash
 # Install PyTorch based on your CUDA version
-$ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
+pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
 
 # Install other deps
-$ pip install -r isaac_requirements.txt
-
+pip install -r requirements/isaac_requirements.txt
 ```
 
 
@@ -237,46 +229,81 @@ If you need to train or evaluate models on [Habitat](#optional-habitat-environme
 - CUDA 12.4
 - GPU: NVIDIA A100 or higher (optional for VLA training)
 
+```bash
+conda create -n <env> python=3.9
+conda activate <env>
+```
+Install habitat sim and habitat lab:
 ```bash
 conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat
 git clone --branch v0.2.4 https://github.com/facebookresearch/habitat-lab.git
 cd habitat-lab
 pip install -e habitat-lab  # install habitat_lab
 pip install -e habitat-baselines # install habitat_baselines
-pip install -r habitat_requirements.txt
+```
+Install pytorch and other requirements:
+```bash
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url ​https://download.pytorch.org/whl/cu124
+pip install -r requirements/habitat_requirements.txt
 ```
 
 
 ## Verification
 
-Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory. Download the VLN-CE dataset from [huggingface](). The final folder structure should look like this:
+### Data/Checkpoints Preparation
+To get started, we need to prepare the data and checkpoints.
+1. **InternVLA-N1 pretrained Checkpoints**
+Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
+2. **DepthAnything v2 Checkpoints**
+Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
+3. **Matterport3D Scenes**
+Download the MP3D scenes from [official project pages](https://niessner.github.io/Matterport/) and place them under `data/scene_datasets/mp3d`.
+4. **VLN-CE Episodes**
+   - [r2r](https://drive.google.com/file/d/18DCrNcpxESnps1IbXVjXSbGLDzcSOqzD/view) (rename R2R_VLNCE_v1/ -> r2r/)
+   - [rxr](https://drive.google.com/file/d/145xzLjxBaNTbVgBfQ8e9EsBAV8W-SM0t/view) (rename RxR_VLNCE_v0/ -> rxr/)
+   - [envdrop](https://drive.google.com/file/d/1fo8F4NKgZDH-bPSdVU3cONAkt5EW-tyr/view) (rename R2R_VLNCE_v1-3_preprocessed/envdrop/ -> envdrop/)
+The final folder structure should look like this:
 
 ```bash
 InternNav/
-|-- data/
-|   |-- datasets
-        |-- vln
-        |-- vln_datasets
-    |-- scene_datasets
-    |-- hm3d
-    |-- mp3d
-
-|-- src/
-|   |-- ...
-
-|-- checkpoints/
-|   |-- InternVLA-N1/
-|   |   |-- model-00001-of-00004.safetensors
-|   |   |-- config.json
-|   |   |-- ...
-|   |-- InternVLA-N1-S2
-|   |   |-- model-00001-of-00004.safetensors
-|   |   |-- config.json
-|   |   |-- ...
+├── data/
+│   ├── datasets/
+│   │   ├── r2r/
+│   │   │   ├── train/
+│   │   │   ├── val_seen/
+│   │   │   ├── val_unseen/
+│   │   ├── rxr/
+│   │   │   ├── train/
+│   │   │   ├── val_seen/
+│   │   │   ├── val_unseen/
+│   │   ├── envdrop/
+│   │   │   ├── train/
+│   │   │   ├── val_seen/
+│   │   │   ├── val_unseen/
+│   ├── scene_datasets/
+│   │   ├── mp3d
+│   │   │   ├──17DRP5sb8fy/
+│   │   │   ├── 1LXtFkjw3qL/
+│   │   │   └── ...
+├── src/
+│   ├── ...
+
+├── checkpoints/
+│   ├── InternVLA-N1/
+│   │   ├── model-00001-of-00004.safetensors
+│   │   ├── config.json
+│   │   ├── ...
+│   ├── InternVLA-N1-S2
+│   │   ├── model-00001-of-00004.safetensors
+│   │   ├── config.json
+│   │   ├── ...
+│   │   depth_anything_v2_vits.pth
 ```
+### Gradio demo
 
-Replace the 'model_path' variable in 'vln_ray_backend.py' with the path of InternVLA-N1 checkpoint.
+Currently the gradio demo is only available in **habitat** environment. Replace the 'model_path' variable in 'vln_ray_backend.py' with the path of InternVLA-N1 checkpoint.
 ```bash
+conda activate <habitat-env>
 srun -p {partition_name} --cpus-per-task 16 --gres gpu:1 python3 scripts/eval/vln_ray_backend.py
 ```
 Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the server's IP address. Start the gradio.
@@ -294,8 +321,11 @@ Click the 'Start Navigation Simulation' button to send a VLN request to the back
 
 
 
-## Dataset Preparation
-We also prepare high-quality data for trainning system1/system2. To set up the trainning dataset, please follow the steps below:
+## InternData-N1 Dataset Preparation
+```
+Due to network throttling restrictions on HuggingFace, InternData-N1 has not been fully uploaded yet. Please wait patiently for several days.
+```
+We also prepare high-quality data for **training** system1/system2 and **evaluation** on isaac sim environment. To set up the dataset, please follow the steps below:
 
 1. Download Datasets
 - Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for:
@@ -327,13 +357,60 @@ data/
 │   │       └── val_unseen.json.gz
 ├── └── traj_data/
 │       └── mp3d/
-│           └── trajectory_0/
-│               ├── data/
-│               ├── meta/
-│               └── videos/
+│           └── 17DRP5sb8fy/
+│           └── 1LXtFkjw3qL/
+│           └── ...
 ├── vln_ce/
 │   ├── raw_data/
 │   └── traj_data/
 └── vln_n1/
     └── traj_data/
 ```
+
+If you want to evaluate on habitat environment and finish the data preparation mentioned [above](#DataCheckpoints-Preparation), the final data structure should look like this:
+```bash
+data/
+├── scene_data/
+│   ├── mp3d_pe/
+│   │   ├──17DRP5sb8fy/
+│   │   ├── 1LXtFkjw3qL/
+│   │   └── ...
+│   ├── mp3d_ce/
+│   └── mp3d_n1/
+├── vln_pe/
+│   ├── raw_data/
+│   │   ├── train/
+│   │   ├── val_seen/
+│   │   │   └── val_seen.json.gz
+│   │   └── val_unseen/
+│   │       └── val_unseen.json.gz
+├── └── traj_data/
+│       └── mp3d/
+│           └── 17DRP5sb8fy/
+│           └── 1LXtFkjw3qL/
+│           └── ...
+│
+├── vln_ce/
+│   ├── raw_data/
+│   └── traj_data/
+└── vln_n1/
+│    └── traj_data/
+├── datasets/
+│   ├── r2r/
+│   ├── ├── train/
+│   ├── ├── val_seen/
+│   ├── ├── val_unseen/
+│   ├── rxr/
+│   ├── ├── train/
+│   ├── ├── val_seen/
+│   ├── ├── val_unseen/
+│   ├── envdrop/
+│   ├── ├── train/
+│   ├── ├── val_seen/
+│   ├── ├── val_unseen/
+├── scene_datasets
+│   ├── mp3d
+│   │   ├──17DRP5sb8fy/
+│   │   ├── 1LXtFkjw3qL/
+│   │   └── ...
+```
diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/train_eval.md
@@ -6,7 +6,7 @@ This document presents how to train and evaluate models for different systems wi
 ## Whole-system
 
 ### Evaluation
-Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments). Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
+Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
 
 #### Evaluation on isaac sim
 The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.
@@ -150,7 +150,7 @@ data/
 
 ### Training
 
-Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the trainning of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
+Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
 
 ```base
 # train cma model
diff --git a/source/en/user_guide/internnav/tutorials/model.md b/source/en/user_guide/internnav/tutorials/model.md
@@ -1,20 +1,20 @@
 # Model
 
-This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the InterNav framework.
+This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the internNav framework.
 
 ---
 
 ## System 1: Navdp
 
 <!-- navdp content start -->
 
-This tutorial introduces the structure and implementation of the navdp policy model in the InterNav framework, helping you understand and customize each module.
+This tutorial introduces the structure and implementation of the navdp policy model in the internNav framework, helping you understand and customize each module.
 
 ---
 
 ### Model Structure Overview
 
-The navdp policy model in InterNav mainly consists of the following parts:
+The navdp policy model in internNav mainly consists of the following parts:
 
 - **RGBD Encoder (NavDP_RGBD_Backbone)**: Extracts multi-frame RGB+Depth features.
 - **Goal Point/Image Encoder**: Encodes goal point or goal image information.
@@ -98,8 +98,6 @@ def forward(self, goal_point, goal_image, input_images, input_depths, output_act
     return action_pred, value_pred
 ```
 
----
-
 ### Key Code Snippets
 
 #### Load Model
@@ -116,14 +114,50 @@ To customize the backbone, decoder, or heads, refer to `navdp_policy.py` and `na
 ---
 
 ### Reference
-- [navdp_policy.py](../../internnav/model/basemodel/navdp/navdp_policy.py)
-- [navdp_backbone.py](../../internnav/model/encoder/navdp_backbone.py)
-- [navdp.py config](../../scripts/train/configs/navdp.py)
+- [diffusion policy](https://github.com/real-stanford/diffusion_policy)
 
 <!-- navdp content end -->
 
 ---
 
-## System 2: InternVLA-N1-S2
+## Dual System: InternVLA-N1
+This tutorial provides a detailed guide for training the InternVLA-N1 policy model within the internNav framework.
+
+1. Qwen2.5-VL Backbone
+The system 2 model is built on Qwen2.5-VL, a state-of-the-art vision-language model:
+
+```python
+class InternVLAN1ForCausalLM(Qwen2_5_VLForConditionalGeneration, InternVLAN1MetaForCausalLM):
+    config_class = InternVLAN1ModelConfig
 
-*TODO
+    def __init__(self, config):
+        Qwen2_5_VLForConditionalGeneration.__init__(self, config)
+        config.model_type == "internvla_n1"
+
+        self.model = InternVLAN1Model(config)
+        self.rope_deltas = None
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+        self.post_init()
+```
+Qwen2.5-VL supports multi-turn conversations, image understanding, and text generation. We finetune the qwenVL model on the self-collected navigation dataset.
+
+2. Latent Queries
+Our model learns a set of latent queries to query the latent vector of Qwen2.5-VL, which is used to model trajectory context.
+```python
+self.latent_queries = nn.Parameter(torch.randn(1, config.n_query, config.hidden_size))
+```
+
+3. NavDP Integration
+Embeds the System 1 (NavDP) policy for low-level trajectory generation:
+
+```python
+def build_navdp(navdp_cfg):
+    navdp = NavDP_Policy_DPT_CriticSum_DAT(navdp_pretrained=navdp_cfg.navdp_pretrained)
+    navdp.load_model()
+    return navdp
+```
+NavDP converts high-level waypoints from the language model to continuous action sequences.
+
+
+### Reference
+[Qwen2.5-VL Documentation](https://lmdeploy.readthedocs.io/en/latest/multi_modal/qwen2_5_vl.html)
diff --git a/source/en/user_guide/internnav/tutorials/training.md b/source/en/user_guide/internnav/tutorials/training.md
@@ -116,4 +116,4 @@ For customizing the model structure or dataset format, see [model.md](./model.md
 
 ## System 2: InternVLA-N1-S2
 
-*TODO
+Currently we don't support the training of InternVLA-N1-S2 in this repository.

Original file line number	Diff line number	Diff line change
`@@ -116,4 +116,4 @@ For customizing the model structure or dataset format, see [model.md](./model.md`
`116`	`116`
`117`	`117`	`## System 2: InternVLA-N1-S2`
`118`	`118`
`119`		`-*TODO`
	`119`	`+Currently we don't support the training of InternVLA-N1-S2 in this repository.`