You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
😄 Don’t worry — both [Quick Installation](#quick-installation) and [Dataset Preparation](#dataset-preparation) are beginner-friendly.
17
-
18
-
19
-
<!-- > 💡NOTE \
20
-
> 🙋 **[First-time users:](#-lightweight-installation-recommended-for-beginners)** Skip GenManip for now — it requires installing NVIDIA [⚙️ Isaac Sim](#), which can be complex.
21
-
Start with **CALVIN** or **SimplerEnv** for easy setup and full training/eval support.\
22
-
> 🧠 **[Advanced users:](#-full-installation-advanced-users)** Feel free to use all benchmarks, including **GenManip** with Isaac Sim support. -->
23
-
24
-
<!-- > For 🙋**first-time** users, we recommend skipping the GenManip benchmark, as it requires installing NVIDIA [⚙️ Isaac Sim](#) for simulation (which can be complex).
25
-
Instead, start with **CALVIN** or **SimplerEnv** — both are easy to set up and fully support training and evaluation. -->
26
-
27
-
<!-- This guide provides comprehensive instructions for installing and setting up the InternManip robot manipulation learning suite. Please read through the following prerequisites carefully before proceeding with the installation. -->
17
+
```
18
+
Detailed technical report will be released in about two weeks.
19
+
```
28
20
29
21
## Prerequisites
30
22
@@ -176,6 +168,10 @@ We provide a flexible installation tool for users who want to use InternNav for
Our toolchain provides two Python environment solutions to accommodate different usage scenarios with the InternNav-N1 series model:
181
177
@@ -187,43 +183,39 @@ Choose the environment that best fits your specific needs to optimize your exper
187
183
### Isaac Sim Environment
188
184
#### Prerequisite
189
185
- Ubuntu 20.04, 22.04
190
-
- Conda
191
186
- Python 3.10.16 (3.10.* should be ok)
192
187
- NVIDIA Omniverse Isaac Sim 4.5.0
193
188
- NVIDIA GPU (RTX 2070 or higher)
194
189
- NVIDIA GPU Driver (recommended version 535.216.01+)
195
190
- PyTorch 2.5.1, 2.6.0 (recommended)
196
191
- CUDA 11.8, 12.4 (recommended)
197
-
- Docker (Optional)
198
-
- NVIDIA Container Toolkit (Optional)
199
192
200
193
Before proceeding with the installation, ensure that you have [Isaac Sim 4.5.0](https://docs.isaacsim.omniverse.nvidia.com/4.5.0/installation/install_workstation.html) and [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) installed.
201
194
202
-
To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
195
+
<!--To help you get started quickly, we've prepared a Docker image pre-configured with Isaac Sim 4.5 and InternUtopia. You can pull the image and run evaluations in the container using the following command:
docker run -it --name internutopia-container registry.cn-hangzhou.aliyuncs.com/internutopia/internutopia:2.2.0
206
-
```
199
+
```-->
207
200
#### Conda installation
208
201
```bash
209
-
$ conda create -n <env> python=3.10 libxcb=1.14
202
+
conda create -n <env> python=3.10 libxcb=1.14
210
203
211
204
# Install InternUtopia through pip.(2.1.1 and 2.2.0 recommended)
212
-
$ conda activate <env>
213
-
$ pip install internutopia
205
+
conda activate <env>
206
+
pip install internutopia
214
207
215
208
# Configure the conda environment.
216
-
$ python -m internutopia.setup_conda_pypi
217
-
$ conda deactivate && conda activate <env>
209
+
python -m internutopia.setup_conda_pypi
210
+
conda deactivate && conda activate <env>
218
211
```
219
212
For InternUtopia installation, you can find more detailed [docs](https://internrobotics.github.io/user_guide/internutopia/get_started/installation.html) in [InternUtopia](https://github.com/InternRobotics/InternUtopia?tab=readme-ov-file).
Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory. Download the VLN-CE dataset from [huggingface](). The final folder structure should look like this:
253
+
### Data/Checkpoints Preparation
254
+
To get started, we need to prepare the data and checkpoints.
255
+
1.**InternVLA-N1 pretrained Checkpoints**
256
+
Please download our latest pretrained [checkpoint](https://huggingface.co/InternRobotics/InternVLA-N1) of InternVLA-N1 and run the following script to inference with visualization results. Move the checkpoint to the `checkpoints` directory.
257
+
2.**DepthAnything v2 Checkpoints**
258
+
Please download the depthanything v2 pretrained [checkpoint](https://huggingface.co/Ashoka74/Placement/resolve/main/depth_anything_v2_vits.pth). Move the checkpoint to the `checkpoints` directory.
259
+
3.**Matterport3D Scenes**
260
+
Download the MP3D scenes from [official project pages](https://niessner.github.io/Matterport/) and place them under `data/scene_datasets/mp3d`.
Replace the 'model_path' variable in 'vln_ray_backend.py' with the path of InternVLA-N1 checkpoint.
304
+
Currently the gradio demo is only available in **habitat** environment. Replace the 'model_path' variable in 'vln_ray_backend.py' with the path of InternVLA-N1 checkpoint.
Find the IP address of the node allocated by Slurm. Then change the BACKEND_URL in the gradio client (navigation_ui.py) to the server's IP address. Start the gradio.
@@ -294,8 +321,11 @@ Click the 'Start Navigation Simulation' button to send a VLN request to the back
294
321
295
322
296
323
297
-
## Dataset Preparation
298
-
We also prepare high-quality data for trainning system1/system2. To set up the trainning dataset, please follow the steps below:
324
+
## InternData-N1 Dataset Preparation
325
+
```
326
+
Due to network throttling restrictions on HuggingFace, InternData-N1 has not been fully uploaded yet. Please wait patiently for several days.
327
+
```
328
+
We also prepare high-quality data for **training** system1/system2 and **evaluation** on isaac sim environment. To set up the dataset, please follow the steps below:
299
329
300
330
1. Download Datasets
301
331
- Download the [InternData-N1](https://huggingface.co/datasets/InternRobotics/InternData-N1) for:
@@ -327,13 +357,60 @@ data/
327
357
│ │ └── val_unseen.json.gz
328
358
├── └── traj_data/
329
359
│ └── mp3d/
330
-
│ └── trajectory_0/
331
-
│ ├── data/
332
-
│ ├── meta/
333
-
│ └── videos/
360
+
│ └── 17DRP5sb8fy/
361
+
│ └── 1LXtFkjw3qL/
362
+
│ └── ...
334
363
├── vln_ce/
335
364
│ ├── raw_data/
336
365
│ └── traj_data/
337
366
└── vln_n1/
338
367
└── traj_data/
339
368
```
369
+
370
+
If you want to evaluate on habitat environment and finish the data preparation mentioned [above](#DataCheckpoints-Preparation), the final data structure should look like this:
Copy file name to clipboardExpand all lines: source/en/user_guide/internnav/quick_start/train_eval.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ This document presents how to train and evaluate models for different systems wi
6
6
## Whole-system
7
7
8
8
### Evaluation
9
-
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments). Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
9
+
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
10
10
11
11
#### Evaluation on isaac sim
12
12
The main architecture of the whole-system evaluation adopts a client-server model. In the client, we specify the corresponding configuration (*.cfg), which includes settings such as the scenarios to be evaluated, robots, models, and parallelization parameters. The client sends requests to the server, which then submits tasks to the Ray distributed framework based on the corresponding cfg file, enabling the entire evaluation process to run.
@@ -150,7 +150,7 @@ data/
150
150
151
151
### Training
152
152
153
-
Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the trainning of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
153
+
Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
Copy file name to clipboardExpand all lines: source/en/user_guide/internnav/tutorials/model.md
+44-10Lines changed: 44 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,20 @@
1
1
# Model
2
2
3
-
This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the InterNav framework.
3
+
This tutorial introduces the structure and implementation of both System 1 (navdp) and System 2 (rdp) policy models in the internNav framework.
4
4
5
5
---
6
6
7
7
## System 1: Navdp
8
8
9
9
<!-- navdp content start -->
10
10
11
-
This tutorial introduces the structure and implementation of the navdp policy model in the InterNav framework, helping you understand and customize each module.
11
+
This tutorial introduces the structure and implementation of the navdp policy model in the internNav framework, helping you understand and customize each module.
12
12
13
13
---
14
14
15
15
### Model Structure Overview
16
16
17
-
The navdp policy model in InterNav mainly consists of the following parts:
17
+
The navdp policy model in internNav mainly consists of the following parts:
Qwen2.5-VL supports multi-turn conversations, image understanding, and text generation. We finetune the qwenVL model on the self-collected navigation dataset.
143
+
144
+
2. Latent Queries
145
+
Our model learns a set of latent queries to query the latent vector of Qwen2.5-VL, which is used to model trajectory context.
0 commit comments