- InternUtopia: A simulation platform for versatile Embodied AI research and developments.
- InternManip: An all-in-one robot manipulation learning suites (5 pretrained models, 3 benchmarks, and more coming soon).
- InternNav: A open platform for building generalized navigation foundation models (with 6 mainstream benchmarks and 10+ baselines).
- InternHumanoid: A versatile, all-in-one toolbox for whole-body humanoid robot contorl.
- InternSR: A open-source toolbox for vision-based embodied spatial intelligence.
-
Humanoids/Legged Robots
- Datasets:
- InternData-H1: The largest open-sourced 3D human motion dataset with text annotation, including 2.5k hours 1.9M episodes.
- Models and Research:
- UniHSI: Unified Human-Scene Interaction via Prompted Chain-of-Contacts
- HIMLoco: Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
- 🏆HoST [Best Systems Paper Finalist at RSS 2025]: Learning Humanoid Standing-up Control across Diverse Postures
- HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
- Datasets:
-
Manipulation
- Datasets:
- InternData-A1: A hybrid synthetic-real manipulation dataset integrating 5 heterogeneous robots, 15 skills, and 200+ scenes, emphasizing multi-robot collaboration under dynamic scenarios.
- InternData-M1: A large-scale synthetic dataset for generalizable pick-and-place over 80K objects, with open-ended instructions covering object recognition, spatial and commonsense reasoning, and long-horizon tasks.
- Models and Research:
- InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
- InternVLA-M1: An Spatially Grounded Foundation Model for Generalist Robot
- F1-VLA: Visual foresight generation for planning-based control
- VLAC: A generalist vision-language-action-critic model for robotic real-world reinforcement learning
- Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
- RoboSplat: Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation
- GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
- Datasets:
-
Navigation
- Datasets:
- InternData-N1: A high-quality navigation dataset with the most diverse scenes and extensive randomization across embodiments/viewpoints, including 3k+ scenes and 830k VLN data.
- Models and Research:
- InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans
- NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
- StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
- VLN-PE: A Holistic Study of Physical and Visual Disparities in Vision-and-Language Navigation
- Datasets:
-
AIGC for Embodied AI
- Datasets:
- OmniWorld: A large-scale, multi-domain, multi-modal dataset, enables significant performance improvements in 4D reconstruction and video generation.
- Models and Research:
- MeshCoder: Generate Structured 3D Object Blender Code from Point Clouds
- Infinite-Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation
- Aether: Geometric-Aware Unified World Modeling
- Datasets:
-
3D Vision and Embodied Perception
- EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
- 🏆PointLLM [Best Paper Candidate at ECCV 2024]: Empowering Large Language Models to Understand Point Clouds
- MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
-
3D Assets for Embodied AI
- InternScenes: A large-scale interactive indoor scene dataset with realistic layouts, 40,000 diverse scenes and 1.96M 3D objects.