AIR-Embodied

The replication code for experiments has been open sourced, and our system will be fully open source once the article is accepted.

Recent advancements in 3D reconstruction and neural rendering have enhanced the creation of high-quality digital assets, yet existing methods struggle to generalize across varying object shapes, textures, and occlusions. While Next Best View (NBV) planning and learning-based approaches offer solutions, they are often limited by predefined criteria and fail to manage occlusions effectively. We present AIR-Embodied, a novel framework that integrates embodied AI agents with large-scale pretrained multi-modal language models (MLLM) to improve active reconstruction. AIR-Embodied utilizes a three-stage process: understanding the current reconstruction state via multi-modal prompts, planning tasks with viewpoint selection and interactive actions, and employing closed-loop reasoning to ensure accurate execution. The agent dynamically refines its actions based on discrepancies between the planned and actual outcomes. Experimental evaluations across virtual and real-world environments demonstrate that AIR-Embodied significantly enhances reconstruction efficiency and quality, providing a robust solution to challenges in active 3D reconstruction.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
diff-gaussian-rasterization-with- pixel-uncertainty		diff-gaussian-rasterization-with- pixel-uncertainty
README.md		README.md
next_best_view_policy.py		next_best_view_policy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIR-Embodied

About

Uh oh!

Releases

Packages

Uh oh!

Languages

QZH-00/AIR-Embodied

Folders and files

Latest commit

History

Repository files navigation

AIR-Embodied

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages