Skip to content

JAMESYJL/Nano3D

Repository files navigation

Nano3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Paper | Project Page | Datasets

Official implementation of Nano3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Junliang Ye*, Shenghao Xie*, Ruowen Zhao, Zhengyi Wang, Hongyu Yan, Wenqiang Zu, Lei Ma, Jun Zhu.

nano3d-3.mp4

Abstract: 3D object editing is essential for interactive content creation in gaming, animation, and robotics, yet current approaches remain inefficient, inconsistent, and often fail to preserve unedited regions. Most methods rely on editing multi-view renderings followed by reconstruction, which introduces artifacts and limits practicality. To address these challenges, we propose Nano3D, a training-free framework for precise and coherent 3D object editing without masks. Nano3D integrates FlowEdit into TRELLIS to perform localized edits guided by front-view renderings, and further introduces region-aware merging strategies, Voxel/Slat-Merge, which adaptively preserve structural fidelity by ensuring consistency between edited and unedited areas. Experiments demonstrate that Nano3D achieves superior 3D consistency and visual quality compared with existing methods. Based on this framework, we construct the first large-scale 3D editing datasets Nano3D-Edit-100k, which contains over 100,000 high-quality 3D editing pairs. This work addresses long-standing challenges in both algorithm design and data availability, significantly improving the generality and reliability of 3D editing, and laying the groundwork for the development of feed-forward 3D editing models.

Installation

Basic Environment

Follow the TRELLIS installation guide to set up the base environment. Then install the additional dependency:

pip install bpy==4.0.0 --extra-index-url https://download.blender.org/pypi/

Optional: Local Image Editing with Qwen-Image

If you want to run image editing locally (instead of providing pre-edited images), additional setup is required:

  • torch >= 2.5.1
  • Configure the Qwen-Image environment following the official guide Qwen-Image-Lightning
  • At least 60GB GPU VRAM is required

Then download the Qwen-Image-Lightning LoRA weights:

huggingface-cli download lightx2v/Qwen-Image-Lightning \
    Qwen-Image-Edit-2509/Qwen-Image-Edit-2509-Lightning-8steps-V1.0-fp32.safetensors \
    --local-dir ./Qwen-Image-Lightning

Gradio Demo

We provide an interactive Gradio interface via app.py. There are 4 supported configurations:

Case Qwen-Image Input Type Description
1 Enabled 3D Mesh Direct 3D editing with auto image editing
2 Enabled Image Image-to-3D, then 3D editing with auto image editing
3 Disabled 3D Mesh Direct 3D editing (provide your own edited image)
4 Disabled Image Image-to-3D, then 3D editing (provide your own edited image)

Case 1 — Qwen-Image enabled, input: 3D mesh:

python3 app.py --use-qwen-image --input-mesh \
    --qwen-image-lora-path "/path/to/Qwen-Image-Edit-2509-Lightning-8steps-V1.0-fp32.safetensors"

Case 2 — Qwen-Image enabled, input: image:

python3 app.py --use-qwen-image \
    --qwen-image-lora-path "/path/to/Qwen-Image-Edit-2509-Lightning-8steps-V1.0-fp32.safetensors"

Case 3 — Qwen-Image disabled, input: 3D mesh:

python3 app.py --input-mesh

Case 4 — Qwen-Image disabled, input: image:

python3 app.py

Inference Scripts

Two inference scripts are provided depending on your input type.

inference.py — Input: 3D Mesh

Takes an existing GLB mesh as input and edits it directly.

Note on consistency: When using a 3D mesh as input, editing consistency may be lower. This is a known limitation of TRELLIS's render-projection encoding scheme — as discussed in the Nano3D paper, this pipeline is better suited for constructing editing datasets than for producing high-fidelity interactive edits. If you need more consistent results, you have two options:

  1. Use image input instead (inference2.py): the image → 3D → edit pipeline avoids render-projection entirely, yielding more consistent editing pairs. This is how the Nano3D-Edit-100k dataset was built.
  2. Stay tuned for Nano3D-v2, which will address this limitation.

Case 1 — With Qwen-Image (automatic image editing):

python3 inference.py \
    --src_mesh_path /path/to/source.glb \
    --output_dir ./output \
    --editing_mode add \
    --using_qwen_image \
    --edit_instruction "add a hat on the head." \
    --lora_path /path/to/Qwen-Image-Edit-2509-Lightning-8steps-V1.0-fp32.safetensors

Case 2 — Without Qwen-Image (provide your own edited image):

python3 inference.py \
    --src_mesh_path /path/to/source.glb \
    --output_dir ./output \
    --editing_mode add \
    --edit_instruction "" \
    --lora_path ""

Without --using_qwen_image, the script will prompt you to enter the path of your pre-edited image.

inference2.py — Input: Image

Takes a single image as input, first reconstructs a 3D mesh via TRELLIS, then performs editing.

Case 3 — With Qwen-Image (automatic image editing):

python3 inference2.py \
    --src_input_image_path /path/to/source.png \
    --output_dir ./output \
    --editing_mode add \
    --using_qwen_image \
    --edit_instruction "add a hat on the head." \
    --lora_path /path/to/Qwen-Image-Edit-2509-Lightning-8steps-V1.0-fp32.safetensors

Case 4 — Without Qwen-Image (provide your own edited image):

python3 inference2.py \
    --src_input_image_path /path/to/source.png \
    --output_dir ./output \
    --editing_mode add \
    --edit_instruction "" \
    --lora_path ""

Without --using_qwen_image, the script will prompt you to enter the path of your pre-edited image.

Arguments

Argument Description
--src_mesh_path Path to the source GLB mesh (inference.py only)
--src_input_image_path Path to the source image (inference2.py only)
--output_dir Directory to save all outputs
--editing_mode Editing type: add, remove, or replace
--using_qwen_image Flag. Add this argument to enable Qwen-Image for auto image editing; omit it to provide your own edited image
--edit_instruction Natural language instruction for Qwen-Image editing
--lora_path Path to the Qwen-Image-Lightning LoRA weights

The output edit_mesh.glb will be saved in --output_dir.

Method

Overall Framework of Nano3D. The original 3D object is voxelized and encoded into sparse structure and structured latent respectively. Stage 1 modifies geometry via Flow Transformer with FlowEdit, guided by Nano Banana–edited images. Stage 2 generates structured latents with Sparse Flow Transformer, supporting TRELLIS-inherent appearance editing. Voxel/Slat-Merge further ensures consistency across both stages before decoding the final 3D object.

Result

We present three edit types—object removal, addition, and replacement. In each case, Nano3D confines changes to the target region (red dashed circles) and produces view-consistent edits, while leaving the rest of the scene unchanged. Geometry stays sharp and textures remain faithful in unedited areas, with no noticeable artifacts.


BibTeX

@article{ye2025nano3d,
  title={NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks},
  author={Ye, Junliang and Xie, Shenghao and Zhao, Ruowen and Wang, Zhengyi and Yan, Hongyu and Zu, Wenqiang and Ma, Lei and Zhu, Jun},
  journal={arXiv preprint arXiv:2510.15019},
  year={2025}
}

About

[ICLR 2026] NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors