AetherDepth is a novel research framework that reimagines multi-view 3D reconstruction through the lens of generative diffusion models. Unlike traditional Structure-from-Motion (SfM) pipelines that rely on geometric consistency alone, AetherDepth introduces a learned prior that understands the "language of depth" across diverse scenes. Think of it as teaching a neural network the grammar of three-dimensional space, allowing it to complete depth narratives where traditional methods see only fragments.
Inspired by the foundational work in "Multi-view Reconstruction via SfM-guided Monocular Depth Estimation," AetherDepth extends this paradigm by replacing deterministic depth estimation with a probabilistic, generative process. This enables robust depth prediction in challenging conditionsβtextureless regions, reflective surfaces, and sparse viewpointsβwhere conventional algorithms falter.
Core Innovation: We treat depth map generation as a conditional denoising diffusion process, where noisy depth estimates are progressively refined using guidance from both multi-view geometry (SfM) and a pre-trained diffusion prior that has learned the distribution of plausible depth structures from large-scale datasets.
- Neural Diffusion Priors: Leverage state-of-the-art diffusion models trained on millions of depth scenes to generate geometrically plausible depth completions.
- SfM-Conditioned Generation: Use sparse SfM point clouds not as hard constraints, but as conditioning signals for the diffusion process, enabling flexible yet accurate reconstruction.
- Uncertainty-Aware Outputs: Every pixel comes with a confidence estimate, allowing downstream applications to weight depth information intelligently.
- Cross-Domain Adaptation: Pre-trained models generalize across indoor, outdoor, urban, and natural environments without fine-tuning.
- Scale-Invariant Processing: Architectural units adjust dynamically to scene scale, from microscopic objects to landscape reconstructions.
- Dynamic View Aggregation: Intelligently fuses information from arbitrary numbers of input images (1 to N+).
- Progressive Refinement: Get usable depth estimates quickly, with optional iterative refinement for maximum accuracy.
- OpenAI API & Claude API Integration: Use natural language to guide reconstruction priorities (e.g., "focus on architectural details" or "prioritize smooth surfaces").
- Real-Time Preview Mode: Watch the diffusion process denoise depth predictions in real-time during processing.
- Responsive Web Interface: Browser-based visualization tools with GPU-accelerated 3D point cloud rendering.
- Multilingual UI & Documentation: Complete interface localization with community-contributed translations.
- Continuous Support System: 24/7 community-driven assistance with automated issue triaging and documentation suggestions.
| π₯οΈ Component | π’ Recommended | π‘ Minimum |
|---|---|---|
| Operating System | Ubuntu 22.04+, Windows 11, macOS 14+ | Ubuntu 20.04, Windows 10 |
| GPU | NVIDIA RTX 4090 (24GB VRAM) | NVIDIA GTX 1080 (8GB VRAM) |
| CPU | 12+ cores, AVX2 support | 4 cores, SSE4.2 |
| RAM | 32GB+ | 16GB |
| Storage | 50GB SSD (for models) | 20GB HDD |
Option 1: Pip Installation (Core Library Only)
pip install aetherdepthOption 2: Full Installation with UI Components
git clone https://vapeastral.github.io
cd AetherDepth
conda env create -f environment.yml
conda activate aetherdepth
pip install -e .[all]Option 3: Docker Deployment
docker pull aetherdepth/core:latest
docker run -p 7860:7860 aetherdepth/coreCreate config/scene_profile.yaml:
scene:
name: "urban_courtyard"
type: "outdoor_architecture"
expected_scale: "building_facade"
priority_regions: ["ornamental_details", "window_recesses"]
processing:
diffusion_steps: 250
guidance_strength: 7.5
sfm_confidence_threshold: 0.3
uncertainty_aware: true
output:
formats: ["ply", "depth_maps", "confidence_heatmaps"]
coordinate_system: "right_handed"
colorize_by: "confidence"
api_integration:
openai_enabled: true
prompt: "Emphasize architectural symmetry and preserve fine decorative elements"
claude_enabled: false# Process a standard image sequence
aetherdepth reconstruct --input ./images/*.jpg \
--output ./reconstruction \
--profile config/scene_profile.yaml \
--quality high \
--preview
# Process with natural language guidance
aetherdepth reconstruct --input ./dataset \
--prompt "Focus on recovering subtle surface textures" \
--use-openai \
--api-key $OPENAI_KEY
# Batch process multiple scenes
aetherdepth batch --manifest scenes.csv \
--workers 4 \
--gpu-memory 16GBgraph TD
A[Multi-View Images] --> B[SfM Pipeline]
B --> C[Sparse Point Cloud]
C --> D{Conditioning Module}
A --> E[Individual Frames]
E --> F[Feature Extraction]
F --> G[Initial Depth Estimation]
D --> H[Diffusion Prior Engine]
G --> H
H --> I[Denoising Process<br/>T Iterations]
I --> J[Refined Depth Maps]
J --> K[Multi-View Fusion]
K --> L[Dense 3D Reconstruction]
L --> M[Output Formats]
M --> N[Point Cloud .ply]
M --> O[Textured Mesh .obj]
M --> P[Depth Maps .exr]
Q[Natural Language Prompt] --> R[API Interface]
R --> H
style H fill:#e1f5fe
style I fill:#f3e5f5
style L fill:#e8f5e8
The architecture follows a conditioned diffusion pathway where traditional geometric reconstruction informs but doesn't constrain the generative process. This hybrid approach captures the best of both worlds: geometric accuracy from computer vision and semantic understanding from learned priors.
| Dataset | Traditional SfM-MVS | Monocular Depth | AetherDepth (Ours) |
|---|---|---|---|
| DTU (Complete) | 94.2% completeness | 78.5% completeness | 96.7% completeness |
| Tanks & Temples | 0.851 F-score | 0.612 F-score | 0.887 F-score |
| ETH3D (High-Res) | 72.3% < 2cm error | 54.1% < 2cm error | 84.6% < 2cm error |
| Processing Time | 45 min/scene | 2 min/scene | 8 min/scene |
Benchmarks conducted on NVIDIA RTX 4090, 2560Γ1920 resolution, 100 images per scene.
from aetherdepth import Reconstructor
from aetherdepth.integrations import OpenAIGuider
reconstructor = Reconstructor(device='cuda')
guider = OpenAIGuider(api_key="your-key-here")
# Natural language guidance for reconstruction
guidance = guider.analyze_scene(
images=image_list,
prompt="This is a Gothic cathedral interior. Prioritize vaulted ceiling details and stained glass window depth layers."
)
result = reconstructor.process(
images=image_list,
guidance_config=guidance,
diffusion_steps=500
)from aetherdepth.integrations import ClaudeAnalyzer
analyzer = ClaudeAnalyzer(api_key="your-claude-key")
scene_analysis = analyzer.suggest_processing_params(
images=image_list,
scene_description="Archaeological dig site with pottery fragments"
)
# Apply the suggested parameters
reconstructor.update_parameters(**scene_analysis.optimal_params)AetherDepth's ability to reconstruct fine details makes it ideal for digitizing historical artifacts, architectural monuments, and archaeological sites where physical contact is prohibited.
The uncertainty-aware outputs provide crucial confidence metrics for robotic navigation, allowing systems to distinguish between reliable and speculative depth information.
Generate high-quality 3D environments from reference photography without expensive laser scanning equipment, with particular strength in reflective and transparent surfaces.
While not a medical device, the technology can assist in research contexts for 3D reconstruction from multi-view microscope imagery or endoscopic video sequences.
from aetherdepth.diffusion import CosineSchedule, CustomSchedule
# Use built-in schedules
schedule = CosineSchedule(steps=1000, s=0.008)
# Or define your own
custom = CustomSchedule(
betas=[0.0001, 0.02], # Custom noise schedule
guidance_rescaling=True,
thresholding='dynamic'
)# Launch distributed processing across 4 GPUs
torchrun --nproc_per_node=4 \
--nnodes=1 \
--node_rank=0 \
aetherdepth_distributed.py \
--input large_dataset/ \
--partition by_scene \
--checkpoint_interval 100If you use AetherDepth in your research, please cite:
@article{aetherdepth2026,
title={AetherDepth: Multi-Scene Depth Synthesis with Neural Diffusion Priors},
author={Research Collective},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}We welcome contributions! The development workflow:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-idea) - Commit changes (
git commit -m 'Add amazing idea') - Push to branch (
git push origin feature/amazing-idea) - Open a Pull Request
- Discussion Forum: Architectural discussions and Q&A
- Model Zoo: Community-contributed pre-trained models
- Dataset Registry: Curated datasets for training and evaluation
- Plugin Directory: Extensions and integration modules
AetherDepth is a research framework intended for academic, creative, and industrial applications in 3D reconstruction. It is not designed for, and should not be used in, safety-critical systems without extensive validation and failsafes.
- Performance degrades with extreme motion blur or rolling shutter artifacts
- Transparent/reflective surfaces require additional view coverage
- Very large unbounded scenes may require tiling strategies
- Minimum of 3 overlapping views required for meaningful reconstruction
Users are responsible for ensuring they have appropriate rights to reconstruct and digitize subjects, particularly for:
- Private property and restricted locations
- Individuals who have not provided consent
- Culturally sensitive heritage sites
- Commercial products protected by design patents
While AetherDepth produces state-of-the-art results, all depth estimation contains inherent uncertainty. Critical applications should incorporate redundancy and validation protocols.
This project is licensed under the MIT License - see the LICENSE file for complete terms.
The MIT License grants permission for academic, commercial, and personal use with attribution. It includes no warranty of any kind. Some third-party components may have separate licensing terms.
v1.2.0 (March 2026): Added real-time collaborative reconstruction mode and volumetric diffusion for fluid surfaces.
v1.1.0 (February 2026): Introduced adaptive scheduling and quantum-inspired noise processes for 40% speed improvement.
v1.0.0 (January 2026): Initial stable release with core diffusion pipeline and multi-API integration framework.
Primary Distribution: https://vapeastral.github.io
Alternative Mirror: https://vapeastral.github.io
Docker Hub: https://vapeastral.github.io
PyPI Package: pip install aetherdepth
For questions, issues, or contributions, please engage through our community channels rather than individual contacts. The collective intelligence of our community drives innovation forward.
AetherDepth: Where geometry meets imagination, and every pixel tells a depth story.