DeepTech Autonomous Agent for Optical Physics & Energy Optimization Powered by Gemini 3 Pro & Google GenAI SDK
Spatial Engine is a multimodal AI agent designed to act as a Senior Optical Physicist. Unlike standard chatbots, it combines Generative AI's vision capabilities with a deterministic physics engine to audit rooms, calculate lighting deficits, and project energy ROI.
The agent does not "guess" math. It delegates calculations to a rigorous Python engine.
-
Illuminance Calculation: Uses the Inverse Square Law (
$E=I/d^2$ ) and Beam Angle geometry to calculate exact Lux levels at specific points. - Health Compliance (ISO/SanPiN): Automatically checks if lighting levels meet health standards for offices (500 Lux), living rooms, etc., and warns of safety deficits.
-
Unit Tested: All physics formulas are covered by
unittestto ensure 100% reliability.
The agent connects physics to the real economy.
- Live Market Search: Finds real-world products (prices, specs) and local electricity rates (USD/kWh) via Google Search.
- ROI & Energy Calculator: Computes financial savings (USD) and CO2 reduction when switching lighting technologies (e.g., Incandescent to LED).
- Search Verification: "Trust but Verify" logic. The agent reads product specs to ensure a lamp is truly "dimmable" or "smart" before recommending it.
- Fallback Resilience: Continues working offline using averaged market data if the internet connection fails.
The agent can "see" and audit a room from a single photograph using Gemini Vision.
- 3x3 Grid Analysis: Mentally divides the image into sectors to pinpoint features (e.g., "Window in Sector 3").
- Material Detection: Analyzes wall textures (Concrete vs. Paint) to estimate Albedo (reflection coefficients).
- Shadow Detection: Identifies under-lit zones requiring optimization.
- Scale Estimation: Uses Reference Object Inference (e.g., comparing room width to standard door frames) to estimate floor area without user input.
The agent possesses a "Short-term Memory" via the SpatialState class.
- Persistence: It remembers room geometry and light sources across multiple reasoning steps.
- Layering: Can combine visual data (from a photo) with technical data (from a PDF) into a single simulation model.
- PDF Analysis: Capable of reading datasheets and blueprints to extract technical specifications (Lumens, Watts, CRI).
- Simulation: Can "virtually install" a lamp found in a catalog into the scanned room to predict the final Lux level.
The agent acts as a certified engineer, not just a salesperson.
- Knowledge Base (RAG): Consults internal standards (Zigbee, Matter, Philips Hue) to ensure hardware compatibility.
- Config Generator: Automatically generates JSON configuration files for Home Assistant/HomeKit based on the designed lighting scenes.
- Tool Use: Autonomous Function Calling (The agent decides when to calculate, when to search, and when to read standards).
- Streaming CLI: Real-time "Thinking" logs showing Tool Calls and arguments in the terminal.
| Component | Technologies |
|---|---|
| Frontend | React 19, Vite, TailwindCSS, TypeScript |
| Backend | Python 3.12, FastAPI, Uvicorn |
| AI Core | Google GenAI SDK, Gemini 3.0 Pro, Gemini Live API |
| Infrastructure | Docker, Google Cloud Run, UV (Package Manager) |
- Gemini Live API: Real-time multimodal interaction.
- Live Persona: Customized voice and personality for the Live API.
- Cloud Run Ready: Fully configured for serverless deployment on Google Cloud.
For detailed documentation on system design and data flows, see ARCHITECTURE.md.
graph TD
User[User] -->|Interactions| FE["Frontend (React/Vite)"]
subgraph "Client Side"
FE -->|REST| GAI[Google Gemini API]
FE -->|WebSocket/RTP| Live[Gemini Multimodal Live API]
end
subgraph "Server Side (Python/FastAPI)"
FE -->|HTTP Requests| BE[Backend API]
BE -->|Calculations| PE[Physics Engine]
BE -->|Market/Search| MA[Market Agent]
BE -->|Generation| RG[Report Generator]
subgraph "Agent Core"
AC[Agent Runtime] -->|Tools| PE
AC -->|Tools| MA
AC -->|Tools| KB["Knowledge Base (RAG)"]
AC -->|State| SS[Spatial State]
end
BE -->|Invokes| AC
end
MA -->|Search| Web[Google Search]
RG -->|Outputs| PDF["PDF/HTML Reports"]
spatial-engine/
โโโ backend/ # FastAPI Backend
โ โโโ main.py # API Entry Points
โ โโโ report_generator.py # HTML Report Logic
โ โโโ pdf_generator.py # PDF Export Logic
โโโ frontend/ # React Frontend (Vite)
โ โโโ src/
โ โ โโโ components/ # UI Components (VisionAudit, EconomicEngine, etc.)
โ โ โโโ App.tsx # Main UI Layout
โโโ my_agent/ # The AI Core
โ โโโ agent.py # The "Brain"
โ โโโ market_agent.py # The "Hands"
โ โโโ physics_engine.py # The "Core"
โ โโโ spatial_state.py # The "Memory"
โโโ data/
โ โโโ smart_home_standards.md # RAG Knowledge Base
โโโ tests/ # Unit Tests
โโโ .env # Configuration
โโโ pyproject.toml # Python Dependencies
โโโ README.md # Documentation
-
Python 3.12+
-
uv(modern Python package manager) -
Google Gemini API Key
-
Clone & Sync:
git clone https://github.com/vero-code/spatial-engine.git cd spatial-engine uv sync -
Configure Environment:
Create a .env file:
# for backend GOOGLE_API_KEY=your_gemini_key_here # for frontend VITE_GEMINI_API_KEY=your_gemini_key_here -
Run the Agent:
# Start the Backend uv run uvicorn backend.main:app --reload # Start the Frontend (in a new terminal) npm run dev --prefix frontend -
Run Tests:
# Verify physics engine integrity uv run python -m unittest discover tests
Status: Fully Operational. 100% Test Coverage.
- Infrastructure: Environment setup (
uv), Project structure, Basic ADK integration. - Physics Engine: Deterministic calculations for Illuminance (
$E = I/d^2$ ) and Energy ROI. - Reliability: Pydantic typing for tools,
unittestsuite coverage, Chain of Thought logging. - Persona: Senior Optical Engineer system prompt configuration.
Status: Implemented. Agent "sees" geometry and materials, "reads", and "remembers".
- Multimodality: Binary File Handler for image uploads.
- Visual Analysis: 3x3 Grid decomposition, Shadow Detection, Material/Albedo identification.
- Spatial Reasoning: Scale estimation via Reference Object Inference (no user input needed).
- Advanced Features: PDF Parser for blueprints, Persistent Spatial State class.
Status: Implemented. Connecting Physics to Economics, Standards & Safety.
- Market Agent: Multi-threaded Google Search for products and electricity rates.
- Search Verification: Agent verifies technical specs (e.g.,
is_dimmable,protocol) before recommending to ensure compatibility. - Health Checks: ISO/SanPiN compliance tool (Pass/Fail verdicts for Lux levels).
- Smart Standards (RAG): Knowledge Base for Zigbee/Matter/Hue compatibility.
- Config Generator: JSON output for Home Assistant scenes (Focus/Relax/Movie).
- Robustness: Fallback Mode logic for offline operation.
Status: Fully Operational. Generative UI and Reporting live.
- Visualization: Heatmaps for Vision Audit and Physics Engine.
- Reporting: HTML and PDF report generation.
- Generative UI: Interactive React Frontend with Budget Slider and real-time updates.
Goal: Polish and Submission.
- Gemini Live API: Real-time active reasoning.
- Documentation: Architecture diagrams, Demo video script, Final submission text.
- Optimization: Latency reduction, Error handling, End-to-End testing.
- Hardware: Gemini 3 reasoning with Nano Banana Pro.
- Synthesized Video: Gemini Live API to synthesize live video for real-time recommendations.
- Voice Chat: Bi-directional voice recognition in the chat interface.
Built for the Gemini 3 Hackathon.



