Your mind, rendered in 3D. Live voice. Live camera. Live screen.
Turn everything you hear, see, and say into a living, walkable 3D world of memory.
We forget things. All the time. Not in some big philosophical way. In the most basic, embarrassing way. What did you eat yesterday? What was that person's name at the conference? What did your manager say two Fridays back?
The problem isn't capture. We have more capture tools than ever. The problem is retrieval. Our memories are flat, unsearchable, disconnected from each other.
Rayan fixes retrieval.
[Image: Full palace view - walkable 3D rooms with glowing artifact holograms, crystal orbs, and framed screenshots on the walls]
Rayan is a voice-first AI memory system that transforms your knowledge into an explorable 3D Memory Palace. Two persistent Gemini Live voice agents run simultaneously in the background:
- CaptureAgent - silently co-listens to your microphone and screen in real time, autonomously extracting key concepts and placing them as 3D artifacts inside themed rooms - no manual input required.
- RecallAgent - a persistent voice companion inside your 3D palace. Speak naturally, walk through your memories, and get instant, hallucination-free answers grounded exclusively in what you've actually captured.
The palace is not a metaphor. It's a fully rendered Three.js 3D environment you navigate in first-person - rooms, walls, glowing objects, and all.
[GIF: Side-by-side of Capture mode extracting a concept from a lecture β artifact appearing in 3D palace in real time]
Two-way audio streaming with Gemini Live API (gemini-live-2.5-flash-native-audio). The CaptureAgent listens passively alongside you - in meetings, lectures, podcasts, or conversations - and silently distills what matters.
Your memories are not a list. They are a fully navigable 3D environment built with Three.js and React Three Fiber. Walk room to room in first-person, approach artifacts, and let spatial memory do what it was built to do.
[Image: First-person view inside a room - glowing hologram artifacts, crystal orbs, framed image artifacts on the walls]
Captured knowledge is typed and visually rendered as one of 16 distinct 3D objects:
| Type | 3D Visual |
|---|---|
| Concept | Glowing hologram panel |
| Quote | Speech bubble |
| Formula | Crystal orb |
| Screenshot | Framed image on the wall |
| Person | Character model |
| Book | 3D book |
| Code | Terminal display |
| ...and 9 more | Unique 3D models per type |
Every Recall answer is grounded by text-embedding-005 - 768-dimensional cosine similarity search over your stored memories. The system prompt enforces citation. Rayan cannot invent information that isn't in your palace.
Both the Capture and Recall agents have access to Gemini's built-in google_search tool β no external API key required. When a user asks about something not yet in their palace, the agent queries the live web mid-session and injects verified facts directly into its spoken response. This is native Gemini grounding, not a wrapper around a third-party search API.
Both agents use enable_affective_dialog=True. Rayan modulates its vocal tone, pacing, and empathy in real time based on how you sound - matching your energy when you're excited, staying quiet when you're focused.
When the CaptureAgent sees a compelling slide or diagram on your screen, it autonomously calls take_screenshot, uploads to Cloud Storage, and places it as a framed visual artifact directly on your palace wall.
Ask Rayan to synthesize_room and gemini-2.5-flash-image generates a creative visual summary of all memories in that room β a styled, beautiful mind map image rendered live on the 3D wall. Each synthesis is unique to the room's theme: warm parchment for a Library, holographic panels for a Lab, painterly brushstrokes for a Gallery. It's not a diagram β it's a work of art that captures the shape of your knowledge.
[Image: A synthesized AI mind map rendered on the wall of a 3D palace room]
New captures are cosine-compared against everything saved this session. Near-duplicates (β₯ 0.90 similarity) are merged, not duplicated - your palace stays clean.
The RecallAgent handles mid-sentence interruptions gracefully via Gemini Live's built-in VAD and the interrupted server event. Natural conversation, not a rigid Q&A.
The Memory Architect (gemini-2.5-flash) automatically categorizes and clusters captured concepts into themed rooms - no manual organization needed. Your palace structures itself.
Screen data is analyzed in real time and only explicitly chosen screenshots are stored. No passive recording of your screen. You control what enters your palace.
The entire GCP stack - Cloud Run, Firestore, Cloud Storage, Vertex AI Vector Search, Firebase Hosting, service accounts, IAM - is provisioned by a single terraform apply.
- GCP project with billing enabled
- APIs enabled: Cloud Run, Firestore, Cloud Storage, Vertex AI, Firebase
- Tools:
gcloudCLI,terraform >= 1.9,node >= 18,python 3.11,firebase-tools - Authenticated:
gcloud auth application-default login
git clone <repo-url>
cd rayan
export PROJECT_ID=your-gcp-project-id
gcloud config set project $PROJECT_IDcd infrastructure/terraform
terraform init
terraform apply \
-var="project_id=$PROJECT_ID" \
-var="backend_image=gcr.io/$PROJECT_ID/rayan-backend:latest"This provisions Cloud Run, Firestore, Cloud Storage, Vertex AI Vector Search, service accounts, and IAM - everything.
cd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cat > .env << EOF
GOOGLE_CLOUD_PROJECT=$PROJECT_ID
MEDIA_BUCKET=rayan-media-$PROJECT_ID
EOF
# Local dev
uvicorn main:app --reload --port 8000
# Deploy to Cloud Run
gcloud builds submit --tag gcr.io/$PROJECT_ID/rayan-backend .
gcloud run deploy rayan-backend \
--image gcr.io/$PROJECT_ID/rayan-backend \
--region us-central1 \
--allow-unauthenticated \
--session-affinity \
--set-env-vars GOOGLE_CLOUD_PROJECT=$PROJECT_ID,MEDIA_BUCKET=rayan-media-$PROJECT_IDcd frontend
npm install
BACKEND_URL=$(gcloud run services describe rayan-backend \
--region us-central1 --format='value(status.url)')
cat > .env.local << EOF
VITE_WS_URL=wss://${BACKEND_URL#https://}/ws
VITE_API_URL=$BACKEND_URL
EOF
npm run build
firebase deploy --only hostingNavigate to your Firebase Hosting URL, start a Capture session, and speak - your 3D palace builds itself.
Rayan isn't a notes app. It's a persistent second brain you can walk through and talk to. Here are real ways it fits into your life:
- Lecture companion - run Capture in the background during any lecture or online course; your palace auto-fills with typed, searchable concepts as you listen
- Textbook synthesis - read aloud or screen-share; Rayan extracts and clusters key ideas into rooms by topic
- Exam prep - walk your palace before a test and ask Recall to quiz you on any room
- Research rabbit holes - capture browser tabs, articles, and papers across a session; Recall surfaces connections between them
- Language learning - capture vocabulary in context; Rayan stores definitions and example sentences as typed artifacts
- Spaced repetition - revisit your palace daily; spatial + voice recall is more durable than flashcards
- Meeting memory - run Capture during any meeting; action items, decisions, and names are auto-extracted
- Client onboarding - capture everything discussed in the first week; Recall knows your client's context as well as you do
- Engineering context - capture technical decisions and architecture discussions; Recall answers "why did we do it this way?" months later
- One-on-ones - build a room per direct report; Recall surfaces what you discussed last time before every meeting
- Competitive research - capture competitor analysis sessions; your palace clusters insights by company automatically
- Legal / compliance - capture meeting notes and decisions with a traceable, grounded memory chain
- Writing research - capture sources, quotes, and ideas; Recall helps you cite and cross-reference while you write
- Worldbuilding - capture lore, character decisions, and plot threads; Recall keeps your fictional world consistent
- Brainstorming - capture every idea in a session; Recall finds patterns and connections across your messy ideation
- Podcast prep - capture notes and research across multiple sessions; walk your palace before recording
- Travel planning - capture recommendations, itineraries, and research; Recall answers "what was that restaurant someone mentioned?"
- Medical - capture doctor conversations and health research; Recall gives you grounded answers from your own notes
- Relationships - capture birthdays, preferences, and conversations; Recall makes sure you remember what matters to people
- Learning a skill - capture instructional content across weeks; your palace builds a structured curriculum automatically
- Podcast and book ideas - capture everything you consume; Recall surfaces relevant memories when you need them
- Mid-conversation memory saves - Recall can save new memories during a voice conversation without leaving the palace
- Cross-session context - everything persists; your palace from six months ago is fully searchable today
- Room synthesis - generate an AI mind map of any room on demand to visualize the shape of your knowledge
- Real-time palace updates - as Capture runs, new 3D artifacts appear in your palace live, no refresh needed
[Image: Composite showing Capture mode (left) and Recall voice session inside the 3D palace (right)]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User's Browser β
β React + Three.js (Firebase Hosting) β
β β
β ββββββββββββββββββ WebSocket /ws/{userId} ββββββββββββ β
β β Capture UI βββββββββββββββββββββββββββββΊβ β β
β β 3D Palace β βCloud Run β β
β β Voice UI βββββββββββββββββββββββββββββΊβ FastAPI β β
β ββββββββββββββββββ β β β
ββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββ β
β
βββββββββββββββββββββββββΌββββββββββββββ
β Cloud Run - FastAPIβ β
β βΌ β
β βββββββββββββββββββββββββββββββ β
β β CaptureAgent β β
β β gemini-live-2.5-flash- β β
β β native-audio β β
β β enable_affective_dialog=Trueβ β
β ββββββββββββββββ¬βββββββββββββββ β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β β RecallAgent β β
β β gemini-live-2.5-flash- β β
β β native-audio β β
β β enable_affective_dialog=Trueβ β
β ββββββββββββββββ¬βββββββββββββββ β
β β β
β ββββββββββββββββΌββββββββββββ-βββ β
β β Memory Architect β β
β β gemini-2.5-flash β β
β β categorize + cluster rooms β β
β ββββββββββββββββ¬βββββββββββββ-ββ β
βββββββββββββββββββΌββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββΌββββββββββββββββββββ
β Google Cloud β β
β ββββββββββββββββΌββββββββββββ-ββ β
β β Firestore β β
β β users/{id}/rooms/ β β
β β users/{id}/rooms/artifacts β β
β β (embeddings stored inline) β β
β ββββββββββββββββ¬βββββββββββββββ β
β β β
β βββββββββββββββββββββββββββΌββββββββββ-ββββ β
β β Vertex AI text-embedding-005 β β
β β Semantic search grounding β β
β β 768-dim cosine similarity β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Cloud Storage β β
β β Screenshots / AI mind map images β β
β ββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
graph TD
Browser["Browser\nReact + Three.js\nFirebase Hosting"]
subgraph CloudRun["Cloud Run - FastAPI Backend"]
WS["WebSocket Handler\n/ws/{userId}"]
CA["CaptureAgent\ngemini-live-2.5-flash-native-audio\nenable_affective_dialog=true"]
RA["RecallAgent\ngemini-live-2.5-flash-native-audio\nenable_affective_dialog=true"]
MA["Memory Architect\ngemini-2.5-flash\ncategorize + cluster rooms"]
SS["Semantic Search\ntext-embedding-005\n768-dim cosine grounding"]
end
subgraph GCP["Google Cloud"]
FS["Firestore\nrooms / artifacts / embeddings"]
GCS["Cloud Storage\nscreenshots / mind maps"]
VAI["Vertex AI\ntext-embedding-005"]
end
Browser <-->|"WebSocket\naudio chunks, video frames\npalace_update events"| WS
WS --> CA
WS --> RA
CA --> MA
RA --> SS
SS --> VAI
VAI --> SS
MA --> FS
MA --> GCS
FS --> SS
CA --> GCS
Every Recall session is semantically grounded before Rayan speaks a single word:
- On session start,
_retrieve_context()embeds your current artifact summary viatext-embedding-005 - It runs cosine similarity search across every stored artifact embedding in Firestore
- The top-8 most semantically relevant memories are injected into the live system prompt under
MEMORIES: - The system prompt enforces: "ONLY use information from the provided MEMORIES section. NEVER hallucinate or invent information. Cite which artifact/room the information comes from."
- On every room navigation and artifact highlight,
update_context()re-runs the search and injects fresh memories mid-conversation viasend_client_content- no reconnection needed
Both CaptureAgent and RecallAgent use enable_affective_dialog=True:
config = genai_types.LiveConnectConfig(
response_modalities=["AUDIO"],
enable_affective_dialog=True, # Rayan adapts to your emotional tone
system_instruction=system_prompt,
...
)Gemini naturally adjusts its vocal tone, pacing, and empathy based on your emotional cues. When you sound excited, Rayan matches that energy. When you sound tired, it stays quieter. This makes Rayan feel like a genuine presence, not a tool.
| Layer | Technology |
|---|---|
| Frontend | TypeScript 5, React 18, Three.js, @react-three/fiber |
| 3D Engine | Three.js - first-person navigation, 16 artifact types, procedural rooms |
| Backend | Python 3.11, FastAPI, WebSockets |
| AI - Live Agents | Gemini Live API (gemini-live-2.5-flash-native-audio) |
| AI - Categorization | gemini-2.5-flash |
| AI - Creative Synthesis | gemini-2.5-flash-image β generates styled mind map images to visually summarize room memories |
| AI - Semantic Grounding | Vertex AI text-embedding-005 (768-dim cosine similarity) |
| SDK | Google GenAI SDK (google-genai), Google ADK (google-adk) |
| Database | Cloud Firestore |
| Storage | Cloud Storage (screenshots + mind maps) |
| Hosting | Cloud Run (backend, session affinity), Firebase Hosting (frontend) |
| Infrastructure | Terraform (infrastructure/terraform/) - single apply |
rayan/
βββ backend/
β βββ app/
β β βββ agents/
β β β βββ capture_agent.py # Gemini Live capture session
β β β βββ recall_agent.py # Gemini Live recall session
β β β βββ memory_architect.py # Categorization + room clustering
β β β βββ tools/tools.py # Tool declarations for both agents
β β βββ services/
β β β βββ search_service.py # Semantic search (Vertex AI embeddings)
β β β βββ embedding_service.py # text-embedding-005 via Vertex AI
β β β βββ synthesis_service.py # AI mind map generation
β β β βββ room_service.py # Room CRUD
β β β βββ artifact_service.py # Artifact CRUD
β β βββ websocket/
β β β βββ handlers.py # WebSocket event router
β β βββ core/
β β βββ gemini.py # GenAI client (Vertex AI backend)
β βββ requirements.txt
β βββ main.py
βββ frontend/
β βββ src/
β βββ components/ # React components (Palace, Capture, Voice)
β βββ hooks/ # useCapture, useWS, useAmbientMusic
β βββ pages/PalacePage.tsx
βββ infrastructure/
β βββ terraform/
β βββ main.tf # Full GCP infrastructure as code
βββ README.md
| Variable | Description |
|---|---|
GOOGLE_CLOUD_PROJECT |
GCP project ID |
MEDIA_BUCKET |
Cloud Storage bucket for screenshots and mind maps |
GOOGLE_APPLICATION_CREDENTIALS |
Path to service account JSON (local dev only; Cloud Run uses attached SA) |
| (no key needed) | Web search uses Gemini's built-in google_search grounding tool via Vertex AI β no Custom Search API required |
# Backend
cd backend && source venv/bin/activate
uvicorn main:app --reload --port 8000
ruff check . && ruff format .
# Frontend
cd frontend
npm run dev
npm run lint
npm test| Field | Value |
|---|---|
| Challenge | Gemini Live Agent Challenge - #GeminiLiveAgentChallenge |
| Category | Live Agents |
| Mandatory tech | Gemini Live API (gemini-live-2.5-flash-native-audio), Google GenAI SDK, Google ADK, Cloud Run |
| Image synthesis | gemini-2.5-flash-image β creative visual summaries of room memories rendered as 3D palace wall art |
| Google Cloud services | Cloud Run, Firestore, Cloud Storage, Vertex AI (embeddings + Vector Search), Firebase Hosting |
| Infrastructure | Terraform - fully automated, single terraform apply |
| Developer | g.dev/yelnady |
Built with the Gemini Live API - Google GenAI SDK - Google ADK - Cloud Run - Vertex AI - Terraform
A 3D memory palace that listens, remembers, and speaks back.