CapyNodes follows a hybrid architecture combining a robust Django backend with real-time capabilities via Django Channels and a managed database layer using Supabase (PostgreSQL).
- Framework: Django (Python 3.x)
- Real-time: Django Channels + Redis (Channel Layer)
- Database: PostgreSQL (via Supabase)
- Task Processing: Synchronous application code with planned background workers for long-running AI evaluations.
The system utilizes PostgreSQL JSONB fields extensively to handle the dynamic nature of React Flow diagrams and LLM responses.
| Model | Primary Purpose | Key Fields |
|---|---|---|
Question |
Stores system design problems. | constraints (JSON), ideal_solution (JSON) |
Submission |
User diagram submissions. | diagram (JSONB), evaluation_result (JSONB), score |
CollaborationSession |
Manages multi-user sync. | id (UUID/String), diagram_state (JSONB) |
LLMCallLog |
Observability for AI calls. | prompt_tokens, completion_tokens, latency_ms |
The collaboration system enables multiple users to edit the same diagram simultaneously with low-latency updates and cursor tracking.
Implemented via CollaborationConsumer in consumers.py.
- Connection: Auth-guarded WebSockets. Users join a room identified by
collab_{session_id}. - State Synchronization (
state_update):- When a user modifies the diagram, the frontend sends the updated state.
- The backend broadcasts this state to all other participants in the room.
- The state is persisted to the database asynchronously via
database_sync_to_async.
- Cursor Tracking (
cursor_update):- Rapidly broadcasts
(x, y)coordinates to all participants. - These updates are not persisted to the database to minimize overhead and latency.
- Rapidly broadcasts
To prevent feedback loops and ensure responsiveness:
- Immediate Broadcast: The backend broadcasts updates before (or while) persisting to the DB.
- Sender Filtering: Consumers ignore their own broadcast messages based on
channel_namefiltering.
The "Judge" system is a multi-stage pipeline designed to provide high-accuracy, consistent scoring of system design diagrams.
The engine uses a three-stage approach to balance speed and accuracy:
- Stage 1: Rule-Based Validation (rule_engine.py)
- Detects structural issues (orphaned nodes, missing connections).
- Checks for anti-patterns (e.g., single point of failure).
- Fast, deterministic scoring (approx. 30% weight).
- Stage 2: LLM Chain-of-Thought (CoT) (llm_evaluator.py)
- Primary Model: Qwen 32B (via Groq) for low-latency reasoning.
- Process: The model first summarizes the architecture, identifies strengths/weaknesses, and then assigns scores.
- Weight: Approx. 70% weight.
- Stage 3: Score Aggregation (score_aggregator.py)
- Normalizes scores across dimensions (Scalability, Performance, Security, etc.).
- Applies difficulty multipliers based on the question level.
To ensure evaluations are stable:
- Graph Normalization: UI-only fields (positions, styles) are stripped. Keys are sorted deterministically.
- Hashing: A SHA256 hash is generated from the normalized JSON.
- Caching: Future submissions with the same hash can bypass re-evaluation if the prompt/model version hasn't changed.
A dedicated observability service monitors the health and accuracy of the AI judge.
Provides real-time visibility into:
- Quality Metrics: Trends in average fidelity and logical correctness.
- Operational Metrics: P50/P95 latency of Groq/Qwen calls and daily token costs.
- Error Tracking: Visibility into LLM API failures or decoding errors.
The system maintains a "Golden Set" of curated diagrams with known ideal scores.
- Regression Testing: After any prompt change, the regression suite compares new scores against the baseline.
- Hallucination Monitoring: Judges like Gemini 3 Pro/Flash are used offline to evaluate the primary judge (Qwen) for hallucinations or scoring variance.