-
-
Notifications
You must be signed in to change notification settings - Fork 1
[Feature] Migrate to pgvector + graph-backed relationship model for AI-ready data layer #91
Copy link
Copy link
Open
Labels
component:apiAPIAPIcomponent:databaseDatabaseDatabasepriority:highHigh priorityHigh prioritystatus:readyReady to work onReady to work ontype:featureNew feature or functionalityNew feature or functionality
Milestone
Description
User Story
As a backend developer or AI engineer, I want the Elder data layer to use pgvector for embedding storage and a graph-aware relationship model so that discovery data can be queried semantically, relationships traversed efficiently, and the backend easily consumed by AI agents via MCP.
Background
Currently Elder stores infrastructure relationships (entity→networking→services→dependencies) in flat relational tables with explicit FK joins. This works for tabular queries but is:
- Inefficient for multi-hop relationship traversal (e.g., "which workloads share this VPC?")
- Incompatible with semantic/vector search
- Not AI-agent-friendly — no native MCP surface for LLM consumption
Proposed Changes
1. pgvector Extension
- Enable
pgvectorin the PostgreSQL deployment (Helm + Kustomize) - Add
embedding vector(1536)columns to key tables:entities,networking_resources,services,identities - Store embeddings generated from resource metadata (name, tags, type, provider, region)
- Add
ivfflatorhnswindex per embedding column for ANN search
2. Graph Relationship Layer
- Add a
relationshipstable:(id, src_id, src_type, dst_id, dst_type, rel_type, weight, metadata jsonb) - Populate from existing
network_entity_mappingsanddependenciestables during migration - Use recursive CTEs (or pgvector cosine distance + graph walk) for multi-hop traversal
- Expose graph queries via a
RelationshipServiceinapps/api/services/
3. Migration Path
- Alembic migration to add vector columns +
relationshipstable - Backfill script to seed relationships from existing junction tables
- Keep existing tables intact (additive migration, no breaking changes)
- New
DB_ENABLE_VECTOR=trueenv var to gate pgvector usage (graceful fallback if extension absent)
4. AI Agent Readiness
RelationshipService.search_similar(embedding, k=10)— semantic nearest-neighbor lookupRelationshipService.traverse(src_id, depth=2)— graph walk returning subgraph- Both methods return structured JSON suitable for MCP tool responses
- Pairs with the dedicated MCP server (see companion issue)
Acceptance Criteria
- pgvector extension enabled in postgres Helm chart and Kustomize overlay
-
relationshipstable created via Alembic migration - Embedding columns added to
entities,networking_resources,services,identities -
ivfflat/hnswindex created on each embedding column -
RelationshipServiceimplemented withsearch_similar()andtraverse()methods - Backfill migration populates
relationshipsfrom existing junction tables -
DB_ENABLE_VECTORenv var controls vector feature activation - Unit tests for RelationshipService (≥90% coverage)
- Integration test: embed → store → search_similar returns correct results
- No regression on existing discovery endpoints
- Linting passes (flake8, mypy --strict, black)
- Security scan passes
Notes
- pgvector version: 0.7.x (supports HNSW)
- Embedding model: defer to caller (openai, local ollama, or WaddleAI) — service accepts pre-computed vectors
- Companion issue: MCP server for relationship/info lookups
- Reference: https://github.com/pgvector/pgvector
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component:apiAPIAPIcomponent:databaseDatabaseDatabasepriority:highHigh priorityHigh prioritystatus:readyReady to work onReady to work ontype:featureNew feature or functionalityNew feature or functionality