Skip to content

[Feature] Migrate to pgvector + graph-backed relationship model for AI-ready data layer #91

@PenguinzTech

Description

@PenguinzTech

User Story

As a backend developer or AI engineer, I want the Elder data layer to use pgvector for embedding storage and a graph-aware relationship model so that discovery data can be queried semantically, relationships traversed efficiently, and the backend easily consumed by AI agents via MCP.

Background

Currently Elder stores infrastructure relationships (entity→networking→services→dependencies) in flat relational tables with explicit FK joins. This works for tabular queries but is:

  • Inefficient for multi-hop relationship traversal (e.g., "which workloads share this VPC?")
  • Incompatible with semantic/vector search
  • Not AI-agent-friendly — no native MCP surface for LLM consumption

Proposed Changes

1. pgvector Extension

  • Enable pgvector in the PostgreSQL deployment (Helm + Kustomize)
  • Add embedding vector(1536) columns to key tables: entities, networking_resources, services, identities
  • Store embeddings generated from resource metadata (name, tags, type, provider, region)
  • Add ivfflat or hnsw index per embedding column for ANN search

2. Graph Relationship Layer

  • Add a relationships table: (id, src_id, src_type, dst_id, dst_type, rel_type, weight, metadata jsonb)
  • Populate from existing network_entity_mappings and dependencies tables during migration
  • Use recursive CTEs (or pgvector cosine distance + graph walk) for multi-hop traversal
  • Expose graph queries via a RelationshipService in apps/api/services/

3. Migration Path

  • Alembic migration to add vector columns + relationships table
  • Backfill script to seed relationships from existing junction tables
  • Keep existing tables intact (additive migration, no breaking changes)
  • New DB_ENABLE_VECTOR=true env var to gate pgvector usage (graceful fallback if extension absent)

4. AI Agent Readiness

  • RelationshipService.search_similar(embedding, k=10) — semantic nearest-neighbor lookup
  • RelationshipService.traverse(src_id, depth=2) — graph walk returning subgraph
  • Both methods return structured JSON suitable for MCP tool responses
  • Pairs with the dedicated MCP server (see companion issue)

Acceptance Criteria

  • pgvector extension enabled in postgres Helm chart and Kustomize overlay
  • relationships table created via Alembic migration
  • Embedding columns added to entities, networking_resources, services, identities
  • ivfflat/hnsw index created on each embedding column
  • RelationshipService implemented with search_similar() and traverse() methods
  • Backfill migration populates relationships from existing junction tables
  • DB_ENABLE_VECTOR env var controls vector feature activation
  • Unit tests for RelationshipService (≥90% coverage)
  • Integration test: embed → store → search_similar returns correct results
  • No regression on existing discovery endpoints
  • Linting passes (flake8, mypy --strict, black)
  • Security scan passes

Notes

  • pgvector version: 0.7.x (supports HNSW)
  • Embedding model: defer to caller (openai, local ollama, or WaddleAI) — service accepts pre-computed vectors
  • Companion issue: MCP server for relationship/info lookups
  • Reference: https://github.com/pgvector/pgvector

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions