You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue tracks planned improvements and next steps for ProxySQL's GenAI, MCP (Model Context Protocol), and RAG (Retrieval-Augmented Generation) subsystems. The current architecture provides a solid foundation — this roadmap outlines actionable work to bring each component to full production readiness.
Current Architecture Summary
The GenAI/MCP/RAG system is built around four main components:
MCP server settings (port, SSL, per-endpoint auth)
Hybrid routing preferences (local vs cloud, budget limits)
Roadmap Items
1. Complete LLM Semantic Cache
Priority: High | Effort: Medium
The LLM_Bridge has a semantic cache design where similar queries can return cached LLM responses instead of making redundant API calls. The infrastructure exists (vector DB via sqlite-vec, similarity threshold configuration, cache statistics counters) but the core operations need implementation:
check_cache() — vector similarity lookup against previous queries
store_in_cache() — store query embedding + response after successful LLM call
clear_cache() — cache eviction (by age, size, or manual)
get_cache_stats() — expose hit rate, size, avg lookup time
The genai-llm_cache_enabled and genai-llm_cache_similarity_threshold variables are already defined and wired through the admin interface.
Acceptance criteria:
Cache reduces duplicate LLM calls by >80% for repeated/similar queries
Cache hit/miss metrics visible in stats_genai_* tables
Cache can be cleared at runtime via admin command
2. Configurable Embedding Dimensions
Priority: High | Effort: Low
The vector dimension is currently set to 1536 (matching OpenAI text-embedding-3-small). Since ProxySQL supports multiple providers including Ollama and local models, the dimension should adapt to the model in use:
Ollama/llama models: typically 4096
OpenAI text-embedding-3-small: 1536
OpenAI text-embedding-3-large: 3072
Cohere: 1024
Various others
Proposed approach:
Auto-detect dimension from the first embedding response
Or allow explicit override via genai-vector_dimension (variable already exists, just needs validation)
Ensure sqlite-vec schema adapts when dimension changes
The Anomaly_Detector currently has a solid multi-stage pipeline:
SQL injection pattern detection (regex)
Suspicious keyword detection
Query normalization and fingerprinting
Per-user rate limiting
Statistical outlier detection
The planned embedding-based threat similarity stage (comparing query embeddings against a known-threats vector database) would add ML-based detection for novel attack patterns that bypass regex rules.
Implementation steps:
Build a threat embeddings database from known SQL injection payloads
On each query, compute embedding and check cosine similarity against threat DB
Integrate with the existing genai-anomaly_similarity_threshold variable
Track false positive rate to tune thresholds
4. Circuit Breaker for External Services
Priority: Medium | Effort: Low
When external services (embedding endpoint, LLM provider, rerank service) are unavailable, every request currently waits for the full timeout before failing. Under load, this causes cascading latency.
Proposed approach:
Track consecutive failures per service endpoint
After N failures (configurable), enter "open" state — fast-fail for a cooldown period
After cooldown, allow one probe request ("half-open" state)
If probe succeeds, return to normal; if fails, extend cooldown
Expose circuit breaker state in stats tables
5. Observability Enhancements
Priority: Medium | Effort: Low
Add a stats_genai_summary view (or populate existing counters) showing:
Metric
Source
LLM cache hit rate
llm_cache_hits / llm_total_requests
Avg LLM response time
llm_total_response_time_ms / llm_total_requests
Local vs cloud model distribution
llm_local_model_calls vs llm_cloud_model_calls
Daily cloud spend
daily_cloud_spend_usd
Anomaly block rate
anomaly_blocked_queries / anomaly_total_checks
RAG search latency (p50/p99)
From ToolUsageStats
Embedding service health
Circuit breaker state
The AI_Features_Manager already has atomic counters for most of these — they just need to be exposed via admin tables and/or Prometheus metrics.
6. Schema-Aware NL2SQL
Priority: Medium | Effort: High
The ai.nl2sql_convert MCP tool converts natural language to SQL via the LLM_Bridge. Its accuracy depends heavily on schema context. ProxySQL is uniquely positioned to provide this because it already knows:
All backend servers (mysql_servers, pgsql_servers)
Schema information via monitor (if enabled)
Query patterns via stats_mysql_query_digest
Table relationships via observed JOINs
Proposed enhancement:
Pull live schema (tables, columns, types) from monitored backends
Include top query patterns as examples in the LLM prompt
Cache schema snapshots to avoid repeated introspection
Support multi-database context (user specifies which hostgroup/schema)
7. RAG Timeout Tuning
Priority: Low | Effort: Low
The default genai-rag_timeout_ms=2000 (2 seconds) may be tight for hybrid searches that involve FTS + vector similarity + Reciprocal Rank Fusion merging, especially with large document collections.
Actions:
Benchmark RAG search latency at various collection sizes (1K, 10K, 100K documents)
Adjust default timeout based on findings
Consider adding per-operation timeouts (FTS timeout vs vector timeout) rather than a single global timeout
Document recommended timeout values for different deployment sizes
8. Multi-Model Routing Enhancements
Priority: Low | Effort: Medium
The LLM_Bridge currently selects between local and cloud models based on a simple latency heuristic (prefer local if <500ms). The genai-daily_budget_usd and genai-max_cloud_requests_per_hour variables exist for cost control.
RAG search quality tests (precision/recall at various k values)
Multi-provider failover tests (what happens when OpenAI is down?)
Load tests for MCP server under concurrent connections
Tests that verify GenAI features don't break standard proxy behavior (regression)
Some tests currently require external API keys or a GenAI-enabled build. The CI infrastructure should support both modes: with mock endpoints for unit testing, and with real endpoints for integration testing.
10. Local Embedding Support
Priority: Low | Effort: High
Currently, all embedding operations require an external HTTP service (genai-embedding_uri). The sqlite-rembed extension is already linked into the build — it could potentially provide local embedding computation without an external service dependency.
Investigation needed:
Can sqlite-rembed generate embeddings with acceptable quality for the RAG use case?
What models does it support and what are the dimension outputs?
What's the performance profile (latency, memory) compared to an external service?
Could this be an opt-in fallback when no external embedding service is configured?
Overview
This issue tracks planned improvements and next steps for ProxySQL's GenAI, MCP (Model Context Protocol), and RAG (Retrieval-Augmented Generation) subsystems. The current architecture provides a solid foundation — this roadmap outlines actionable work to bring each component to full production readiness.
Current Architecture Summary
The GenAI/MCP/RAG system is built around four main components:
GloGATHAI_Features_ManagerGloMCPHAI_Features_Manager/mcp/ragMCP Endpoint Map
Configuration Variables
The system exposes ~48
genai-*variables and ~15mcp-*variables via the admin interface, covering:Roadmap Items
1. Complete LLM Semantic Cache
Priority: High | Effort: Medium
The LLM_Bridge has a semantic cache design where similar queries can return cached LLM responses instead of making redundant API calls. The infrastructure exists (vector DB via sqlite-vec, similarity threshold configuration, cache statistics counters) but the core operations need implementation:
check_cache()— vector similarity lookup against previous queriesstore_in_cache()— store query embedding + response after successful LLM callclear_cache()— cache eviction (by age, size, or manual)get_cache_stats()— expose hit rate, size, avg lookup timeThe
genai-llm_cache_enabledandgenai-llm_cache_similarity_thresholdvariables are already defined and wired through the admin interface.Acceptance criteria:
stats_genai_*tables2. Configurable Embedding Dimensions
Priority: High | Effort: Low
The vector dimension is currently set to 1536 (matching OpenAI
text-embedding-3-small). Since ProxySQL supports multiple providers including Ollama and local models, the dimension should adapt to the model in use:Proposed approach:
genai-vector_dimension(variable already exists, just needs validation)3. Anomaly Detection — Embedding-Based Threat Matching
Priority: Medium | Effort: Medium
The Anomaly_Detector currently has a solid multi-stage pipeline:
The planned embedding-based threat similarity stage (comparing query embeddings against a known-threats vector database) would add ML-based detection for novel attack patterns that bypass regex rules.
Implementation steps:
genai-anomaly_similarity_thresholdvariable4. Circuit Breaker for External Services
Priority: Medium | Effort: Low
When external services (embedding endpoint, LLM provider, rerank service) are unavailable, every request currently waits for the full timeout before failing. Under load, this causes cascading latency.
Proposed approach:
5. Observability Enhancements
Priority: Medium | Effort: Low
Add a
stats_genai_summaryview (or populate existing counters) showing:llm_cache_hits / llm_total_requestsllm_total_response_time_ms / llm_total_requestsllm_local_model_callsvsllm_cloud_model_callsdaily_cloud_spend_usdanomaly_blocked_queries / anomaly_total_checksToolUsageStatsThe
AI_Features_Manageralready has atomic counters for most of these — they just need to be exposed via admin tables and/or Prometheus metrics.6. Schema-Aware NL2SQL
Priority: Medium | Effort: High
The
ai.nl2sql_convertMCP tool converts natural language to SQL via the LLM_Bridge. Its accuracy depends heavily on schema context. ProxySQL is uniquely positioned to provide this because it already knows:mysql_servers,pgsql_servers)stats_mysql_query_digestProposed enhancement:
7. RAG Timeout Tuning
Priority: Low | Effort: Low
The default
genai-rag_timeout_ms=2000(2 seconds) may be tight for hybrid searches that involve FTS + vector similarity + Reciprocal Rank Fusion merging, especially with large document collections.Actions:
8. Multi-Model Routing Enhancements
Priority: Low | Effort: Medium
The LLM_Bridge currently selects between local and cloud models based on a simple latency heuristic (prefer local if <500ms). The
genai-daily_budget_usdandgenai-max_cloud_requests_per_hourvariables exist for cost control.Potential improvements:
9. Test Coverage Expansion
Priority: Medium | Effort: Medium
Current test status:
Gaps to address:
Some tests currently require external API keys or a GenAI-enabled build. The CI infrastructure should support both modes: with mock endpoints for unit testing, and with real endpoints for integration testing.
10. Local Embedding Support
Priority: Low | Effort: High
Currently, all embedding operations require an external HTTP service (
genai-embedding_uri). Thesqlite-rembedextension is already linked into the build — it could potentially provide local embedding computation without an external service dependency.Investigation needed:
Related Issues