Skip to content

Design: extract GenAI/MCP/RAG stack as a ProxySQL plugin (Plan #2 master doc) #5617

@renecannao

Description

@renecannao

Extract GenAI/MCP/RAG stack as a ProxySQL plugin — design master document

This issue is the master design doc for Plan #2, the follow-on to #5616 (Plan #1, sqlite-rembed removal). Plan #2 extracts ProxySQL's GenAI/MCP/RAG stack out of core into a plugin under plugins/genai/, using the v1 plugin ABI introduced on the ProtocolX branch (#5593, see doc/PLUGIN_API.md).

Plan #2 is too large for a single PR or a single implementation plan. It decomposes into eight sequential sub-plans (Phases A–H), each shippable on its own. This document captures the vision, the architectural facts, the phase breakdown, and the open decisions — so the implementation work can be picked up with full context and without re-deriving anything.

Related work: #5616 (Plan #1), #5593 (ProtocolX / v1 plugin ABI), doc/PLUGIN_API.md, plugins/mysqlx/ reference plugin.


1. Why do this

Today the AI/MCP tier (PROXYSQLGENAI=1, v4.0.x) is built from the same binary as stable tier (v3.0.x) via compile-time feature flags. That's been fine so far, but it has real costs:

  • Binary bloat. Stable-tier users pay for ~25K lines of GenAI code that never runs (even though much is #ifdef-compiled out, the deps — vector storage, etc. — bleed into the shape of the tree).
  • Ifdef pollution in core files. MySQL_Session.cpp, MySQL_Thread.cpp, PgSQL_Thread.cpp, ProxySQL_Admin.cpp, Admin_Handler.cpp, ProxySQL_Admin_Tables_Definitions.h, proxysql.h, cpp.h, and others have #ifdef PROXYSQLGENAI sites — total count around 40-50 sites across ~12 files. Each one is a small cognitive tax.
  • ~970 lines of GenAI logic inside MySQL_Session.cpp itself (the handler___...___detect_ai_anomaly() block, lines ~3634-4605 pre-refactor). This is a pre-existing layering smell independent of plugin-ification: GenAI-specific anomaly detection code lives inside a core session class, not in Anomaly_Detector.cpp where it belongs.
  • No independent shippability. A user who wants the MCP control surface (expose ProxySQL to their LLM tooling) must take the whole AI tier including outbound LLM calls, vector storage, and anomaly detection. A user who wants just anomaly detection must take MCP and RAG too. The tiers don't match the product journeys.
  • Product-tier packaging is compile-time. If AI tier becomes a plugin, the three ProxySQL tiers (stable / innovative / AI-MCP) become a packaging concern instead of a compile-time concern. proxysql-ai = proxysql + genai.so. Much cleaner.
  • Forcing-function for the plugin ABI. The v1 plugin ABI on ProtocolX has exactly one consumer right now (plugins/mysqlx/). The second consumer is a much stronger test of whether the ABI is actually good. GenAI is the right second consumer because its needs (hot-path hooks, admin tables, admin commands, plugin-owned threads) cover most of what any future plugin will want.

Plan #2 is also a prerequisite for a possible future Plan #3 that would split the GenAI plugin into three plugins (mcp.so / genai.so / genai-mcp-tools.so) matching the three user journeys. Plan #2 lays the file-level groundwork so Plan #3 is a git mv plus an inter-plugin service-registry design — not a rewrite.


2. State of the tree after Plan #1

Summary of what exists today (on top of #5616):

Dedicated GenAI files under lib/+include/ (all guarded by #ifdef PROXYSQLGENAI):

Subsystem Files Approx lines
AI core (cyclic) GenAI_Thread, AI_Features_Manager, LLM_Bridge, LLM_Clients, Anomaly_Detector, AI_Vector_Storage ~6.5K
MCP protocol MCP_Thread, MCP_Endpoint, ProxySQL_MCP_Server, MCP_Tool_Handler ~1.3K
MCP tool handlers (AI-free) Admin_Tool_Handler, Config_Tool_Handler, Cache_Tool_Handler, Stats_Tool_Handler, Observe_Tool_Handler, Query_Tool_Handler, MySQL_Tool_Handler ~?
MCP tool handlers (AI-backed) AI_Tool_Handler, RAG_Tool_Handler ~?
Schema discovery helpers Static_Harvester, PgSQL_Static_Harvester, Discovery_Schema, MySQL_Catalog, MySQL_FTS ~?

Total dedicated surface: ~30 files, on the order of 15-25K lines depending on how you count headers.

Core files with #ifdef PROXYSQLGENAI touchpoints:

File Sites Nature
lib/ProxySQL_Admin.cpp 18 Admin table + command registration
lib/Admin_Handler.cpp 14 Command dispatch
lib/MySQL_Session.cpp 8 2 hot-path, 1 extern block, ~970-line anomaly detector body
lib/MySQL_Thread.cpp ~7 Lifecycle init, config defaults, global access
lib/PgSQL_Thread.cpp a few Similar to MySQL_Thread, less intrusive
lib/ProxySQL_Admin_Stats.cpp a few Stats table registration
lib/Admin_Bootstrap.cpp a few Bootstrap sequencing
lib/Admin_FlushVariables.cpp a few Variable flush
Headers: proxysql.h, cpp.h, proxysql_admin.h, ProxySQL_Admin_Tables_Definitions.h a few Forward declarations, table DDL macros

Total ifdef pollution: ~40-50 sites across ~12 files. Of those, only two are in the actual MySQL client-query hot path (both in MySQL_Session.cpp).


3. Design snapshot: the include-graph

I traced the #include relationships between GenAI files. Three facts are load-bearing:

Fact 1 — AI core is cyclic, not splittable

GenAI_Thread ↔ AI_Features_Manager ↔ LLM_Bridge ↔ LLM_Clients
                      ↑
                      ↓
              Anomaly_Detector

All five include each other. They're one cohesive unit. Splitting them is not in scope for Plan #2.

Fact 2 — Seven of nine tool handlers are AI-free

MCP_Thread ─► ProxySQL_MCP_Server ─┬─► MCP_Tool_Handler
                                   ├─► Admin_Tool_Handler     (AI-free)
                                   ├─► Config_Tool_Handler    (AI-free)
                                   ├─► Cache_Tool_Handler     (AI-free)
                                   ├─► Stats_Tool_Handler     (AI-free)
                                   ├─► Observe_Tool_Handler   (AI-free)
                                   ├─► Query_Tool_Handler     (AI-free)
                                   ├─► MySQL_Tool_Handler     (AI-free)
                                   ├─► AI_Tool_Handler   ──► LLM_Bridge, Anomaly_Detector, AI_Features_Manager
                                   └─► RAG_Tool_Handler  ──► AI_Features_Manager, Discovery_Schema, GenAI_Thread, LLM_Bridge

The MCP protocol surface (the seven AI-free tool handlers + MCP_Thread + the schema discovery helpers) does not depend on any AI core file. The only one line that welds MCP to AI is this, in lib/ProxySQL_MCP_Server.cpp:

#include "AI_Tool_Handler.h"
#include "RAG_Tool_Handler.h"

That's it. If you break that include (by making the MCP server's tool list runtime-registered instead of compile-time #included), MCP and AI core become cleanly separable.

Fact 3 — Hot-path coupling is narrow

lib/MySQL_Session.cpp has only two hot-path intrusions:

  • Per-query anomaly detection, around line ~5348:
    #ifdef PROXYSQLGENAI
    if (GloAI && GloAI->get_anomaly_detector()) {
        if (handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_detect_ai_anomaly()) {
            handler_ret = -1;
            return handler_ret;
        }
    }
    #endif
  • GENAI: query-prefix interception, around line ~7163 (intercepts queries starting with GENAI: before normal routing).

Both become generic plugin-ABI services. Null-guarded so zero-cost when no plugin is loaded.

The ~970-line detect_ai_anomaly implementation block at lines ~3634-4605 is the actual body of the detector. It needs to move out of MySQL_Session.cpp into Anomaly_Detector.cpp proper, with a session-context accessor — this is Phase B.


4. Target architecture (Plan #2 endpoint)

After Plan #2 ships, the tree looks like:

plugins/genai/
├── Makefile
├── README.md
├── include/
│   └── (headers shared among subdirs, via narrow public interfaces only)
├── src/
│   ├── plugin_entry.cpp          # descriptor, init/start/stop, service wiring
│   ├── mcp/                      # future mcp.so candidate
│   │   ├── mcp_thread.cpp
│   │   ├── mcp_endpoint.cpp
│   │   ├── proxysql_mcp_server.cpp
│   │   └── handlers/
│   │       ├── admin_tool_handler.cpp
│   │       ├── config_tool_handler.cpp
│   │       ├── cache_tool_handler.cpp
│   │       ├── stats_tool_handler.cpp
│   │       ├── observe_tool_handler.cpp
│   │       ├── query_tool_handler.cpp
│   │       ├── mysql_tool_handler.cpp
│   │       ├── static_harvester.cpp
│   │       ├── pgsql_static_harvester.cpp
│   │       ├── discovery_schema.cpp
│   │       ├── mysql_catalog.cpp
│   │       └── mysql_fts.cpp
│   ├── ai_core/                  # future genai.so candidate
│   │   ├── genai_thread.cpp
│   │   ├── ai_features_manager.cpp
│   │   ├── llm_bridge.cpp
│   │   ├── llm_clients.cpp
│   │   ├── anomaly_detector.cpp
│   │   └── ai_vector_storage.cpp
│   └── bridge/                   # future genai-mcp-tools.so candidate
│       ├── ai_tool_handler.cpp
│       └── rag_tool_handler.cpp

Two internal invariants enforced on day one (even though it's still one .so):

  1. No cross-subdirectory #include except through narrow published headers. bridge/ is the only subdir allowed to include from both mcp/ and ai_core/. mcp/ must not include from ai_core/. ai_core/ must not include from mcp/. Enforcement is a lint check in CI (grep at CI time, fail on violation).
  2. ProxySQL_MCP_Server asks a ToolHandlerRegistry for its tool set rather than #includeing handler headers. Even in one .so, the MCP server iterates a runtime-registered list of tools. bridge/ registers AI_Tool_Handler and RAG_Tool_Handler into that list during plugin init().

Both invariants are the exact pattern that will become inter-plugin service registration if Plan #3 (physical split) ever happens. Getting them right now means Plan #3 is a git mv plus a small ABI extension, not a refactor.

In the core, after Plan #2:

  • The PROXYSQLGENAI, PROXYSQL31, PROXYSQLFFTO, PROXYSQLTSDB compile-time flags still exist for FFTO and TSDB (which stay in core for now, see Section 7 for why). PROXYSQLGENAI=1 becomes either a no-op or an alias for "also build the plugins/genai/ plugin" — TBD, see Open Question 2.
  • All #ifdef PROXYSQLGENAI sites in core files are deleted. Admin tables/commands are registered by the plugin's init(). Hot-path hooks are plugin-registered null-check call sites. Direct global access (GloAI, GloGATH, GloMCPServer) is gone — the plugin owns its state.
  • The ~970-line anomaly detector block is gone from MySQL_Session.cpp. Anomaly_Detector.cpp inside the plugin has it.
  • ProxySQL stable tier build: makesrc/proxysql (slightly smaller, no GenAI bleed).
  • ProxySQL AI tier build: make ai-tier or equivalent → src/proxysql + plugins/genai/genai.so. Package ships both.
  • AI tier TAP tests load genai.so via plugins = ("path/to/genai.so") in the test proxysql.cnf.

5. Phase breakdown

Eight phases. Each is a shippable PR with its own implementation plan, reviewable in isolation. They execute sequentially with the dependencies shown.

Phase A — ABI extension: hot-path hook services

Prerequisite for: all later phases
Depends on: nothing (purely additive to ProxySQL_Plugin.h)
Approximate size: ~150 lines, 3-4 files
Risk level: low

Extends include/ProxySQL_Plugin.h with:

  • register_mysql_pre_query_hook(proxysql_plugin_pre_query_cb cb) — callback invoked by MySQL_Session.cpp for every client query before routing. Plugin's callback receives an opaque session handle and the SQL; returns non-zero to reject the query (the anomaly detector case).
  • register_mysql_query_prefix_hook(const char* prefix, proxysql_plugin_pre_query_cb cb) — callback invoked when a query starts with the given prefix (the GENAI: interception case). The prefix is stripped before the callback sees the SQL.
  • Possibly: register_pgsql_pre_query_hook(...) + register_pgsql_query_prefix_hook(...) mirrors. Defer until actually needed.
  • Possibly: session-context accessor services (get session username, schema, client addr, etc.) used by the anomaly detector after it's moved out. See Open Question 3.
  • Possibly: get_pgsql_servers_snapshot / get_pgsql_users_snapshot callbacks to pair with the existing MySQL ones. Additive, cheap.

Core-side wiring: in lib/MySQL_Session.cpp, replace the two #ifdef PROXYSQLGENAI hot-path blocks with unconditional null-checked calls through the registered hooks. If no plugin is loaded, both call sites are single null-pointer checks — zero-cost equivalent to today's disabled-feature path.

Crucially: this phase adds no behavior and removes no code. It just extends the ABI. A fresh PROXYSQLGENAI=1 build must behave identically before and after this phase lands.

This phase can ship to v3.0 even if no plugin ever uses it. The mysqlx plugin (#5593) doesn't need it, so backward compatibility is preserved.

Phase B — Anomaly detector extraction (pre-plugin-ification cleanup)

Depends on: Phase A (for the session-context accessor services, if any)
Approximate size: ~1000 lines moved, small refactor
Risk level: medium (touches a hot-path code block)

Lift the ~970-line handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_detect_ai_anomaly() block from lib/MySQL_Session.cpp (lines ~3634-4605) into lib/Anomaly_Detector.cpp proper. Define a small session-context struct that carries just the fields the detector needs (current SQL, schema, user, client addr, QPO result, etc.). MySQL_Session's participation shrinks to a single thin forwarder.

This phase is worth doing independent of plugin-ification. The detector code being inside MySQL_Session.cpp is a pre-existing layering smell. Fixing it is its own improvement. It remains #ifdef PROXYSQLGENAI-guarded at the end of Phase B — still in core, still built with GENAI flag. Phase D moves the resulting clean Anomaly_Detector.cpp into the plugin.

The benefit of doing this as its own phase: if Phase C-H drag, the code-quality improvement from Phase B still ships.

Phase C — plugins/genai/ skeleton

Depends on: Phase A
Approximate size: ~200 lines new, no functional change
Risk level: low

Create plugins/genai/ with:

  • Subdirectory layout (src/mcp/, src/ai_core/, src/bridge/, include/)
  • A minimal plugin_entry.cpp with a proxysql_plugin_descriptor_v1 struct that exports empty init/start/stop callbacks
  • A Makefile that compiles plugin_entry.cpp into plugins/genai/genai.so and links nothing from core (per the plugin ABI's no-libproxysql.a-linking rule)
  • A CI lint check enforcing the no-cross-subdirectory-include invariant (grep-based)

The resulting genai.so is an empty stub: init returns true, start returns true, stop returns true, status_json returns {"name":"genai","state":"stub"}. It can be loaded via plugins = in the ProxySQL config and does nothing.

This phase introduces the plugin as a build target and test target. It doesn't move any code yet.

Phase D — File moves part 1: ai_core/

Depends on: Phase A, Phase B, Phase C
Approximate size: ~7K lines moved, ~15-20 #ifdef sites unwound
Risk level: medium (first real code motion across the core↔plugin boundary)

Move GenAI_Thread, AI_Features_Manager, LLM_Bridge, LLM_Clients, Anomaly_Detector, AI_Vector_Storage from lib/+include/ to plugins/genai/src/ai_core/+plugins/genai/include/ai_core/.

  • Replace direct global access (GloAI, GloGATH) with plugin-owned state held in a plugin context struct.
  • Delete #ifdef PROXYSQLGENAI inside the moved files — they're unconditional within the plugin.
  • Delete the extern GloAI / extern GloGATH declarations from core files that used them. Delete the corresponding #ifdef PROXYSQLGENAI blocks in MySQL_Session.cpp, MySQL_Thread.cpp, PgSQL_Thread.cpp that referenced them.
  • Plugin's init() registers the pre-query hook from Phase A with a callback that forwards to Anomaly_Detector.
  • Plugin's init() registers the GENAI: query-prefix hook from Phase A for the GENAI: command path.
  • Plugin's start() spins up GenAI_Thread + whatever else needs threads.

After this phase, the AI core is fully plugin-resident. Stable-tier builds no longer compile any of it. make (stable) is slightly smaller.

Phase E — File moves part 2: mcp/

Depends on: Phase D
Approximate size: ~15K lines moved (biggest phase)
Risk level: high (lots of files, the ProxySQL_MCP_ServerAI_Tool_Handler / RAG_Tool_Handler weld needs breaking)

Move MCP_Thread, MCP_Endpoint, ProxySQL_MCP_Server, MCP_Tool_Handler, plus the 7 AI-free tool handlers (Admin_, Config_, Cache_, Stats_, Observe_, Query_, MySQL_Tool_Handler), plus the schema-discovery helpers (Static_Harvester, PgSQL_Static_Harvester, Discovery_Schema, MySQL_Catalog, MySQL_FTS), into plugins/genai/src/mcp/.

Break the ProxySQL_MCP_Server → AI_Tool_Handler / RAG_Tool_Handler include weld. Introduce a ToolHandlerRegistry inside the plugin that tools register themselves with at init() time. ProxySQL_MCP_Server iterates the registry at runtime instead of #includeing handler headers. At the end of this phase, the two AI tool handlers are still in the plugin but now registered via the runtime registry — they haven't moved yet. They'll move to bridge/ in Phase F.

After this phase, the MCP protocol surface is fully plugin-resident but still shipped in the same .so as the AI core.

Phase F — File moves part 3: bridge/

Depends on: Phase E
Approximate size: ~1K lines moved, minor restructuring
Risk level: low

Move AI_Tool_Handler and RAG_Tool_Handler from wherever they landed in Phase E into plugins/genai/src/bridge/. Adjust their headers so they include from both mcp/ and ai_core/ (the only subdir allowed to).

Verify the cross-directory include lint check still passes. Delete the mandatory-include lines in ProxySQL_MCP_Server.cpp — the registry system now handles tool discovery.

After this phase, the three internal subdirectories match their long-term plugin split boundaries exactly. Moving to three physical plugins becomes a git mv + inter-plugin service lookup, deferrable to Plan #3.

Phase G — Admin plumbing: delete GenAI #ifdefs from core

Depends on: Phase D (so the AI core is plugin-resident)
Approximate size: ~40 #ifdef sites unwound across ~8 files
Risk level: medium-high (touches ProxySQL_Admin.cpp which is load-bearing)

Convert every #ifdef PROXYSQLGENAI site in core to use the plugin's registration services:

  • lib/ProxySQL_Admin.cpp (18 sites) — remove GenAI-specific table registration. Tables are now registered by the plugin via services->register_table(...) in init().
  • lib/Admin_Handler.cpp (14 sites) — remove GenAI-specific command dispatch. Commands are registered by the plugin via services->register_command(...) in init().
  • lib/Admin_Bootstrap.cpp, lib/Admin_FlushVariables.cpp, lib/ProxySQL_Admin_Stats.cpp — smaller, similar pattern.
  • Header cleanup: include/proxysql.h, include/cpp.h, include/proxysql_admin.h, include/ProxySQL_Admin_Tables_Definitions.h — delete GenAI forward declarations and GenAI table DDL macros.
  • Delete #ifdef PROXYSQLGENAI blocks from MySQL_Thread.cpp, PgSQL_Thread.cpp that referenced plugin internals or config defaults.
  • Anything that's pure lifecycle (thread start/stop) moves entirely to the plugin's start()/stop() callbacks.

Risk area: admin command alias handling. PLUGIN_API.md says alias resolution (TO RUNTO RUNTIME) happens in Admin_Handler.cpp and plugins only see canonical forms. The GenAI admin commands may have their own aliases that currently live in Admin_Handler.cpp. Those either stay in core (core knows about the plugin's alias vocabulary — a small knowledge leak) or move to the plugin (the plugin registers both canonical and alias forms). See Open Question 4.

After this phase, grep -rn PROXYSQLGENAI lib/ include/ src/ returns only matches in plugins/genai/ files (which should themselves be unconditional — see note in Phase D).

Phase H — Build system, packaging, test migration

Depends on: Phase G
Approximate size: ~200 lines changed, packaging rework
Risk level: medium (packaging is its own beast)

Clean up:

  • Top-level Makefile: teach make to build plugins/genai/genai.so when an AI_TIER=1 (or similar) flag is set. Stable-tier make doesn't build it.
  • Packaging: .deb / .rpm packages for the AI tier ship proxysql + genai.so + a proxysql.cnf with plugins = ("/usr/lib/proxysql/genai.so").
  • Decide: remove PROXYSQLGENAI=1 compile flag entirely, or keep it as a no-op alias for users who have it in their build scripts. See Open Question 2.
  • TAP tests: AI-tier tests load genai.so via the plugins = config line. test/infra/ sets the config accordingly for GenAI test runs.
  • CI: AI-tier CI jobs build both the core binary and the plugin, produce the combined package, and run the AI-tier TAP subset against it.
  • Documentation: update CLAUDE.md, doc/GENAI.md, and the top-level README to reflect the plugin packaging model.

After this phase, the AI/MCP product tier is genuinely a separate plugin, distributed alongside the core binary, dynamically loaded at startup, and testable in isolation. make with no extra flags builds the stable tier and nothing else — no FFTO, no TSDB, no GenAI.


6. Phases at a glance

Phase Description Dep ~Size Risk Ships value if stopped here
A ABI extension: hot-path hooks 150 lines low yes, reusable for other plugins
B Anomaly detector extraction A 1000 lines refactored medium yes, code-quality win independent of plugin work
C plugins/genai/ skeleton A 200 lines low no (stub plugin)
D Move AI core into plugin A, B, C 7K lines medium partial (stable tier drops AI core)
E Move MCP into plugin D 15K lines high partial (MCP + non-AI tool handlers plugin-resident)
F Move AI tool handlers to bridge/ E 1K lines low yes (enables future 3-way split)
G Delete GenAI #ifdefs from core D 40 sites medium-high yes (core is GenAI-free)
H Build/packaging/test migration G 200 lines medium yes (product-tier story is plugin-native)

Ordering note: B and C can be done in either order; both depend only on A. D depends on both. Phase F depends on E but is a small cleanup (could be folded into E if preferred — they're intimately related).


7. Key design decisions that need a fresh-mind call

Things I have opinions on but want confirmed before execution starts.

7.1 FFTO and TSDB — plugin-ize in this plan, or defer?

Today, PROXYSQLGENAI=1 implies PROXYSQL31=1 implies PROXYSQLFFTO=1 and PROXYSQLTSDB=1. The innovative tier (PROXYSQL31=1) is built from FFTO + TSDB.

Proposed (default, my recommendation): keep FFTO and TSDB as compile-time features in core. Plan #2 only moves GenAI. The innovative tier (v3.1.x) continues to be a compile-time product. PROXYSQLFFTO and PROXYSQLTSDB remain #ifdefs in core files — they are much smaller surfaces than GenAI and don't have the same bleed.

Alternative: plugin-ize FFTO and TSDB in the same effort. This roughly doubles the scope and the new plugin API surface needs to accommodate two more sets of lifecycle hooks. Probably not worth it in one plan.

7.2 Fate of the PROXYSQLGENAI=1 compile flag

Plan #2 moves the GenAI code out of core. At the end of Phase H, the flag either:

Option A (cleanest): delete the flag entirely. make PROXYSQLGENAI=1 becomes an error. The AI tier is built via AI_TIER=1 or make ai-tier or similar, which produces core + plugin.

Option B (conservative): keep the flag as a no-op alias for AI_TIER=1. Users with existing build scripts that pass PROXYSQLGENAI=1 don't break. After a grace period, deprecate.

Option C (minimal): leave PROXYSQLGENAI=1 alone and have it mean "build + include the plugin". The downside is the name is wrong (there's no GenAI in core anymore), but nothing breaks.

Recommendation: Option B. Don't break build scripts, but don't pretend GenAI is a core feature.

7.3 Shape of the session-context accessor for the anomaly detector

When the ~970-line detector block moves out of MySQL_Session.cpp, it needs some way to read session state. Three options:

Option A — opaque handle + accessor callbacks in the plugin services struct:

const char* (*session_get_current_sql)(void* session_handle);
const char* (*session_get_username)(void* session_handle);
const char* (*session_get_schemaname)(void* session_handle);
const char* (*session_get_client_addr)(void* session_handle);
...

Pro: ABI-safe, no C++ classes cross the plugin boundary.
Con: Verbose. Every piece of state the detector needs requires a new service callback. Currently it touches a LOT of session state.

Option B — copy a context struct by value:

struct PreQueryContext {
    const char* sql;
    size_t sql_len;
    const char* username;
    const char* schemaname;
    const char* client_addr;
    uint64_t client_flags;
    ... (as many as the detector needs)
};

Core allocates and fills the struct before calling the hook; plugin reads it.
Pro: Single ABI contract, single copy cost.
Con: If the detector needs new fields later, the struct grows — still additive since plugins can check a struct size field.

Option C — session as thin C-struct pointer, fields directly readable:
Avoid this. It makes the ABI fragile because the struct layout becomes part of the contract.

Recommendation: Option B for the obvious fields, with a small set of Option A accessor callbacks for anything the detector needs occasionally that doesn't belong in the hot-path context struct. Details to be worked out in Phase A's detailed plan.

7.4 Admin command alias handling

PLUGIN_API.md says alias resolution happens in Admin_Handler.cpp (core) and plugins see only canonical command forms. Today, GenAI admin commands like LOAD MYSQL LLM_SERVERS TO RUNTIME may have aliases (TO RUN, FROM MEM, etc.) that live in Admin_Handler.cpp's alias vectors.

Two options:

Option A — keep alias vectors in core: Core knows about the plugin's alias vocabulary. A small knowledge leak from plugin to core. Follows the current PLUGIN_API.md direction.

Option B — plugin registers both canonical and alias forms: Plugin's init() does services->register_command("LOAD MYSQL LLM_SERVERS TO RUNTIME", cb); services->register_command("LOAD MYSQL LLM_SERVERS TO RUN", cb);. Core has no GenAI knowledge. Cleaner but duplicates every command N times where N = number of aliases.

Recommendation: Option A for now (matches the docs), but this should be re-examined if and when Plan #3 splits the plugin. If MCP, GenAI, and bridge become three separate plugins, core can't know about all their alias vocabularies — so a service for "plugin registers an alias for one of its own commands" becomes necessary.

7.5 Plugin-owned scheduler vs core scheduler

Today GenAI_Thread and MCP_Thread have their own event loops (they spin up their own libev or similar). They don't piggy-back on the core scheduler.

Nothing to decide here — the plugin's start() callback is the natural place to spin up these threads. The plugin owns their lifecycle and cleans them up in stop().

Note: SQLite3_Server and ClickHouse_Server are a different story — they piggy-back on Admin's scheduler tick and wake-pipe. Those wouldn't apply here.

7.6 Metrics / Prometheus integration

GenAI publishes metrics (LLM call counts, latency, anomaly hit rate, etc.) via ProxySQL's Prometheus exporter. The plugin ABI v1 doesn't have a metrics registration service.

Two options:

Option A — plugin owns a stats_db table. The core Prometheus exporter already reads from stats_db for built-in metrics. Plugins can register a stats table via the existing services->register_table(stats_db, ...) callback, and the core exporter will pick it up automatically (verify this — see ProxySQL_Admin_Stats.cpp). No ABI change needed.

Option B — add a register_metric(name, getter_cb) service. Lower-overhead for hot metrics; more flexible.

Recommendation: Option A unless verification shows the core exporter doesn't iterate plugin-registered tables. Check in Phase D.


8. Open questions (require input before execution)

  1. Branch point for Plan Too much time spent in array2buffer for resultset with a lot of small rows #2. Currently a worktree exists at .worktrees/genai-plugin based on the Plan Remove argument MySQL_Data_Stream from MySQL_Protocol's function #1 branch (v3.0-remove-sqlite-rembed). Alternative: rebranch from v3.0 directly, and rebase onto v3.0 after build: remove sqlite-rembed and the Rust toolchain dependency #5616 merges. Either works. The Plan Remove argument MySQL_Data_Stream from MySQL_Protocol's function #1 branch base is slightly more convenient because the plan-narrative assumes Rust is gone.

  2. Phase A detailed plan first, or all 8 detailed plans written upfront? My default is: write Phase A's detailed implementation plan now, execute it, learn from it, then write Phase B's. Each phase's plan benefits from the experience of the previous phase. Alternative: write all 8 detailed plans upfront — higher upfront cost, but gives a complete roadmap. Default: phase-by-phase.

  3. MCP rules integration (build: remove sqlite-rembed and the Rust toolchain dependency #5616 note). I saw ADMIN_SQLITE_TABLE_MCP_QUERY_RULES macro-redefinition warnings in the build log during Plan Remove argument MySQL_Data_Stream from MySQL_Protocol's function #1 verification. These predate Plan Remove argument MySQL_Data_Stream from MySQL_Protocol's function #1. They're MCP-rules related. Need to check whether they affect any of Plan Too much time spent in array2buffer for resultset with a lot of small rows #2's file moves or touch the MCP plugin files in non-obvious ways.

  4. How aggressive should Phase G be about #ifdef PROXYSQLGENAI removal? After Phase G, the goal is zero #ifdef PROXYSQLGENAI sites in core. But some may exist as defensive guards that don't strictly need the plugin (e.g., config variable defaults). The detailed Phase G plan should enumerate every site and decide per-site whether to delete, convert to runtime-check, or keep.

  5. What's the deprecation story for existing PROXYSQLGENAI=1 users? (See decision 7.2.) Is there an external user base that explicitly builds with PROXYSQLGENAI=1 and depends on it being a compile-time flag? If yes, Option B (keep as alias); if no, Option A (delete).

  6. Test infrastructure. test/infra/ has Docker environments. Does any existing AI-tier test assume the plugin is compiled into the binary rather than loaded at runtime? If yes, those tests need updating — in which phase? Probably Phase H.


9. Explicit non-goals for Plan #2

Things that sound related but are out of scope:


10. Size and effort estimate (rough)

I deliberately avoid time estimates, but phase-weighted complexity:

Phase Relative complexity
A 1× (baseline — small ABI additive change)
B 2× (refactor of 1000-line hot-path block)
C 1× (skeleton)
D 4× (first real code motion, ~7K lines, ~15-20 ifdef removals)
E 6× (biggest phase, tool-handler registry redesign)
F 1× (small cleanup after E)
G 3× (40 ifdef sites, admin-heavy)
H 2× (build system + packaging + tests)

Total: ~20× the "A baseline". Phase A + B together are ~3×, which is roughly the size of Plan #1 (9 commits). So expect ~6-8 PRs of Plan #1 size to land the whole thing.


11. Current state


12. References


Next action: none from me — this is here to be re-read with a fresh mind. When ready, the implementation entry point is Phase A's detailed plan (writing-plans skill → docs/superpowers/plans/...-plan2a-plugin-abi-extension.md), dispatched via subagent-driven-development or inline execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions