Design: extract GenAI/MCP/RAG stack as a ProxySQL plugin (Plan #2 master doc)

# Extract GenAI/MCP/RAG stack as a ProxySQL plugin — design master document

This issue is the master design doc for **Plan #2**, the follow-on to #5616 (Plan #1, sqlite-rembed removal). Plan #2 extracts ProxySQL's GenAI/MCP/RAG stack out of core into a plugin under `plugins/genai/`, using the v1 plugin ABI introduced on the `ProtocolX` branch (#5593, see `doc/PLUGIN_API.md`).

Plan #2 is too large for a single PR or a single implementation plan. It decomposes into **eight sequential sub-plans (Phases A–H)**, each shippable on its own. This document captures the vision, the architectural facts, the phase breakdown, and the open decisions — so the implementation work can be picked up with full context and without re-deriving anything.

Related work: #5616 (Plan #1), #5593 (ProtocolX / v1 plugin ABI), `doc/PLUGIN_API.md`, `plugins/mysqlx/` reference plugin.

---

## 1. Why do this

Today the AI/MCP tier (`PROXYSQLGENAI=1`, v4.0.x) is built from the same binary as stable tier (v3.0.x) via compile-time feature flags. That's been fine so far, but it has real costs:

- **Binary bloat.** Stable-tier users pay for ~25K lines of GenAI code that never runs (even though much is `#ifdef`-compiled out, the deps — vector storage, etc. — bleed into the shape of the tree).
- **Ifdef pollution in core files.** `MySQL_Session.cpp`, `MySQL_Thread.cpp`, `PgSQL_Thread.cpp`, `ProxySQL_Admin.cpp`, `Admin_Handler.cpp`, `ProxySQL_Admin_Tables_Definitions.h`, `proxysql.h`, `cpp.h`, and others have `#ifdef PROXYSQLGENAI` sites — total count around 40-50 sites across ~12 files. Each one is a small cognitive tax.
- **~970 lines of GenAI logic inside `MySQL_Session.cpp` itself** (the `handler___...___detect_ai_anomaly()` block, lines ~3634-4605 pre-refactor). This is a pre-existing layering smell independent of plugin-ification: GenAI-specific anomaly detection code lives inside a core session class, not in `Anomaly_Detector.cpp` where it belongs.
- **No independent shippability.** A user who wants the MCP control surface (expose ProxySQL to their LLM tooling) must take the whole AI tier including outbound LLM calls, vector storage, and anomaly detection. A user who wants just anomaly detection must take MCP and RAG too. The tiers don't match the product journeys.
- **Product-tier packaging is compile-time.** If AI tier becomes a plugin, the three ProxySQL tiers (stable / innovative / AI-MCP) become a **packaging** concern instead of a compile-time concern. `proxysql-ai` = `proxysql` + `genai.so`. Much cleaner.
- **Forcing-function for the plugin ABI.** The v1 plugin ABI on `ProtocolX` has exactly one consumer right now (`plugins/mysqlx/`). The second consumer is a much stronger test of whether the ABI is actually good. GenAI is the right second consumer because its needs (hot-path hooks, admin tables, admin commands, plugin-owned threads) cover most of what any future plugin will want.

Plan #2 is also **a prerequisite for a possible future Plan #3** that would split the GenAI plugin into three plugins (`mcp.so` / `genai.so` / `genai-mcp-tools.so`) matching the three user journeys. Plan #2 lays the file-level groundwork so Plan #3 is a `git mv` plus an inter-plugin service-registry design — not a rewrite.

---

## 2. State of the tree after Plan #1

Summary of what exists today (on top of #5616):

**Dedicated GenAI files** under `lib/`+`include/` (all guarded by `#ifdef PROXYSQLGENAI`):

| Subsystem | Files | Approx lines |
|---|---|---|
| AI core (cyclic) | `GenAI_Thread`, `AI_Features_Manager`, `LLM_Bridge`, `LLM_Clients`, `Anomaly_Detector`, `AI_Vector_Storage` | ~6.5K |
| MCP protocol | `MCP_Thread`, `MCP_Endpoint`, `ProxySQL_MCP_Server`, `MCP_Tool_Handler` | ~1.3K |
| MCP tool handlers (AI-free) | `Admin_Tool_Handler`, `Config_Tool_Handler`, `Cache_Tool_Handler`, `Stats_Tool_Handler`, `Observe_Tool_Handler`, `Query_Tool_Handler`, `MySQL_Tool_Handler` | ~? |
| MCP tool handlers (AI-backed) | `AI_Tool_Handler`, `RAG_Tool_Handler` | ~? |
| Schema discovery helpers | `Static_Harvester`, `PgSQL_Static_Harvester`, `Discovery_Schema`, `MySQL_Catalog`, `MySQL_FTS` | ~? |

Total dedicated surface: ~30 files, on the order of 15-25K lines depending on how you count headers.

**Core files with `#ifdef PROXYSQLGENAI` touchpoints:**

| File | Sites | Nature |
|---|---|---|
| `lib/ProxySQL_Admin.cpp` | 18 | Admin table + command registration |
| `lib/Admin_Handler.cpp` | 14 | Command dispatch |
| `lib/MySQL_Session.cpp` | 8 | **2 hot-path**, 1 extern block, ~970-line anomaly detector body |
| `lib/MySQL_Thread.cpp` | ~7 | Lifecycle init, config defaults, global access |
| `lib/PgSQL_Thread.cpp` | a few | Similar to MySQL_Thread, less intrusive |
| `lib/ProxySQL_Admin_Stats.cpp` | a few | Stats table registration |
| `lib/Admin_Bootstrap.cpp` | a few | Bootstrap sequencing |
| `lib/Admin_FlushVariables.cpp` | a few | Variable flush |
| Headers: `proxysql.h`, `cpp.h`, `proxysql_admin.h`, `ProxySQL_Admin_Tables_Definitions.h` | a few | Forward declarations, table DDL macros |

**Total ifdef pollution**: ~40-50 sites across ~12 files. Of those, only **two** are in the actual MySQL client-query hot path (both in `MySQL_Session.cpp`).

---

## 3. Design snapshot: the include-graph

I traced the `#include` relationships between GenAI files. Three facts are load-bearing:

**Fact 1 — AI core is cyclic, not splittable**

```
GenAI_Thread ↔ AI_Features_Manager ↔ LLM_Bridge ↔ LLM_Clients
                      ↑
                      ↓
              Anomaly_Detector
```

All five include each other. They're one cohesive unit. Splitting them is not in scope for Plan #2.

**Fact 2 — Seven of nine tool handlers are AI-free**

```
MCP_Thread ─► ProxySQL_MCP_Server ─┬─► MCP_Tool_Handler
                                   ├─► Admin_Tool_Handler     (AI-free)
                                   ├─► Config_Tool_Handler    (AI-free)
                                   ├─► Cache_Tool_Handler     (AI-free)
                                   ├─► Stats_Tool_Handler     (AI-free)
                                   ├─► Observe_Tool_Handler   (AI-free)
                                   ├─► Query_Tool_Handler     (AI-free)
                                   ├─► MySQL_Tool_Handler     (AI-free)
                                   ├─► AI_Tool_Handler   ──► LLM_Bridge, Anomaly_Detector, AI_Features_Manager
                                   └─► RAG_Tool_Handler  ──► AI_Features_Manager, Discovery_Schema, GenAI_Thread, LLM_Bridge
```

The MCP protocol surface (the seven AI-free tool handlers + MCP_Thread + the schema discovery helpers) does not depend on any AI core file. The only **one line** that welds MCP to AI is this, in `lib/ProxySQL_MCP_Server.cpp`:

```cpp
#include "AI_Tool_Handler.h"
#include "RAG_Tool_Handler.h"
```

That's it. If you break that include (by making the MCP server's tool list runtime-registered instead of compile-time `#include`d), MCP and AI core become cleanly separable.

**Fact 3 — Hot-path coupling is narrow**

`lib/MySQL_Session.cpp` has only **two** hot-path intrusions:

- **Per-query anomaly detection**, around line ~5348:
  ```cpp
  #ifdef PROXYSQLGENAI
  if (GloAI && GloAI->get_anomaly_detector()) {
      if (handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_detect_ai_anomaly()) {
          handler_ret = -1;
          return handler_ret;
      }
  }
  #endif
  ```
- **`GENAI:` query-prefix interception**, around line ~7163 (intercepts queries starting with `GENAI: ` before normal routing).

Both become generic plugin-ABI services. Null-guarded so zero-cost when no plugin is loaded.

The ~970-line `detect_ai_anomaly` implementation block at lines ~3634-4605 is the actual body of the detector. It needs to move out of `MySQL_Session.cpp` into `Anomaly_Detector.cpp` proper, with a session-context accessor — this is Phase B.

---

## 4. Target architecture (Plan #2 endpoint)

After Plan #2 ships, the tree looks like:

```
plugins/genai/
├── Makefile
├── README.md
├── include/
│   └── (headers shared among subdirs, via narrow public interfaces only)
├── src/
│   ├── plugin_entry.cpp          # descriptor, init/start/stop, service wiring
│   ├── mcp/                      # future mcp.so candidate
│   │   ├── mcp_thread.cpp
│   │   ├── mcp_endpoint.cpp
│   │   ├── proxysql_mcp_server.cpp
│   │   └── handlers/
│   │       ├── admin_tool_handler.cpp
│   │       ├── config_tool_handler.cpp
│   │       ├── cache_tool_handler.cpp
│   │       ├── stats_tool_handler.cpp
│   │       ├── observe_tool_handler.cpp
│   │       ├── query_tool_handler.cpp
│   │       ├── mysql_tool_handler.cpp
│   │       ├── static_harvester.cpp
│   │       ├── pgsql_static_harvester.cpp
│   │       ├── discovery_schema.cpp
│   │       ├── mysql_catalog.cpp
│   │       └── mysql_fts.cpp
│   ├── ai_core/                  # future genai.so candidate
│   │   ├── genai_thread.cpp
│   │   ├── ai_features_manager.cpp
│   │   ├── llm_bridge.cpp
│   │   ├── llm_clients.cpp
│   │   ├── anomaly_detector.cpp
│   │   └── ai_vector_storage.cpp
│   └── bridge/                   # future genai-mcp-tools.so candidate
│       ├── ai_tool_handler.cpp
│       └── rag_tool_handler.cpp
```

**Two internal invariants enforced on day one** (even though it's still one `.so`):

1. **No cross-subdirectory `#include` except through narrow published headers.** `bridge/` is the only subdir allowed to include from both `mcp/` and `ai_core/`. `mcp/` must not include from `ai_core/`. `ai_core/` must not include from `mcp/`. Enforcement is a lint check in CI (grep at CI time, fail on violation).
2. **`ProxySQL_MCP_Server` asks a `ToolHandlerRegistry` for its tool set rather than `#include`ing handler headers.** Even in one `.so`, the MCP server iterates a runtime-registered list of tools. `bridge/` registers `AI_Tool_Handler` and `RAG_Tool_Handler` into that list during plugin `init()`.

Both invariants are **the exact pattern that will become inter-plugin service registration** if Plan #3 (physical split) ever happens. Getting them right now means Plan #3 is a `git mv` plus a small ABI extension, not a refactor.

In the core, after Plan #2:

- The `PROXYSQLGENAI`, `PROXYSQL31`, `PROXYSQLFFTO`, `PROXYSQLTSDB` compile-time flags still exist for FFTO and TSDB (which stay in core for now, see Section 7 for why). `PROXYSQLGENAI=1` becomes either a no-op or an alias for "also build the `plugins/genai/` plugin" — **TBD, see Open Question 2**.
- All `#ifdef PROXYSQLGENAI` sites in core files are **deleted**. Admin tables/commands are registered by the plugin's `init()`. Hot-path hooks are plugin-registered null-check call sites. Direct global access (`GloAI`, `GloGATH`, `GloMCPServer`) is gone — the plugin owns its state.
- The ~970-line anomaly detector block is gone from `MySQL_Session.cpp`. `Anomaly_Detector.cpp` inside the plugin has it.
- ProxySQL stable tier build: `make` → `src/proxysql` (slightly smaller, no GenAI bleed).
- ProxySQL AI tier build: `make ai-tier` or equivalent → `src/proxysql` + `plugins/genai/genai.so`. Package ships both.
- AI tier TAP tests load `genai.so` via `plugins = ("path/to/genai.so")` in the test `proxysql.cnf`.

---

## 5. Phase breakdown

Eight phases. Each is a shippable PR with its own implementation plan, reviewable in isolation. They execute sequentially with the dependencies shown.

### Phase A — ABI extension: hot-path hook services

**Prerequisite for:** all later phases
**Depends on:** nothing (purely additive to `ProxySQL_Plugin.h`)
**Approximate size:** ~150 lines, 3-4 files
**Risk level:** low

Extends `include/ProxySQL_Plugin.h` with:

- `register_mysql_pre_query_hook(proxysql_plugin_pre_query_cb cb)` — callback invoked by `MySQL_Session.cpp` for every client query before routing. Plugin's callback receives an opaque session handle and the SQL; returns non-zero to reject the query (the anomaly detector case).
- `register_mysql_query_prefix_hook(const char* prefix, proxysql_plugin_pre_query_cb cb)` — callback invoked when a query starts with the given prefix (the `GENAI:` interception case). The prefix is stripped before the callback sees the SQL.
- Possibly: `register_pgsql_pre_query_hook(...)` + `register_pgsql_query_prefix_hook(...)` mirrors. Defer until actually needed.
- Possibly: session-context accessor services (get session username, schema, client addr, etc.) used by the anomaly detector after it's moved out. **See Open Question 3.**
- Possibly: `get_pgsql_servers_snapshot` / `get_pgsql_users_snapshot` callbacks to pair with the existing MySQL ones. Additive, cheap.

Core-side wiring: in `lib/MySQL_Session.cpp`, replace the two `#ifdef PROXYSQLGENAI` hot-path blocks with unconditional null-checked calls through the registered hooks. If no plugin is loaded, both call sites are single null-pointer checks — zero-cost equivalent to today's disabled-feature path.

**Crucially:** this phase adds no behavior and removes no code. It just extends the ABI. A fresh `PROXYSQLGENAI=1` build must behave identically before and after this phase lands.

This phase can ship to `v3.0` even if no plugin ever uses it. The mysqlx plugin (#5593) doesn't need it, so backward compatibility is preserved.

### Phase B — Anomaly detector extraction (pre-plugin-ification cleanup)

**Depends on:** Phase A (for the session-context accessor services, if any)
**Approximate size:** ~1000 lines moved, small refactor
**Risk level:** medium (touches a hot-path code block)

Lift the ~970-line `handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_detect_ai_anomaly()` block from `lib/MySQL_Session.cpp` (lines ~3634-4605) into `lib/Anomaly_Detector.cpp` proper. Define a small session-context struct that carries just the fields the detector needs (current SQL, schema, user, client addr, QPO result, etc.). `MySQL_Session`'s participation shrinks to a single thin forwarder.

**This phase is worth doing independent of plugin-ification.** The detector code being inside `MySQL_Session.cpp` is a pre-existing layering smell. Fixing it is its own improvement. It remains `#ifdef PROXYSQLGENAI`-guarded at the end of Phase B — still in core, still built with GENAI flag. Phase D moves the resulting clean `Anomaly_Detector.cpp` into the plugin.

The benefit of doing this as its own phase: if Phase C-H drag, the code-quality improvement from Phase B still ships.

### Phase C — `plugins/genai/` skeleton

**Depends on:** Phase A
**Approximate size:** ~200 lines new, no functional change
**Risk level:** low

Create `plugins/genai/` with:

- Subdirectory layout (`src/mcp/`, `src/ai_core/`, `src/bridge/`, `include/`)
- A minimal `plugin_entry.cpp` with a `proxysql_plugin_descriptor_v1` struct that exports empty `init`/`start`/`stop` callbacks
- A `Makefile` that compiles `plugin_entry.cpp` into `plugins/genai/genai.so` and links nothing from core (per the plugin ABI's no-`libproxysql.a`-linking rule)
- A CI lint check enforcing the no-cross-subdirectory-include invariant (grep-based)

The resulting `genai.so` is an empty stub: `init` returns true, `start` returns true, `stop` returns true, `status_json` returns `{"name":"genai","state":"stub"}`. It can be loaded via `plugins = ` in the ProxySQL config and does nothing.

This phase introduces the plugin as a build target and test target. It doesn't move any code yet.

### Phase D — File moves part 1: `ai_core/`

**Depends on:** Phase A, Phase B, Phase C
**Approximate size:** ~7K lines moved, ~15-20 `#ifdef` sites unwound
**Risk level:** medium (first real code motion across the core↔plugin boundary)

Move `GenAI_Thread`, `AI_Features_Manager`, `LLM_Bridge`, `LLM_Clients`, `Anomaly_Detector`, `AI_Vector_Storage` from `lib/`+`include/` to `plugins/genai/src/ai_core/`+`plugins/genai/include/ai_core/`.

- Replace direct global access (`GloAI`, `GloGATH`) with plugin-owned state held in a plugin context struct.
- Delete `#ifdef PROXYSQLGENAI` **inside** the moved files — they're unconditional within the plugin.
- Delete the `extern GloAI` / `extern GloGATH` declarations from core files that used them. Delete the corresponding `#ifdef PROXYSQLGENAI` blocks in `MySQL_Session.cpp`, `MySQL_Thread.cpp`, `PgSQL_Thread.cpp` that referenced them.
- Plugin's `init()` registers the pre-query hook from Phase A with a callback that forwards to `Anomaly_Detector`.
- Plugin's `init()` registers the `GENAI:` query-prefix hook from Phase A for the `GENAI:` command path.
- Plugin's `start()` spins up `GenAI_Thread` + whatever else needs threads.

After this phase, the `AI core` is fully plugin-resident. Stable-tier builds no longer compile any of it. `make` (stable) is slightly smaller.

### Phase E — File moves part 2: `mcp/`

**Depends on:** Phase D
**Approximate size:** ~15K lines moved (biggest phase)
**Risk level:** high (lots of files, the `ProxySQL_MCP_Server` → `AI_Tool_Handler` / `RAG_Tool_Handler` weld needs breaking)

Move `MCP_Thread`, `MCP_Endpoint`, `ProxySQL_MCP_Server`, `MCP_Tool_Handler`, plus the 7 AI-free tool handlers (`Admin_`, `Config_`, `Cache_`, `Stats_`, `Observe_`, `Query_`, `MySQL_Tool_Handler`), plus the schema-discovery helpers (`Static_Harvester`, `PgSQL_Static_Harvester`, `Discovery_Schema`, `MySQL_Catalog`, `MySQL_FTS`), into `plugins/genai/src/mcp/`.

**Break the `ProxySQL_MCP_Server → AI_Tool_Handler / RAG_Tool_Handler` include weld.** Introduce a `ToolHandlerRegistry` inside the plugin that tools register themselves with at `init()` time. `ProxySQL_MCP_Server` iterates the registry at runtime instead of `#include`ing handler headers. At the end of this phase, the two AI tool handlers are **still** in the plugin but now registered via the runtime registry — they haven't moved yet. They'll move to `bridge/` in Phase F.

After this phase, the MCP protocol surface is fully plugin-resident but still shipped in the same `.so` as the AI core.

### Phase F — File moves part 3: `bridge/`

**Depends on:** Phase E
**Approximate size:** ~1K lines moved, minor restructuring
**Risk level:** low

Move `AI_Tool_Handler` and `RAG_Tool_Handler` from wherever they landed in Phase E into `plugins/genai/src/bridge/`. Adjust their headers so they include from both `mcp/` and `ai_core/` (the only subdir allowed to).

Verify the cross-directory include lint check still passes. Delete the mandatory-include lines in `ProxySQL_MCP_Server.cpp` — the registry system now handles tool discovery.

After this phase, the three internal subdirectories match their long-term plugin split boundaries exactly. Moving to three physical plugins becomes a `git mv` + inter-plugin service lookup, deferrable to Plan #3.

### Phase G — Admin plumbing: delete GenAI `#ifdef`s from core

**Depends on:** Phase D (so the AI core is plugin-resident)
**Approximate size:** ~40 `#ifdef` sites unwound across ~8 files
**Risk level:** medium-high (touches `ProxySQL_Admin.cpp` which is load-bearing)

Convert every `#ifdef PROXYSQLGENAI` site in core to use the plugin's registration services:

- `lib/ProxySQL_Admin.cpp` (18 sites) — remove GenAI-specific table registration. Tables are now registered by the plugin via `services->register_table(...)` in `init()`.
- `lib/Admin_Handler.cpp` (14 sites) — remove GenAI-specific command dispatch. Commands are registered by the plugin via `services->register_command(...)` in `init()`.
- `lib/Admin_Bootstrap.cpp`, `lib/Admin_FlushVariables.cpp`, `lib/ProxySQL_Admin_Stats.cpp` — smaller, similar pattern.
- Header cleanup: `include/proxysql.h`, `include/cpp.h`, `include/proxysql_admin.h`, `include/ProxySQL_Admin_Tables_Definitions.h` — delete GenAI forward declarations and GenAI table DDL macros.
- Delete `#ifdef PROXYSQLGENAI` blocks from `MySQL_Thread.cpp`, `PgSQL_Thread.cpp` that referenced plugin internals or config defaults.
- Anything that's pure lifecycle (thread start/stop) moves entirely to the plugin's `start()`/`stop()` callbacks.

**Risk area:** admin command alias handling. `PLUGIN_API.md` says alias resolution (`TO RUN` → `TO RUNTIME`) happens in `Admin_Handler.cpp` and plugins only see canonical forms. The GenAI admin commands may have their own aliases that currently live in `Admin_Handler.cpp`. Those either stay in core (core knows about the plugin's alias vocabulary — a small knowledge leak) or move to the plugin (the plugin registers both canonical and alias forms). **See Open Question 4.**

After this phase, `grep -rn PROXYSQLGENAI lib/ include/ src/` returns only matches in `plugins/genai/` files (which should themselves be unconditional — see note in Phase D).

### Phase H — Build system, packaging, test migration

**Depends on:** Phase G
**Approximate size:** ~200 lines changed, packaging rework
**Risk level:** medium (packaging is its own beast)

Clean up:

- Top-level `Makefile`: teach `make` to build `plugins/genai/genai.so` when an `AI_TIER=1` (or similar) flag is set. Stable-tier `make` doesn't build it.
- Packaging: `.deb` / `.rpm` packages for the AI tier ship `proxysql` + `genai.so` + a `proxysql.cnf` with `plugins = ("/usr/lib/proxysql/genai.so")`.
- Decide: **remove `PROXYSQLGENAI=1` compile flag entirely, or keep it as a no-op alias** for users who have it in their build scripts. **See Open Question 2.**
- TAP tests: AI-tier tests load `genai.so` via the `plugins =` config line. `test/infra/` sets the config accordingly for GenAI test runs.
- CI: AI-tier CI jobs build both the core binary and the plugin, produce the combined package, and run the AI-tier TAP subset against it.
- Documentation: update `CLAUDE.md`, `doc/GENAI.md`, and the top-level README to reflect the plugin packaging model.

After this phase, the AI/MCP product tier is genuinely a separate plugin, distributed alongside the core binary, dynamically loaded at startup, and testable in isolation. `make` with no extra flags builds the stable tier and nothing else — no FFTO, no TSDB, no GenAI.

---

## 6. Phases at a glance

| Phase | Description | Dep | ~Size | Risk | Ships value if stopped here |
|---|---|---|---|---|---|
| **A** | ABI extension: hot-path hooks | — | 150 lines | low | yes, reusable for other plugins |
| **B** | Anomaly detector extraction | A | 1000 lines refactored | medium | **yes, code-quality win independent of plugin work** |
| **C** | `plugins/genai/` skeleton | A | 200 lines | low | no (stub plugin) |
| **D** | Move AI core into plugin | A, B, C | 7K lines | medium | partial (stable tier drops AI core) |
| **E** | Move MCP into plugin | D | 15K lines | high | partial (MCP + non-AI tool handlers plugin-resident) |
| **F** | Move AI tool handlers to `bridge/` | E | 1K lines | low | yes (enables future 3-way split) |
| **G** | Delete GenAI `#ifdef`s from core | D | 40 sites | medium-high | yes (core is GenAI-free) |
| **H** | Build/packaging/test migration | G | 200 lines | medium | yes (product-tier story is plugin-native) |

**Ordering note:** B and C can be done in either order; both depend only on A. D depends on both. Phase F depends on E but is a small cleanup (could be folded into E if preferred — they're intimately related).

---

## 7. Key design decisions that need a fresh-mind call

Things I have opinions on but want confirmed before execution starts.

### 7.1 FFTO and TSDB — plugin-ize in this plan, or defer?

Today, `PROXYSQLGENAI=1` implies `PROXYSQL31=1` implies `PROXYSQLFFTO=1` and `PROXYSQLTSDB=1`. The innovative tier (`PROXYSQL31=1`) is built from FFTO + TSDB.

**Proposed (default, my recommendation):** keep FFTO and TSDB as compile-time features in core. Plan #2 only moves GenAI. The innovative tier (v3.1.x) continues to be a compile-time product. `PROXYSQLFFTO` and `PROXYSQLTSDB` remain `#ifdef`s in core files — they are much smaller surfaces than GenAI and don't have the same bleed.

**Alternative:** plugin-ize FFTO and TSDB in the same effort. This roughly doubles the scope and the new plugin API surface needs to accommodate two more sets of lifecycle hooks. Probably not worth it in one plan.

### 7.2 Fate of the `PROXYSQLGENAI=1` compile flag

Plan #2 moves the GenAI code out of core. At the end of Phase H, the flag either:

**Option A (cleanest):** delete the flag entirely. `make PROXYSQLGENAI=1` becomes an error. The AI tier is built via `AI_TIER=1` or `make ai-tier` or similar, which produces core + plugin.

**Option B (conservative):** keep the flag as a no-op alias for `AI_TIER=1`. Users with existing build scripts that pass `PROXYSQLGENAI=1` don't break. After a grace period, deprecate.

**Option C (minimal):** leave `PROXYSQLGENAI=1` alone and have it mean "build + include the plugin". The downside is the name is wrong (there's no GenAI in core anymore), but nothing breaks.

**Recommendation:** Option B. Don't break build scripts, but don't pretend GenAI is a core feature.

### 7.3 Shape of the session-context accessor for the anomaly detector

When the ~970-line detector block moves out of `MySQL_Session.cpp`, it needs some way to read session state. Three options:

**Option A — opaque handle + accessor callbacks in the plugin services struct:**
```cpp
const char* (*session_get_current_sql)(void* session_handle);
const char* (*session_get_username)(void* session_handle);
const char* (*session_get_schemaname)(void* session_handle);
const char* (*session_get_client_addr)(void* session_handle);
...
```
**Pro:** ABI-safe, no C++ classes cross the plugin boundary.
**Con:** Verbose. Every piece of state the detector needs requires a new service callback. Currently it touches a LOT of session state.

**Option B — copy a context struct by value:**
```cpp
struct PreQueryContext {
    const char* sql;
    size_t sql_len;
    const char* username;
    const char* schemaname;
    const char* client_addr;
    uint64_t client_flags;
    ... (as many as the detector needs)
};
```
Core allocates and fills the struct before calling the hook; plugin reads it.
**Pro:** Single ABI contract, single copy cost.
**Con:** If the detector needs new fields later, the struct grows — still additive since plugins can check a struct size field.

**Option C — session as thin C-struct pointer, fields directly readable:**
Avoid this. It makes the ABI fragile because the struct layout becomes part of the contract.

**Recommendation:** Option B for the obvious fields, with a small set of Option A accessor callbacks for anything the detector needs occasionally that doesn't belong in the hot-path context struct. Details to be worked out in Phase A's detailed plan.

### 7.4 Admin command alias handling

`PLUGIN_API.md` says alias resolution happens in `Admin_Handler.cpp` (core) and plugins see only canonical command forms. Today, GenAI admin commands like `LOAD MYSQL LLM_SERVERS TO RUNTIME` may have aliases (`TO RUN`, `FROM MEM`, etc.) that live in Admin_Handler.cpp's alias vectors.

Two options:

**Option A — keep alias vectors in core:** Core knows about the plugin's alias vocabulary. A small knowledge leak from plugin to core. Follows the current `PLUGIN_API.md` direction.

**Option B — plugin registers both canonical and alias forms:** Plugin's `init()` does `services->register_command("LOAD MYSQL LLM_SERVERS TO RUNTIME", cb); services->register_command("LOAD MYSQL LLM_SERVERS TO RUN", cb);`. Core has no GenAI knowledge. Cleaner but duplicates every command N times where N = number of aliases.

**Recommendation:** Option A for now (matches the docs), but this should be re-examined if and when Plan #3 splits the plugin. If MCP, GenAI, and bridge become three separate plugins, core can't know about all their alias vocabularies — so a service for "plugin registers an alias for one of its own commands" becomes necessary.

### 7.5 Plugin-owned scheduler vs core scheduler

Today `GenAI_Thread` and `MCP_Thread` have their own event loops (they spin up their own libev or similar). They don't piggy-back on the core scheduler.

**Nothing to decide here** — the plugin's `start()` callback is the natural place to spin up these threads. The plugin owns their lifecycle and cleans them up in `stop()`.

Note: `SQLite3_Server` and `ClickHouse_Server` are a different story — they piggy-back on Admin's scheduler tick and wake-pipe. Those wouldn't apply here.

### 7.6 Metrics / Prometheus integration

GenAI publishes metrics (LLM call counts, latency, anomaly hit rate, etc.) via ProxySQL's Prometheus exporter. The plugin ABI v1 doesn't have a metrics registration service.

Two options:

**Option A — plugin owns a `stats_db` table.** The core Prometheus exporter already reads from `stats_db` for built-in metrics. Plugins can register a stats table via the existing `services->register_table(stats_db, ...)` callback, and the core exporter will pick it up automatically (verify this — see `ProxySQL_Admin_Stats.cpp`). No ABI change needed.

**Option B — add a `register_metric(name, getter_cb)` service.** Lower-overhead for hot metrics; more flexible.

**Recommendation:** Option A unless verification shows the core exporter doesn't iterate plugin-registered tables. Check in Phase D.

---

## 8. Open questions (require input before execution)

1. **Branch point for Plan #2.** Currently a worktree exists at `.worktrees/genai-plugin` based on the Plan #1 branch (`v3.0-remove-sqlite-rembed`). Alternative: rebranch from `v3.0` directly, and rebase onto `v3.0` after #5616 merges. Either works. The Plan #1 branch base is slightly more convenient because the plan-narrative assumes Rust is gone.

2. **Phase A detailed plan first, or all 8 detailed plans written upfront?** My default is: write Phase A's detailed implementation plan now, execute it, learn from it, then write Phase B's. Each phase's plan benefits from the experience of the previous phase. Alternative: write all 8 detailed plans upfront — higher upfront cost, but gives a complete roadmap. **Default: phase-by-phase.**

3. **MCP rules integration (#5616 note).** I saw `ADMIN_SQLITE_TABLE_MCP_QUERY_RULES` macro-redefinition warnings in the build log during Plan #1 verification. These predate Plan #1. They're MCP-rules related. Need to check whether they affect any of Plan #2's file moves or touch the MCP plugin files in non-obvious ways.

4. **How aggressive should Phase G be about `#ifdef PROXYSQLGENAI` removal?** After Phase G, the goal is zero `#ifdef PROXYSQLGENAI` sites in core. But some may exist as defensive guards that don't strictly need the plugin (e.g., config variable defaults). The detailed Phase G plan should enumerate every site and decide per-site whether to delete, convert to runtime-check, or keep.

5. **What's the deprecation story for existing `PROXYSQLGENAI=1` users?** (See decision 7.2.) Is there an external user base that explicitly builds with `PROXYSQLGENAI=1` and depends on it being a compile-time flag? If yes, Option B (keep as alias); if no, Option A (delete).

6. **Test infrastructure.** `test/infra/` has Docker environments. Does any existing AI-tier test assume the plugin is compiled into the binary rather than loaded at runtime? If yes, those tests need updating — in which phase? Probably Phase H.

---

## 9. Explicit non-goals for Plan #2

Things that sound related but are **out of scope**:

- **MySQL/PgSQL as plugins.** This is a ~10x bigger effort. It requires a full service-registry redesign, Admin wire-protocol extraction, and `src/main.cpp` startup-ordering rework. Not part of Plan #2.
- **ClickHouse_Server / SQLite3_Server as plugins.** Different seam (they ride on core's `MySQL_Thread` via a MySQL-wire-interface service that doesn't exist yet). Would need its own plan. Can be done in parallel with Plan #2 or after, same ABI extensions but different services.
- **Inter-plugin service registry.** Needed only if the GenAI plugin physically splits into three `.so`s (Plan #3). Plan #2 ends with one `.so` and enforces the module boundary via directory layout + lint, not inter-plugin calls.
- **Removing FFTO/TSDB `#ifdef`s from core.** Innovative tier stays compile-time. See decision 7.1.
- **Fixing the pre-existing `Makefile` default-goal bug** (flagged in PR #5616). Unrelated, separate small PR.

---

## 10. Size and effort estimate (rough)

I deliberately avoid time estimates, but phase-weighted complexity:

| Phase | Relative complexity |
|---|---|
| A | 1× (baseline — small ABI additive change) |
| B | 2× (refactor of 1000-line hot-path block) |
| C | 1× (skeleton) |
| D | 4× (first real code motion, ~7K lines, ~15-20 ifdef removals) |
| E | 6× (biggest phase, tool-handler registry redesign) |
| F | 1× (small cleanup after E) |
| G | 3× (40 ifdef sites, admin-heavy) |
| H | 2× (build system + packaging + tests) |

Total: ~20× the "A baseline". Phase A + B together are ~3×, which is roughly the size of Plan #1 (9 commits). So expect ~6-8 PRs of Plan #1 size to land the whole thing.

---

## 11. Current state

- **Worktree:** `/home/rene/aa/ab/proxysql/.worktrees/genai-plugin` exists, on branch `v3.0-genai-plugin` at `6cfcb2b40`, branched from `v3.0-remove-sqlite-rembed` (Plan #1).
- **Plan document location:** nothing written to the repo yet. Once Phase A's detailed implementation plan is ready, it goes to `docs/superpowers/plans/2026-04-XX-plan2a-plugin-abi-extension.md` on the `v3.0-genai-plugin` branch.
- **PR #5616 (Plan #1):** OPEN, targeting `v3.0`. Plan #2 depends on its merge — rebase as needed after it lands.

---

## 12. References

- **PR #5616** — Plan #1: sqlite-rembed and Rust toolchain removal.
- **PR #5593 / branch `ProtocolX`** — v1 plugin ABI introduction. Key files:
  - `doc/PLUGIN_API.md` — authoritative ABI spec
  - `include/ProxySQL_Plugin.h` — ABI header
  - `plugins/mysqlx/` — reference plugin implementation
- `lib/MySQL_Session.cpp:3634-4605` — the anomaly detector block that needs extraction (Phase B).
- `lib/MySQL_Session.cpp:5348-5355` — hot-path seam 1 (per-query anomaly check).
- `lib/MySQL_Session.cpp:7163-7182` — hot-path seam 2 (`GENAI:` prefix interception).
- `lib/ProxySQL_MCP_Server.cpp` — the include weld that couples MCP to AI (line ~17-18 of the included tool handlers).
- All ~30 files in `lib/`+`include/` guarded by `#ifdef PROXYSQLGENAI` (see Section 2 table).

---

**Next action:** none from me — this is here to be re-read with a fresh mind. When ready, the implementation entry point is Phase A's detailed plan (writing-plans skill → `docs/superpowers/plans/...-plan2a-plugin-abi-extension.md`), dispatched via subagent-driven-development or inline execution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: extract GenAI/MCP/RAG stack as a ProxySQL plugin (Plan #2 master doc) #5617

Extract GenAI/MCP/RAG stack as a ProxySQL plugin — design master document

1. Why do this

2. State of the tree after Plan #1

3. Design snapshot: the include-graph

4. Target architecture (Plan #2 endpoint)

5. Phase breakdown

Phase A — ABI extension: hot-path hook services

Phase B — Anomaly detector extraction (pre-plugin-ification cleanup)

Phase C — `plugins/genai/` skeleton

Phase D — File moves part 1: `ai_core/`

Phase E — File moves part 2: `mcp/`

Phase F — File moves part 3: `bridge/`

Phase G — Admin plumbing: delete GenAI `#ifdef`s from core

Phase H — Build system, packaging, test migration

6. Phases at a glance

7. Key design decisions that need a fresh-mind call

7.1 FFTO and TSDB — plugin-ize in this plan, or defer?

7.2 Fate of the `PROXYSQLGENAI=1` compile flag

7.3 Shape of the session-context accessor for the anomaly detector

7.4 Admin command alias handling

7.5 Plugin-owned scheduler vs core scheduler

7.6 Metrics / Prometheus integration

8. Open questions (require input before execution)

9. Explicit non-goals for Plan #2

10. Size and effort estimate (rough)

11. Current state

12. References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Subsystem	Files	Approx lines
AI core (cyclic)	`GenAI_Thread`, `AI_Features_Manager`, `LLM_Bridge`, `LLM_Clients`, `Anomaly_Detector`, `AI_Vector_Storage`	~6.5K
MCP protocol	`MCP_Thread`, `MCP_Endpoint`, `ProxySQL_MCP_Server`, `MCP_Tool_Handler`	~1.3K
MCP tool handlers (AI-free)	`Admin_Tool_Handler`, `Config_Tool_Handler`, `Cache_Tool_Handler`, `Stats_Tool_Handler`, `Observe_Tool_Handler`, `Query_Tool_Handler`, `MySQL_Tool_Handler`	~?
MCP tool handlers (AI-backed)	`AI_Tool_Handler`, `RAG_Tool_Handler`	~?
Schema discovery helpers	`Static_Harvester`, `PgSQL_Static_Harvester`, `Discovery_Schema`, `MySQL_Catalog`, `MySQL_FTS`	~?

File	Sites	Nature
`lib/ProxySQL_Admin.cpp`	18	Admin table + command registration
`lib/Admin_Handler.cpp`	14	Command dispatch
`lib/MySQL_Session.cpp`	8	2 hot-path, 1 extern block, ~970-line anomaly detector body
`lib/MySQL_Thread.cpp`	~7	Lifecycle init, config defaults, global access
`lib/PgSQL_Thread.cpp`	a few	Similar to MySQL_Thread, less intrusive
`lib/ProxySQL_Admin_Stats.cpp`	a few	Stats table registration
`lib/Admin_Bootstrap.cpp`	a few	Bootstrap sequencing
`lib/Admin_FlushVariables.cpp`	a few	Variable flush
Headers: `proxysql.h`, `cpp.h`, `proxysql_admin.h`, `ProxySQL_Admin_Tables_Definitions.h`	a few	Forward declarations, table DDL macros

Phase	Description	Dep	~Size	Risk	Ships value if stopped here
A	ABI extension: hot-path hooks	—	150 lines	low	yes, reusable for other plugins
B	Anomaly detector extraction	A	1000 lines refactored	medium	yes, code-quality win independent of plugin work
C	`plugins/genai/` skeleton	A	200 lines	low	no (stub plugin)
D	Move AI core into plugin	A, B, C	7K lines	medium	partial (stable tier drops AI core)
E	Move MCP into plugin	D	15K lines	high	partial (MCP + non-AI tool handlers plugin-resident)
F	Move AI tool handlers to `bridge/`	E	1K lines	low	yes (enables future 3-way split)
G	Delete GenAI `#ifdef`s from core	D	40 sites	medium-high	yes (core is GenAI-free)
H	Build/packaging/test migration	G	200 lines	medium	yes (product-tier story is plugin-native)

Phase	Relative complexity
A	1× (baseline — small ABI additive change)
B	2× (refactor of 1000-line hot-path block)
C	1× (skeleton)
D	4× (first real code motion, ~7K lines, ~15-20 ifdef removals)
E	6× (biggest phase, tool-handler registry redesign)
F	1× (small cleanup after E)
G	3× (40 ifdef sites, admin-heavy)
H	2× (build system + packaging + tests)

Design: extract GenAI/MCP/RAG stack as a ProxySQL plugin (Plan #2 master doc) #5617

Description

Extract GenAI/MCP/RAG stack as a ProxySQL plugin — design master document

1. Why do this

2. State of the tree after Plan #1

3. Design snapshot: the include-graph

4. Target architecture (Plan #2 endpoint)

5. Phase breakdown

Phase A — ABI extension: hot-path hook services

Phase B — Anomaly detector extraction (pre-plugin-ification cleanup)

Phase C — plugins/genai/ skeleton

Phase D — File moves part 1: ai_core/

Phase E — File moves part 2: mcp/

Phase F — File moves part 3: bridge/

Phase G — Admin plumbing: delete GenAI #ifdefs from core

Phase H — Build system, packaging, test migration

6. Phases at a glance

7. Key design decisions that need a fresh-mind call

7.1 FFTO and TSDB — plugin-ize in this plan, or defer?

7.2 Fate of the PROXYSQLGENAI=1 compile flag

7.3 Shape of the session-context accessor for the anomaly detector

7.4 Admin command alias handling

7.5 Plugin-owned scheduler vs core scheduler

7.6 Metrics / Prometheus integration

8. Open questions (require input before execution)

9. Explicit non-goals for Plan #2

10. Size and effort estimate (rough)

11. Current state

12. References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Phase C — `plugins/genai/` skeleton

Phase D — File moves part 1: `ai_core/`

Phase E — File moves part 2: `mcp/`

Phase F — File moves part 3: `bridge/`

Phase G — Admin plumbing: delete GenAI `#ifdef`s from core

7.2 Fate of the `PROXYSQLGENAI=1` compile flag