From 1709a228ba5b1ae3b8cb48847d8bdc96274d0144 Mon Sep 17 00:00:00 2001 From: konard Date: Fri, 20 Mar 2026 00:25:58 +0000 Subject: [PATCH 1/3] Initial commit with task details Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: https://github.com/xlabtg/teleton-agent/issues/87 --- .gitkeep | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.gitkeep b/.gitkeep index 092d6af2..75b77880 100644 --- a/.gitkeep +++ b/.gitkeep @@ -1 +1,2 @@ -# .gitkeep file auto-generated at 2026-03-18T18:52:42.191Z for PR creation at branch issue-43-e5f7241e2b5e for issue https://github.com/xlabtg/teleton-agent/issues/43 \ No newline at end of file +# .gitkeep file auto-generated at 2026-03-18T18:52:42.191Z for PR creation at branch issue-43-e5f7241e2b5e for issue https://github.com/xlabtg/teleton-agent/issues/43 +# Updated: 2026-03-20T00:25:58.685Z \ No newline at end of file From e5c57c4296b10d9d80be6d4be3f3832134bc5f16 Mon Sep 17 00:00:00 2001 From: konard Date: Fri, 20 Mar 2026 00:36:46 +0000 Subject: [PATCH 2/3] docs: add 21 V2 architecture templates for next-gen agent capabilities Create professional detailed templates covering 9 blocks of V2 architecture: - Memory System (semantic vectors, associative graph, prioritization) - Predictive Intelligence (prediction engine, caching, anomaly detection) - Multi-Agent System (registry, delegation, pipelines, self-correction) - Time Intelligence (temporal context, predictive scheduling) - Security Layer (zero-trust execution, audit trail) - Integrations (unified layer, webhooks & event bus) - Generative UI (dynamic dashboard, AI widget generator) - Self-Improvement (feedback learning, adaptive prompting) - Agent Network (cross-agent communication protocol) Each template follows the existing project format with Current State, Problem, What to Implement, Implementation Steps, and Notes sections. Closes #87 Co-Authored-By: Claude Opus 4.6 --- improvements/README.md | 74 ++++++++++ improvements/v2-01-semantic-vector-memory.md | 88 ++++++++++++ .../v2-02-associative-memory-graph.md | 87 ++++++++++++ .../v2-03-memory-prioritization-engine.md | 85 ++++++++++++ improvements/v2-04-prediction-engine.md | 83 +++++++++++ improvements/v2-05-predictive-caching.md | 80 +++++++++++ improvements/v2-06-anomaly-detection.md | 92 ++++++++++++ improvements/v2-07-agent-registry.md | 96 +++++++++++++ improvements/v2-08-task-delegation.md | 87 ++++++++++++ improvements/v2-09-pipeline-execution.md | 112 +++++++++++++++ improvements/v2-10-self-correcting-loop.md | 89 ++++++++++++ improvements/v2-11-temporal-context.md | 80 +++++++++++ improvements/v2-12-predictive-scheduling.md | 91 ++++++++++++ improvements/v2-13-zero-trust-execution.md | 115 +++++++++++++++ improvements/v2-14-audit-trail.md | 103 ++++++++++++++ .../v2-15-unified-integration-layer.md | 110 +++++++++++++++ improvements/v2-16-webhooks-event-bus.md | 108 +++++++++++++++ improvements/v2-17-dynamic-dashboard.md | 100 +++++++++++++ improvements/v2-18-ai-widget-generator.md | 96 +++++++++++++ improvements/v2-19-feedback-learning.md | 114 +++++++++++++++ improvements/v2-20-adaptive-prompting.md | 114 +++++++++++++++ improvements/v2-21-multi-agent-network.md | 131 ++++++++++++++++++ 22 files changed, 2135 insertions(+) create mode 100644 improvements/v2-01-semantic-vector-memory.md create mode 100644 improvements/v2-02-associative-memory-graph.md create mode 100644 improvements/v2-03-memory-prioritization-engine.md create mode 100644 improvements/v2-04-prediction-engine.md create mode 100644 improvements/v2-05-predictive-caching.md create mode 100644 improvements/v2-06-anomaly-detection.md create mode 100644 improvements/v2-07-agent-registry.md create mode 100644 improvements/v2-08-task-delegation.md create mode 100644 improvements/v2-09-pipeline-execution.md create mode 100644 improvements/v2-10-self-correcting-loop.md create mode 100644 improvements/v2-11-temporal-context.md create mode 100644 improvements/v2-12-predictive-scheduling.md create mode 100644 improvements/v2-13-zero-trust-execution.md create mode 100644 improvements/v2-14-audit-trail.md create mode 100644 improvements/v2-15-unified-integration-layer.md create mode 100644 improvements/v2-16-webhooks-event-bus.md create mode 100644 improvements/v2-17-dynamic-dashboard.md create mode 100644 improvements/v2-18-ai-widget-generator.md create mode 100644 improvements/v2-19-feedback-learning.md create mode 100644 improvements/v2-20-adaptive-prompting.md create mode 100644 improvements/v2-21-multi-agent-network.md diff --git a/improvements/README.md b/improvements/README.md index 5e78699b..304a61d6 100644 --- a/improvements/README.md +++ b/improvements/README.md @@ -63,6 +63,80 @@ These files are intended to be used as the basis for creating individual GitHub | Dark/Light Theme Toggle | Done | | Keyboard Shortcuts (Ctrl+S) | Done | +--- + +## V2 Architecture — Next-Gen Agent Capabilities + +Professional detailed templates for the V2 architecture: memory, multi-agent, predictive systems, self-learning, security, integrations, and adaptive UI. + +### Block 1 — Memory System (Foundation) + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-01 | [v2-01-semantic-vector-memory.md](v2-01-semantic-vector-memory.md) | Semantic Vector Memory with Embeddings | Medium | +| V2-02 | [v2-02-associative-memory-graph.md](v2-02-associative-memory-graph.md) | Associative Graph-Based Memory | High | +| V2-03 | [v2-03-memory-prioritization-engine.md](v2-03-memory-prioritization-engine.md) | Importance-Based Memory Retention | Medium | + +### Block 2 — Predictive Intelligence + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-04 | [v2-04-prediction-engine.md](v2-04-prediction-engine.md) | Prediction Engine for Next User Actions | High | +| V2-05 | [v2-05-predictive-caching.md](v2-05-predictive-caching.md) | Predictive Caching Layer | Medium | +| V2-06 | [v2-06-anomaly-detection.md](v2-06-anomaly-detection.md) | Anomaly Detection for Unusual Behavior | Medium | + +### Block 3 — Multi-Agent System + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-07 | [v2-07-agent-registry.md](v2-07-agent-registry.md) | Agent Registry and Roles | Very High | +| V2-08 | [v2-08-task-delegation.md](v2-08-task-delegation.md) | Automatic Task Delegation System | Very High | +| V2-09 | [v2-09-pipeline-execution.md](v2-09-pipeline-execution.md) | Pipeline-Based Task Execution | Very High | +| V2-10 | [v2-10-self-correcting-loop.md](v2-10-self-correcting-loop.md) | Self-Correcting Execution Loop | High | + +### Block 4 — Time Intelligence + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-11 | [v2-11-temporal-context.md](v2-11-temporal-context.md) | Time-Aware Context System | Medium | +| V2-12 | [v2-12-predictive-scheduling.md](v2-12-predictive-scheduling.md) | Smart Task Scheduling | High | + +### Block 5 — Security Layer + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-13 | [v2-13-zero-trust-execution.md](v2-13-zero-trust-execution.md) | Zero-Trust Validation for Actions | High | +| V2-14 | [v2-14-audit-trail.md](v2-14-audit-trail.md) | Full Audit Logs for Agent Decisions | High | + +### Block 6 — Integrations + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-15 | [v2-15-unified-integration-layer.md](v2-15-unified-integration-layer.md) | Unified API Layer for External Services | High | +| V2-16 | [v2-16-webhooks-event-bus.md](v2-16-webhooks-event-bus.md) | Event-Driven Architecture | High | + +### Block 7 — Generative UI + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-17 | [v2-17-dynamic-dashboard.md](v2-17-dynamic-dashboard.md) | Dynamic Dashboard Generation | High | +| V2-18 | [v2-18-ai-widget-generator.md](v2-18-ai-widget-generator.md) | Auto-Generated Widgets Based on Usage | High | + +### Block 8 — Self-Improvement + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-19 | [v2-19-feedback-learning.md](v2-19-feedback-learning.md) | Feedback-Based Learning Loop | High | +| V2-20 | [v2-20-adaptive-prompting.md](v2-20-adaptive-prompting.md) | Dynamic Prompt Optimization | Very High | + +### Block 9 — Agent Network (Advanced / Optional) + +| # | File | Area | Complexity | +|---|------|------|-----------| +| V2-21 | [v2-21-multi-agent-network.md](v2-21-multi-agent-network.md) | Cross-Agent Communication Protocol | Very High | + +--- + ## Complexity Legend - **Low** — 1-2 days, minimal backend changes diff --git a/improvements/v2-01-semantic-vector-memory.md b/improvements/v2-01-semantic-vector-memory.md new file mode 100644 index 00000000..a09ea084 --- /dev/null +++ b/improvements/v2-01-semantic-vector-memory.md @@ -0,0 +1,88 @@ +# Semantic Vector Memory + +## Current State + +The agent uses a SQLite-backed memory system (`src/memory/`) for storing conversation context. The existing `Memory.tsx` page provides basic memory browsing. Embeddings support exists via `sqlite-vec` (upgraded to `^0.1.7` stable in PR #86), but there is no semantic search API exposed to users or to the agent itself for retrieval-augmented generation. + +## Problem + +- Memory retrieval is keyword-based or by exact ID — no "search by meaning" +- The agent cannot recall contextually similar past conversations or tool results +- Users cannot search memory using natural language queries +- Relevant past context is lost unless explicitly referenced +- No way to surface related tasks or outcomes from prior sessions + +## What to Implement + +### 1. Vector Storage Layer +- **Backend**: Extend SQLite with `sqlite-vec` to store embedding vectors alongside memory entries +- **Schema**: `memory_vectors (id, memory_id FK, embedding BLOB, model TEXT, created_at)` +- **Embedding models**: Support OpenAI `text-embedding-3-small` and local alternatives (e.g., `@xenova/transformers`) +- **Storage targets**: + - Conversation messages (user + assistant turns) + - Task descriptions and outcomes + - Tool invocation results (summarized) + +### 2. Semantic Search API +- **Endpoint**: `GET /api/memory/search?q=&limit=10&threshold=0.7` +- **Flow**: + 1. Embed the query string + 2. Perform cosine similarity search against stored vectors + 3. Return ranked results with similarity scores +- **Endpoint**: `GET /api/memory/related/:id` — find memories semantically related to a given memory entry + +### 3. Agent Context Integration +- **Auto-retrieval**: Before each LLM call, retrieve top-K relevant memories based on the current conversation context +- **Injection**: Append retrieved memories as a "relevant context" section in the system prompt +- **Configurable**: Enable/disable via `config.yaml` → `memory.semantic_search.enabled: true` +- **Token budget**: Configurable max tokens for injected context (default: 1000) + +### 4. Memory Indexing Pipeline +- **On-write**: When a new memory entry is created, compute and store its embedding asynchronously +- **Batch reindex**: `POST /api/memory/reindex` — recompute all embeddings (for model changes) +- **Progress tracking**: Reindex job with status endpoint `GET /api/memory/reindex/status` + +### 5. Web UI Enhancements +- **Location**: Enhance existing `Memory.tsx` page +- **Features**: + - Semantic search bar with natural language input + - "Similar memories" sidebar when viewing a memory entry + - Visual similarity scores on search results + +### Backend Architecture +- `src/memory/vector-store.ts` — vector storage and retrieval using `sqlite-vec` +- `src/memory/embeddings.ts` — embedding computation (provider-agnostic) +- `src/memory/semantic-search.ts` — search orchestration, ranking, filtering +- `src/webui/routes/memory.ts` — extend with search endpoints + +### Implementation Steps + +1. Create `vector-store.ts` with `sqlite-vec` integration for insert/query +2. Create `embeddings.ts` with provider abstraction (OpenAI, local) +3. Add `memory_vectors` table migration +4. Create semantic search service with cosine similarity ranking +5. Add `/api/memory/search` and `/api/memory/related/:id` endpoints +6. Integrate auto-retrieval into `src/agent/runtime.ts` before LLM calls +7. Add reindex pipeline with job status tracking +8. Enhance `Memory.tsx` with semantic search UI +9. Add configuration options to `config.yaml` + +### Files to Modify +- `src/memory/` — new files for vector store, embeddings, semantic search +- `src/webui/routes/memory.ts` — add search endpoints +- `src/agent/runtime.ts` — integrate semantic context retrieval +- `web/src/pages/Memory.tsx` — add search UI +- `web/src/lib/api.ts` — add memory search API calls +- `config.example.yaml` — add semantic search config section + +### Acceptance Criteria +- Search by meaning, not keywords — "what did we discuss about performance?" returns relevant results +- API: `/api/memory/search?q=...` returns ranked results with similarity scores +- Works with existing agent context pipeline +- Configurable embedding provider and token budget + +### Notes +- **Medium complexity** — `sqlite-vec` is already a dependency, main work is the search/indexing pipeline +- Embedding computation adds latency; run asynchronously and cache aggressively +- Consider chunking long texts before embedding (max ~512 tokens per chunk) +- Rate-limit embedding API calls to avoid cost spikes during bulk reindex diff --git a/improvements/v2-02-associative-memory-graph.md b/improvements/v2-02-associative-memory-graph.md new file mode 100644 index 00000000..4c507e61 --- /dev/null +++ b/improvements/v2-02-associative-memory-graph.md @@ -0,0 +1,87 @@ +# Associative Memory Graph + +## Current State + +Memory entries are stored as flat rows in SQLite. Relationships between entities (tasks, tools, conversations, outcomes) are implicit — buried in conversation text rather than explicitly modeled. There is no way to traverse connections like "which tools were used for task X?" or "what conversations led to outcome Y?" + +## Problem + +- No explicit relationships between memory entities +- Cannot answer "what tools were used when we discussed topic X?" +- Cannot trace decision chains: task → tool → outcome → follow-up +- Related context is scattered across separate memory entries with no links +- The agent cannot reason about connections between past interactions + +## What to Implement + +### 1. Graph Schema +- **Nodes**: Entities extracted from agent interactions + - `conversations` — individual sessions/threads + - `tasks` — user-requested tasks and their outcomes + - `tools` — tool invocations with parameters and results + - `topics` — extracted topics/themes from conversations + - `entities` — named entities (people, projects, URLs, etc.) +- **Edges**: Typed relationships + - `conversation → USED_TOOL → tool` + - `task → PRODUCED → outcome` + - `conversation → ABOUT → topic` + - `task → RELATED_TO → task` + - `entity → MENTIONED_IN → conversation` +- **Storage**: SQLite tables `graph_nodes (id, type, label, metadata JSON, created_at)` and `graph_edges (id, source_id, target_id, relation, weight, created_at)` + +### 2. Entity Extraction Pipeline +- **On each agent turn**: Extract entities and relationships using LLM-based extraction +- **Extraction prompt**: Structured output requesting entities, types, and relationships +- **Fallback**: Regex-based extraction for common patterns (URLs, @mentions, dates) +- **Deduplication**: Fuzzy matching to avoid duplicate nodes for the same entity + +### 3. Graph Query API +- `GET /api/memory/graph/nodes?type=tool&q=search` — list/search nodes +- `GET /api/memory/graph/node/:id/related?depth=2` — traverse relationships up to N hops +- `GET /api/memory/graph/path?from=:id&to=:id` — find shortest path between nodes +- `GET /api/memory/graph/context?task_id=:id` — get full context graph for a task + +### 4. Agent Context Enrichment +- When processing a new message, query the graph for related context +- Combine with semantic vector search (v2-01) for hybrid retrieval +- Provide the agent with structured relationship context, not just raw text + +### 5. Graph Visualization UI +- **Location**: New tab on `Memory.tsx` page — "Knowledge Graph" +- **Library**: [react-force-graph](https://github.com/vasturiano/react-force-graph) or D3.js force layout +- **Features**: + - Interactive node-link diagram + - Filter by node type and relationship type + - Click node to see details and connected entities + - Search and highlight paths + +### Backend Architecture +- `src/memory/graph-store.ts` — CRUD for nodes and edges +- `src/memory/entity-extractor.ts` — LLM-based entity/relationship extraction +- `src/memory/graph-query.ts` — traversal and path-finding algorithms +- `src/webui/routes/graph.ts` — API endpoints + +### Implementation Steps + +1. Design and create `graph_nodes` and `graph_edges` SQLite tables +2. Implement `graph-store.ts` with node/edge CRUD operations +3. Implement `entity-extractor.ts` with LLM-based extraction +4. Hook extraction into `src/agent/runtime.ts` post-response pipeline +5. Implement graph query service with traversal algorithms +6. Add API endpoints for graph queries +7. Create graph visualization component in `web/src/components/` +8. Add "Knowledge Graph" tab to `Memory.tsx` + +### Files to Modify +- `src/memory/` — new files for graph store, extraction, queries +- `src/agent/runtime.ts` — hook entity extraction into post-response pipeline +- `src/webui/routes/` — add graph API routes +- `web/src/pages/Memory.tsx` — add graph visualization tab +- `web/package.json` — add graph visualization library + +### Notes +- **High complexity** — requires entity extraction pipeline and graph algorithms +- Entity extraction via LLM adds cost per message; consider batch extraction or extracting only on "interesting" turns +- Graph size will grow; implement pagination and limit traversal depth +- Consider using the graph to enhance the semantic search from v2-01 (hybrid retrieval) +- Start with simple relationship types; extend as patterns emerge from real usage diff --git a/improvements/v2-03-memory-prioritization-engine.md b/improvements/v2-03-memory-prioritization-engine.md new file mode 100644 index 00000000..d75336c4 --- /dev/null +++ b/improvements/v2-03-memory-prioritization-engine.md @@ -0,0 +1,85 @@ +# Memory Prioritization Engine + +## Current State + +All memory entries have equal weight. There is no mechanism to determine which memories are important and should be retained vs. which are stale or irrelevant. Over time, the memory store grows unboundedly, making retrieval slower and context injection noisier. + +## Problem + +- All memories are treated equally regardless of relevance or freshness +- No automatic cleanup of stale or low-value data +- Context injection (v2-01) has no quality signal for ranking +- Storage grows without bounds, degrading search performance +- No way to distinguish between a critical decision and a casual remark + +## What to Implement + +### 1. Importance Scoring Model +- **Scoring dimensions**: + - **Recency**: Exponential decay — recent memories score higher + - **Frequency**: How often a memory or its entities are referenced + - **Impact**: Did this memory lead to a successful task outcome? + - **Explicit markers**: User-flagged memories ("remember this") + - **Semantic centrality**: How connected is this node in the knowledge graph (v2-02)? +- **Composite score**: Weighted combination of all dimensions, normalized to 0.0–1.0 +- **Formula**: `score = w1*recency + w2*frequency + w3*impact + w4*explicit + w5*centrality` +- **Configurable weights**: Via `config.yaml` → `memory.prioritization.weights` + +### 2. Scoring Pipeline +- **On-access**: Bump frequency counter when a memory is retrieved or referenced +- **On-outcome**: When a task completes successfully, boost scores of memories used in its context +- **Periodic**: Background job (configurable interval, default: 1 hour) recalculates composite scores +- **Storage**: `memory_scores (memory_id, score, recency, frequency, impact, explicit, centrality, updated_at)` + +### 3. Auto-Cleanup Service +- **Retention policy**: Configurable in `config.yaml` + - `memory.retention.min_score: 0.1` — memories below this threshold are candidates for cleanup + - `memory.retention.max_age_days: 90` — hard limit regardless of score + - `memory.retention.max_entries: 10000` — cap total entries +- **Cleanup flow**: Score → rank → archive (move to `memory_archive` table) → delete after archive period +- **Protection**: Never delete user-flagged or explicitly marked memories +- **Endpoint**: `POST /api/memory/cleanup` — trigger manual cleanup with dry-run option + +### 4. Priority-Aware Retrieval +- Integrate scores into semantic search (v2-01) as a ranking boost +- `GET /api/memory/search?q=...&min_score=0.3` — filter by minimum importance +- Context injection uses score to allocate token budget: high-score memories get more space + +### 5. Memory Dashboard +- **Location**: Enhance `Memory.tsx` page +- **Features**: + - Score distribution chart (histogram) + - "At risk" memories list (approaching cleanup threshold) + - Manual score adjustment (pin / unpin) + - Cleanup history log + - Storage usage stats + +### Backend Architecture +- `src/memory/scoring.ts` — score calculation and update logic +- `src/memory/retention.ts` — cleanup policy evaluation and execution +- `src/memory/scheduler.ts` — periodic scoring and cleanup jobs + +### Implementation Steps + +1. Design `memory_scores` and `memory_archive` tables +2. Implement scoring model with configurable weights +3. Create scoring pipeline (on-access, on-outcome, periodic) +4. Implement retention policy engine with dry-run support +5. Integrate scores into semantic search ranking +6. Add API endpoints for cleanup and score management +7. Build memory dashboard UI components +8. Add configuration options to `config.yaml` + +### Files to Modify +- `src/memory/` — new files for scoring, retention, scheduler +- `src/memory/semantic-search.ts` — integrate score-based ranking +- `src/webui/routes/memory.ts` — add score/cleanup endpoints +- `web/src/pages/Memory.tsx` — add score visualization and management +- `config.example.yaml` — add prioritization and retention config + +### Notes +- **Medium complexity** — scoring model is straightforward; scheduling and retention need careful testing +- Cleanup is destructive — archive before deleting, and always support dry-run +- Score recalculation on large memory stores may be slow; use incremental updates where possible +- The scoring weights will need tuning based on real usage patterns +- This feature depends on v2-01 (semantic search) and benefits from v2-02 (graph centrality) diff --git a/improvements/v2-04-prediction-engine.md b/improvements/v2-04-prediction-engine.md new file mode 100644 index 00000000..40f91a38 --- /dev/null +++ b/improvements/v2-04-prediction-engine.md @@ -0,0 +1,83 @@ +# Prediction Engine + +## Current State + +The agent operates reactively — it waits for user input and then processes it. There is no analysis of user behavior patterns, no prediction of likely next actions, and no proactive suggestions. The existing analytics service (`src/services/analytics.ts`) records request metrics and costs but does not analyze patterns. + +## Problem + +- Agent is purely reactive — never anticipates user needs +- Repeated interaction patterns are not recognized or optimized +- Users must explicitly request everything, even routine tasks +- No learning from historical command sequences +- Missed opportunities for proactive assistance + +## What to Implement + +### 1. Behavior Pattern Analyzer +- **Data source**: Session history, tool invocations, message patterns from `request_metrics` and memory +- **Pattern types**: + - **Sequential patterns**: User typically does A → B → C (e.g., check status → run tests → deploy) + - **Temporal patterns**: User does X every Monday morning, Y at end of day + - **Contextual patterns**: When discussing topic T, user usually needs tool Z +- **Storage**: `behavior_patterns (id, pattern_type, pattern JSON, confidence, frequency, last_seen, created_at)` + +### 2. Prediction Model +- **Approach**: Lightweight Markov chain + frequency analysis (no heavy ML required) + - Build transition probability matrix from action sequences + - Weight by recency and frequency +- **Predictions**: + - Next likely command/request + - Tools likely to be needed + - Related topics the user might ask about +- **Confidence threshold**: Only surface predictions above configurable confidence (default: 0.6) + +### 3. Suggestions API +- `GET /api/predictions/next` — next predicted actions for current session context +- `GET /api/predictions/tools` — tools likely needed based on current conversation +- `GET /api/predictions/topics` — related topics the user might explore +- **Response format**: `[{ action: string, confidence: number, reason: string }]` + +### 4. Proactive Agent Behavior +- **Pre-load tools**: When prediction confidence is high, pre-initialize likely tools +- **Suggestion injection**: Optionally append "You might also want to..." suggestions +- **Configurable**: `config.yaml` → `predictions.enabled: true`, `predictions.proactive_suggestions: false` + +### 5. Prediction UI +- **Location**: Dashboard widget or sidebar panel +- **Features**: + - "Suggested next actions" card + - Confidence indicators (progress bars) + - One-click action execution from suggestions + - "Not helpful" feedback to improve predictions + +### Backend Architecture +- `src/services/predictions.ts` — pattern analysis and prediction engine +- `src/services/behavior-tracker.ts` — action sequence recording +- `src/webui/routes/predictions.ts` — API endpoints + +### Implementation Steps + +1. Create behavior tracking middleware in agent runtime +2. Design pattern storage schema +3. Implement Markov chain-based prediction model +4. Build pattern analyzer for sequential, temporal, and contextual patterns +5. Create predictions API endpoints +6. Integrate pre-loading for high-confidence tool predictions +7. Build suggestion UI widget +8. Add configuration options + +### Files to Modify +- `src/services/` — new prediction and behavior tracking services +- `src/agent/runtime.ts` — add behavior tracking hooks +- `src/webui/routes/` — add prediction endpoints +- `web/src/components/` — add suggestion widget +- `web/src/pages/Dashboard.tsx` — integrate prediction widget +- `config.example.yaml` — add prediction config + +### Notes +- **High complexity** — pattern analysis requires significant data and tuning +- Start with simple sequential pattern matching before adding temporal/contextual +- Predictions improve with usage — initial period will have low confidence +- Be careful with proactive suggestions: annoying suggestions are worse than no suggestions +- Privacy consideration: behavior patterns may be sensitive; respect data retention settings from v2-03 diff --git a/improvements/v2-05-predictive-caching.md b/improvements/v2-05-predictive-caching.md new file mode 100644 index 00000000..7de05062 --- /dev/null +++ b/improvements/v2-05-predictive-caching.md @@ -0,0 +1,80 @@ +# Predictive Caching Layer + +## Current State + +The agent loads tools, prompts, and resources on-demand. Each request goes through the full initialization pipeline: load configuration, initialize tools, prepare context. There is no caching layer for frequently accessed resources, and no pre-loading based on predicted needs. + +## Problem + +- Cold-start latency for tool initialization on each request +- Repeated loading of the same resources across sessions +- No benefit from predictable usage patterns +- LLM prompt construction happens from scratch each time +- Response time suffers under load or with many tools enabled + +## What to Implement + +### 1. Resource Cache Layer +- **Cached resources**: + - Tool configurations and schemas + - Compiled prompt templates (soul files) + - Embedding vectors for common queries + - API responses with TTL (external service results) +- **Cache backend**: In-memory LRU cache with configurable max size +- **Cache key**: Hash of resource identifier + relevant configuration +- **TTL**: Configurable per resource type (default: tools 5min, prompts 1min, embeddings 30min) + +### 2. Predictive Pre-loading +- **Integration with prediction engine** (v2-04): Pre-load tools and prompts predicted to be needed +- **Session-start pre-load**: When a session starts, pre-load resources based on: + - User's most frequently used tools + - Time-of-day patterns + - Current conversation context +- **Background loading**: Pre-load happens asynchronously, never blocks the main request path + +### 3. Cache Management API +- `GET /api/cache/stats` — cache hit/miss rates, size, entries +- `POST /api/cache/invalidate?key=...` — invalidate specific entries +- `POST /api/cache/warm` — trigger pre-loading for current context +- `DELETE /api/cache` — clear entire cache + +### 4. Smart Invalidation +- **Config change detection**: Invalidate cached tools/prompts when config or soul files change +- **File watcher**: Use existing `plugin-watcher.ts` pattern for change detection +- **Version stamping**: Each cache entry carries a version; stale versions auto-invalidate + +### 5. Performance Monitoring +- **Metrics**: Cache hit rate, average latency reduction, memory usage +- **Integration**: Feed metrics into existing analytics service (`src/services/analytics.ts`) +- **Dashboard widget**: Cache performance stats on the Dashboard + +### Backend Architecture +- `src/services/cache.ts` — generic LRU cache with TTL support +- `src/services/preloader.ts` — predictive pre-loading orchestration +- `src/webui/routes/cache.ts` — cache management endpoints + +### Implementation Steps + +1. Implement generic LRU cache with TTL in `src/services/cache.ts` +2. Wrap tool loading with cache layer +3. Wrap prompt/soul file loading with cache layer +4. Add cache invalidation on file/config changes +5. Integrate with prediction engine for pre-loading +6. Add session-start pre-loading based on usage patterns +7. Create cache management API endpoints +8. Add cache metrics to analytics dashboard + +### Files to Modify +- `src/services/` — new cache and preloader services +- `src/agent/tools/` — wrap tool loading with cache +- `src/soul/` — wrap soul file loading with cache +- `src/agent/runtime.ts` — integrate pre-loading on session start +- `src/webui/routes/` — add cache management endpoints +- `web/src/pages/Dashboard.tsx` — add cache stats widget + +### Notes +- **Medium complexity** — LRU cache is straightforward; predictive pre-loading requires v2-04 +- Memory limits are important — unbounded caching on a resource-constrained system is dangerous +- Start with simple TTL-based caching before adding predictive pre-loading +- Cache invalidation is the hard part — err on the side of invalidating too often +- Monitor memory usage and adjust cache size dynamically based on available resources diff --git a/improvements/v2-06-anomaly-detection.md b/improvements/v2-06-anomaly-detection.md new file mode 100644 index 00000000..cfffbf09 --- /dev/null +++ b/improvements/v2-06-anomaly-detection.md @@ -0,0 +1,92 @@ +# Anomaly Detection + +## Current State + +The agent has basic error handling and retry logic in `src/agent/runtime.ts`. The analytics service tracks request metrics. The security center (`Security.tsx`) provides audit logging and rate limits. However, there is no automated detection of unusual behavior patterns, failure spikes, or security anomalies. + +## Problem + +- No automated alerting when error rates spike +- Cannot detect unusual usage patterns (potential abuse or compromised accounts) +- Failed tool executions go unnoticed unless manually reviewed +- Cost anomalies (unexpected token usage spikes) are not flagged +- No baseline of "normal" behavior to compare against + +## What to Implement + +### 1. Baseline Profiling +- **Metrics tracked**: + - Requests per hour/day (volume) + - Error rate (errors / total requests) + - Average response latency + - Token usage per request + - Tool invocation distribution + - Cost per hour/day +- **Baseline calculation**: Rolling 7-day moving average with standard deviation +- **Storage**: `anomaly_baselines (metric, mean, stddev, sample_count, period, updated_at)` + +### 2. Anomaly Detection Engine +- **Algorithm**: Z-score based detection with configurable sensitivity + - Anomaly if `|current - mean| > threshold * stddev` + - Default threshold: 2.5 standard deviations +- **Detection types**: + - **Volume spike**: Sudden increase in request count + - **Error burst**: Error rate exceeds baseline + - **Latency degradation**: Response times significantly slower + - **Cost spike**: Token/cost usage jumps unexpectedly + - **Behavioral anomaly**: Unusual tool usage pattern or new unseen patterns +- **Configurable**: `config.yaml` → `anomaly_detection.enabled: true`, `anomaly_detection.sensitivity: 2.5` + +### 3. Alert System +- **Alert channels**: + - In-app notifications (via existing notification center from PR #34) + - Telegram message to admin chat + - Webhook to external URL (Slack, PagerDuty, etc.) +- **Alert format**: Type, severity (warning/critical), metric name, current value, expected range, timestamp +- **Deduplication**: Same anomaly type is not re-alerted within a configurable cooldown (default: 15 minutes) + +### 4. Anomaly API +- `GET /api/anomalies?period=24h&severity=critical` — list detected anomalies +- `GET /api/anomalies/baselines` — current baseline values for all metrics +- `POST /api/anomalies/:id/acknowledge` — mark anomaly as reviewed +- `GET /api/anomalies/stats` — detection statistics + +### 5. Anomaly Dashboard +- **Location**: New section on Analytics page or standalone "Monitoring" page +- **Features**: + - Timeline of detected anomalies with severity color coding + - Baseline vs actual charts for each metric + - Alert configuration panel + - Acknowledged vs unacknowledged anomaly management + +### Backend Architecture +- `src/services/anomaly-detector.ts` — baseline calculation and anomaly detection +- `src/services/alerting.ts` — multi-channel alert dispatch +- `src/webui/routes/anomalies.ts` — API endpoints + +### Implementation Steps + +1. Design `anomaly_baselines` and `anomaly_events` tables +2. Implement baseline profiling with rolling statistics +3. Implement Z-score anomaly detection engine +4. Create alert dispatch system with deduplication +5. Integrate with existing notification center +6. Add Telegram and webhook alert channels +7. Create anomaly API endpoints +8. Build anomaly dashboard UI +9. Add configuration options + +### Files to Modify +- `src/services/` — new anomaly detection and alerting services +- `src/services/analytics.ts` — feed metrics into anomaly detector +- `src/agent/runtime.ts` — report metrics to detector on each request +- `src/webui/routes/` — add anomaly endpoints +- `web/src/pages/Analytics.tsx` — add anomaly section +- `config.example.yaml` — add anomaly detection config + +### Notes +- **Medium complexity** — Z-score detection is simple; multi-channel alerting is the complex part +- Needs a warm-up period (7+ days of data) before baselines are meaningful +- False positives are annoying — start with high thresholds and let users tune down +- Consider time-of-day normalization (weekend vs weekday baselines) +- Integrate with the Security Center for security-specific anomalies diff --git a/improvements/v2-07-agent-registry.md b/improvements/v2-07-agent-registry.md new file mode 100644 index 00000000..83dd0287 --- /dev/null +++ b/improvements/v2-07-agent-registry.md @@ -0,0 +1,96 @@ +# Agent Registry + +## Current State + +The existing `19-multi-agent.md` template describes multi-agent support at a high level. The system currently runs a single agent instance with one configuration. `AgentControl` in the WebUI starts/stops this single agent. The agent runtime (`src/agent/runtime.ts`) processes messages for one bot instance with one set of tools, hooks, and soul configuration. + +## Problem + +- Only one agent type exists — no specialization +- Cannot define different agent roles (research, code, content) +- No centralized catalog of agent capabilities +- Cannot compose teams of agents with different skill sets +- Adding a new agent type requires manual configuration duplication + +## What to Implement + +### 1. Agent Type Definitions +- **Built-in agent archetypes**: + - `ResearchAgent` — web search, information gathering, summarization + - `CodeAgent` — code generation, review, debugging, testing + - `ContentAgent` — writing, editing, translation, formatting + - `OrchestratorAgent` — delegates to other agents, aggregates results + - `MonitorAgent` — system monitoring, health checks, alerting +- **Custom agent types**: Users can define their own via configuration + +### 2. Agent Registry Service +- **Storage**: `agent_registry (id, name, type, description, config JSON, soul_template, tools JSON, status, created_at, updated_at)` +- **Config per agent**: + - Soul/system prompt template + - Allowed tools list + - Hook rules + - LLM provider and model + - Temperature and other inference parameters + - Resource limits (max tokens, max tool calls per turn) + +### 3. Registry API +- `GET /api/agents` — list all registered agents with status +- `POST /api/agents` — register a new agent from archetype or custom config +- `GET /api/agents/:id` — get agent details and config +- `PUT /api/agents/:id` — update agent configuration +- `DELETE /api/agents/:id` — deregister agent +- `POST /api/agents/:id/clone` — duplicate agent with new name +- `GET /api/agents/archetypes` — list built-in archetypes with descriptions + +### 4. Agent Lifecycle Management +- `POST /api/agents/:id/start` — start agent instance +- `POST /api/agents/:id/stop` — stop agent instance +- `GET /api/agents/:id/status` — health and runtime status +- **Process isolation**: Each agent runs in its own worker/subprocess +- **Resource limits**: Configurable per agent (memory, CPU time, concurrent requests) + +### 5. Agent Management UI +- **Location**: New "Agents" page in WebUI navigation +- **Features**: + - Agent catalog with archetype cards + - "Create Agent" wizard — choose archetype → customize config → deploy + - Per-agent dashboard: status, metrics, recent activity + - Agent switcher in sidebar for quick navigation + - Clone, edit, delete actions per agent + +### Backend Architecture +- `src/agent/registry.ts` — agent type definitions and registry CRUD +- `src/agent/agent-manager.ts` — lifecycle management (start/stop/health) +- `src/agent/worker.ts` — isolated agent process wrapper +- `src/webui/routes/agents.ts` — API endpoints + +### Implementation Steps + +1. Define agent archetype schemas (soul templates, tool lists, default configs) +2. Create `agent_registry` table migration +3. Implement registry service with CRUD operations +4. Implement agent process manager with worker isolation +5. Create archetype templates for built-in agent types +6. Build agent management API endpoints +7. Create Agents page UI with catalog and management features +8. Add agent switcher to sidebar navigation +9. Refactor existing single-agent code to work through registry + +### Files to Modify +- `src/agent/` — new registry, manager, and worker files +- `src/webui/routes/` — add agents endpoints +- `web/src/pages/` — new `Agents.tsx` page +- `web/src/components/` — agent cards, wizard, switcher components +- `web/src/App.tsx` — add agents route +- `config.example.yaml` — add agent registry config section + +### Relationship to Existing Work +- Extends concepts from `19-multi-agent.md` with concrete archetype definitions +- The existing multi-agent template focused on running multiple instances; this focuses on defining and managing agent types + +### Notes +- **Very High complexity** — requires significant architectural refactoring of single-agent assumption +- Start with the registry and archetypes (no process isolation) — let users configure different agent profiles +- Process isolation (step 4) can be deferred to a follow-up iteration +- Consider backward compatibility: existing single-agent config should auto-register as the default agent +- This is a prerequisite for v2-08 (Task Delegation) and v2-09 (Pipeline Execution) diff --git a/improvements/v2-08-task-delegation.md b/improvements/v2-08-task-delegation.md new file mode 100644 index 00000000..49b40078 --- /dev/null +++ b/improvements/v2-08-task-delegation.md @@ -0,0 +1,87 @@ +# Task Delegation Engine + +## Current State + +The Tasks page (`web/src/pages/Tasks.tsx`) shows tasks with statuses (pending, in_progress, done, failed, cancelled) but tasks are processed by the single agent instance. There is no mechanism to split complex tasks into subtasks or route them to specialized agents. Task assignment is implicit — whatever the agent is asked, it handles directly. + +## Problem + +- Complex tasks are handled monolithically by one agent +- No task decomposition into smaller, manageable subtasks +- Cannot route specialized work to the best-suited agent +- No parallel task execution across multiple agents +- Failed subtasks require restarting the entire task +- No visibility into task decomposition and delegation flow + +## What to Implement + +### 1. Task Decomposition +- **Automatic splitting**: LLM-based analysis breaks complex tasks into subtasks +- **Decomposition prompt**: Structured output with subtask descriptions, dependencies, and required skills +- **Manual override**: Users can manually decompose tasks in the UI +- **Depth limit**: Maximum 3 levels of nesting (task → subtask → sub-subtask) + +### 2. Agent Matching +- **Skill-based routing**: Match subtask requirements to agent capabilities (from v2-07 registry) +- **Matching criteria**: + - Required tools (subtask needs web search → route to ResearchAgent) + - Domain expertise (code review → CodeAgent) + - Availability (agent not currently overloaded) + - Historical performance (which agent type has best success rate for similar tasks?) +- **Fallback**: If no specialist matches, route to OrchestratorAgent or default agent + +### 3. Delegation Execution +- **Flow**: Parent task → decompose → match agents → delegate subtasks → collect results → synthesize +- **Parallel execution**: Independent subtasks run concurrently on different agents +- **Sequential execution**: Dependent subtasks wait for prerequisites +- **Result aggregation**: Orchestrator agent collects and synthesizes subtask results into a coherent response +- **Error handling**: Failed subtask → retry with same agent → retry with different agent → escalate to user + +### 4. Delegation API +- `POST /api/tasks/:id/decompose` — trigger decomposition of a task +- `GET /api/tasks/:id/subtasks` — list subtasks and their assignments +- `POST /api/tasks/:id/delegate` — manually assign a task to a specific agent +- `GET /api/tasks/:id/tree` — full task tree with status at each level +- `POST /api/tasks/:id/subtasks/:subtask_id/retry` — retry a failed subtask + +### 5. Delegation UI +- **Location**: Enhance `Tasks.tsx` page +- **Features**: + - Task tree visualization (collapsible hierarchy) + - Agent assignment badges on each subtask + - Status indicators: pending → delegated → in_progress → done/failed + - Manual re-assignment drag-and-drop + - Delegation timeline showing execution order + +### Backend Architecture +- `src/agent/delegation/decomposer.ts` — LLM-based task decomposition +- `src/agent/delegation/matcher.ts` — agent-to-task matching +- `src/agent/delegation/executor.ts` — delegation orchestration and result collection +- `src/webui/routes/delegation.ts` — API endpoints + +### Implementation Steps + +1. Design subtask schema: `subtasks (id, parent_id, task_id, description, agent_id, status, result, created_at)` +2. Implement task decomposer using structured LLM output +3. Implement agent matching algorithm based on capabilities and availability +4. Build delegation executor with parallel/sequential support +5. Implement result aggregation and synthesis +6. Create delegation API endpoints +7. Build task tree UI components +8. Integrate with agent registry (v2-07) for agent selection +9. Add error handling and retry logic + +### Files to Modify +- `src/agent/delegation/` — new directory for delegation engine +- `src/agent/runtime.ts` — hook delegation into task processing pipeline +- `src/webui/routes/` — add delegation endpoints +- `web/src/pages/Tasks.tsx` — add task tree and delegation UI +- `web/src/components/` — task tree, agent badge components + +### Notes +- **Very High complexity** — multi-agent coordination is architecturally challenging +- Requires v2-07 (Agent Registry) to be implemented first +- Start simple: manual delegation before automatic decomposition +- LLM-based decomposition adds cost — consider caching decomposition patterns +- Race conditions: multiple agents writing results simultaneously need proper locking +- Consider a message queue (in-process or Redis) for reliable task distribution diff --git a/improvements/v2-09-pipeline-execution.md b/improvements/v2-09-pipeline-execution.md new file mode 100644 index 00000000..8f9f1df5 --- /dev/null +++ b/improvements/v2-09-pipeline-execution.md @@ -0,0 +1,112 @@ +# Pipeline Execution + +## Current State + +The workflow automation template (`20-workflow-automation.md`) describes visual workflow building with triggers, conditions, and actions. The current agent processes tasks as isolated units — there is no concept of chained execution where the output of one step feeds into the next, and no dependency resolution between steps. + +## Problem + +- Cannot chain multiple agent actions into a sequential pipeline +- No dependency resolution between tasks +- Output of one tool cannot automatically feed into another +- Complex workflows require manual intervention at each step +- No way to define reusable multi-step processes + +## What to Implement + +### 1. Pipeline Definition +- **Pipeline schema**: + ```yaml + name: "research-and-summarize" + steps: + - id: search + agent: ResearchAgent + action: "Search for {topic}" + output: search_results + - id: analyze + agent: CodeAgent + action: "Analyze {search_results}" + depends_on: [search] + output: analysis + - id: summarize + agent: ContentAgent + action: "Create summary from {analysis}" + depends_on: [analyze] + output: final_report + ``` +- **Storage**: `pipelines (id, name, description, steps JSON, enabled, created_at, updated_at)` +- **Variable passing**: Step outputs are available as `{variable_name}` in subsequent steps + +### 2. Dependency Resolution +- **DAG validation**: Steps form a Directed Acyclic Graph — detect cycles at definition time +- **Topological sort**: Determine execution order based on `depends_on` declarations +- **Parallel branches**: Steps with no dependencies between them execute concurrently +- **Fan-in**: Steps can depend on multiple predecessors (wait for all to complete) + +### 3. Pipeline Execution Engine +- **Executor**: Walk the DAG, dispatching steps to appropriate agents +- **State machine per step**: `pending → running → completed | failed | skipped` +- **Context propagation**: Each step receives the accumulated context from all predecessor outputs +- **Error strategies**: + - `fail_fast` — stop pipeline on first failure (default) + - `continue` — skip failed step, continue with available data + - `retry` — retry failed step N times before failing +- **Timeout**: Per-step and per-pipeline configurable timeouts + +### 4. Pipeline API +- `GET /api/pipelines` — list all pipeline definitions +- `POST /api/pipelines` — create a new pipeline +- `PUT /api/pipelines/:id` — update pipeline definition +- `DELETE /api/pipelines/:id` — delete pipeline +- `POST /api/pipelines/:id/run` — trigger a pipeline execution +- `GET /api/pipelines/:id/runs` — list execution history +- `GET /api/pipelines/:id/runs/:runId` — detailed run status with per-step results +- `POST /api/pipelines/:id/runs/:runId/cancel` — cancel a running pipeline + +### 5. Pipeline Builder UI +- **Location**: New "Pipelines" page or tab within existing Workflows +- **Features**: + - Visual pipeline builder with step cards connected by arrows + - Drag-and-drop step ordering + - Step configuration panel (agent, action, variables, error strategy) + - Dependency line drawing between steps + - Pipeline run history with per-step status timeline + - Real-time execution monitoring with live step status updates + +### Backend Architecture +- `src/services/pipeline/definition.ts` — pipeline CRUD and validation +- `src/services/pipeline/resolver.ts` — DAG validation and topological sort +- `src/services/pipeline/executor.ts` — execution engine with state machine +- `src/webui/routes/pipelines.ts` — API endpoints + +### Implementation Steps + +1. Design pipeline and pipeline_runs table schemas +2. Implement pipeline definition service with DAG validation +3. Implement dependency resolver with topological sort +4. Build pipeline executor with state machine per step +5. Implement variable passing and context propagation +6. Add error handling strategies (fail_fast, continue, retry) +7. Create pipeline API endpoints +8. Build pipeline builder UI with visual editor +9. Add real-time execution monitoring via WebSocket + +### Files to Modify +- `src/services/pipeline/` — new directory for pipeline engine +- `src/webui/routes/` — add pipeline endpoints +- `web/src/pages/` — new `Pipelines.tsx` page +- `web/src/components/` — pipeline builder, step cards, run viewer +- `web/src/App.tsx` — add pipelines route + +### Relationship to Existing Work +- Extends the workflow automation concept from `20-workflow-automation.md` +- Depends on v2-07 (Agent Registry) for agent routing +- Complements v2-08 (Task Delegation) — delegation is automatic, pipelines are user-defined + +### Notes +- **Very High complexity** — DAG execution engine with state management is non-trivial +- Start with linear (sequential-only) pipelines before adding parallel branches +- Visual builder can use react-flow (already proposed in `20-workflow-automation.md`) +- Pipeline runs should be durable — survive agent restarts +- Consider max pipeline size (e.g., 20 steps) to prevent abuse +- Log all step inputs/outputs for debugging and audit diff --git a/improvements/v2-10-self-correcting-loop.md b/improvements/v2-10-self-correcting-loop.md new file mode 100644 index 00000000..bdf080a2 --- /dev/null +++ b/improvements/v2-10-self-correcting-loop.md @@ -0,0 +1,89 @@ +# Self-Correcting Execution Loop + +## Current State + +The agent runtime (`src/agent/runtime.ts`) has basic retry logic for server errors (overloaded, internal server error, api_error, rate limits). However, retries are blind — the same request is repeated without analysis of why it failed. There is no mechanism for the agent to evaluate its own output quality, detect mistakes, and iteratively improve. + +## Problem + +- Retry logic is blind — same request repeated without adjustment +- Agent cannot detect when its output is wrong or low quality +- No self-evaluation or reflection step after generating a response +- Failed tool calls are retried identically, not adapted +- No iterative improvement loop for complex tasks +- Users must manually identify and correct agent mistakes + +## What to Implement + +### 1. Output Evaluation +- **Self-critique prompt**: After generating a response, optionally run a second LLM call to evaluate quality +- **Evaluation criteria**: + - Completeness: Does the response address all parts of the request? + - Correctness: Are facts, code, and reasoning accurate? + - Tool usage: Were the right tools used? Did they return expected results? + - Formatting: Does the output match the expected format? +- **Score**: 0.0–1.0 quality score with specific feedback + +### 2. Correction Loop +- **Flow**: Generate → Evaluate → (if score < threshold) → Reflect → Regenerate +- **Reflection step**: Analyze what went wrong and create an explicit correction plan +- **Max iterations**: Configurable limit (default: 3) to prevent infinite loops +- **Escalation**: If max iterations reached without acceptable quality, flag for human review +- **Configurable**: `config.yaml` → `self_correction.enabled: true`, `self_correction.threshold: 0.7` + +### 3. Tool Error Recovery +- **Error classification**: Categorize tool failures (auth error, timeout, invalid input, resource not found) +- **Recovery strategies per error type**: + - Auth error → refresh credentials and retry + - Timeout → retry with longer timeout or simpler parameters + - Invalid input → analyze error message, adjust parameters + - Resource not found → try alternative resources or inform user +- **Parameter adaptation**: Modify tool call parameters based on error feedback + +### 4. Learning from Corrections +- **Correction log**: Store each correction cycle `(original, evaluation, corrected, improvement_delta)` +- **Pattern detection**: Identify recurring mistakes for the prediction engine (v2-04) +- **Prompt improvement**: Feed correction patterns into adaptive prompting (v2-20) +- **Storage**: `correction_logs (id, task_id, iteration, original_output, evaluation, corrected_output, score_delta, created_at)` + +### 5. Correction Monitoring UI +- **Location**: Expandable section in session/task detail views +- **Features**: + - Correction iteration timeline (attempt 1 → evaluation → attempt 2 → ...) + - Side-by-side diff of original vs corrected output + - Quality score trend per iteration + - Tool error recovery log + - "Skip correction" manual override button + +### Backend Architecture +- `src/agent/self-correction/evaluator.ts` — output quality evaluation +- `src/agent/self-correction/reflector.ts` — mistake analysis and correction planning +- `src/agent/self-correction/recovery.ts` — tool error recovery strategies +- `src/agent/self-correction/logger.ts` — correction log storage and analysis + +### Implementation Steps + +1. Implement output evaluator with structured LLM critique +2. Build correction loop with configurable iterations and threshold +3. Implement reflection step that produces explicit correction instructions +4. Add tool error classification and recovery strategies +5. Create correction logging and storage +6. Integrate correction loop into `src/agent/runtime.ts` +7. Build correction monitoring UI components +8. Add configuration options for thresholds and limits + +### Files to Modify +- `src/agent/self-correction/` — new directory for correction engine +- `src/agent/runtime.ts` — integrate correction loop after response generation +- `src/webui/routes/` — add correction log endpoints +- `web/src/pages/Sessions.tsx` — add correction detail view +- `web/src/pages/Tasks.tsx` — add correction indicators +- `config.example.yaml` — add self-correction config + +### Notes +- **High complexity** — self-evaluation via LLM doubles (or triples) the cost per request +- Make correction optional and off-by-default for cost-sensitive deployments +- The evaluation LLM call should use a smaller/cheaper model if possible +- Avoid correction loops on simple queries — only activate for complex tasks +- Track correction rate as a metric: high correction rate suggests systemic prompt issues +- This feature synergizes with v2-19 (Feedback Learning) and v2-20 (Adaptive Prompting) diff --git a/improvements/v2-11-temporal-context.md b/improvements/v2-11-temporal-context.md new file mode 100644 index 00000000..6f3a41f8 --- /dev/null +++ b/improvements/v2-11-temporal-context.md @@ -0,0 +1,80 @@ +# Temporal Context Engine + +## Current State + +The agent treats all context as atemporal — no distinction between information from today vs. months ago. Session history is ordered by timestamp but the agent does not adapt its behavior based on time-of-day, day-of-week, or temporal patterns. The existing analytics records timestamps but does not use them for context adaptation. + +## Problem + +- Agent behavior is identical at 9am on Monday and 11pm on Saturday +- No awareness of temporal patterns in user behavior +- Stale context is weighted the same as fresh context +- Cannot reason about "last time this happened" or "this usually happens on Fridays" +- Time-sensitive information (deadlines, schedules) is not treated differently + +## What to Implement + +### 1. Temporal Metadata +- **Enrich all stored data** with temporal dimensions: + - Absolute timestamp (already exists) + - Day of week, hour of day (derived) + - Relative time markers ("morning", "evening", "weekend", "weekday") + - Session context: beginning/middle/end of conversation +- **Storage**: Add temporal columns to existing tables or a `temporal_metadata` overlay table + +### 2. Time Pattern Analysis +- **Patterns to detect**: + - **Daily patterns**: User asks about X mostly in the morning + - **Weekly patterns**: Reporting tasks happen on Mondays, deployments on Thursdays + - **Recurring events**: "Check status" happens every day at 10am + - **Seasonal/periodic**: End-of-month tasks, quarterly reviews +- **Storage**: `time_patterns (id, pattern_type, description, schedule_cron, confidence, last_seen, created_at)` + +### 3. Context Time-Weighting +- **Freshness scoring**: Recent context weighted higher in retrieval +- **Temporal relevance**: When it's Monday morning, boost context related to Monday-morning patterns +- **Decay function**: Configurable decay curve (exponential, linear, step) +- **Integration**: Feed temporal weights into memory prioritization (v2-03) and semantic search (v2-01) + +### 4. Time-Aware Agent Behavior +- **Greeting adaptation**: "Good morning" vs "Good evening" based on user timezone +- **Proactive reminders**: "It's Monday — would you like the weekly status report?" +- **Context pre-loading**: Load relevant context based on current time patterns +- **Deadline awareness**: Flag time-sensitive information in memory + +### 5. Temporal Context API +- `GET /api/context/temporal?time=now` — get current temporal context and active patterns +- `GET /api/context/patterns` — list detected time patterns +- `PUT /api/context/patterns/:id` — adjust pattern (user feedback) +- `GET /api/context/timeline?from=...&to=...` — activity timeline for a period + +### Backend Architecture +- `src/services/temporal-context.ts` — temporal metadata enrichment and pattern detection +- `src/services/time-patterns.ts` — pattern analysis and storage +- `src/webui/routes/temporal.ts` — API endpoints + +### Implementation Steps + +1. Add temporal metadata enrichment to data storage pipeline +2. Implement time pattern detection algorithm +3. Build temporal weighting for context retrieval +4. Integrate with memory prioritization and semantic search +5. Implement time-aware agent behavior (greetings, reminders) +6. Create temporal context API endpoints +7. Build time pattern UI on Analytics or Memory page +8. Add timezone configuration support + +### Files to Modify +- `src/services/` — new temporal context and time pattern services +- `src/memory/` — integrate temporal weighting into retrieval +- `src/agent/runtime.ts` — inject temporal context into agent processing +- `src/webui/routes/` — add temporal endpoints +- `web/src/pages/Analytics.tsx` — add temporal patterns section +- `config.example.yaml` — add timezone and temporal config + +### Notes +- **Medium complexity** — pattern detection is straightforward; integration touches many systems +- User timezone must be configurable (not assumed from server timezone) +- Pattern detection needs minimum 2 weeks of data to be meaningful +- Be careful with proactive behavior — users may find unsolicited reminders annoying +- This feature enhances v2-03 (Memory Prioritization) and v2-04 (Prediction Engine) diff --git a/improvements/v2-12-predictive-scheduling.md b/improvements/v2-12-predictive-scheduling.md new file mode 100644 index 00000000..0e66c54b --- /dev/null +++ b/improvements/v2-12-predictive-scheduling.md @@ -0,0 +1,91 @@ +# Predictive Scheduling + +## Current State + +Tasks in the system are created and executed on-demand. The heartbeat mechanism (`HEARTBEAT.md` soul file) provides basic periodic behavior, but there is no intelligent task scheduling. Users must manually initiate all tasks, and there is no understanding of optimal timing or workload distribution. + +## Problem + +- All tasks are manual — no automatic scheduling based on patterns +- No workload balancing across time (tasks pile up during peak hours) +- Recurring tasks must be manually triggered each time +- No awareness of optimal execution windows (low-load periods) +- Cannot schedule tasks based on predicted user needs + +## What to Implement + +### 1. Smart Task Scheduler +- **Scheduling modes**: + - **Cron-based**: Traditional cron expressions for fixed schedules + - **Pattern-based**: Schedule based on detected time patterns from v2-11 + - **Predictive**: Automatically schedule tasks the user is predicted to request (v2-04) + - **Adaptive**: Shift non-urgent tasks to low-load windows +- **Storage**: `scheduled_tasks (id, name, description, schedule_type, schedule_config JSON, agent_id, next_run, last_run, status, created_at)` + +### 2. Workload Optimization +- **Load analysis**: Track agent utilization over time (requests per minute, queue depth) +- **Off-peak detection**: Identify periods with low activity +- **Task shifting**: Non-urgent scheduled tasks automatically move to off-peak windows +- **Priority levels**: Critical (run at exact time), Normal (±1 hour flexibility), Low (any off-peak) + +### 3. Recurring Task Templates +- **Pre-built templates**: + - Daily status summary + - Weekly activity report + - Database cleanup and optimization + - Health check and monitoring + - Log rotation and archival +- **Custom templates**: Users define their own recurring tasks with parameters +- **Template parameters**: Variable substitution (dates, counts, dynamic values) + +### 4. Scheduling API +- `GET /api/schedule` — list all scheduled tasks with next run times +- `POST /api/schedule` — create a new scheduled task +- `PUT /api/schedule/:id` — update schedule +- `DELETE /api/schedule/:id` — remove scheduled task +- `POST /api/schedule/:id/run-now` — trigger immediate execution +- `GET /api/schedule/:id/history` — execution history for a scheduled task +- `GET /api/schedule/calendar?month=2026-03` — calendar view of upcoming tasks + +### 5. Scheduling UI +- **Location**: New "Schedule" tab on Tasks page or standalone page +- **Features**: + - Calendar view showing scheduled task distribution + - Task scheduling wizard with template selection + - Cron expression builder (visual, no manual cron syntax needed) + - Workload heatmap showing busy vs. free periods + - Drag-and-drop rescheduling on calendar + - Execution history with success/failure indicators + +### Backend Architecture +- `src/services/scheduler.ts` — scheduling engine with cron and adaptive modes +- `src/services/workload.ts` — load analysis and off-peak detection +- `src/webui/routes/schedule.ts` — API endpoints + +### Implementation Steps + +1. Design scheduled_tasks and schedule_history tables +2. Implement cron-based scheduler using existing timer infrastructure +3. Implement workload analysis and off-peak detection +4. Build adaptive scheduling that shifts tasks to low-load windows +5. Create recurring task templates +6. Integrate with prediction engine (v2-04) for predictive scheduling +7. Create scheduling API endpoints +8. Build calendar UI and scheduling wizard +9. Add schedule monitoring and alerting + +### Files to Modify +- `src/services/` — new scheduler and workload services +- `src/agent/runtime.ts` — integrate scheduled task execution +- `src/webui/routes/` — add schedule endpoints +- `web/src/pages/Tasks.tsx` — add schedule tab +- `web/src/components/` — calendar, cron builder, schedule wizard +- `config.example.yaml` — add scheduling config + +### Notes +- **High complexity** — adaptive scheduling and workload optimization require careful tuning +- Start with simple cron-based scheduling, then add adaptive/predictive features +- Cron expression builder UI avoids the need for users to learn cron syntax +- Be conservative with predictive scheduling — wrong predictions waste resources +- Scheduled tasks should respect the same security and audit trail as manual tasks +- Depends on v2-04 (Prediction Engine) and v2-11 (Temporal Context) for advanced features diff --git a/improvements/v2-13-zero-trust-execution.md b/improvements/v2-13-zero-trust-execution.md new file mode 100644 index 00000000..6ecb3301 --- /dev/null +++ b/improvements/v2-13-zero-trust-execution.md @@ -0,0 +1,115 @@ +# Zero-Trust Execution Layer + +## Current State + +The Security Center (`Security.tsx`, `src/services/security.ts`) provides audit logging, rate limits, and an IP allowlist. The tool execution layer (`src/agent/tools/exec/module.ts`) has an `allowlist` scope system mapping to permission levels. However, there is no validation chain that verifies each action before execution, and the allowlist was recently found to have a privilege escalation bug (fixed in PR #86). + +## Problem + +- Tool execution trusts the agent's decisions without independent validation +- No pre-execution safety checks for potentially dangerous actions +- Sensitive operations (file writes, API calls, database changes) lack approval gates +- No sandboxing for untrusted or experimental tool executions +- Privilege escalation risks when adding new tools or modifying permissions +- No formal policy engine for defining what actions are allowed under what conditions + +## What to Implement + +### 1. Action Validation Pipeline +- **Pre-execution validation**: Every tool call passes through a validation chain before execution +- **Validation steps**: + 1. **Permission check**: Does this agent have permission to use this tool? + 2. **Parameter validation**: Are the parameters within allowed ranges? + 3. **Rate check**: Has this tool been called too frequently? + 4. **Risk assessment**: Is this a high-risk action requiring additional approval? + 5. **Policy evaluation**: Does this action comply with defined policies? +- **Validation result**: `allow | deny | require_approval` + +### 2. Policy Engine +- **Policy definition format** (YAML): + ```yaml + policies: + - name: "no-destructive-file-ops" + match: + tool: "exec" + params: + command: { pattern: "rm -rf|dd if=|mkfs" } + action: deny + reason: "Destructive file operations are blocked" + - name: "api-calls-require-approval" + match: + tool: "http_request" + params: + method: { in: ["POST", "PUT", "DELETE"] } + action: require_approval + ``` +- **Storage**: `security_policies (id, name, match JSON, action, reason, enabled, priority, created_at)` +- **Evaluation**: Policies evaluated in priority order; first match wins + +### 3. Approval Gates +- **For `require_approval` actions**: + - Push notification to admin via Telegram + - WebUI approval queue with accept/reject buttons + - Configurable auto-approve timeout (default: never auto-approve) +- **Approval log**: Every approval/rejection stored in audit trail +- **Delegation**: Specific users can be designated as approvers for specific action types + +### 4. Execution Sandboxing +- **Sandbox modes**: + - `unrestricted` — full access (current behavior, for trusted tools) + - `sandboxed` — limited filesystem access, no network, resource limits + - `dry-run` — execute but discard results (for testing) +- **Implementation**: Use Node.js `vm` module or subprocess with restricted permissions +- **Per-tool configuration**: Each tool can specify its required sandbox level + +### 5. Security Policy UI +- **Location**: Enhance existing Security Center page +- **Features**: + - Policy editor with syntax highlighting for YAML + - Policy testing: "What would happen if tool X is called with params Y?" + - Approval queue with real-time notifications + - Validation log: recent allow/deny decisions with reasons + - Policy templates for common security scenarios + +### 6. Zero-Trust API +- `GET /api/security/policies` — list all policies +- `POST /api/security/policies` — create policy +- `PUT /api/security/policies/:id` — update policy +- `POST /api/security/policies/evaluate` — test a hypothetical action against policies +- `GET /api/security/approvals` — pending approval queue +- `POST /api/security/approvals/:id/approve` — approve an action +- `POST /api/security/approvals/:id/reject` — reject an action +- `GET /api/security/validation-log` — recent validation decisions + +### Backend Architecture +- `src/services/policy-engine.ts` — policy evaluation engine +- `src/services/approval-gate.ts` — approval queue and notification +- `src/services/sandbox.ts` — execution sandboxing +- `src/agent/tools/validation.ts` — pre-execution validation pipeline + +### Implementation Steps + +1. Design policy schema and validation pipeline architecture +2. Implement policy engine with pattern matching and priority ordering +3. Create pre-execution validation middleware for tool calls +4. Implement approval gate with notification dispatch +5. Build execution sandbox using subprocess isolation +6. Create security policy API endpoints +7. Build policy editor and approval queue UI +8. Add policy templates for common scenarios +9. Integrate validation pipeline into `src/agent/tools/` execution path + +### Files to Modify +- `src/services/` — new policy engine, approval gate, sandbox services +- `src/agent/tools/` — add validation middleware before tool execution +- `src/webui/routes/` — add security policy endpoints +- `web/src/pages/Security.tsx` — add policy editor and approval queue +- `config.example.yaml` — add security policy config + +### Notes +- **High complexity** — policy engine and sandboxing are architecturally significant +- Start with the validation pipeline and basic allow/deny policies before adding approval gates +- Sandboxing in Node.js has limitations — `vm` module is not a security boundary; consider subprocess isolation +- Policy evaluation must be fast — it runs on every tool call +- Default policy should be permissive (allow) to avoid breaking existing functionality +- This is a foundation for enterprise-grade security requirements diff --git a/improvements/v2-14-audit-trail.md b/improvements/v2-14-audit-trail.md new file mode 100644 index 00000000..ff886336 --- /dev/null +++ b/improvements/v2-14-audit-trail.md @@ -0,0 +1,103 @@ +# Audit Trail System + +## Current State + +The Security Center (`src/services/audit.ts`) records admin mutations (configuration changes, security settings updates). The analytics service tracks request metrics. However, there is no comprehensive audit trail that captures all agent decisions, tool invocations, and their outcomes in a tamper-evident, queryable format suitable for compliance and forensic analysis. + +## Problem + +- Audit logging only covers admin mutations, not agent decisions +- No record of why the agent chose a specific tool or approach +- Cannot reconstruct the agent's decision chain for a given session +- No tamper-evidence or integrity verification for audit records +- Audit data is not structured for compliance reporting +- No export format for external audit systems (SIEM, compliance tools) + +## What to Implement + +### 1. Comprehensive Event Capture +- **Events to audit**: + - `agent.decision` — agent chose to use tool X (with reasoning) + - `tool.invoke` — tool called with specific parameters + - `tool.result` — tool returned result (success/failure, duration) + - `llm.request` — LLM API call (model, tokens, cost) + - `llm.response` — LLM response (truncated content, token counts) + - `config.change` — any configuration modification + - `security.validation` — policy evaluation result (from v2-13) + - `user.action` — user-initiated actions via WebUI + - `session.lifecycle` — session start, end, timeout +- **Event schema**: `audit_events (id, event_type, actor, session_id, payload JSON, parent_event_id, checksum, created_at)` + +### 2. Decision Chain Tracking +- **Causal linking**: Each event references its parent event (`parent_event_id`) +- **Chain reconstruction**: Given an outcome, trace back through the entire decision chain +- **Visualization**: Render decision trees showing agent reasoning flow +- **Example chain**: user_message → agent.decision(use_search) → tool.invoke(web_search) → tool.result → agent.decision(summarize) → llm.request → llm.response → user_response + +### 3. Integrity Verification +- **Hash chaining**: Each event includes a SHA-256 checksum of `(previous_checksum + event_data)` +- **Verification endpoint**: `POST /api/audit/verify` — verify integrity of audit chain for a time range +- **Tamper detection**: Any modification to historical events breaks the hash chain +- **Export signing**: Exported audit data includes digital signatures + +### 4. Compliance Reporting +- **Pre-built reports**: + - Daily activity summary + - Security events report (access, policy violations) + - Cost and resource usage report + - Tool usage and performance report +- **Export formats**: JSON, CSV, PDF +- **Filtering**: By event type, time range, session, actor, severity +- **Retention policy**: Configurable retention period (default: 90 days, compliance mode: 7 years) + +### 5. Audit UI +- **Location**: Enhance Security Center with dedicated "Audit Trail" tab +- **Features**: + - Searchable, filterable event timeline + - Decision chain visualization (tree view) + - Integrity status indicator (verified / chain broken) + - Report generator with export options + - Real-time event stream (via WebSocket) + - Compliance dashboard with retention status + +### 6. Audit API +- `GET /api/audit/events?type=...&from=...&to=...&session=...` — query events +- `GET /api/audit/chain/:event_id` — get full decision chain for an event +- `POST /api/audit/verify?from=...&to=...` — verify integrity +- `GET /api/audit/reports/:type?period=...&format=json` — generate report +- `POST /api/audit/export` — export audit data with signing + +### Backend Architecture +- `src/services/audit-trail.ts` — comprehensive event capture and storage +- `src/services/audit-integrity.ts` — hash chaining and verification +- `src/services/audit-reports.ts` — report generation and export +- `src/webui/routes/audit.ts` — API endpoints + +### Implementation Steps + +1. Design `audit_events` table with hash chaining support +2. Implement event capture middleware for all agent operations +3. Add hash chaining for integrity verification +4. Hook event capture into agent runtime, tool execution, and LLM calls +5. Implement decision chain reconstruction +6. Build compliance report generator +7. Create audit export with signing +8. Build audit UI with timeline, chain viewer, and reports +9. Add retention policy management + +### Files to Modify +- `src/services/` — new audit trail, integrity, and reporting services +- `src/services/audit.ts` — extend existing audit with comprehensive events +- `src/agent/runtime.ts` — add audit hooks for decisions and LLM calls +- `src/agent/tools/` — add audit hooks for tool invocations +- `src/webui/routes/` — add audit endpoints +- `web/src/pages/Security.tsx` — add Audit Trail tab +- `config.example.yaml` — add audit retention and export config + +### Notes +- **High complexity** — comprehensive event capture touches every part of the system +- Hash chaining adds minimal overhead but provides strong tamper evidence +- Audit data grows fast — implement log rotation and archival from the start +- Truncate large payloads (LLM responses, tool results) to keep storage manageable +- Consider shipping events to external log aggregation (ELK, Splunk) via webhook +- This feature complements v2-13 (Zero-Trust) by providing the audit evidence for policy decisions diff --git a/improvements/v2-15-unified-integration-layer.md b/improvements/v2-15-unified-integration-layer.md new file mode 100644 index 00000000..04b1ccc7 --- /dev/null +++ b/improvements/v2-15-unified-integration-layer.md @@ -0,0 +1,110 @@ +# Unified Integration Layer + +## Current State + +The agent integrates with external services through individual tool implementations and the MCP (Model Context Protocol) server system. The `Mcp.tsx` page manages MCP servers, and `21-api-webhooks.md` describes API and webhook management. However, each integration is implemented independently with no shared abstraction for authentication, error handling, rate limiting, or configuration. + +## Problem + +- Each external integration is built from scratch with its own patterns +- No shared authentication management (OAuth, API keys, tokens) +- No unified error handling across integrations +- Rate limiting is per-integration, not coordinated +- Adding a new integration requires significant boilerplate +- No integration health monitoring or status dashboard +- Cannot compose integrations (e.g., "when Slack message → create Jira ticket") + +## What to Implement + +### 1. Integration Abstraction Layer +- **Base interface**: All integrations implement a common interface + ```typescript + interface Integration { + id: string; + name: string; + type: 'api' | 'webhook' | 'oauth' | 'mcp'; + auth: AuthConfig; + healthCheck(): Promise; + execute(action: string, params: Record): Promise; + } + ``` +- **Built-in integrations**: Telegram (existing), Slack, GitHub, Jira, Notion, Google Workspace, email (SMTP) +- **Custom integrations**: Users define their own via HTTP endpoint configuration + +### 2. Authentication Management +- **Supported auth types**: API key, OAuth 2.0, JWT, Basic Auth, custom header +- **Credential storage**: Encrypted in SQLite (extend existing security service) +- **OAuth flow**: Built-in OAuth 2.0 authorization code flow with token refresh +- **Credential rotation**: Auto-refresh expiring tokens, notify on rotation failure +- **Storage**: `integration_credentials (id, integration_id, auth_type, credentials_encrypted, expires_at, created_at)` + +### 3. Unified Rate Limiting +- **Per-integration limits**: Configurable requests per minute/hour +- **Global limit**: Total outbound requests across all integrations +- **Backpressure**: Queue requests when approaching limits, reject when exceeded +- **Rate limit sharing**: Coordinate limits across agent instances (for multi-agent, v2-07) + +### 4. Integration Registry +- **Storage**: `integrations (id, name, type, config JSON, auth_id FK, status, health_check_url, created_at)` +- **Discovery**: Pre-built integration catalog with setup wizards +- **Health monitoring**: Periodic health checks with status tracking + +### 5. Integration Management UI +- **Location**: New "Integrations" page or enhance existing MCP page +- **Features**: + - Integration catalog with one-click setup + - OAuth authorization flow (redirect and callback) + - Per-integration health status and metrics + - Credential management (rotate, revoke) + - Integration testing panel ("send test request") + - Usage statistics per integration + +### 6. Integration API +- `GET /api/integrations` — list all configured integrations with status +- `POST /api/integrations` — add new integration +- `PUT /api/integrations/:id` — update integration config +- `DELETE /api/integrations/:id` — remove integration +- `GET /api/integrations/:id/health` — check health +- `POST /api/integrations/:id/test` — send test request +- `POST /api/integrations/:id/execute` — execute an integration action +- `GET /api/integrations/catalog` — list available integration types + +### Backend Architecture +- `src/services/integrations/base.ts` — abstract integration interface +- `src/services/integrations/registry.ts` — integration CRUD and discovery +- `src/services/integrations/auth.ts` — authentication management and OAuth +- `src/services/integrations/rate-limiter.ts` — unified rate limiting +- `src/services/integrations/providers/` — individual integration implementations + +### Implementation Steps + +1. Design integration abstraction interface +2. Implement integration registry with CRUD +3. Build authentication management with encrypted credential storage +4. Implement unified rate limiter +5. Create built-in integration providers (Slack, GitHub, etc.) +6. Add health monitoring with periodic checks +7. Build integration management API endpoints +8. Create integration catalog UI with setup wizards +9. Add OAuth 2.0 flow support + +### Files to Modify +- `src/services/integrations/` — new directory for integration layer +- `src/services/security.ts` — extend with credential encryption +- `src/webui/routes/` — add integration endpoints +- `web/src/pages/` — new `Integrations.tsx` page +- `web/src/App.tsx` — add integrations route +- `config.example.yaml` — add integration config section + +### Relationship to Existing Work +- Extends concepts from `21-api-webhooks.md` +- Subsumes MCP server management into a broader integration framework +- Provides infrastructure for v2-16 (Webhooks & Event Bus) + +### Notes +- **High complexity** — OAuth flows and credential management require careful implementation +- Start with API key-based integrations before adding OAuth +- Credential encryption must be robust — use `crypto.createCipheriv` with proper key management +- Each integration provider can be added incrementally +- Integration health checks should be non-blocking and have short timeouts +- Consider a plugin architecture so community can contribute integration providers diff --git a/improvements/v2-16-webhooks-event-bus.md b/improvements/v2-16-webhooks-event-bus.md new file mode 100644 index 00000000..884700ea --- /dev/null +++ b/improvements/v2-16-webhooks-event-bus.md @@ -0,0 +1,108 @@ +# Webhooks & Event Bus + +## Current State + +The existing `21-api-webhooks.md` template describes API and webhook management at a UI level. The agent currently has no internal event bus — components communicate through direct function calls. Hooks (`src/agent/hooks/`) provide a rule-based system for intercepting agent events, but there is no pub/sub mechanism for decoupled event handling or external webhook dispatch. + +## Problem + +- No internal event system for decoupled communication between components +- Hooks are tightly coupled to the agent runtime +- Cannot trigger external actions based on internal events +- No webhook delivery system for notifying external services +- Cannot react to external events without polling +- No event replay or dead-letter queue for failed deliveries + +## What to Implement + +### 1. Internal Event Bus +- **Architecture**: In-process pub/sub event bus +- **Event types**: Typed events for all significant system actions + - `agent.message.received`, `agent.message.sent` + - `tool.executed`, `tool.failed` + - `session.started`, `session.ended` + - `config.changed` + - `security.alert` + - `schedule.triggered` + - `anomaly.detected` +- **Subscribers**: Any service can subscribe to event types +- **Async delivery**: Events dispatched asynchronously to avoid blocking +- **Event schema**: `{ type: string, payload: any, timestamp: Date, source: string, correlationId: string }` + +### 2. Webhook Delivery System +- **Webhook registration**: Configure URL + event types to subscribe to +- **Delivery guarantees**: At-least-once delivery with retry +- **Retry policy**: Exponential backoff (1s, 5s, 30s, 5min) with configurable max retries (default: 5) +- **Payload signing**: HMAC-SHA256 signature in `X-Webhook-Signature` header +- **Dead-letter queue**: Failed deliveries after max retries stored for manual review +- **Storage**: `webhooks (id, url, events JSON, secret, active, created_at)` and `webhook_deliveries (id, webhook_id, event_type, payload, status, attempts, last_attempt, created_at)` + +### 3. Incoming Webhooks +- **Receiver**: `POST /api/webhooks/incoming/:id` — receive events from external services +- **Verification**: Validate incoming signatures (per-webhook secret) +- **Mapping**: Map incoming payloads to internal events +- **Use cases**: GitHub push events → trigger build task, Slack message → forward to agent + +### 4. Event Replay & Monitoring +- **Event log**: Store recent events for debugging and replay +- **Replay**: `POST /api/events/:id/replay` — re-dispatch a past event +- **Monitoring**: Real-time event stream via WebSocket for the UI + +### 5. Event Bus API +- `GET /api/webhooks` — list registered webhooks +- `POST /api/webhooks` — register a new webhook +- `PUT /api/webhooks/:id` — update webhook configuration +- `DELETE /api/webhooks/:id` — remove webhook +- `POST /api/webhooks/:id/test` — send test event +- `GET /api/webhooks/:id/deliveries` — delivery history with status +- `POST /api/webhooks/:id/deliveries/:did/retry` — retry a failed delivery +- `GET /api/events?type=...&from=...&to=...` — query event log +- `GET /api/events/stream` — WebSocket real-time event stream + +### 6. Event Bus UI +- **Location**: Enhance existing or new "Events" section +- **Features**: + - Webhook management (register, edit, test, delete) + - Delivery log with status indicators (delivered, retrying, failed) + - Real-time event stream viewer + - Dead-letter queue with retry actions + - Event type catalog with payload documentation + +### Backend Architecture +- `src/services/event-bus.ts` — internal pub/sub event bus +- `src/services/webhook-dispatcher.ts` — outgoing webhook delivery with retry +- `src/services/webhook-receiver.ts` — incoming webhook handler +- `src/webui/routes/events.ts` — API endpoints + +### Implementation Steps + +1. Implement internal event bus with typed events and async delivery +2. Instrument existing services to emit events (runtime, tools, sessions) +3. Build webhook dispatcher with retry and signing +4. Implement incoming webhook receiver with verification +5. Create dead-letter queue and event log storage +6. Add event replay support +7. Build WebSocket-based real-time event stream +8. Create webhook management API endpoints +9. Build event bus UI with delivery monitoring + +### Files to Modify +- `src/services/` — new event bus, webhook dispatcher, and receiver services +- `src/agent/runtime.ts` — emit events for agent actions +- `src/agent/tools/` — emit events for tool execution +- `src/webui/routes/` — add event and webhook endpoints +- `web/src/pages/` — new Events page or enhance existing pages +- `config.example.yaml` — add webhook and event bus config + +### Relationship to Existing Work +- Extends `21-api-webhooks.md` with full event-driven architecture +- Complements existing hooks system — hooks are agent-level, event bus is system-level +- Provides infrastructure for v2-15 (Integration Layer) to react to events + +### Notes +- **High complexity** — reliable webhook delivery with retry is non-trivial +- Start with the internal event bus first, add webhook delivery second +- Webhook secrets must be stored encrypted (use existing security service) +- Event log storage needs rotation — don't store events forever +- Consider using Server-Sent Events (SSE) as a simpler alternative to WebSocket for the stream +- Rate-limit outgoing webhooks to prevent accidental DDoS of external services diff --git a/improvements/v2-17-dynamic-dashboard.md b/improvements/v2-17-dynamic-dashboard.md new file mode 100644 index 00000000..255162f0 --- /dev/null +++ b/improvements/v2-17-dynamic-dashboard.md @@ -0,0 +1,100 @@ +# Dynamic Dashboard Engine + +## Current State + +The Dashboard (`Dashboard.tsx`) has customizable widgets (PR #36) with drag-and-drop layout powered by `react-grid-layout`. Widgets include stat cards, charts, quick actions, notifications, health check, and live logs. The widget set is fixed at compile time — adding a new widget requires code changes, a build, and deployment. + +## Problem + +- Widget catalog is static — new widgets require code changes +- Dashboard layout is the same for all use cases +- Cannot create task-specific or role-specific dashboards +- No ability to generate widgets from agent data dynamically +- Cannot share dashboard configurations between users +- Widgets cannot display arbitrary agent-generated content + +## What to Implement + +### 1. Widget Plugin System +- **Widget definition schema**: + ```typescript + interface WidgetDefinition { + id: string; + name: string; + description: string; + category: 'metrics' | 'status' | 'content' | 'action' | 'custom'; + dataSource: { type: 'api' | 'websocket' | 'static'; endpoint?: string; refreshInterval?: number }; + renderer: 'chart' | 'table' | 'text' | 'markdown' | 'custom'; + defaultSize: { w: number; h: number }; + configSchema: JSONSchema; // user-configurable parameters + } + ``` +- **Dynamic loading**: Widgets defined as data, not hard-coded components +- **Built-in renderers**: Chart (Recharts), Table, Text/Markdown, KPI card, List + +### 2. Dashboard Profiles +- **Multiple dashboards**: Users can create multiple named dashboards +- **Storage**: `dashboards (id, name, description, widgets JSON, layout JSON, is_default, created_at)` +- **Switching**: Dashboard selector in the top bar +- **Templates**: Pre-built dashboard profiles (Operations, Development, Security, Analytics) +- **Sharing**: Export/import dashboard configurations as JSON + +### 3. AI-Generated Widgets +- **Agent-driven**: The agent can create widgets dynamically based on conversation +- **Example**: "Show me a chart of today's errors" → agent creates a widget with error data +- **Temporary widgets**: Session-scoped widgets that disappear when the session ends +- **Pinning**: User can pin a temporary widget to make it permanent + +### 4. Dashboard API +- `GET /api/dashboards` — list all dashboard profiles +- `POST /api/dashboards` — create a new dashboard +- `PUT /api/dashboards/:id` — update dashboard (layout, widgets) +- `DELETE /api/dashboards/:id` — remove dashboard +- `GET /api/dashboards/:id/widgets` — list widgets in a dashboard +- `POST /api/dashboards/:id/widgets` — add widget to dashboard +- `PUT /api/dashboards/:id/widgets/:wid` — update widget config +- `DELETE /api/dashboards/:id/widgets/:wid` — remove widget +- `POST /api/dashboards/:id/export` — export dashboard config +- `POST /api/dashboards/import` — import dashboard config + +### 5. Dashboard Builder UI +- **Location**: Enhanced Dashboard page with "Edit" mode +- **Features**: + - Widget marketplace/catalog browser + - Dashboard profile switcher and manager + - Widget configuration panel (data source, refresh rate, appearance) + - Dashboard templates gallery + - Export/import buttons + - "Create widget from conversation" action + +### Backend Architecture +- `src/services/dashboard.ts` — dashboard CRUD and widget management +- `src/services/widget-registry.ts` — widget type registry and validation +- `src/webui/routes/dashboards.ts` — API endpoints + +### Implementation Steps + +1. Design dashboard and widget schemas +2. Refactor existing widget system to use data-driven definitions +3. Implement built-in widget renderers (chart, table, text, KPI) +4. Create dashboard profile management +5. Build widget plugin system with dynamic loading +6. Implement dashboard templates +7. Add AI-generated widget support +8. Create dashboard management API endpoints +9. Build dashboard builder UI with catalog and profiles + +### Files to Modify +- `src/services/` — new dashboard and widget registry services +- `src/webui/routes/` — add dashboard endpoints +- `web/src/pages/Dashboard.tsx` — refactor to support dynamic widgets and profiles +- `web/src/components/widgets/` — refactor to data-driven rendering +- `web/src/components/` — dashboard builder, widget catalog components + +### Notes +- **High complexity** — refactoring the existing widget system while maintaining backward compatibility +- Start by extracting existing widgets into the plugin format, then add new capabilities +- AI-generated widgets need careful sandboxing — user data should not be exposed in untrusted widgets +- Dashboard export/import reuses patterns from the existing config export (PR #82) +- Consider a max widget count per dashboard (e.g., 20) for performance +- Widget data sources must respect the same security policies as regular API endpoints diff --git a/improvements/v2-18-ai-widget-generator.md b/improvements/v2-18-ai-widget-generator.md new file mode 100644 index 00000000..00f52e81 --- /dev/null +++ b/improvements/v2-18-ai-widget-generator.md @@ -0,0 +1,96 @@ +# AI Widget Generator + +## Current State + +Dashboard widgets are pre-defined components built at development time. The existing widget registry (`web/src/components/widgets/`) contains fixed implementations for stats, charts, logs, and actions. Users cannot create custom visualizations without modifying the source code. + +## Problem + +- Creating new widgets requires developer effort +- Users cannot visualize arbitrary data from agent interactions +- No way to quickly prototype a visualization for ad-hoc analysis +- Custom reporting needs are unmet without code changes +- Agent-generated insights have no dedicated display mechanism + +## What to Implement + +### 1. Natural Language Widget Creation +- **User flow**: "Create a widget showing tool usage by hour" → AI generates widget definition +- **AI pipeline**: + 1. Parse natural language request + 2. Identify data source (which API endpoint, which metrics) + 3. Choose appropriate visualization (chart type, layout) + 4. Generate widget configuration + 5. Preview and refine +- **Supported outputs**: Line chart, bar chart, pie chart, table, KPI card, list, markdown text + +### 2. Widget Configuration Generator +- **Input**: Natural language description + optional data sample +- **Output**: Complete widget definition (data source, renderer, styling, refresh interval) +- **Templates**: Pre-built generation templates for common patterns: + - "Show me X over time" → line chart + - "Compare X across categories" → bar chart + - "What percentage of X is Y" → pie chart + - "List recent X" → table with sorting + - "Current value of X" → KPI card + +### 3. Data Source Auto-Detection +- **API catalog**: The generator knows all available API endpoints and their response schemas +- **Schema matching**: Match user's data request to the best API endpoint +- **Data transformation**: Generate data mapping functions (JSON path, aggregations, filters) +- **Fallback**: If no existing API matches, suggest creating a custom endpoint + +### 4. Interactive Refinement +- **Preview**: Show generated widget with real data before saving +- **Refinement prompts**: "Make it a bar chart instead", "Add the last 30 days", "Group by week" +- **Style adjustment**: "Use blue colors", "Make it larger", "Add a title" +- **Undo/redo**: Step back through refinement history + +### 5. Widget Generator UI +- **Location**: Accessible from Dashboard edit mode and via command palette (Cmd+K) +- **Features**: + - Natural language input field with autocomplete suggestions + - Live preview panel showing generated widget + - Refinement chat (conversational widget editing) + - Save to dashboard with one click + - Template gallery for common widget types + - Recently generated widgets list + +### 6. Widget Generator API +- `POST /api/widgets/generate` — generate widget from natural language description +- `POST /api/widgets/refine` — refine an existing generated widget +- `GET /api/widgets/templates` — list generation templates +- `GET /api/widgets/data-sources` — list available data sources with schemas +- `POST /api/widgets/preview` — preview widget with real data + +### Backend Architecture +- `src/services/widget-generator.ts` — AI-powered widget generation +- `src/services/data-source-catalog.ts` — API endpoint catalog and schema registry +- `src/webui/routes/widget-generator.ts` — API endpoints + +### Implementation Steps + +1. Build API endpoint catalog with response schemas +2. Implement widget generation prompt templates +3. Build natural language → widget definition pipeline +4. Create data source auto-detection and matching +5. Implement interactive refinement loop +6. Build live preview with real data fetching +7. Create widget generator API endpoints +8. Build generator UI with input, preview, and refinement +9. Integrate with dashboard profiles (v2-17) + +### Files to Modify +- `src/services/` — new widget generator and data source catalog +- `src/webui/routes/` — add generator endpoints +- `web/src/components/` — widget generator UI, preview panel, refinement chat +- `web/src/pages/Dashboard.tsx` — add "Generate Widget" action button + +### Notes +- **High complexity** — reliable natural language → visualization pipeline requires careful prompting +- Depends on v2-17 (Dynamic Dashboard) for the widget plugin system +- Widget generation uses LLM calls — adds cost per generation request +- Generated widgets should be validated against the widget schema before saving +- Consider caching generated definitions for similar requests +- Start with simple chart types; expand renderer support over time +- Privacy: generated widgets should only access data the user is authorized to see diff --git a/improvements/v2-19-feedback-learning.md b/improvements/v2-19-feedback-learning.md new file mode 100644 index 00000000..2a212788 --- /dev/null +++ b/improvements/v2-19-feedback-learning.md @@ -0,0 +1,114 @@ +# Feedback Learning System + +## Current State + +The agent has no mechanism to learn from user feedback. When a user corrects the agent, provides positive reinforcement, or expresses dissatisfaction, this information is not captured or used to improve future interactions. The self-correcting loop (v2-10) handles within-session corrections, but there is no cross-session learning. + +## Problem + +- Agent makes the same mistakes repeatedly across sessions +- Positive feedback is not reinforced — good patterns are not strengthened +- No mechanism to capture implicit feedback (user edits agent output, retries with different phrasing) +- Cannot adapt behavior based on accumulated user preferences +- No feedback data for prompt optimization or model fine-tuning + +## What to Implement + +### 1. Feedback Capture +- **Explicit feedback**: + - Thumbs up/down buttons on agent responses + - Text feedback field ("What could be better?") + - Rating scale (1-5) for response quality + - "This was helpful" / "This was not helpful" quick actions +- **Implicit feedback signals**: + - User rephrases the same question → previous response was unclear + - User immediately asks a follow-up correction → response had errors + - User accepts output without modification → response was good + - User copies agent output → high value response + - Response time before next message → processing/satisfaction indicator +- **Storage**: `feedback (id, session_id, message_id, type, rating, text, implicit_signals JSON, created_at)` + +### 2. Feedback Analysis Engine +- **Pattern extraction**: Identify recurring feedback themes + - "Agent is too verbose" → reduce response length + - "Code examples don't work" → improve code generation prompts + - "Wrong tool selection" → adjust tool selection heuristics +- **Sentiment tracking**: Aggregate satisfaction over time +- **Topic-feedback correlation**: Which topics get the most negative feedback? +- **Agent-feedback correlation**: If multi-agent (v2-07), which agent types perform best? + +### 3. Learning Application +- **Prompt adjustment**: Feed feedback patterns into system prompt modifications + - Negative patterns → add explicit instructions to avoid + - Positive patterns → reinforce in prompts +- **Preference model**: Build user preference profile + - Response length preference (concise vs. detailed) + - Code style preference (commented vs. clean) + - Interaction style (formal vs. casual) +- **Tool selection bias**: Adjust tool selection weights based on success feedback +- **Memory integration**: Store learned preferences in memory system (v2-01) + +### 4. Feedback Loop Metrics +- **Tracked metrics**: + - Overall satisfaction score (rolling average) + - Improvement trend (is the agent getting better?) + - Feedback coverage (what % of responses get feedback?) + - Top improvement opportunities (most common negative themes) +- **Alerting**: Notify if satisfaction drops below threshold + +### 5. Feedback UI +- **Location**: Inline on every agent response + dedicated Feedback page +- **Inline features**: + - Thumbs up/down (minimal friction) + - Expandable text feedback field + - Quick-tag options (too long, too short, wrong, helpful) +- **Feedback dashboard**: + - Satisfaction trend chart + - Feedback theme word cloud or list + - Most improved / most problematic areas + - User preference profile summary + - Export feedback data for external analysis + +### 6. Feedback API +- `POST /api/feedback` — submit feedback for a response +- `GET /api/feedback?session=...&from=...&to=...` — query feedback history +- `GET /api/feedback/analytics` — feedback statistics and trends +- `GET /api/feedback/themes` — extracted feedback themes +- `GET /api/feedback/preferences` — current learned user preferences +- `PUT /api/feedback/preferences` — manually adjust preferences + +### Backend Architecture +- `src/services/feedback/capture.ts` — feedback collection and implicit signal detection +- `src/services/feedback/analyzer.ts` — pattern extraction and sentiment analysis +- `src/services/feedback/learner.ts` — preference model and prompt adjustment +- `src/webui/routes/feedback.ts` — API endpoints + +### Implementation Steps + +1. Design feedback storage schema +2. Implement explicit feedback capture (thumbs up/down, text, rating) +3. Implement implicit feedback signal detection +4. Build feedback analysis engine with pattern extraction +5. Create user preference model +6. Implement prompt adjustment based on feedback patterns +7. Integrate tool selection bias from feedback +8. Create feedback API endpoints +9. Build inline feedback UI and feedback dashboard +10. Add satisfaction alerting + +### Files to Modify +- `src/services/feedback/` — new directory for feedback system +- `src/agent/runtime.ts` — integrate feedback-based prompt adjustments +- `src/webui/routes/` — add feedback endpoints +- `web/src/components/` — inline feedback buttons, feedback dashboard +- `web/src/pages/` — new feedback analytics page or section +- `config.example.yaml` — add feedback system config + +### Notes +- **High complexity** — implicit feedback detection and preference modeling are nuanced +- Start with explicit feedback (thumbs up/down) — it's simple and immediately valuable +- Implicit feedback signals need careful calibration to avoid false positives +- Feedback-driven prompt changes should be conservative — small, incremental adjustments +- Store raw feedback for potential future model fine-tuning +- Privacy: feedback data may contain sensitive information; respect retention policies +- This feature synergizes with v2-10 (Self-Correcting Loop) and v2-20 (Adaptive Prompting) diff --git a/improvements/v2-20-adaptive-prompting.md b/improvements/v2-20-adaptive-prompting.md new file mode 100644 index 00000000..1387c342 --- /dev/null +++ b/improvements/v2-20-adaptive-prompting.md @@ -0,0 +1,114 @@ +# Adaptive Prompting Engine + +## Current State + +The Soul Editor allows manual editing of system prompts stored as markdown files. Prompts are static — once written, they don't change based on performance or user feedback. The existing template system (PR #42) provides starting points, but there is no optimization loop that improves prompts over time. + +## Problem + +- Prompts are static and manually maintained +- No data-driven optimization of prompt effectiveness +- A/B testing of different prompts requires manual setup +- Cannot adapt prompts to individual user communication styles +- Performance variations across prompt versions are not measured +- Soul file changes are trial-and-error with no metrics + +## What to Implement + +### 1. Prompt Variant Management +- **Variant system**: Multiple versions of each prompt section, tracked with metrics +- **Storage**: `prompt_variants (id, section, version, content, active, metrics JSON, created_at)` +- **Sections**: System prompt can be split into independently optimizable sections: + - Persona / role definition + - Instructions / guidelines + - Tool usage guidance + - Response format rules + - Safety guardrails +- **Activation**: One active variant per section, rest are candidates + +### 2. A/B Testing Framework +- **Experiment definition**: Compare variant A vs. B for a section over N interactions +- **Traffic splitting**: Configurable percentage split (e.g., 80/20 existing/new) +- **Metrics tracked per variant**: + - User satisfaction (from v2-19 feedback) + - Task success rate + - Response quality score (from v2-10 self-evaluation) + - Token usage efficiency + - Error rate +- **Statistical significance**: Minimum sample size before declaring a winner +- **Auto-promotion**: Winning variant automatically becomes the active version + +### 3. AI-Powered Prompt Optimization +- **Optimization pipeline**: + 1. Collect performance metrics for current prompt + 2. Analyze failure patterns and low-scoring responses + 3. Generate improved prompt variant using LLM meta-prompting + 4. Validate variant against test cases + 5. Deploy as A/B test candidate +- **Meta-prompting**: Use an LLM to analyze prompt weaknesses and suggest improvements +- **Guard rails**: Generated variants must pass safety validation before deployment + +### 4. Context-Adaptive Prompts +- **Dynamic sections**: Prompt sections that change based on context: + - User experience level → adjust explanation depth + - Conversation topic → activate domain-specific instructions + - Time of day → adjust formality level + - Feedback history → avoid known user pet peeves +- **Template variables**: `{user_preference_style}`, `{current_context}`, `{active_tools}` +- **Integration**: Pull context from memory (v2-01), feedback (v2-19), temporal engine (v2-11) + +### 5. Prompt Optimization UI +- **Location**: Enhance Soul Editor page +- **Features**: + - Variant manager: list, create, activate, deactivate variants + - A/B test dashboard: experiment status, metrics comparison, significance indicators + - Performance history per section: how each section has improved over time + - AI optimization panel: "Suggest improvement" button with preview and deploy + - Prompt diff viewer: compare variants side-by-side + - Test case manager: define inputs and expected outputs for validation + +### 6. Adaptive Prompting API +- `GET /api/prompts/sections` — list prompt sections with active variants +- `GET /api/prompts/sections/:section/variants` — list variants for a section +- `POST /api/prompts/sections/:section/variants` — create new variant +- `PUT /api/prompts/sections/:section/variants/:id/activate` — activate a variant +- `POST /api/prompts/experiments` — create A/B test experiment +- `GET /api/prompts/experiments/:id` — experiment status and metrics +- `POST /api/prompts/optimize` — trigger AI optimization for a section +- `GET /api/prompts/performance` — overall prompt performance metrics + +### Backend Architecture +- `src/services/prompts/variant-manager.ts` — variant CRUD and activation +- `src/services/prompts/ab-testing.ts` — experiment management and traffic splitting +- `src/services/prompts/optimizer.ts` — AI-powered prompt generation and validation +- `src/services/prompts/context-adapter.ts` — dynamic context injection +- `src/webui/routes/prompts.ts` — API endpoints + +### Implementation Steps + +1. Design prompt variant and experiment schemas +2. Implement variant manager with activation logic +3. Build A/B testing framework with traffic splitting +4. Integrate metric collection from feedback and self-evaluation systems +5. Implement statistical significance calculation +6. Build AI-powered prompt optimization pipeline +7. Implement context-adaptive prompt assembly +8. Create prompt optimization API endpoints +9. Build Soul Editor UI enhancements (variant manager, A/B dashboard) +10. Add auto-promotion logic for winning variants + +### Files to Modify +- `src/services/prompts/` — new directory for prompt optimization +- `src/soul/` — integrate variant system with soul file loading +- `src/agent/runtime.ts` — use adaptive prompt assembly +- `src/webui/routes/` — add prompt optimization endpoints +- `web/src/pages/Soul.tsx` — add variant manager and optimization UI +- `config.example.yaml` — add prompt optimization config + +### Notes +- **Very High complexity** — A/B testing infrastructure and AI optimization are substantial +- Start with manual variant management before adding AI optimization +- A/B testing requires sufficient traffic volume — may not be viable for low-usage deployments +- Safety guardrails are critical — AI-generated prompts must be reviewed before production use +- Prompt optimization is a feedback loop: feedback → analysis → variant → test → promote +- Depends on v2-19 (Feedback Learning) for quality metrics and v2-10 (Self-Correcting) for evaluation diff --git a/improvements/v2-21-multi-agent-network.md b/improvements/v2-21-multi-agent-network.md new file mode 100644 index 00000000..1ecbfc13 --- /dev/null +++ b/improvements/v2-21-multi-agent-network.md @@ -0,0 +1,131 @@ +# Multi-Agent Network Protocol + +## Current State + +The agent operates as an isolated instance. Even with the proposed multi-agent support (v2-07), agents within the same deployment would communicate through shared databases and in-process messaging. There is no protocol for agents running on different machines or in different deployments to discover each other, negotiate capabilities, and collaborate on tasks. + +## Problem + +- Agents are isolated — no cross-instance communication +- Cannot distribute work across multiple deployments +- No agent discovery mechanism for distributed environments +- No standardized protocol for agent-to-agent messaging +- Cannot form agent teams across organizational boundaries +- No trust model for inter-agent communication + +## What to Implement + +### 1. Agent Discovery Protocol +- **Registry service**: Central or distributed agent registry +- **Agent advertisement**: Each agent publishes its capabilities, availability, and endpoint + ```json + { + "agentId": "agent-001", + "name": "ResearchBot", + "capabilities": ["web-search", "summarization", "translation"], + "endpoint": "https://agent-001.example.com/api/agent-network", + "status": "available", + "load": 0.3, + "publicKey": "..." + } + ``` +- **Discovery modes**: + - **Central registry**: All agents register with a known registry server + - **Peer-to-peer**: Agents discover each other via broadcast or known peer lists + - **DNS-based**: Agent endpoints published as DNS SRV records + +### 2. Inter-Agent Messaging Protocol +- **Message format** (JSON over HTTPS): + ```json + { + "type": "task_request | task_response | capability_query | heartbeat", + "from": "agent-001", + "to": "agent-002", + "correlationId": "uuid", + "payload": { ... }, + "signature": "...", + "timestamp": "ISO-8601" + } + ``` +- **Message types**: + - `capability_query` — "Can you handle task type X?" + - `task_request` — "Please execute this task" + - `task_response` — "Here are the results" + - `heartbeat` — "I'm alive and available" + - `negotiation` — capability and terms negotiation +- **Transport**: HTTPS REST + optional WebSocket for streaming + +### 3. Trust and Security +- **Authentication**: Mutual TLS or signed messages (Ed25519) +- **Authorization**: Capability-based — agents only accept tasks matching their published capabilities +- **Trust levels**: + - `trusted` — full access, share all results + - `verified` — authenticated but limited data sharing + - `untrusted` — minimal interaction, sandboxed execution +- **Allowlist/blocklist**: Configurable per-agent access control +- **Audit**: All inter-agent messages logged in audit trail (v2-14) + +### 4. Task Coordination +- **Distributed task delegation**: Orchestrator agent delegates subtasks to remote agents +- **Load balancing**: Route tasks to least-loaded capable agent +- **Failover**: If an agent goes offline, reassign its pending tasks +- **Result aggregation**: Collect and merge results from multiple remote agents +- **Timeout**: Per-task timeout with configurable escalation + +### 5. Network Management UI +- **Location**: New "Network" page in WebUI +- **Features**: + - Network topology visualization (connected agents graph) + - Agent status dashboard (online/offline, load, capabilities) + - Message flow monitor (real-time inter-agent traffic) + - Trust management (configure per-agent trust levels) + - Network health indicators (latency, error rates) + - Manual agent registration and removal + +### 6. Network API +- `GET /api/network/agents` — list known agents in the network +- `POST /api/network/agents` — register a remote agent +- `DELETE /api/network/agents/:id` — remove agent from network +- `GET /api/network/agents/:id/capabilities` — query agent capabilities +- `POST /api/network/agents/:id/tasks` — send task to remote agent +- `GET /api/network/status` — network health overview +- `PUT /api/network/agents/:id/trust` — set trust level for an agent +- `GET /api/network/messages?from=...&to=...` — message log + +### Backend Architecture +- `src/services/network/discovery.ts` — agent discovery and registration +- `src/services/network/messenger.ts` — inter-agent messaging with signing +- `src/services/network/trust.ts` — trust model and access control +- `src/services/network/coordinator.ts` — distributed task coordination +- `src/webui/routes/network.ts` — API endpoints + +### Implementation Steps + +1. Define agent network protocol specification (message formats, discovery, auth) +2. Implement agent discovery service with central registry mode +3. Build inter-agent messenger with message signing +4. Implement trust model with authentication and authorization +5. Build distributed task coordinator with load balancing +6. Implement failover and task reassignment +7. Create network management API endpoints +8. Build network topology UI with monitoring +9. Add peer-to-peer discovery mode +10. Write protocol documentation + +### Files to Modify +- `src/services/network/` — new directory for network protocol +- `src/agent/runtime.ts` — integrate with network for remote task handling +- `src/webui/routes/` — add network endpoints +- `web/src/pages/` — new `Network.tsx` page +- `web/src/App.tsx` — add network route +- `config.example.yaml` — add network config (registry URL, keys, trust defaults) + +### Notes +- **Very High complexity** — distributed systems with security is the most challenging feature in this epic +- This is an **advanced/optional** feature — implement only after the single-instance multi-agent system is stable +- Start with the central registry mode; add P2P later +- Security is critical — inter-agent communication over the internet must be encrypted and authenticated +- Consider using an existing protocol (e.g., ActivityPub, Matrix) as a foundation rather than building from scratch +- Network partition handling: agents must function independently when disconnected +- Rate limiting: prevent a rogue agent from flooding the network +- Depends on v2-07 (Agent Registry), v2-08 (Task Delegation), and v2-14 (Audit Trail) From f71e1059cd1859c6391edd1eac30e3d4b5cc3210 Mon Sep 17 00:00:00 2001 From: konard Date: Fri, 20 Mar 2026 00:37:46 +0000 Subject: [PATCH 3/3] Revert "Initial commit with task details" This reverts commit 1709a228ba5b1ae3b8cb48847d8bdc96274d0144. --- .gitkeep | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.gitkeep b/.gitkeep index 75b77880..092d6af2 100644 --- a/.gitkeep +++ b/.gitkeep @@ -1,2 +1 @@ -# .gitkeep file auto-generated at 2026-03-18T18:52:42.191Z for PR creation at branch issue-43-e5f7241e2b5e for issue https://github.com/xlabtg/teleton-agent/issues/43 -# Updated: 2026-03-20T00:25:58.685Z \ No newline at end of file +# .gitkeep file auto-generated at 2026-03-18T18:52:42.191Z for PR creation at branch issue-43-e5f7241e2b5e for issue https://github.com/xlabtg/teleton-agent/issues/43 \ No newline at end of file