Conversation
Implements LLM-based intent classification and tool execution for the Responses API: - Intent classifier: routes user messages to 'chat' or 'web_search' - Query extractor: extracts clean search queries from natural language - Tool execution framework with web_search (mock implementation) - Stream-first architecture: tools persist through storage channels - Synchronization: oneshot barrier for tool persistence before prompt rebuild - Context builder: automatically includes tool_call and tool_output from DB - SSE events: tool_call.created and tool_output.created for client streaming - Security: sensitive data (queries, user content) only logged at TRACE level New modules: - prompts.rs: Intent classification and query extraction prompts - tools.rs: Tool execution registry and web search (to be replaced with real API) Phase flow: 1. Validate input 2. Build context and check billing 3. Persist user message 4. Create dual streams (client + storage) 5. Optional: Classify intent and execute tools 6. Setup completion processor (rebuilds prompt if tools executed) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
… tool persistence Replace PersistTools message variant (containing non-cloneable oneshot sender) with immediate tool persistence and dedicated oneshot acknowledgment channel. - Remove PersistTools variant from StorageMessage enum - Make StorageMessage fully cloneable (derive Clone instead of manual impl) - Storage now persists tool_call and tool_output immediately upon receipt - Storage sends oneshot acknowledgment after tool_output is persisted - Handler waits for acknowledgment via rx_tool_ack before continuing - Simplifies architecture: tools persist on arrival, no accumulation needed Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replaces placeholder web search with production-ready Kagi Search API. Adds kagi-api-rust as git submodule and integrates secure API key management following existing patterns. Search results include direct answers, weather, infoboxes, web results, and news with proper formatting for LLM consumption. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Implement OpenAI Responses API pattern where web_search tool must be explicitly enabled in request tools array. Classification and tool execution now only run when client specifies {type: 'web_search'} in tools.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Removes third-party submodule dependency and implements a lightweight ~200 line Kagi client with connection pooling and proper timeout handling. Benefits: - Full control over code running in secure enclave - No submodule complexity for Nix builds - Connection pooling (100 idle connections) for high-volume usage - Fast timeouts (10s request, 5s connect) for interactive workflows - Client initialized once at startup and reused across all requests Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…stamp ordering Previously, the assistant placeholder was created in Phase 3 (early), causing it to have an earlier created_at timestamp than any tools executed in Phase 5. This resulted in incorrect message ordering when querying by created_at: assistant would appear BEFORE its tools in the conversation. Now the assistant placeholder is created in Phase 6, after tools complete, ensuring the correct semantic order: user → tool_call → tool_output → assistant. The assistant message only needs to exist before the storage task UPDATEs it (when streaming completes), not during the entire lifecycle. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Move phases 4-6 (channels, tools, completion) inside SSE stream and execute in background orchestrator task. This allows event loop to start immediately, enabling real-time streaming of tool events and assistant content without blocking delays. Key changes: - Spawn orchestrator task for tool execution and completion setup - Start event loop immediately after channel creation - Add AssistantMessageStarting signal to coordinate assistant UI timing - Delay output_item.added/content_part.added until completion is ready Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add complete deployment documentation and infrastructure setup for Kagi Search API integration including vsock proxy configuration, API key encryption/storage, and entrypoint traffic forwarding. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
The KagiClient now automatically adds the "Bot " prefix to API keys if not already present. This allows storing clean tokens in environment variables while maintaining proper authorization header formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Make Kagi Search properly optional throughout the codebase:
- Remove startup panic if Kagi client initialization fails
- Skip web search classification if Kagi client unavailable
- Application continues running without web search feature
Improve tool call argument handling:
- Add error logging for JSON parse/serialization failures
- Use safe fallbacks ("{}" string) instead of empty strings
- Skip malformed tool calls rather than corrupting conversation
- Match OpenAI format: arguments field is JSON string, not object
Changes ensure no panics occur and errors are handled gracefully
while maintaining data integrity in the conversation flow.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add validation to check if tool_choice is set to "none" before executing web search tools. This prevents data leakage to external services (Kagi) when users explicitly disable tool usage. Changes: - Add is_tool_choice_allowed() helper that returns false only when tool_choice is explicitly "none" - Update tool execution condition to check tool_choice first - Maintain existing behavior for all other tool_choice values (including when not set) This fix ensures compliance with the OpenAI API contract by respecting the tool_choice parameter before making external tool calls. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix AssistantMessageStarting race by sending signal before processor spawn - Add cancellation support to orchestrator with tokio::select - Optimize memory usage with Arc for prompt_messages (reduces clones) Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
When storage channel closes during tool execution (catastrophic failure): - Send error event to client immediately - Abort orchestrator to prevent wasted LLM API calls - Treat as cancellation using existing cleanup infrastructure - Add comprehensive comments explaining failure scenarios This prevents: - Client hanging indefinitely waiting for completion - Wasted API credits calling LLM when data can't be persisted - Continuing with broken storage infrastructure Storage channel failure indicates serious issues (task died or buffer full) requiring immediate investigation. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Instead of tracking pending_tool_call_db_id in memory, we now look up the tool_call by UUID when persisting tool_output. This is more reliable since the DB is the source of truth and eliminates stale state issues. Changes: - Add ToolCall::get_by_uuid() model method - Add get_tool_call_by_uuid() to DBConnection trait - Remove pending_tool_call_db_id from storage task - Use db.get_tool_call_by_uuid() when persisting tool outputs The UUID lookup is efficient (UNIQUE index) and tool_calls are always persisted before tool_outputs, so the lookup will succeed in normal operation. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add user_id parameter to get_tool_call_by_uuid() to prevent unauthorized access to tool calls. Without this check, an attacker who learns a tool call UUID could access tool arguments that may contain sensitive data. Changes: - Add user_id parameter to ToolCall::get_by_uuid() - Add user_id filter to the database query - Update DBConnection trait and implementation - Pass user_id from storage task to the lookup This ensures tool calls can only be accessed by their owner. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ification Add OpenSecretAdapter implementing DSRs Adapter trait to bridge our custom LLM infrastructure (billing, routing, auth) with DSRs's structured prompting framework. Create IntentClassifier and QueryExtractor modules following DSPy patterns for cleaner, more maintainable code. Changes: - Implement OpenSecretAdapter: Custom Adapter that delegates formatting/parsing to ChatAdapter but uses our OpenSecretLM for actual LLM calls, bypassing the passed Arc<LM> parameter (necessary workaround until DSRs supports LM as trait) - Create dspy_modules.rs: DSPy-style wrapper modules (IntentClassifier, QueryExtractor) that encapsulate signatures and provide clean domain APIs (classify(), extract()) while internally calling our custom adapter - Refactor classify_and_execute_tools: Replace manual LM calls and message construction with clean module interfaces, reducing code by ~40 lines while improving readability and maintainability - Add dummy LM creation: Create Arc<LM> instances to satisfy DSRs API type requirements, though our adapter ignores them in favor of OpenSecretLM Architecture notes: This is the closest we can get to idiomatic DSPy without upstream changes to DSRs. We're not using Predict.forward_with_config() because it's hardcoded to ChatAdapter. Instead, we call our OpenSecretAdapter directly, which implements the same Adapter trait but uses our infrastructure. This approach enables: - Structured prompting with type-safe signatures - Clean, testable module interfaces - Reuse of existing billing/routing/auth infrastructure - Future migration path when DSRs supports trait-based LMs Limitations: - Cannot use DSRs optimizers (GEPA, COPRO) yet - requires Predict modules - Bypasses Predict abstraction by calling adapter.call() directly - Creates unused dummy LM instances to satisfy type system Related: Will propose making LM a trait in DSRs upstream to enable proper integration with custom infrastructure while using standard Predict modules. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Greptile Overview
Summary
This PR implements DSPy-style modules for tool use classification, migrating from manual prompt construction to a more structured approach using the DSRs library.
Key Changes
- New DSPy Integration: Added
dspy-rsdependency from main branch and created customOpenSecretAdapterthat bridges DSPy's API with OpenSecret's existing completions infrastructure - Intent Classification Module: Replaced manual prompt-based classification with
IntentClassifierDSPy module that determines if user needs web search or chat - Query Extraction Module: Added
QueryExtractorDSPy module to extract clean search queries from natural language - Maintained Infrastructure: Preserved existing billing, routing, retry logic, and auth handling by integrating DSPy calls through the centralized completions API
- Backward Compatibility: Marked legacy prompt templates as deprecated but kept them in place
Architecture
The implementation uses a custom adapter pattern where:
- DSPy
Signaturedefines input/output structure for classification and extraction OpenSecretAdapterimplements the DSPyAdaptertrait but ignores the provided LM parameter- Instead, it creates
OpenSecretLMthat calls OpenSecret's existingget_chat_completion_response() - This ensures all LLM calls go through centralized billing, routing, and error handling
Notable Design Decisions
- Uses dummy LM instances to satisfy DSPy's API requirements, but the adapter ignores them
- Defaults to "chat" mode if classification fails (best-effort approach)
- Falls back to original user message if query extraction fails
- Uses fast model (llama-3.3-70b) with temperature=0 for deterministic classification
Confidence Score: 3/5
- Safe to merge with moderate risk - dependency on main branch and fallback behaviors need monitoring
- Score reflects the dependency stability risk (using main branch instead of pinned version), potential overhead from repeated dummy LM initialization, and silent fallback behaviors that could mask failures. The core logic is sound and integrates well with existing infrastructure, but production monitoring will be important to catch classification failures and unexpected DSPy API changes.
- Pay close attention to
Cargo.toml(unstable dependency),dspy_modules.rs(repeated async initialization overhead), andhandlers.rs(silent error fallbacks)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| Cargo.toml | 4/5 | Added dspy-rs dependency from main branch and anyhow error handling library |
| src/web/responses/dspy_adapter.rs | 3/5 | New custom DSPy adapter integrating with existing completions API infrastructure for billing and routing |
| src/web/responses/dspy_modules.rs | 4/5 | DSPy-style modules for intent classification and query extraction using OpenSecretAdapter |
| src/web/responses/handlers.rs | 3/5 | Replaced manual prompt construction with DSPy modules for intent classification and query extraction |
Sequence Diagram
sequenceDiagram
participant Client
participant Handler as handlers.rs
participant Classifier as IntentClassifier
participant Extractor as QueryExtractor
participant Adapter as OpenSecretAdapter
participant LM as OpenSecretLM
participant API as Completions API
participant Tools as tools::execute_tool
Client->>Handler: POST /v1/responses (user message)
Handler->>Handler: validate_and_normalize_input()
Handler->>Handler: build_context_and_check_billing()
Handler->>Handler: persist_request_data()
Handler->>Classifier: classify(message)
Classifier->>Adapter: call(dummy_lm, signature, input)
Adapter->>Adapter: format(signature, inputs)
Adapter->>LM: call(messages)
LM->>API: get_chat_completion_response()
API-->>LM: FullResponse
LM-->>Adapter: LMResponse
Adapter->>Adapter: parse_response()
Adapter-->>Classifier: Prediction{intent}
Classifier-->>Handler: "web_search" or "chat"
alt intent == "web_search"
Handler->>Extractor: extract(message)
Extractor->>Adapter: call(dummy_lm, signature, input)
Adapter->>LM: call(messages)
LM->>API: get_chat_completion_response()
API-->>LM: FullResponse
LM-->>Adapter: LMResponse
Adapter-->>Extractor: Prediction{search_query}
Extractor-->>Handler: search query string
Handler->>Handler: send ToolCall to streams
Handler->>Tools: execute_tool("web_search", args)
Tools-->>Handler: tool output
Handler->>Handler: send ToolOutput to streams
end
Handler->>API: get_chat_completion_response() (final response)
API-->>Handler: CompletionChunk stream
Handler-->>Client: SSE events with assistant response
6 files reviewed, 7 comments
| # Using main branch since v0.6.0 tag doesn't exist yet | ||
| dspy-rs = { git = "https://github.com/krypticmouse/DSRs", branch = "main" } |
There was a problem hiding this comment.
logic: dependency points to main branch instead of a specific commit or version tag
Using main branch creates instability - any upstream changes will break your build without warning. Pin to specific commit hash or wait for v0.6.0 release.
| # Using main branch since v0.6.0 tag doesn't exist yet | |
| dspy-rs = { git = "https://github.com/krypticmouse/DSRs", branch = "main" } | |
| # DSPy for structured prompting and optimization | |
| dspy-rs = { git = "https://github.com/krypticmouse/DSRs", rev = "47ca636086403344b91fc4a90108f99a27101ae6" } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: Cargo.toml
Line: 81:82
Comment:
**logic:** dependency points to `main` branch instead of a specific commit or version tag
Using `main` branch creates instability - any upstream changes will break your build without warning. Pin to specific commit hash or wait for v0.6.0 release.
```suggestion
# DSPy for structured prompting and optimization
dspy-rs = { git = "https://github.com/krypticmouse/DSRs", rev = "47ca636086403344b91fc4a90108f99a27101ae6" }
```
How can I resolve this? If you propose a fix, please make it concise.| // We need a real LM instance to satisfy the API, but our adapter ignores it | ||
| let dummy_lm = Arc::new( | ||
| LM::builder() | ||
| .api_key("dummy_key".into()) | ||
| .build() | ||
| .await, | ||
| ); |
There was a problem hiding this comment.
style: dummy LM created with hardcoded "dummy_key" on every classifier instantiation
The adapter ignores this LM, but creating it with .await suggests it may perform initialization work (network calls, validation, etc.). Consider using lazy_static or caching to avoid repeated initialization overhead.
Also, if the LM builder validates the API key format, "dummy_key" might fail in future versions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_modules.rs
Line: 48:54
Comment:
**style:** dummy LM created with hardcoded `"dummy_key"` on every classifier instantiation
The adapter ignores this LM, but creating it with `.await` suggests it may perform initialization work (network calls, validation, etc.). Consider using `lazy_static` or caching to avoid repeated initialization overhead.
Also, if the LM builder validates the API key format, `"dummy_key"` might fail in future versions.
How can I resolve this? If you propose a fix, please make it concise.| // We need a real LM instance to satisfy the API, but our adapter ignores it | ||
| let dummy_lm = Arc::new( | ||
| LM::builder() | ||
| .api_key("dummy_key".into()) | ||
| .build() | ||
| .await, | ||
| ); |
There was a problem hiding this comment.
style: duplicate dummy LM initialization logic
Same concern as IntentClassifier - repeated .await calls may have overhead. Consider extracting to a shared get_dummy_lm() function or using a static instance.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_modules.rs
Line: 136:142
Comment:
**style:** duplicate dummy LM initialization logic
Same concern as IntentClassifier - repeated `.await` calls may have overhead. Consider extracting to a shared `get_dummy_lm()` function or using a static instance.
How can I resolve this? If you propose a fix, please make it concise.| prompt_tokens: usage_json | ||
| .get("prompt_tokens") | ||
| .and_then(|v| v.as_i64()) | ||
| .unwrap_or(0) as u32, | ||
| completion_tokens: usage_json | ||
| .get("completion_tokens") | ||
| .and_then(|v| v.as_i64()) | ||
| .unwrap_or(0) as u32, | ||
| total_tokens: usage_json | ||
| .get("total_tokens") | ||
| .and_then(|v| v.as_i64()) | ||
| .unwrap_or(0) as u32, | ||
| reasoning_tokens: Some(0), | ||
| } | ||
| } else { | ||
| LmUsage::default() | ||
| }; |
There was a problem hiding this comment.
style: token count conversions use as u32 without overflow checks
If token counts exceed u32::MAX (4.2B), this will silently wrap. While unlikely for current models, consider using .try_into() with proper error handling or document the assumption that token counts fit in u32.
DSPy v0.6.0 uses u32 for tokens, but OpenAI API returns i64, creating a type mismatch that could cause issues with future high-context models.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_adapter.rs
Line: 105:121
Comment:
**style:** token count conversions use `as u32` without overflow checks
If token counts exceed `u32::MAX` (4.2B), this will silently wrap. While unlikely for current models, consider using `.try_into()` with proper error handling or document the assumption that token counts fit in u32.
DSPy v0.6.0 uses u32 for tokens, but OpenAI API returns i64, creating a type mismatch that could cause issues with future high-context models.
How can I resolve this? If you propose a fix, please make it concise.| let classifier = IntentClassifier::new(state.clone(), user.clone()).await; | ||
|
|
||
| let intent = match classifier.classify(&user_text).await { | ||
| Ok(intent) => { | ||
| debug!("Classified intent: {}", intent); | ||
| intent | ||
| } | ||
| Err(e) => { | ||
| // Best effort - if classification fails, default to chat | ||
| warn!("Classification failed (defaulting to chat): {:?}", e); | ||
| warn!("DSPy classification failed (defaulting to chat): {:?}", e); | ||
| "chat".to_string() | ||
| } | ||
| }; |
There was a problem hiding this comment.
style: classification failure defaults to "chat" silently
If the DSPy call fails (network, quota, model error), the system defaults to chat mode without surfacing the error to the user. This means users won't get web search even when they need it. Consider logging at error! level (currently warn!) or adding metrics to track classification failures.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/handlers.rs
Line: 1208:1220
Comment:
**style:** classification failure defaults to `"chat"` silently
If the DSPy call fails (network, quota, model error), the system defaults to chat mode without surfacing the error to the user. This means users won't get web search even when they need it. Consider logging at `error!` level (currently `warn!`) or adding metrics to track classification failures.
How can I resolve this? If you propose a fix, please make it concise.| let search_query = match extractor.extract(&user_text).await { | ||
| Ok(query) => { | ||
| trace!("Extracted search query: {}", query); | ||
| debug!("Search query extracted successfully"); | ||
| query | ||
| } | ||
| Err(e) => { | ||
| warn!("Query extraction failed, using original message: {:?}", e); | ||
| warn!("DSPy query extraction failed, using original message: {:?}", e); | ||
| user_text.clone() | ||
| } | ||
| }; |
There was a problem hiding this comment.
style: query extraction failure falls back to original user_text
If extraction fails, the entire user message (potentially multi-sentence) becomes the search query. This could result in poor search results. Consider either failing the search or extracting key terms with a simpler heuristic as backup.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/handlers.rs
Line: 1229:1239
Comment:
**style:** query extraction failure falls back to original `user_text`
If extraction fails, the entire user message (potentially multi-sentence) becomes the search query. This could result in poor search results. Consider either failing the search or extracting key terms with a simpler heuristic as backup.
How can I resolve this? If you propose a fix, please make it concise.| &self, | ||
| _lm: Arc<LM>, // Ignored - we use our own LM |
There was a problem hiding this comment.
style: ignored lm parameter with comment explanation
This design creates tight coupling between DSPy's API expectations and OpenSecret's infrastructure. If DSPy changes its adapter API to require using the provided LM, this will break. Consider contributing upstream to make the LM parameter optional or adding integration tests to catch API changes early.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_adapter.rs
Line: 196:197
Comment:
**style:** ignored `lm` parameter with comment explanation
This design creates tight coupling between DSPy's API expectations and OpenSecret's infrastructure. If DSPy changes its adapter API to require using the provided LM, this will break. Consider contributing upstream to make the LM parameter optional or adding integration tests to catch API changes early.
How can I resolve this? If you propose a fix, please make it concise.47ef77d to
73cb442
Compare
No description provided.