Skip to content

Tool use dsrs#106

Open
AnthonyRonning wants to merge 20 commits intomasterfrom
tool-use-dsrs
Open

Tool use dsrs#106
AnthonyRonning wants to merge 20 commits intomasterfrom
tool-use-dsrs

Conversation

@AnthonyRonning
Copy link
Contributor

No description provided.

AnthonyRonning and others added 20 commits October 14, 2025 21:00
Implements LLM-based intent classification and tool execution for the Responses API:
- Intent classifier: routes user messages to 'chat' or 'web_search'
- Query extractor: extracts clean search queries from natural language
- Tool execution framework with web_search (mock implementation)
- Stream-first architecture: tools persist through storage channels
- Synchronization: oneshot barrier for tool persistence before prompt rebuild
- Context builder: automatically includes tool_call and tool_output from DB
- SSE events: tool_call.created and tool_output.created for client streaming
- Security: sensitive data (queries, user content) only logged at TRACE level

New modules:
- prompts.rs: Intent classification and query extraction prompts
- tools.rs: Tool execution registry and web search (to be replaced with real API)

Phase flow:
1. Validate input
2. Build context and check billing
3. Persist user message
4. Create dual streams (client + storage)
5. Optional: Classify intent and execute tools
6. Setup completion processor (rebuilds prompt if tools executed)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
… tool persistence

Replace PersistTools message variant (containing non-cloneable oneshot sender) with
immediate tool persistence and dedicated oneshot acknowledgment channel.

- Remove PersistTools variant from StorageMessage enum
- Make StorageMessage fully cloneable (derive Clone instead of manual impl)
- Storage now persists tool_call and tool_output immediately upon receipt
- Storage sends oneshot acknowledgment after tool_output is persisted
- Handler waits for acknowledgment via rx_tool_ack before continuing
- Simplifies architecture: tools persist on arrival, no accumulation needed

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replaces placeholder web search with production-ready Kagi Search API.
Adds kagi-api-rust as git submodule and integrates secure API key
management following existing patterns. Search results include direct
answers, weather, infoboxes, web results, and news with proper
formatting for LLM consumption.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Implement OpenAI Responses API pattern where web_search tool must be explicitly enabled in request tools array. Classification and tool execution now only run when client specifies {type: 'web_search'} in tools.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Removes third-party submodule dependency and implements a lightweight
~200 line Kagi client with connection pooling and proper timeout handling.

Benefits:
- Full control over code running in secure enclave
- No submodule complexity for Nix builds
- Connection pooling (100 idle connections) for high-volume usage
- Fast timeouts (10s request, 5s connect) for interactive workflows
- Client initialized once at startup and reused across all requests

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…stamp ordering

Previously, the assistant placeholder was created in Phase 3 (early), causing
it to have an earlier created_at timestamp than any tools executed in Phase 5.
This resulted in incorrect message ordering when querying by created_at:
assistant would appear BEFORE its tools in the conversation.

Now the assistant placeholder is created in Phase 6, after tools complete,
ensuring the correct semantic order: user → tool_call → tool_output → assistant.

The assistant message only needs to exist before the storage task UPDATEs it
(when streaming completes), not during the entire lifecycle.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Move phases 4-6 (channels, tools, completion) inside SSE stream and execute
in background orchestrator task. This allows event loop to start immediately,
enabling real-time streaming of tool events and assistant content without
blocking delays.

Key changes:
- Spawn orchestrator task for tool execution and completion setup
- Start event loop immediately after channel creation
- Add AssistantMessageStarting signal to coordinate assistant UI timing
- Delay output_item.added/content_part.added until completion is ready

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add complete deployment documentation and infrastructure setup for Kagi Search API integration including vsock proxy configuration, API key encryption/storage, and entrypoint traffic forwarding.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
The KagiClient now automatically adds the "Bot " prefix to API keys
if not already present. This allows storing clean tokens in environment
variables while maintaining proper authorization header formatting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Make Kagi Search properly optional throughout the codebase:
- Remove startup panic if Kagi client initialization fails
- Skip web search classification if Kagi client unavailable
- Application continues running without web search feature

Improve tool call argument handling:
- Add error logging for JSON parse/serialization failures
- Use safe fallbacks ("{}" string) instead of empty strings
- Skip malformed tool calls rather than corrupting conversation
- Match OpenAI format: arguments field is JSON string, not object

Changes ensure no panics occur and errors are handled gracefully
while maintaining data integrity in the conversation flow.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add validation to check if tool_choice is set to "none" before
executing web search tools. This prevents data leakage to external
services (Kagi) when users explicitly disable tool usage.

Changes:
- Add is_tool_choice_allowed() helper that returns false only when
  tool_choice is explicitly "none"
- Update tool execution condition to check tool_choice first
- Maintain existing behavior for all other tool_choice values
  (including when not set)

This fix ensures compliance with the OpenAI API contract by respecting
the tool_choice parameter before making external tool calls.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix AssistantMessageStarting race by sending signal before processor spawn
- Add cancellation support to orchestrator with tokio::select
- Optimize memory usage with Arc for prompt_messages (reduces clones)

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
When storage channel closes during tool execution (catastrophic failure):
- Send error event to client immediately
- Abort orchestrator to prevent wasted LLM API calls
- Treat as cancellation using existing cleanup infrastructure
- Add comprehensive comments explaining failure scenarios

This prevents:
- Client hanging indefinitely waiting for completion
- Wasted API credits calling LLM when data can't be persisted
- Continuing with broken storage infrastructure

Storage channel failure indicates serious issues (task died or buffer
full) requiring immediate investigation.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Instead of tracking pending_tool_call_db_id in memory, we now look up
the tool_call by UUID when persisting tool_output. This is more reliable
since the DB is the source of truth and eliminates stale state issues.

Changes:
- Add ToolCall::get_by_uuid() model method
- Add get_tool_call_by_uuid() to DBConnection trait
- Remove pending_tool_call_db_id from storage task
- Use db.get_tool_call_by_uuid() when persisting tool outputs

The UUID lookup is efficient (UNIQUE index) and tool_calls are always
persisted before tool_outputs, so the lookup will succeed in normal
operation.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Add user_id parameter to get_tool_call_by_uuid() to prevent unauthorized
access to tool calls. Without this check, an attacker who learns a tool
call UUID could access tool arguments that may contain sensitive data.

Changes:
- Add user_id parameter to ToolCall::get_by_uuid()
- Add user_id filter to the database query
- Update DBConnection trait and implementation
- Pass user_id from storage task to the lookup

This ensures tool calls can only be accessed by their owner.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
…ification

Add OpenSecretAdapter implementing DSRs Adapter trait to bridge our custom
LLM infrastructure (billing, routing, auth) with DSRs's structured prompting
framework. Create IntentClassifier and QueryExtractor modules following DSPy
patterns for cleaner, more maintainable code.

Changes:
- Implement OpenSecretAdapter: Custom Adapter that delegates formatting/parsing
  to ChatAdapter but uses our OpenSecretLM for actual LLM calls, bypassing the
  passed Arc<LM> parameter (necessary workaround until DSRs supports LM as trait)

- Create dspy_modules.rs: DSPy-style wrapper modules (IntentClassifier,
  QueryExtractor) that encapsulate signatures and provide clean domain APIs
  (classify(), extract()) while internally calling our custom adapter

- Refactor classify_and_execute_tools: Replace manual LM calls and message
  construction with clean module interfaces, reducing code by ~40 lines while
  improving readability and maintainability

- Add dummy LM creation: Create Arc<LM> instances to satisfy DSRs API type
  requirements, though our adapter ignores them in favor of OpenSecretLM

Architecture notes:
This is the closest we can get to idiomatic DSPy without upstream changes to
DSRs. We're not using Predict.forward_with_config() because it's hardcoded to
ChatAdapter. Instead, we call our OpenSecretAdapter directly, which implements
the same Adapter trait but uses our infrastructure.

This approach enables:
- Structured prompting with type-safe signatures
- Clean, testable module interfaces
- Reuse of existing billing/routing/auth infrastructure
- Future migration path when DSRs supports trait-based LMs

Limitations:
- Cannot use DSRs optimizers (GEPA, COPRO) yet - requires Predict modules
- Bypasses Predict abstraction by calling adapter.call() directly
- Creates unused dummy LM instances to satisfy type system

Related: Will propose making LM a trait in DSRs upstream to enable proper
integration with custom infrastructure while using standard Predict modules.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@coderabbitai
Copy link

coderabbitai bot commented Oct 17, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tool-use-dsrs

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR implements DSPy-style modules for tool use classification, migrating from manual prompt construction to a more structured approach using the DSRs library.

Key Changes

  • New DSPy Integration: Added dspy-rs dependency from main branch and created custom OpenSecretAdapter that bridges DSPy's API with OpenSecret's existing completions infrastructure
  • Intent Classification Module: Replaced manual prompt-based classification with IntentClassifier DSPy module that determines if user needs web search or chat
  • Query Extraction Module: Added QueryExtractor DSPy module to extract clean search queries from natural language
  • Maintained Infrastructure: Preserved existing billing, routing, retry logic, and auth handling by integrating DSPy calls through the centralized completions API
  • Backward Compatibility: Marked legacy prompt templates as deprecated but kept them in place

Architecture

The implementation uses a custom adapter pattern where:

  1. DSPy Signature defines input/output structure for classification and extraction
  2. OpenSecretAdapter implements the DSPy Adapter trait but ignores the provided LM parameter
  3. Instead, it creates OpenSecretLM that calls OpenSecret's existing get_chat_completion_response()
  4. This ensures all LLM calls go through centralized billing, routing, and error handling

Notable Design Decisions

  • Uses dummy LM instances to satisfy DSPy's API requirements, but the adapter ignores them
  • Defaults to "chat" mode if classification fails (best-effort approach)
  • Falls back to original user message if query extraction fails
  • Uses fast model (llama-3.3-70b) with temperature=0 for deterministic classification

Confidence Score: 3/5

  • Safe to merge with moderate risk - dependency on main branch and fallback behaviors need monitoring
  • Score reflects the dependency stability risk (using main branch instead of pinned version), potential overhead from repeated dummy LM initialization, and silent fallback behaviors that could mask failures. The core logic is sound and integrates well with existing infrastructure, but production monitoring will be important to catch classification failures and unexpected DSPy API changes.
  • Pay close attention to Cargo.toml (unstable dependency), dspy_modules.rs (repeated async initialization overhead), and handlers.rs (silent error fallbacks)

Important Files Changed

File Analysis

Filename Score Overview
Cargo.toml 4/5 Added dspy-rs dependency from main branch and anyhow error handling library
src/web/responses/dspy_adapter.rs 3/5 New custom DSPy adapter integrating with existing completions API infrastructure for billing and routing
src/web/responses/dspy_modules.rs 4/5 DSPy-style modules for intent classification and query extraction using OpenSecretAdapter
src/web/responses/handlers.rs 3/5 Replaced manual prompt construction with DSPy modules for intent classification and query extraction

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as handlers.rs
    participant Classifier as IntentClassifier
    participant Extractor as QueryExtractor
    participant Adapter as OpenSecretAdapter
    participant LM as OpenSecretLM
    participant API as Completions API
    participant Tools as tools::execute_tool
    
    Client->>Handler: POST /v1/responses (user message)
    Handler->>Handler: validate_and_normalize_input()
    Handler->>Handler: build_context_and_check_billing()
    Handler->>Handler: persist_request_data()
    
    Handler->>Classifier: classify(message)
    Classifier->>Adapter: call(dummy_lm, signature, input)
    Adapter->>Adapter: format(signature, inputs)
    Adapter->>LM: call(messages)
    LM->>API: get_chat_completion_response()
    API-->>LM: FullResponse
    LM-->>Adapter: LMResponse
    Adapter->>Adapter: parse_response()
    Adapter-->>Classifier: Prediction{intent}
    Classifier-->>Handler: "web_search" or "chat"
    
    alt intent == "web_search"
        Handler->>Extractor: extract(message)
        Extractor->>Adapter: call(dummy_lm, signature, input)
        Adapter->>LM: call(messages)
        LM->>API: get_chat_completion_response()
        API-->>LM: FullResponse
        LM-->>Adapter: LMResponse
        Adapter-->>Extractor: Prediction{search_query}
        Extractor-->>Handler: search query string
        
        Handler->>Handler: send ToolCall to streams
        Handler->>Tools: execute_tool("web_search", args)
        Tools-->>Handler: tool output
        Handler->>Handler: send ToolOutput to streams
    end
    
    Handler->>API: get_chat_completion_response() (final response)
    API-->>Handler: CompletionChunk stream
    Handler-->>Client: SSE events with assistant response
Loading

6 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +81 to +82
# Using main branch since v0.6.0 tag doesn't exist yet
dspy-rs = { git = "https://github.com/krypticmouse/DSRs", branch = "main" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: dependency points to main branch instead of a specific commit or version tag

Using main branch creates instability - any upstream changes will break your build without warning. Pin to specific commit hash or wait for v0.6.0 release.

Suggested change
# Using main branch since v0.6.0 tag doesn't exist yet
dspy-rs = { git = "https://github.com/krypticmouse/DSRs", branch = "main" }
# DSPy for structured prompting and optimization
dspy-rs = { git = "https://github.com/krypticmouse/DSRs", rev = "47ca636086403344b91fc4a90108f99a27101ae6" }
Prompt To Fix With AI
This is a comment left during a code review.
Path: Cargo.toml
Line: 81:82

Comment:
**logic:** dependency points to `main` branch instead of a specific commit or version tag

Using `main` branch creates instability - any upstream changes will break your build without warning. Pin to specific commit hash or wait for v0.6.0 release.

```suggestion
# DSPy for structured prompting and optimization
dspy-rs = { git = "https://github.com/krypticmouse/DSRs", rev = "47ca636086403344b91fc4a90108f99a27101ae6" }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +48 to +54
// We need a real LM instance to satisfy the API, but our adapter ignores it
let dummy_lm = Arc::new(
LM::builder()
.api_key("dummy_key".into())
.build()
.await,
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: dummy LM created with hardcoded "dummy_key" on every classifier instantiation

The adapter ignores this LM, but creating it with .await suggests it may perform initialization work (network calls, validation, etc.). Consider using lazy_static or caching to avoid repeated initialization overhead.

Also, if the LM builder validates the API key format, "dummy_key" might fail in future versions.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_modules.rs
Line: 48:54

Comment:
**style:** dummy LM created with hardcoded `"dummy_key"` on every classifier instantiation

The adapter ignores this LM, but creating it with `.await` suggests it may perform initialization work (network calls, validation, etc.). Consider using `lazy_static` or caching to avoid repeated initialization overhead.

Also, if the LM builder validates the API key format, `"dummy_key"` might fail in future versions.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +136 to +142
// We need a real LM instance to satisfy the API, but our adapter ignores it
let dummy_lm = Arc::new(
LM::builder()
.api_key("dummy_key".into())
.build()
.await,
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: duplicate dummy LM initialization logic

Same concern as IntentClassifier - repeated .await calls may have overhead. Consider extracting to a shared get_dummy_lm() function or using a static instance.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_modules.rs
Line: 136:142

Comment:
**style:** duplicate dummy LM initialization logic

Same concern as IntentClassifier - repeated `.await` calls may have overhead. Consider extracting to a shared `get_dummy_lm()` function or using a static instance.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +105 to +121
prompt_tokens: usage_json
.get("prompt_tokens")
.and_then(|v| v.as_i64())
.unwrap_or(0) as u32,
completion_tokens: usage_json
.get("completion_tokens")
.and_then(|v| v.as_i64())
.unwrap_or(0) as u32,
total_tokens: usage_json
.get("total_tokens")
.and_then(|v| v.as_i64())
.unwrap_or(0) as u32,
reasoning_tokens: Some(0),
}
} else {
LmUsage::default()
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: token count conversions use as u32 without overflow checks

If token counts exceed u32::MAX (4.2B), this will silently wrap. While unlikely for current models, consider using .try_into() with proper error handling or document the assumption that token counts fit in u32.

DSPy v0.6.0 uses u32 for tokens, but OpenAI API returns i64, creating a type mismatch that could cause issues with future high-context models.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_adapter.rs
Line: 105:121

Comment:
**style:** token count conversions use `as u32` without overflow checks

If token counts exceed `u32::MAX` (4.2B), this will silently wrap. While unlikely for current models, consider using `.try_into()` with proper error handling or document the assumption that token counts fit in u32.

DSPy v0.6.0 uses u32 for tokens, but OpenAI API returns i64, creating a type mismatch that could cause issues with future high-context models.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1208 to 1220
let classifier = IntentClassifier::new(state.clone(), user.clone()).await;

let intent = match classifier.classify(&user_text).await {
Ok(intent) => {
debug!("Classified intent: {}", intent);
intent
}
Err(e) => {
// Best effort - if classification fails, default to chat
warn!("Classification failed (defaulting to chat): {:?}", e);
warn!("DSPy classification failed (defaulting to chat): {:?}", e);
"chat".to_string()
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: classification failure defaults to "chat" silently

If the DSPy call fails (network, quota, model error), the system defaults to chat mode without surfacing the error to the user. This means users won't get web search even when they need it. Consider logging at error! level (currently warn!) or adding metrics to track classification failures.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/handlers.rs
Line: 1208:1220

Comment:
**style:** classification failure defaults to `"chat"` silently

If the DSPy call fails (network, quota, model error), the system defaults to chat mode without surfacing the error to the user. This means users won't get web search even when they need it. Consider logging at `error!` level (currently `warn!`) or adding metrics to track classification failures.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1229 to 1239
let search_query = match extractor.extract(&user_text).await {
Ok(query) => {
trace!("Extracted search query: {}", query);
debug!("Search query extracted successfully");
query
}
Err(e) => {
warn!("Query extraction failed, using original message: {:?}", e);
warn!("DSPy query extraction failed, using original message: {:?}", e);
user_text.clone()
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: query extraction failure falls back to original user_text

If extraction fails, the entire user message (potentially multi-sentence) becomes the search query. This could result in poor search results. Consider either failing the search or extracting key terms with a simpler heuristic as backup.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/handlers.rs
Line: 1229:1239

Comment:
**style:** query extraction failure falls back to original `user_text`

If extraction fails, the entire user message (potentially multi-sentence) becomes the search query. This could result in poor search results. Consider either failing the search or extracting key terms with a simpler heuristic as backup.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +196 to +197
&self,
_lm: Arc<LM>, // Ignored - we use our own LM
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: ignored lm parameter with comment explanation

This design creates tight coupling between DSPy's API expectations and OpenSecret's infrastructure. If DSPy changes its adapter API to require using the provided LM, this will break. Consider contributing upstream to make the LM parameter optional or adding integration tests to catch API changes early.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/responses/dspy_adapter.rs
Line: 196:197

Comment:
**style:** ignored `lm` parameter with comment explanation

This design creates tight coupling between DSPy's API expectations and OpenSecret's infrastructure. If DSPy changes its adapter API to require using the provided LM, this will break. Consider contributing upstream to make the LM parameter optional or adding integration tests to catch API changes early.

How can I resolve this? If you propose a fix, please make it concise.

@AnthonyRonning AnthonyRonning force-pushed the feat/tool-use-classifier-search branch 5 times, most recently from 47ef77d to 73cb442 Compare October 21, 2025 15:53
Base automatically changed from feat/tool-use-classifier-search to master October 21, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant