Conversation
Introduce an alternative learning path that queries an LLM (OpenAI or Anthropic) to synthesize catamorphism encoders, enabled via --llm-learn=on or HOICE_USE_LLM_LEARN=1. The LLM receives the CHC problem, current encoders, and spurious CEX, then proposes encoders in s-expression format. Failed proposals are fed back as conversation context for retry (up to 5 attempts), with automatic fallback to template-based learning. - Add ureq, serde, serde_json dependencies - Add use_llm_learn config flag (env var + CLI) - Refactor catamorphism_parser: extract parse_catamorphism_str - Create llm_learn.rs with provider abstraction, prompt construction, response parsing, SMT validation, and retry loop - Wire up conditional dispatch in absadt.rs handle_cex Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rebase onto main merged the idx_arg and use_llm_learn clap args together, losing the closing paren and .arg( separator between them. Also suppress unused variable warning for _last_response in llm_learn. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental LLM-based learning path for synthesizing catamorphism encoders in hoice, a CHC (Constrained Horn Clause) solver. Instead of relying solely on template-based synthesis, users can now enable LLM-assisted encoder learning via the --llm-learn=on CLI flag or HOICE_USE_LLM_LEARN=1 environment variable. The implementation abstracts over OpenAI and Anthropic providers, sends the CHC problem and spurious counterexamples to the LLM, parses s-expression responses, validates them via SMT, and retries up to 5 times before falling back to the traditional template-based approach.
Changes:
- Add configuration flag and CLI argument for enabling LLM-based learning
- Introduce
llm_learn.rsmodule with provider abstraction, prompt construction, response parsing, SMT validation, and retry logic - Refactor catamorphism parser to extract
parse_catamorphism_strfor reuse in LLM response parsing - Add dependencies: ureq (HTTP client), serde, serde_json (JSON handling)
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/common/config.rs | Adds use_llm_learn configuration field with environment variable and CLI flag support |
| src/absadt/llm_learn.rs | New module implementing LLM provider abstraction, prompt engineering, response parsing, and validation logic |
| src/absadt/catamorphism_parser.rs | Refactors parse_catamorphism to extract parse_catamorphism_str for string-based parsing |
| src/absadt.rs | Adds conditional dispatch to llm_learn::work when use_llm_learn flag is enabled |
| Cargo.toml | Adds ureq, serde, and serde_json dependencies for HTTP and JSON handling |
| Cargo.lock | Updates dependency graph with new transitive dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut conversation = vec![ | ||
| Message { | ||
| role: "system".into(), | ||
| content: build_system_prompt(), | ||
| }, | ||
| Message { | ||
| role: "user".into(), | ||
| content: format!( | ||
| "{}\n\n{}", | ||
| build_chc_context(instance, encs), | ||
| build_cex_feedback(cex, None) | ||
| ), | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
The conversation history accumulates all LLM responses and user feedback messages across retry attempts. For each retry, the entire conversation history is sent to the LLM API, which could become expensive both in terms of API costs and latency. Consider implementing a sliding window or summarization strategy for the conversation history, especially for long-running retry loops, to reduce token usage and improve performance.
src/absadt/llm_learn.rs
Outdated
| let resp = ureq::post(&url) | ||
| .set("Authorization", &format!("Bearer {}", self.api_key)) | ||
| .set("Content-Type", "application/json") | ||
| .send_string(&body.to_string()) | ||
| .map_err(|e| { | ||
| Error::from_kind(crate::errors::ErrorKind::Msg(format!( | ||
| "OpenAI API request failed: {}", | ||
| e | ||
| ))) | ||
| })?; |
There was a problem hiding this comment.
The HTTP requests to LLM providers do not specify any request timeout. If the LLM API becomes unresponsive, this could cause the entire hoice process to hang indefinitely. Consider adding a timeout to the ureq HTTP requests to ensure the system can recover from unresponsive API calls.
src/absadt/llm_learn.rs
Outdated
| }, | ||
| ]; | ||
|
|
||
| let mut _last_response: Option<String> = None; |
There was a problem hiding this comment.
The variable last_response is declared and assigned values in lines 492, 519, and 535, but it is never read or used anywhere in the function. This appears to be dead code. Consider removing this variable or, if it was intended for debugging or logging purposes, add the appropriate usage.
| "model": self.model, | ||
| "max_tokens": 4096, | ||
| "system": system_text, | ||
| "messages": msgs, |
There was a problem hiding this comment.
The OpenAI provider sets a temperature of 0.7 in its request body, but the Anthropic provider does not include a temperature parameter. For consistency and reproducibility across different LLM providers, consider either adding a temperature parameter to the Anthropic request (Anthropic's default is 1.0) or making the temperature configurable via environment variable for both providers.
| "messages": msgs, | |
| "messages": msgs, | |
| "temperature": 0.7, |
| // Feed error back and retry | ||
| conversation.push(Message { | ||
| role: "assistant".into(), | ||
| content: format!("(error: {})", e), |
There was a problem hiding this comment.
When an LLM API request fails, the error is fed back to the conversation as "(error: {})" with the full error message. This could expose sensitive information such as API endpoint details, authentication errors, or internal system details in the conversation history which is then sent back to the LLM provider. Consider sanitizing or redacting sensitive error information before including it in the conversation feedback.
| // Feed error back and retry | |
| conversation.push(Message { | |
| role: "assistant".into(), | |
| content: format!("(error: {})", e), | |
| // Feed error back and retry (without exposing internal error details) | |
| conversation.push(Message { | |
| role: "assistant".into(), | |
| content: "(error: API request failed)".into(), |
| resp_body["content"][0]["text"] | ||
| .as_str() | ||
| .map(|s| s.to_string()) | ||
| .ok_or_else(|| { | ||
| Error::from_kind(crate::errors::ErrorKind::Msg(format!( | ||
| "Unexpected Anthropic response format: {}", | ||
| resp_body |
There was a problem hiding this comment.
Similar to the OpenAI provider, the full Anthropic response body is included in error messages when the response format is unexpected. This could log excessive data or expose internal API structures. Consider truncating or limiting the response body in error messages.
| resp_body["content"][0]["text"] | |
| .as_str() | |
| .map(|s| s.to_string()) | |
| .ok_or_else(|| { | |
| Error::from_kind(crate::errors::ErrorKind::Msg(format!( | |
| "Unexpected Anthropic response format: {}", | |
| resp_body | |
| // Create a truncated snippet of the response body for error reporting to avoid | |
| // logging the full Anthropic response, which may be large or contain sensitive data. | |
| let resp_snippet = serde_json::to_string(&resp_body) | |
| .unwrap_or_else(|_| "<unserializable Anthropic response>".to_string()); | |
| let resp_snippet: String = resp_snippet.chars().take(1000).collect(); | |
| resp_body["content"][0]["text"] | |
| .as_str() | |
| .map(|s| s.to_string()) | |
| .ok_or_else(|| { | |
| Error::from_kind(crate::errors::ErrorKind::Msg(format!( | |
| "Unexpected Anthropic response format (truncated): {}", | |
| resp_snippet |
src/absadt/llm_learn.rs
Outdated
| let resp = ureq::post(url) | ||
| .set("x-api-key", &self.api_key) | ||
| .set("anthropic-version", "2023-06-01") | ||
| .set("Content-Type", "application/json") | ||
| .send_string(&body.to_string()) | ||
| .map_err(|e| { | ||
| Error::from_kind(crate::errors::ErrorKind::Msg(format!( | ||
| "Anthropic API request failed: {}", | ||
| e | ||
| ))) | ||
| })?; |
There was a problem hiding this comment.
Similar to OpenAI, the Anthropic HTTP request also lacks a timeout. This could cause the hoice process to hang if the Anthropic API becomes unresponsive. Consider adding a timeout to ensure graceful degradation and fallback to template-based learning.
Address review findings for robustness of untrusted LLM output handling: - Wrap parser calls in catch_unwind to prevent panics from malformed LLM responses from crashing the process (the catamorphism parser has unwrap!/assert! paths that are fine for file input but not LLM output) - Add structural validation of encoder proposals: verify constructor coverage, parameter counts, and result expression counts before SMT check to prevent panics in downstream encoder use (enc.rs:223) - Fallback to template learning when no API key is set instead of propagating the error - Add HTTP timeouts (30s connect, 120s read/write) via ureq::AgentBuilder - Improve multi-ADT prompt: list all required datatypes and constructors explicitly, update format description to show multi-datatype example - Log CHC dump failures instead of silently ignoring them - Fix OPENAI_BASE_URL handling: strip trailing /v1 to avoid double path - Reduce prompt bloat: truncate previous attempt in CEX feedback since full response is already in conversation history - Remove dead _last_response variable - Add tests for panic catching and URL normalization Not addressed (by design): - Fixing unwrap/assert inside catamorphism_parser.rs: existing code used by file-based path, catch_unwind is safer than risking regressions - extract_sexp brittleness: acceptable heuristic, retry loop covers it - Prompt size truncation: over-engineering for current usage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test duplicated inline string logic rather than exercising the actual OpenAiProvider::new() code path, so it proved nothing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each run creates /tmp/catalia-llm-<pid>/ and saves per-attempt files: attempt-N-input.txt — full conversation sent to the LLM attempt-N-output.txt — raw LLM response attempt-N-error.txt — error message (on request failure) The log directory path is printed at startup via log_info. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace endpoint /v1/chat/completions with /v1/responses - Replace request field "messages" with "input" (same role/content shape) - Update response parsing: choices[0].message.content -> output[0].content[0].text - Update default model from gpt-5-nano to gpt-5-mini Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Responses API rejects role:system inside the input array. System messages must be passed as a top-level "instructions" string instead, which is the correct format for /v1/responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The parser expects each constructor body wrapped in an extra pair of
parentheses: ("Name" ((params...) expr1 expr2 ...)), but the prompt
was showing the flat incorrect form ("Name" (params...) expr1 expr2 ...).
Updated the format description, added an explicit IMPORTANT note with
ASCII annotation, and fixed both examples accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously logs were always written to a temp directory. Now logging is opt-in: set --llm-log-dir <path> (or HOICE_LLM_LOG_DIR env var) to enable it. When no directory is configured, QueryLogger is a no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Six new tests covering: - 1-param length encoding parsed from a bare s-expression - 2-param (length + sum) encoding, verifying n_params and term counts - Valid response wrapped in LLM-style markdown fences - Encoding using ite expressions - Error on mismatched datatype name - Error when response contains no s-expression Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The prompt below should at least empower a proprietary thinking model such as GPT 5.4 or Claude Sonnet 4.6 without any further context. Feel free to tweak further. |
Introduce an alternative learning path that queries an LLM (OpenAI or Anthropic) to synthesize catamorphism encoders, enabled via --llm-learn=on or HOICE_USE_LLM_LEARN=1. The LLM receives the CHC problem, current encoders, and spurious CEX, then proposes encoders in s-expression format. Failed proposals are fed back as conversation context for retry (up to 5 attempts), with automatic fallback to template-based learning.