Skip to content

Add a learner using LLMs#22

Open
moratorium08 wants to merge 16 commits intomainfrom
feat/llm-learn
Open

Add a learner using LLMs#22
moratorium08 wants to merge 16 commits intomainfrom
feat/llm-learn

Conversation

@moratorium08
Copy link
Copy Markdown
Collaborator

Introduce an alternative learning path that queries an LLM (OpenAI or Anthropic) to synthesize catamorphism encoders, enabled via --llm-learn=on or HOICE_USE_LLM_LEARN=1. The LLM receives the CHC problem, current encoders, and spurious CEX, then proposes encoders in s-expression format. Failed proposals are fed back as conversation context for retry (up to 5 attempts), with automatic fallback to template-based learning.

  • Add ureq, serde, serde_json dependencies
  • Add use_llm_learn config flag (env var + CLI)
  • Refactor catamorphism_parser: extract parse_catamorphism_str
  • Create llm_learn.rs with provider abstraction, prompt construction, response parsing, SMT validation, and retry loop
  • Wire up conditional dispatch in absadt.rs handle_cex

Introduce an alternative learning path that queries an LLM (OpenAI or
Anthropic) to synthesize catamorphism encoders, enabled via --llm-learn=on
or HOICE_USE_LLM_LEARN=1. The LLM receives the CHC problem, current
encoders, and spurious CEX, then proposes encoders in s-expression format.
Failed proposals are fed back as conversation context for retry (up to 5
attempts), with automatic fallback to template-based learning.

- Add ureq, serde, serde_json dependencies
- Add use_llm_learn config flag (env var + CLI)
- Refactor catamorphism_parser: extract parse_catamorphism_str
- Create llm_learn.rs with provider abstraction, prompt construction,
  response parsing, SMT validation, and retry loop
- Wire up conditional dispatch in absadt.rs handle_cex

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 26, 2026 17:03
The rebase onto main merged the idx_arg and use_llm_learn clap args
together, losing the closing paren and .arg( separator between them.
Also suppress unused variable warning for _last_response in llm_learn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@moratorium08 moratorium08 changed the title Feat/llm learn Add a learner using LLMs Feb 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental LLM-based learning path for synthesizing catamorphism encoders in hoice, a CHC (Constrained Horn Clause) solver. Instead of relying solely on template-based synthesis, users can now enable LLM-assisted encoder learning via the --llm-learn=on CLI flag or HOICE_USE_LLM_LEARN=1 environment variable. The implementation abstracts over OpenAI and Anthropic providers, sends the CHC problem and spurious counterexamples to the LLM, parses s-expression responses, validates them via SMT, and retries up to 5 times before falling back to the traditional template-based approach.

Changes:

  • Add configuration flag and CLI argument for enabling LLM-based learning
  • Introduce llm_learn.rs module with provider abstraction, prompt construction, response parsing, SMT validation, and retry logic
  • Refactor catamorphism parser to extract parse_catamorphism_str for reuse in LLM response parsing
  • Add dependencies: ureq (HTTP client), serde, serde_json (JSON handling)

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/common/config.rs Adds use_llm_learn configuration field with environment variable and CLI flag support
src/absadt/llm_learn.rs New module implementing LLM provider abstraction, prompt engineering, response parsing, and validation logic
src/absadt/catamorphism_parser.rs Refactors parse_catamorphism to extract parse_catamorphism_str for string-based parsing
src/absadt.rs Adds conditional dispatch to llm_learn::work when use_llm_learn flag is enabled
Cargo.toml Adds ureq, serde, and serde_json dependencies for HTTP and JSON handling
Cargo.lock Updates dependency graph with new transitive dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +435 to +448
let mut conversation = vec![
Message {
role: "system".into(),
content: build_system_prompt(),
},
Message {
role: "user".into(),
content: format!(
"{}\n\n{}",
build_chc_context(instance, encs),
build_cex_feedback(cex, None)
),
},
];
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversation history accumulates all LLM responses and user feedback messages across retry attempts. For each retry, the entire conversation history is sent to the LLM API, which could become expensive both in terms of API costs and latency. Consider implementing a sliding window or summarization strategy for the conversation history, especially for long-running retry loops, to reduce token usage and improve performance.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +82
let resp = ureq::post(&url)
.set("Authorization", &format!("Bearer {}", self.api_key))
.set("Content-Type", "application/json")
.send_string(&body.to_string())
.map_err(|e| {
Error::from_kind(crate::errors::ErrorKind::Msg(format!(
"OpenAI API request failed: {}",
e
)))
})?;
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTP requests to LLM providers do not specify any request timeout. If the LLM API becomes unresponsive, this could cause the entire hoice process to hang indefinitely. Consider adding a timeout to the ureq HTTP requests to ensure the system can recover from unresponsive API calls.

Copilot uses AI. Check for mistakes.
},
];

let mut _last_response: Option<String> = None;
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable last_response is declared and assigned values in lines 492, 519, and 535, but it is never read or used anywhere in the function. This appears to be dead code. Consider removing this variable or, if it was intended for debugging or logging purposes, add the appropriate usage.

Copilot uses AI. Check for mistakes.
"model": self.model,
"max_tokens": 4096,
"system": system_text,
"messages": msgs,
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenAI provider sets a temperature of 0.7 in its request body, but the Anthropic provider does not include a temperature parameter. For consistency and reproducibility across different LLM providers, consider either adding a temperature parameter to the Anthropic request (Anthropic's default is 1.0) or making the temperature configurable via environment variable for both providers.

Suggested change
"messages": msgs,
"messages": msgs,
"temperature": 0.7,

Copilot uses AI. Check for mistakes.
Comment on lines +460 to +463
// Feed error back and retry
conversation.push(Message {
role: "assistant".into(),
content: format!("(error: {})", e),
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an LLM API request fails, the error is fed back to the conversation as "(error: {})" with the full error message. This could expose sensitive information such as API endpoint details, authentication errors, or internal system details in the conversation history which is then sent back to the LLM provider. Consider sanitizing or redacting sensitive error information before including it in the conversation feedback.

Suggested change
// Feed error back and retry
conversation.push(Message {
role: "assistant".into(),
content: format!("(error: {})", e),
// Feed error back and retry (without exposing internal error details)
conversation.push(Message {
role: "assistant".into(),
content: "(error: API request failed)".into(),

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +182
resp_body["content"][0]["text"]
.as_str()
.map(|s| s.to_string())
.ok_or_else(|| {
Error::from_kind(crate::errors::ErrorKind::Msg(format!(
"Unexpected Anthropic response format: {}",
resp_body
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the OpenAI provider, the full Anthropic response body is included in error messages when the response format is unexpected. This could log excessive data or expose internal API structures. Consider truncating or limiting the response body in error messages.

Suggested change
resp_body["content"][0]["text"]
.as_str()
.map(|s| s.to_string())
.ok_or_else(|| {
Error::from_kind(crate::errors::ErrorKind::Msg(format!(
"Unexpected Anthropic response format: {}",
resp_body
// Create a truncated snippet of the response body for error reporting to avoid
// logging the full Anthropic response, which may be large or contain sensitive data.
let resp_snippet = serde_json::to_string(&resp_body)
.unwrap_or_else(|_| "<unserializable Anthropic response>".to_string());
let resp_snippet: String = resp_snippet.chars().take(1000).collect();
resp_body["content"][0]["text"]
.as_str()
.map(|s| s.to_string())
.ok_or_else(|| {
Error::from_kind(crate::errors::ErrorKind::Msg(format!(
"Unexpected Anthropic response format (truncated): {}",
resp_snippet

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +167
let resp = ureq::post(url)
.set("x-api-key", &self.api_key)
.set("anthropic-version", "2023-06-01")
.set("Content-Type", "application/json")
.send_string(&body.to_string())
.map_err(|e| {
Error::from_kind(crate::errors::ErrorKind::Msg(format!(
"Anthropic API request failed: {}",
e
)))
})?;
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to OpenAI, the Anthropic HTTP request also lacks a timeout. This could cause the hoice process to hang if the Anthropic API becomes unresponsive. Consider adding a timeout to ensure graceful degradation and fallback to template-based learning.

Copilot uses AI. Check for mistakes.
claude and others added 14 commits February 26, 2026 21:24
Address review findings for robustness of untrusted LLM output handling:

- Wrap parser calls in catch_unwind to prevent panics from malformed
  LLM responses from crashing the process (the catamorphism parser has
  unwrap!/assert! paths that are fine for file input but not LLM output)
- Add structural validation of encoder proposals: verify constructor
  coverage, parameter counts, and result expression counts before SMT
  check to prevent panics in downstream encoder use (enc.rs:223)
- Fallback to template learning when no API key is set instead of
  propagating the error
- Add HTTP timeouts (30s connect, 120s read/write) via ureq::AgentBuilder
- Improve multi-ADT prompt: list all required datatypes and constructors
  explicitly, update format description to show multi-datatype example
- Log CHC dump failures instead of silently ignoring them
- Fix OPENAI_BASE_URL handling: strip trailing /v1 to avoid double path
- Reduce prompt bloat: truncate previous attempt in CEX feedback since
  full response is already in conversation history
- Remove dead _last_response variable
- Add tests for panic catching and URL normalization

Not addressed (by design):
- Fixing unwrap/assert inside catamorphism_parser.rs: existing code used
  by file-based path, catch_unwind is safer than risking regressions
- extract_sexp brittleness: acceptable heuristic, retry loop covers it
- Prompt size truncation: over-engineering for current usage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test duplicated inline string logic rather than exercising the
actual OpenAiProvider::new() code path, so it proved nothing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each run creates /tmp/catalia-llm-<pid>/ and saves per-attempt files:
  attempt-N-input.txt  — full conversation sent to the LLM
  attempt-N-output.txt — raw LLM response
  attempt-N-error.txt  — error message (on request failure)

The log directory path is printed at startup via log_info.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace endpoint /v1/chat/completions with /v1/responses
- Replace request field "messages" with "input" (same role/content shape)
- Update response parsing: choices[0].message.content -> output[0].content[0].text
- Update default model from gpt-5-nano to gpt-5-mini

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Responses API rejects role:system inside the input array.
System messages must be passed as a top-level "instructions" string
instead, which is the correct format for /v1/responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The parser expects each constructor body wrapped in an extra pair of
parentheses: ("Name" ((params...) expr1 expr2 ...)), but the prompt
was showing the flat incorrect form ("Name" (params...) expr1 expr2 ...).
Updated the format description, added an explicit IMPORTANT note with
ASCII annotation, and fixed both examples accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously logs were always written to a temp directory. Now logging is
opt-in: set --llm-log-dir <path> (or HOICE_LLM_LOG_DIR env var) to
enable it. When no directory is configured, QueryLogger is a no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Six new tests covering:
- 1-param length encoding parsed from a bare s-expression
- 2-param (length + sum) encoding, verifying n_params and term counts
- Valid response wrapped in LLM-style markdown fences
- Encoding using ite expressions
- Error on mismatched datatype name
- Error when response contains no s-expression

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@minchaowu
Copy link
Copy Markdown

minchaowu commented Mar 15, 2026

The prompt below should at least empower a proprietary thinking model such as GPT 5.4 or Claude Sonnet 4.6 without any further context. Feel free to tweak further.

You are solving a **Constrained Horn Clause (CHC)** problem over **algebraic data types (ADTs)** and integer arithmetic. I want a **catamorphism-based solution** in the spirit of Katsura et al., *Automated Catamorphism Synthesis for Solving Constrained Horn Clauses over Algebraic Data Types*.

Your task is to analyze the given SMT-LIB HORN instance and do the following.

## What to produce

1. Decide whether the CHCs are **SAT** or **UNSAT**.
2. If **SAT**, produce:
   - an **intuitive / intended model** for the predicates if possible,
   - a suitable **catamorphism** from each relevant ADT to integers or tuples of integers,
   - the corresponding **range / admissibility predicate** for the image of the catamorphism,
   - an **abstract integer-domain model** for the predicates after applying the catamorphism,
   - the induced **original ADT-domain model** obtained by lifting back through the catamorphism.
3. If **UNSAT**, explain the contradiction in the **original CHCs**, not only in the abstraction.

---

## High-level overview of the paper's approach / pipeline

The core problem is that CHCs over ADTs often require models that are naturally stated using inductive functions such as list length, list sum, alternating sum, or numeric value of Peano naturals. Standard CHC solvers are often much better at solving CHCs over **integers** than over ADTs directly.

The paper's key idea is therefore:

1. **Choose a catamorphism** `C` from the ADT to integers or tuples of integers.
   - A catamorphism is a fold-like summary map defined by one structure map per constructor.
   - Example patterns:
     - `Nat`: `Z -> 0`, `S(x) -> x + 1`
     - `List`: `nil -> 0`, `cons(x,l) -> 1 + l` (length)
     - `List`: `nil -> 0`, `cons(x,l) -> x + l` (sum)
     - `List`: `nil -> 0`, `cons(x,l) -> x - l` (alternating sum)
     - tuple-valued folds when one scalar is insufficient.

2. **Augment the original CHCs with an admissibility predicate** `P_delta` for each ADT sort.
   - This predicate represents “being in the image / being a valid abstractable value”.
   - This matters because abstraction without range restrictions is often too coarse and can introduce spurious counterexamples.
   - In the abstract system, every universally quantified ADT variable is replaced by an integer variable together with a condition such as `P_delta(x)`.

3. **Abstract the CHCs into integer CHCs** by replacing each constructor application with the corresponding structure map of the catamorphism.
   - ADT equality is abstracted soundly by equality of catamorphism images.
   - The result is a CHC system over integer arithmetic, often much easier to solve.

4. **Solve the abstract CHCs**.
   - If the abstract system is SAT, then by soundness the original CHCs are SAT.
   - A model for the original predicates can be obtained by composing the abstract model with the catamorphism.

5. **If the abstract system is UNSAT**, check whether the UNSAT proof corresponds to a real contradiction in the original CHCs.
   - Extract a candidate counterexample / proof obligation from the abstract UNSAT derivation.
   - If that counterexample is genuinely satisfiable in the original ADT world, conclude **UNSAT**.
   - Otherwise it is **spurious**, meaning the current catamorphism was too weak.

6. **Refine the catamorphism**.
   - Search for a better catamorphism using templates, usually linear or affine folds first.
   - The paper uses a **counterexample-guided template synthesis** loop:
     - abstract,
     - solve,
     - extract spurious counterexample if needed,
     - strengthen the catamorphism so the same spurious proof becomes impossible,
     - repeat.

7. **Prefer simple catamorphisms first**, then richer tuple-valued ones if needed.
   - The paper uses a sequence of templates of increasing expressiveness.
   - In practice, many examples are solved by a small scalar or 2D/3D fold.

In short, the pipeline is:

**guess / synthesize fold -> add range predicate -> abstract to integer CHCs -> solve -> if spurious UNSAT, refine fold -> otherwise lift model back**.

---

## Important requirement: output the catamorphism in Catalia-readable format

When you present the catamorphism, do **not** only describe it informally. Also output it in the **Catalia-style readable S-expression format** below.

The format should look like this:

- The whole catamorphism is a list of ADT-sort entries.
- Each ADT-sort entry has the form:
  - `( "SortName" ...constructor-entries... )`
- Each constructor entry has the form:
  - `( "CtorName" ( (arg1 arg2 ...) body1 body2 ... ) )`
- The argument list contains:
  - first, ordinary non-recursive constructor arguments,
  - then, for recursive ADT arguments, the already-catamorphically-mapped components.
- A scalar catamorphism returns one body expression.
- A tuple-valued catamorphism returns multiple body expressions, one per component.
- Add short comments `;; ...` to explain the meaning of tuple components when helpful.

Use exactly this style when possible.

### Example 1

```lisp
(
  ;; Bool_265 |-> (0/1)
  ( "Bool_265"
    ( "false_265"
      ( ()
        0
      )
    )
    ( "true_265"
      ( ()
        1
      )
    )
  )

  ;; list_194 |-> (ok, first, emp)
  ;; nil_220 |-> (1, 0, 1)
  ;; cons_194(x, t) : x |-> x,  t |-> (ok, first, emp)
  ;;                |-> ( ok * (ite (= emp 1) 1 (ite (<= x first) 1 0)), x, 0 )
  ( "list_194"
    ( "cons_194"
      ( (x ok first emp)
        (* ok (ite (= emp 1) 1 (ite (<= x first) 1 0)))
        x
        0
      )
    )
    ( "nil_220"
      ( ()
        1
        0
        1
      )
    )
  )

  ;; pair_82 |-> (b, ok, first, emp)
  ;; pair_83(b,l): b |-> b, l |-> (ok,first,emp)
  ( "pair_82"
    ( "pair_83"
      ( (b ok first emp)
        b
        ok
        first
        emp
      )
    )
  )
)
```

### Example 2

```lisp
(
  ( "Nat_0"
    ( "S_0"
      ( (x)
        (+ x 1)
      )
    )
    ( "Z_0"
      ( ()
        0
      )
    )
  )

  ;; list_0 |-> int (length)
  ( "list_0"
    ( "cons_0"
      ( (x y)
        (+ y 1)
      )
    )
    ( "nil_0"
      ( ()
        0
      )
    )
  )
)
```

When solving the actual problem, after your normal mathematical explanation, include a **Catalia-style catamorphism block** in exactly this spirit.

---

## Key ideas you should actively use while solving

Follow the methodology below, not just generic CHC reasoning.

### A. First infer the semantic role of each predicate

Try to understand what each predicate is tracking:
- arithmetic relation between arguments,
- structural size,
- length of a list,
- sum of elements,
- alternating sum,
- parity,
- relation between two accumulators,
- generated family of ADT values,
- safety property in the final `false` clause.

Many CHC predicates correspond to a fold-derived invariant.

### B. Prefer simple folds first

Start with simple, interpretable catamorphisms:
- natural numbers -> numeric value / size,
- lists -> length,
- lists -> sum,
- lists -> alternating sum,
- tuples such as `(length, sum)`, `(sum_even_minus_odd, length)`, etc.

If one scalar summary is not enough, try a **tuple-valued catamorphism**.

### C. Use linear / affine templates first

A very common successful pattern is a linear catamorphism:
- `C(nil) = d`
- `C(cons(x,l)) = a*C(l) + b*x + c`

or tuple combinations of these.

For naturals:
- `C(Z) = a`
- `C(S(n)) = b*C(n) + c`

Try low-complexity coefficients first.

### D. Always think about the range predicate

Do **not** forget the image restriction / admissibility predicate.

Without it, the abstraction may be too coarse and may yield false contradictions or incorrect arithmetic states that no concrete ADT term can realize.

When the image is all integers, the range predicate may simplify to `true`; when the image is restricted, spell that out explicitly.

### E. Clause-by-clause validation matters

Once you propose a catamorphism and abstract model, check **every Horn clause** carefully.

For SAT cases, verify that each abstract clause is satisfied by your abstract predicate definitions.

For UNSAT cases, identify the concrete contradiction in the original clauses.

### F. Lift the abstract model back explicitly

If the abstract model defines e.g.
- `P_abs(u,v,w) := u + v = w`

and the ADT arguments are abstracted by `C`, then the original model should be written as
- `P(x,y,z) := P_abs(C(x), C(y), C(z))`

or equivalently with `C(x), C(y), ...` substituted directly.

---

## Heuristics that often work on these tasks

Use these aggressively.

- For Peano naturals, the right catamorphism is often simply the numeric value / size.
- For list-processing CHCs, the hidden invariant is often one of:
  - length,
  - sum,
  - alternating sum,
  - parity,
  - a pair/triple of such measures.
- Generator predicates often characterize a family of ADTs with a very simple abstract image.
- Safety clauses often directly reveal the needed invariant:
  - `false <- ...` frequently says exactly what arithmetic relation must hold.
- Recurrences like
  - `P(cons(x,l), x+n, m) <- P(l, m, n)`
  often suggest alternating-sum or accumulator-swap structure.
- If the abstraction becomes UNSAT too early, suspect that the catamorphism is too weak rather than the original problem being UNSAT.
- Equality on ADTs becomes equality of summaries only as a **sound abstraction**; do not silently treat it as full equivalence.

---

## More expectations

When solving the instance, follow this style:

1. First try to give the **best / intuitive / intended model**.
2. Then try to produce a **non-recursive / HoIce-friendly** formulation by over-approximation.
3. Prefer **simple CHC / catamorphism solutions** when possible.
4. If several catamorphisms are plausible, try the smallest one first, then refine only if needed.
5. Be explicit about the **range predicate** and the **lifting step**.
6. When possible, give formulas that can actually be checked by tools such as HoIce.
7. Reason carefully clause-by-clause; do not stop at vague intuition.

---

## Required output format

Use exactly this structure.

### 1. Verdict
State `SAT` or `UNSAT`.

### 2. High-level idea
Briefly explain the invariant and why a catamorphism is appropriate.

### 3. Chosen catamorphism
Define the catamorphism precisely for each constructor.
Also include a **Catalia-style catamorphism block**.
If tuple-valued, clearly name each component.

### 4. Range / admissibility predicate
Define the image predicate `P_delta`.
If it simplifies to `true`, say so and justify briefly.

### 5. Abstract model
Give explicit definitions of the predicates in the abstract integer domain.

### 6. Clause check
Check each Horn clause against the abstract model.

### 7. Lifted original model
Translate the abstract model back to the original ADT domain via the catamorphism.

### 8. If UNSAT
If the verdict is `UNSAT`, explain the contradiction in the original CHCs.

---

## Additional instructions

- Be mathematically explicit.
- Do not give only intuition; give actual formulas.
- If one catamorphism fails, refine it rather than giving up.
- If the intended model is recursive but the non-recursive one is easier to verify, provide both.
- Prefer concise but rigorous reasoning.

I will now give you the SMT-LIB HORN instance. Solve it using the above methodology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants