Skip to content

feat(jolt-eval): invariant and objective framework for AI-assisted development#1406

Open
moodlezoup wants to merge 87 commits intomainfrom
feat/invariants-and-objectives
Open

feat(jolt-eval): invariant and objective framework for AI-assisted development#1406
moodlezoup wants to merge 87 commits intomainfrom
feat/invariants-and-objectives

Conversation

@moodlezoup
Copy link
Copy Markdown
Collaborator

@moodlezoup moodlezoup commented Apr 4, 2026

Summary

  • jolt-eval crate: mechanically checkable invariants and objectives for the Jolt zkVM, designed for AI agent productivity and minimizing human verification surface
  • Invariant framework: #[invariant(Test, Fuzz, RedTeam)] macro synthesizes tests, fuzz targets, and AI red-team harnesses from a single invariant definition
  • Objective system: unified Objective trait for both static-analysis (LLOC, cognitive complexity, Halstead bugs) and Criterion-benchmarked performance measurements
  • ObjectiveFunction struct: const-constructible struct combining objectives into a scalar for the optimizer to minimize; registered via ObjectiveFunction::all() / by_name()
  • Type-safe measurement keys: OptimizationObjective enum replaces string-keyed HashMaps; three nested enums (StaticAnalysisObjective, PerformanceObjective, OptimizationObjective) with discriminant-based Hash/Eq
  • AI red-team binary (redteam): runs Claude in an isolated worktree to find invariant violations via structured JSON counterexamples
  • AI optimize binary (optimize): iterative optimization loop — agent proposes changes, diff is applied, objectives re-measured, invariants checked, accepted/reverted with git branch + per-iteration commits
  • CLI features: --list, --test (built-in e2e sort targets), --verbose (shows prompts/responses), mutual exclusivity via clap
  • E2e sort tests: naive_sort optimization target, candidate_sort red-team target with subtle bug (correct for ≤16 elements, skips last element for larger arrays), both MockAgent and #[ignore] real-agent tests

Test plan

  • cargo nextest run -p jolt-eval — 87 tests pass, 2 skipped (real-agent #[ignore] tests)
  • cargo clippy -p jolt-eval --all-targets — zero warnings
  • cargo run --release -p jolt-eval --bin optimize -- --list — lists 5 objective functions
  • cargo run --release -p jolt-eval --bin redteam -- --list — lists red-teamable invariants + e2e targets
  • cargo run --release -p jolt-eval --bin optimize -- --test --verbose — e2e sort optimization with real Claude agent
  • cargo run --release -p jolt-eval --bin redteam -- --test --verbose — e2e sort red-team with real Claude agent
  • cargo bench -p jolt-eval — Criterion benchmarks run

🤖 Generated with Claude Code

moodlezoup and others added 30 commits March 31, 2026 18:38
…ement

Introduces a framework for mechanically verifiable invariants and
measurable objectives, enabling automated testing, fuzzing, AI red-teaming,
and performance optimization of the Jolt zkVM.

Invariants (binary pass/fail):
- Soundness: mutated proofs must be rejected by the verifier
- Verifier completeness: honest proofs must be accepted
- Prover completeness: prover must not panic on valid inputs
- Determinism: same inputs produce byte-identical proofs
- Serialization roundtrip: serialize/deserialize preserves proof bytes
- ZK consistency: prove+verify succeeds in current compilation mode

Objectives (scalar measurements):
- Peak RSS, prover time, proof size, verifier time
- Guest cycle count, inline instruction lengths, wrapping cost
- auto_optimize harness for AI-driven optimization loops

Synthesis infrastructure:
- Test generation from seed corpus + random Arbitrary inputs
- Fuzz target code generation for libfuzzer_sys
- Red-team harness with git worktree isolation
- #[invariant(targets = [Test, Fuzz, RedTeam])] proc macro

CLI binaries: check-invariants, measure-objectives, redteam

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs randomized fuzz inputs against all fuzzable invariants (or a
specific one via --invariant). Supports --iterations, --duration
time limits, --input-size, and --list to enumerate targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Replaces the placeholder invoke_agent callback with a real
implementation that:

1. Creates a detached git worktree for agent isolation
2. Builds a structured prompt with the invariant description,
   instructions, and any previous failed attempts
3. Invokes `claude -p` in the worktree with configurable model
   and max-turns
4. Captures the agent's analysis as the approach description
5. Cleans up the worktree after each iteration
6. Runs intensive fuzz checks (configurable via --num-fuzz)
   after each agent attempt to mechanically verify findings

Also adds num_fuzz_per_iteration to RedTeamConfig so auto_redteam
runs random fuzz inputs (not just seed corpus) after each attempt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the library-level auto_optimize function (which delegated
agent invocation to the caller) with a standalone `optimize` binary
that drives the full loop:

1. Measures baseline objectives
2. Creates an isolated git worktree per iteration
3. Invokes `claude -p` with a prompt describing the objectives,
   current best measurements, and past attempts
4. Captures the agent's diff and applies it to the real repo
5. Re-measures objectives and checks all invariants
6. Commits on improvement, reverts otherwise

Supports --objectives (comma-separated filter), --hint (extra context
for the agent), --iterations, --model, and --max-turns.

Keeps OptimizationAttempt as a public type in objective/mod.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t backends

Introduces `agent::AgentHarness` -- a trait abstracting over how an AI
coding agent is invoked.  Implementors control their own isolation
strategy (worktrees, containers, API calls, etc.), so a multi-agent
Codex harness, a remote API agent, or a parallel fan-out agent can all
plug into the same red-team and optimize loops.

Concrete changes:
- New `src/agent.rs` with `AgentHarness` trait, `AgentResponse`/
  `AgentError` types, `ClaudeCodeAgent` implementation, and worktree /
  diff-apply utilities.
- `auto_redteam` now takes `&dyn AgentHarness` + `repo_dir` instead
  of an `FnMut` callback.  Prompt construction stays inside the
  library.
- `RedTeamConfig` trimmed to orchestration-only fields (iterations,
  fuzz count); model/max-turns live on the agent.
- Both `bin/redteam.rs` and `bin/optimize.rs` construct a
  `ClaudeCodeAgent` from CLI args and pass it through the trait.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `MockAgent` to the library with three constructors:
- `always_ok(text)` -- repeats a fixed success response
- `always_err(msg)` -- repeats a fixed error
- `from_responses(vec)` -- drains a queue, then repeats the last entry

MockAgent records every prompt it receives, accessible via
`recorded_prompts()`.

21 new tests covering:
- MockAgent basics: ok, err, prompt recording, repeat behavior,
  response queuing, diff passthrough
- auto_redteam integration: no-violation path, immediate violation,
  fuzz-triggered violation, agent error handling, zero iterations,
  mixed ok/err responses
- Prompt construction: invariant description inclusion, failed attempt
  accumulation across iterations
- Trait extensibility: object safety, Arc wrapping, custom
  `FirstSuccessHarness` multi-agent implementation plugging directly
  into auto_redteam

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracts the optimization loop from bin/optimize.rs into a testable
library function `objective::optimize::auto_optimize` that takes:
- `&dyn AgentHarness` for the AI agent
- `&mut dyn OptimizeEnv` for side effects (measure, check invariants,
  apply diff, accept/reject)

`OptimizeEnv` trait lets tests swap in a `MockOptimizeEnv` with
controllable measurement sequences, invariant pass/fail schedules,
and recorded side effects.

14 new optimization tests covering:
- Accept on improvement (Minimize and Maximize directions)
- Reject on regression
- Reject when invariants fail despite improvement
- Multi-iteration progressive improvement with mixed accept/reject
- Early stop when agent produces no diff or errors
- Zero iterations
- Multiple objectives (any-improves-triggers-accept)
- Prompt content: measurements, hint, past attempts
- Diff application recording
- Invariant failure mid-sequence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… single source of truth

Adds `BUILTIN_INVARIANT_NAMES` const and `names()`/`names_for_target()`
methods to `SynthesisRegistry`. All four CLI binaries now derive their
`--list` output and error messages from these instead of duplicating
name strings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…registries

Each invariant and objective module now calls `inventory::submit!` with
an entry struct (`InvariantEntry` / `ObjectiveEntry`) containing the
name, metadata, and a factory function.  At runtime, `inventory::iter`
discovers all registered entries automatically.

Key changes:
- `InvariantEntry` (name, targets, build fn) + `inventory::collect!`
  in invariant/mod.rs; each of the 6 invariant files submits one.
- `ObjectiveEntry` (name, direction, build fn) + `inventory::collect!`
  in objective/mod.rs; each of the 7 objective files submits one.
- `SynthesisRegistry::from_inventory(test_case, inputs)` replaces
  manual `register_invariants` functions in all binaries.
- `build_objectives_from_inventory(setup, inputs)` and `measure_dyn`
  replace manual `build_objectives` functions.
- `invariant_names()` replaces `BUILTIN_INVARIANT_NAMES` const -- names
  are derived from the inventory, not hardcoded.
- All 5 CLI binaries simplified: no more per-binary registration
  boilerplate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
auto_redteam now asks the agent to produce a concrete counterexample
as a JSON code block, deserializes it into the invariant's Input type
via serde, and runs Invariant::check on that specific input.

Key changes:

- Invariant::Input gains Serialize + DeserializeOwned bounds;
  Invariant::Setup gains 'static.  All 6 built-in Input types derive
  Serialize + Deserialize.

- DynInvariant gains three methods:
    input_json_example() — serializes a seed-corpus item for the prompt
    dyn_setup()          — type-erased setup (Box<dyn Any>, call once)
    check_json_input()   — deserialize JSON → Input, run check()

- CheckJsonResult enum distinguishes Pass / Violation / BadInput
  (deserialization failure) so parse errors are never mistaken for
  real counterexamples.

- auto_redteam builds a prompt that includes the JSON schema example
  and asks the agent to end with a ```json code block.  The loop
  extracts JSON via extract_json(), feeds it through check_json_input,
  and returns Violation with the input_json on success.

- RedTeamResult::Violation now carries input_json alongside approach
  and error.

- RedTeamConfig trimmed: num_fuzz_per_iteration removed (the loop
  checks the single agent-produced input, not random fuzz).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The full text belongs in FailedAttempt.approach so callers can decide
how to display it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Give ConstantObjective a configurable name and direction so multiple
instances don't collide in the HashMap.  test_measure_objectives now
passes two objectives and asserts both values appear in the result.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d red-team output

ClaudeCodeAgent::invoke_structured now passes --output-format json and
--json-schema to the Claude CLI, so the response is guaranteed to
conform to the schema.  The CLI envelope's `structured_output` field
is extracted directly — no fragile regex/extract_json parsing needed
for agents that support it.

Key changes:

- AgentHarness gains invoke_structured(repo_dir, prompt, schema) with
  a default that falls back to invoke().
- ClaudeCodeAgent overrides invoke_structured: passes the schema via
  --json-schema, parses the CLI JSON envelope, extracts
  structured_output (or result as fallback).
- Invariant::Input gains a JsonSchema bound (schemars 0.8); all 6
  Input types derive JsonSchema.
- DynInvariant gains input_json_schema() → serde_json::Value, backed
  by schemars::schema_for! in the blanket impl.
- auto_redteam builds an envelope schema
  {"analysis": string, "counterexample": <Input schema>} and calls
  invoke_structured.  Response is parsed as the envelope first; if
  that fails (agent doesn't support structured output), falls back
  to extract_json on free-form text.
- Prompt updated: tells the agent to respond with {analysis,
  counterexample} JSON instead of a code block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds fuzz/Cargo.toml and one fuzz target per built-in invariant, each
a single-line call to the new `fuzz_invariant(name, data)` library
function.  Adding a new fuzz target is one file:

    #![no_main]
    use libfuzzer_sys::fuzz_target;
    fuzz_target!(|data: &[u8]| {
        jolt_eval::invariant::synthesis::fuzz::fuzz_invariant("my_invariant", data);
    });

Run with:
    cd jolt-eval && JOLT_FUZZ_ELF=path/to/guest.elf cargo fuzz run soundness

fuzz_invariant() loads the ELF from JOLT_FUZZ_ELF, builds the
inventory-registered invariant and its setup once (LazyLock), then
interprets each fuzz input as raw JSON fed through check_json_input.

Also:
- Removes the _fuzz_check codegen from the #[invariant] proc macro
  (redundant now that fuzz targets call the library function directly)
- Removes generate_fuzz_target() string-template codegen
- Adds SynthesisRegistry::into_invariants() for ownership transfer
- Tightens Invariant::Setup to Send + Sync + 'static (needed for
  the LazyLock cache in fuzz_invariant)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tional

Introduces a `guests` module with a fixed catalog of known guest
programs (muldiv, fibonacci, sha2, sha3, collatz, alloc), each with
its memory config, max trace length, and default inputs extracted from
the #[jolt::provable(...)] attributes.  `GuestSpec::compile()` invokes
`host::Program::new(package).build()` to cross-compile the guest to
RISC-V automatically.

All 5 CLI binaries now accept `--guest <name>` (compiles automatically)
as the primary interface, with `--elf <path>` preserved as a fallback
for custom ELFs.  The shared `guests::resolve_test_case()` helper
handles both paths.

`InvariantEntry::build` and `ObjectiveEntry::build` now take
`Option<Arc<TestCase>>` / `Option<&SharedSetup>` respectively, with a
`needs_guest: bool` flag.  `SynthesisRegistry::from_inventory` and
`build_objectives_from_inventory` skip entries that need a guest when
`None` is passed.  This lays the groundwork for invariants/objectives
that operate on lower-level primitives (e.g. polynomial binding) and
don't require a compiled guest program.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ynomial fuzz targets

Remove the 6 generic proof-level invariants (soundness, completeness, determinism,
serialization_roundtrip, zk_consistency) that required a guest ELF and were slow to
exercise. Replace with 2 focused, guest-free invariants that fuzz
GruenSplitEqPolynomial::bind against DensePolynomial reference implementations for
both LowToHigh and HighToLow binding orders.

Also makes JOLT_FUZZ_ELF optional so guest-free invariants work without it, uses
DEFAULT_MAX_*_ADVICE_SIZE constants in guest configs, and adds the invariant/objective
design spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inventory::submit!/collect!/iter with explicit arrays in
registered_invariants() and registered_objectives(). Each module's
entry is now listed directly in its parent mod.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fuzz binary was a random-input runner largely duplicating
check-invariants --num-random. The libfuzzer scaffolding in
jolt-eval/fuzz/ is kept for cargo-fuzz usage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Invariants enum

Remove the DynInvariant trait, SynthesisRegistry, and InvariantEntry
in favor of a JoltInvariants enum with match-based method dispatch.

auto_redteam is now generic over I: Invariant, eliminating the need
for type erasure in the red-team path and keeping tests simple.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t macro

Replace the check-invariants binary with macro-generated #[test]
functions. The #[invariant] macro now unconditionally generates
seed_corpus and random_inputs tests, with iteration count
configurable via JOLT_RANDOM_ITERS env var (default 10).

Removed the unused redteam description generation from the macro.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntation

Replace fuzz_invariant() function with a fuzz_invariant!() declarative
macro that generates the full fuzz_target! expansion with OnceLock-based
setup caching. Each fuzz target is now 3 lines.

Change SplitEqBindInput to store w/rs challenge vectors directly as
Vec<u128> with a manual Arbitrary impl, removing the seed-based
indirection. Remove now-dead type-erasure methods (dyn_setup,
check_json_input, check_arbitrary_input, CheckJsonResult) from
JoltInvariants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ro-driven

Move Invariant::targets() to a separate InvariantTargets trait with a
default empty-set implementation. The #[invariant] macro now accepts
target lists (e.g. #[invariant(Test, Fuzz)]) and generates the
InvariantTargets impl and conditionally emits #[test] blocks only when
Test is listed. The fuzz_invariant! macro asserts that the invariant
includes SynthesisTarget::Fuzz at init time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- agent/mod.rs: AgentHarness trait, types, apply_diff, truncate
- agent/claude.rs: ClaudeCodeAgent, worktree helpers
- agent/mock.rs: MockAgent for testing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Don't pass --verbose with --output-format json — it turns stdout into
a JSONL event stream that can't be parsed as a single result object.
Also parse the JSON envelope even on non-zero exit codes, since Claude
may still produce structured_output on soft errors like max_turns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Also fix envelope parsing fallback: when extract_json finds the full
envelope JSON from markdown, try parsing it as an envelope first to
correctly extract the counterexample field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The soundness invariant lets a red-team agent produce a unified diff
against guest-sandbox/, which is applied in-place (with RAII revert),
compiled via jolt build, proved honestly, then verified against the
agent's dishonest (output, panic) claim. The verifier must reject.

Adds TestCase::verify_with_claims to override output bytes and panic
flag independently. Patch filtering drops hunks with '..' paths to
prevent sandbox escapes. Includes filter_patch unit tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change Invariant::check to return Result<(), CheckError> where
CheckError has Violation and InvalidInput variants. InvalidInput is
silently skipped in fuzz targets, macro-generated tests, and
run_checks; recorded as a failed attempt in redteam.

Also refactor guests.rs into guests/ directory (mod.rs + catalog.rs)
and add GuestMemoryConfig to SoundnessInput with validation limits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add guest-sandbox to workspace members so jolt build can find it.
Fix guest to return u32 (Vec<u8> lacks Serialize in no_std).
Wrap Program::build in catch_unwind to handle compilation panics.
Add tests for memory config validation, patch filtering, and the
full compile→prove→verify path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Warning

This PR has more than 500 changed lines and does not include a spec.

Large features and architectural changes benefit from a spec-driven workflow.
See CONTRIBUTING.md for details on how to create a spec.

If this PR is a bug fix, refactor, or doesn't warrant a spec, feel free to ignore this message.

@github-actions github-actions bot added the no-spec PR has no spec file label Apr 4, 2026
moodlezoup and others added 9 commits April 3, 2026 23:12
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
False positive — libfuzzer-sys is used via the fuzz_invariant! macro
which expands to libfuzzer_sys::fuzz_target!.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each OptimizationObjective now declares diff_paths() specifying which
file paths an optimizer may modify. ObjectiveFunction::diff_scope()
computes the union of all input objectives' paths as DiffScope::Include.

- Static analysis + binding objectives: jolt-core/
- NaiveSortTime: jolt-eval/src/sort_targets.rs
- Remove diff_scope from OptimizeConfig (now derived automatically)
- Fix unwrap() → unwrap_or(f64::INFINITY) in const ObjectiveFunctions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add description() to StaticAnalysisObjective, PerformanceObjective, OptimizationObjective
- Optimize prompt: add "What you are optimizing" section with descriptions and units
- Optimize prompt: show units alongside measurement values
- Optimize prompt: include first 500 chars of each past attempt's diff
- Optimize prompt: add anti-repetition steering after previous attempts
- Optimize prompt: derive targeted reading guidance from objective's diff_paths
- Redteam prompt: always include JSON schema (not just when seed example exists)
- Redteam prompt: truncate approach text to 200 chars in previous attempts
- Redteam prompt: show "Iteration {current}/{total}" counter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add description() with default (returns name()) to the Objective trait.
Each concrete type now owns its description, and the enums delegate
instead of hardcoding strings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the manual match statement in the redteam binary with a
red_team() method on JoltInvariants that uses the dispatch! macro,
consistent with name(), description(), targets(), and run_checks().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…normalization

Objectives have different units and magnitudes (e.g. LLOC ~5500 lines
vs cognitive complexity ~4.0), making weighted combinations unwieldy.

- Add baseline() to StaticAnalysisObjective, PerformanceObjective,
  and OptimizationObjective returning a reference scale
- Add normalized(obj, measurements) free function: value / baseline,
  yielding a dimensionless ratio where 1.0 ≈ current typical value
- Add test demonstrating a balanced 50/50 weighted composite that
  treats a 10% improvement in any objective equally

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded baseline() values with dynamic baselines captured
from the first env.measure() call at the start of auto_optimize.

- ObjectiveFunction.evaluate now takes (measurements, baselines)
- normalized(obj, measurements, baselines) divides by the baseline
  measurement instead of a hardcoded constant
- Remove baseline() from all objective enums
- auto_optimize passes initial measurements as baselines to evaluate
- Single-objective functions ignore baselines (use |m, _|)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new jolt-eval crate that provides mechanically checkable invariants and measurable objectives for AI-assisted red-teaming and iterative optimization of the Jolt zkVM codebase, with supporting CLIs and benchmark/fuzz synthesis tooling.

Changes:

  • Adds an invariant framework (incl. #[invariant(...)] proc macro) plus red-team and fuzz harnesses.
  • Adds an objective framework spanning static analysis and Criterion-backed performance measurement, plus an optimization loop (optimize).
  • Adds end-to-end sort targets/tests and a sync_targets.sh helper to keep fuzz/bench targets in sync.

Reviewed changes

Copilot reviewed 50 out of 55 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
jolt-eval/sync_targets.sh Generates/removes fuzz targets and syncs bench entries in Cargo manifests.
jolt-eval/src/sort_targets.rs Adds naive and intentionally-buggy sort functions as e2e targets.
jolt-eval/src/sort_e2e.rs Implements red-team + optimize e2e harnesses and tests around sorting.
jolt-eval/src/objective/synthesis.rs Adds bench_objective! macro to generate Criterion harnesses.
jolt-eval/src/objective/performance/prover_time.rs Adds a prover-time objective implementation for guest programs.
jolt-eval/src/objective/performance/mod.rs Exposes performance objective modules.
jolt-eval/src/objective/performance/binding.rs Adds polynomial binding performance objectives and keys.
jolt-eval/src/objective/optimize.rs Implements iterative AI optimization loop with git accept/reject behavior.
jolt-eval/src/objective/objective_fn/mod.rs Adds registered ObjectiveFunctions and diff scoping.
jolt-eval/src/objective/mod.rs Defines Objective traits and type-safe objective keys/enums.
jolt-eval/src/objective/code_quality/mod.rs Exposes static-analysis objective modules.
jolt-eval/src/objective/code_quality/lloc.rs Implements LLOC static analysis objective via rust-code-analysis.
jolt-eval/src/objective/code_quality/halstead_bugs.rs Implements Halstead “bugs” static analysis objective.
jolt-eval/src/objective/code_quality/cognitive.rs Implements cognitive complexity static analysis objective.
jolt-eval/src/lib.rs Adds crate root exports and module wiring.
jolt-eval/src/invariant/synthesis/redteam.rs Implements structured JSON red-team loop and prompt construction.
jolt-eval/src/invariant/synthesis/mod.rs Exposes invariant synthesis submodules.
jolt-eval/src/invariant/synthesis/fuzz.rs Adds fuzz_invariant! macro to create libFuzzer targets.
jolt-eval/src/invariant/split_eq_bind.rs Adds split-eq binding invariants with Test/Fuzz/RedTeam targets.
jolt-eval/src/invariant/soundness.rs Adds a soundness red-team invariant using sandbox patching + prove/verify.
jolt-eval/src/invariant/mod.rs Defines core invariant traits, error types, and invariant registry/dispatch.
jolt-eval/src/invariant/macro_tests.rs Adds invariants used to test macro synthesis behavior.
jolt-eval/src/guests/sha2_chain.rs Adds SHA-2 chain guest config for prover-time benchmarking.
jolt-eval/src/guests/secp256k1_ecdsa.rs Adds secp256k1 ECDSA verify guest config for benchmarking.
jolt-eval/src/guests/mod.rs Adds guest plumbing for proving/verifying and guest config trait.
jolt-eval/src/guests/fibonacci.rs Adds Fibonacci guest config for benchmarking.
jolt-eval/src/agent/tests.rs Adds extensive tests for agent harness + optimize/redteam loops.
jolt-eval/src/agent/mod.rs Adds agent harness trait, diff scoping, and diff application helper.
jolt-eval/src/agent/mock.rs Adds a mock agent for deterministic testing.
jolt-eval/src/agent/claude.rs Adds a Claude Code CLI-backed agent using isolated git worktrees.
jolt-eval/README.md Documents the jolt-eval concepts, invariants, objectives, and CLIs.
jolt-eval/macros/src/lib.rs Implements #[invariant(...)] proc macro (targets + optional tests).
jolt-eval/macros/Cargo.toml Declares the proc-macro crate and dependencies.
jolt-eval/guest-sandbox/src/main.rs Adds guest sandbox entrypoint.
jolt-eval/guest-sandbox/src/lib.rs Adds default sandbox provable function for soundness red-teaming.
jolt-eval/guest-sandbox/README.md Documents the sandbox and patch filtering safety model.
jolt-eval/guest-sandbox/Cargo.toml Declares the sandbox guest package.
jolt-eval/guest-sandbox/.gitignore Ignores sandbox build output.
jolt-eval/fuzz/fuzz_targets/split_eq_bind_low_high.rs Adds fuzz target for split-eq bind Low→High invariant.
jolt-eval/fuzz/fuzz_targets/split_eq_bind_high_low.rs Adds fuzz target for split-eq bind High→Low invariant.
jolt-eval/fuzz/Cargo.toml Adds fuzz workspace crate and bin targets for fuzzing invariants.
jolt-eval/fuzz/.gitignore Ignores fuzz build artifacts/corpus.
jolt-eval/Cargo.toml Declares jolt-eval crate deps, bins, and bench targets.
jolt-eval/bin/redteam.rs Adds redteam CLI for AI red-team sessions.
jolt-eval/bin/optimize.rs Adds optimize CLI for AI optimization runs against objective functions.
jolt-eval/bin/measure_objectives.rs Adds measure-objectives CLI for static + perf objective measurement.
jolt-eval/benches/prover_time_sha2_chain.rs Adds Criterion bench harness for prover time (SHA2 chain).
jolt-eval/benches/prover_time_secp256k1_ecdsa_verify.rs Adds Criterion bench harness for prover time (ECDSA verify).
jolt-eval/benches/prover_time_fibonacci.rs Adds Criterion bench harness for prover time (Fibonacci).
jolt-eval/benches/bind_parallel_low_to_high.rs Adds Criterion bench harness for binding (Low→High).
jolt-eval/benches/bind_parallel_high_to_low.rs Adds Criterion bench harness for binding (High→Low).
Cargo.toml Adds new workspace members and excludes fuzz sub-workspace.
.gitignore Ignores .omc/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

moodlezoup and others added 3 commits April 4, 2026 13:35
…tterns

- Delete all section separator comments (// ===, // ---, // ──)
- Simplify StaticAnalysisObjective::diff_paths (all variants return
  the same value, no match needed)
- Extract read_criterion_estimate into objective::performance with
  baseline parameter, removing duplicate in both binaries
- Replace manual diff_paths dedup in prompt builder with
  objective.diff_scope() call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SortOptimizeEnv previously used an in-process function pointer
swap to simulate optimization (can't recompile at runtime). Now it
shells out to `cargo run --bin optimize -- --measure`, which
recompiles the modified sort_targets.rs and measures the actual
modified code.

- Add --measure flag to optimize binary: measures objective inputs
  and prints JSON to stdout
- Register MINIMIZE_NAIVE_SORT_TIME as a proper ObjectiveFunction
- Add measure_perf() helper for per-variant perf measurement
  (Criterion for bindings, direct timing for NaiveSortTime)
- SortOptimizeEnv::measure() shells out to the binary
- SortOptimizeEnv::apply_diff() just patches the file (no fn swap)
- Replace mock tests with MockEnv (predetermined measurements)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the NaiveSortTime fieldless variant and SortOptimizeEnv
subprocess hack with a standard Criterion-based performance objective:

- Add NaiveSortObjective in sort_targets.rs implementing Objective
  with setup()/run() — same pattern as BindLowToHighObjective
- Add benches/naive_sort_time.rs using bench_objective! macro
- NaiveSortTime now wraps NaiveSortObjective (data-carrying variant)
- Include NaiveSortTime in PerformanceObjective::all()
- Delete SortOptimizeEnv, run_optimize_test, and mock optimize tests
  from sort_e2e.rs (redundant with agent/tests.rs)
- Rewrite --test in optimize binary to use RealEnv directly
- RealEnv::measure() handles naive_sort via cargo bench like all
  other perf objectives — no special-casing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 4, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedschemars@​0.8.2291100100100100
Addedenumset@​1.1.1010010093100100
Addednum-derive@​0.3.310010093100100
Addedrust-code-analysis@​0.0.249910010010070

View full report

moodlezoup and others added 11 commits April 4, 2026 14:34
…--measure

Move NaiveSortObjective from sort_targets.rs (which the optimizer agent
can modify) to objective/performance/naive_sort.rs (which it cannot).
This prevents the agent from cheating by modifying the objective itself.

Also remove --measure flag from optimize binary — no longer needed now
that the sort objective is a proper Criterion benchmark measured via
RealEnv like all other perf objectives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sage

The error reported input/output/stack<=MAX_STACK_SIZE but input and
output have their own limits (MAX_INPUT_SIZE, MAX_OUTPUT_SIZE).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix ObjectiveFunction example: update evaluate closure signature to
  |m, _b| and add missing `normalized` to the use statement
- Clarify that prover_time_* benchmarks are standalone Criterion targets
  not tracked by optimize/measure-objectives binaries
- Update guest-sandbox/README to reflect actual behavior: hash guest,
  in-place patching with PatchGuard git-checkout revert
- Add git clean -fd after git checkout . in both rejection branches of
  auto_optimize so agent-added untracked files don't contaminate subsequent iterations
- Capture stderr in apply_diff and include it in the AgentError message

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The harness applies patches in-place with a PatchGuard RAII guard (not
copying to a temp dir), and the default guest is a wrapping hash (not
an identity function).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctory

Each auto-optimize iteration now writes diff.patch, response.md,
measurements.json, and status.json to jolt-eval/optimize-history/{objective}/{attempt-N}/.
The baseline is persisted to attempt-0/baseline/. The prompt's previous-attempts
section is slimmed down to one line per attempt pointing at the directory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tory

Mirror the optimize-history pattern for the redteam loop:

- Write approach.md and failure_reason.txt to
  jolt-eval/redteam-history/{invariant}/attempt-{N}/
- Add path: Option<String> to FailedAttempt
- Slim the prompt to show description + failure reason inline,
  with a pointer to the attempt directory for the full approach
- Add redteam-history/ to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code quality objectives now store target_dir (e.g. "jolt-core/src")
instead of a leaked root PathBuf. The repo root is computed at runtime
via env!("CARGO_MANIFEST_DIR").parent(). This eliminates all Box::leak
calls from the code quality objectives and removes the root parameter
from StaticAnalysisObjective::all() and OptimizationObjective::all().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allows computed descriptions that include target_dir, e.g.:
"Total logical lines of code in jolt-core/src/"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Required by the simple form of bench_objective! which calls
<Type>::default().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-spec PR has no spec file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants