feat(jolt-eval): invariant and objective framework for AI-assisted development by moodlezoup · Pull Request #1406 · a16z/jolt

moodlezoup · 2026-04-04T03:09:55Z

Summary

jolt-eval crate: mechanically checkable invariants and objectives for the Jolt zkVM, designed for AI agent productivity and minimizing human verification surface
Invariant framework: #[invariant(Test, Fuzz, RedTeam)] macro synthesizes tests, fuzz targets, and AI red-team harnesses from a single invariant definition
Objective system: unified Objective trait for both static-analysis (LLOC, cognitive complexity, Halstead bugs) and Criterion-benchmarked performance measurements
ObjectiveFunction struct: const-constructible struct combining objectives into a scalar for the optimizer to minimize; registered via ObjectiveFunction::all() / by_name()
Type-safe measurement keys: OptimizationObjective enum replaces string-keyed HashMaps; three nested enums (StaticAnalysisObjective, PerformanceObjective, OptimizationObjective) with discriminant-based Hash/Eq
AI red-team binary (redteam): runs Claude in an isolated worktree to find invariant violations via structured JSON counterexamples
AI optimize binary (optimize): iterative optimization loop — agent proposes changes, diff is applied, objectives re-measured, invariants checked, accepted/reverted with git branch + per-iteration commits
CLI features: --list, --test (built-in e2e sort targets), --verbose (shows prompts/responses), mutual exclusivity via clap
E2e sort tests: naive_sort optimization target, candidate_sort red-team target with subtle bug (correct for ≤16 elements, skips last element for larger arrays), both MockAgent and #[ignore] real-agent tests

Test plan

cargo nextest run -p jolt-eval — 87 tests pass, 2 skipped (real-agent #[ignore] tests)
cargo clippy -p jolt-eval --all-targets — zero warnings
cargo run --release -p jolt-eval --bin optimize -- --list — lists 5 objective functions
cargo run --release -p jolt-eval --bin redteam -- --list — lists red-teamable invariants + e2e targets
cargo run --release -p jolt-eval --bin optimize -- --test --verbose — e2e sort optimization with real Claude agent
cargo run --release -p jolt-eval --bin redteam -- --test --verbose — e2e sort red-team with real Claude agent
cargo bench -p jolt-eval — Criterion benchmarks run

🤖 Generated with Claude Code

…ement Introduces a framework for mechanically verifiable invariants and measurable objectives, enabling automated testing, fuzzing, AI red-teaming, and performance optimization of the Jolt zkVM. Invariants (binary pass/fail): - Soundness: mutated proofs must be rejected by the verifier - Verifier completeness: honest proofs must be accepted - Prover completeness: prover must not panic on valid inputs - Determinism: same inputs produce byte-identical proofs - Serialization roundtrip: serialize/deserialize preserves proof bytes - ZK consistency: prove+verify succeeds in current compilation mode Objectives (scalar measurements): - Peak RSS, prover time, proof size, verifier time - Guest cycle count, inline instruction lengths, wrapping cost - auto_optimize harness for AI-driven optimization loops Synthesis infrastructure: - Test generation from seed corpus + random Arbitrary inputs - Fuzz target code generation for libfuzzer_sys - Red-team harness with git worktree isolation - #[invariant(targets = [Test, Fuzz, RedTeam])] proc macro CLI binaries: check-invariants, measure-objectives, redteam Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs randomized fuzz inputs against all fuzzable invariants (or a specific one via --invariant). Supports --iterations, --duration time limits, --input-size, and --list to enumerate targets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion Replaces the placeholder invoke_agent callback with a real implementation that: 1. Creates a detached git worktree for agent isolation 2. Builds a structured prompt with the invariant description, instructions, and any previous failed attempts 3. Invokes `claude -p` in the worktree with configurable model and max-turns 4. Captures the agent's analysis as the approach description 5. Cleans up the worktree after each iteration 6. Runs intensive fuzz checks (configurable via --num-fuzz) after each agent attempt to mechanically verify findings Also adds num_fuzz_per_iteration to RedTeamConfig so auto_redteam runs random fuzz inputs (not just seed corpus) after each attempt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replaces the library-level auto_optimize function (which delegated agent invocation to the caller) with a standalone `optimize` binary that drives the full loop: 1. Measures baseline objectives 2. Creates an isolated git worktree per iteration 3. Invokes `claude -p` with a prompt describing the objectives, current best measurements, and past attempts 4. Captures the agent's diff and applies it to the real repo 5. Re-measures objectives and checks all invariants 6. Commits on improvement, reverts otherwise Supports --objectives (comma-separated filter), --hint (extra context for the agent), --iterations, --model, and --max-turns. Keeps OptimizationAttempt as a public type in objective/mod.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…t backends Introduces `agent::AgentHarness` -- a trait abstracting over how an AI coding agent is invoked. Implementors control their own isolation strategy (worktrees, containers, API calls, etc.), so a multi-agent Codex harness, a remote API agent, or a parallel fan-out agent can all plug into the same red-team and optimize loops. Concrete changes: - New `src/agent.rs` with `AgentHarness` trait, `AgentResponse`/ `AgentError` types, `ClaudeCodeAgent` implementation, and worktree / diff-apply utilities. - `auto_redteam` now takes `&dyn AgentHarness` + `repo_dir` instead of an `FnMut` callback. Prompt construction stays inside the library. - `RedTeamConfig` trimmed to orchestration-only fields (iterations, fuzz count); model/max-turns live on the agent. - Both `bin/redteam.rs` and `bin/optimize.rs` construct a `ClaudeCodeAgent` from CLI args and pass it through the trait. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds `MockAgent` to the library with three constructors: - `always_ok(text)` -- repeats a fixed success response - `always_err(msg)` -- repeats a fixed error - `from_responses(vec)` -- drains a queue, then repeats the last entry MockAgent records every prompt it receives, accessible via `recorded_prompts()`. 21 new tests covering: - MockAgent basics: ok, err, prompt recording, repeat behavior, response queuing, diff passthrough - auto_redteam integration: no-violation path, immediate violation, fuzz-triggered violation, agent error handling, zero iterations, mixed ok/err responses - Prompt construction: invariant description inclusion, failed attempt accumulation across iterations - Trait extensibility: object safety, Arc wrapping, custom `FirstSuccessHarness` multi-agent implementation plugging directly into auto_redteam Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extracts the optimization loop from bin/optimize.rs into a testable library function `objective::optimize::auto_optimize` that takes: - `&dyn AgentHarness` for the AI agent - `&mut dyn OptimizeEnv` for side effects (measure, check invariants, apply diff, accept/reject) `OptimizeEnv` trait lets tests swap in a `MockOptimizeEnv` with controllable measurement sequences, invariant pass/fail schedules, and recorded side effects. 14 new optimization tests covering: - Accept on improvement (Minimize and Maximize directions) - Reject on regression - Reject when invariants fail despite improvement - Multi-iteration progressive improvement with mixed accept/reject - Early stop when agent produces no diff or errors - Zero iterations - Multiple objectives (any-improves-triggers-accept) - Prompt content: measurements, hint, past attempts - Diff application recording - Invariant failure mid-sequence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… single source of truth Adds `BUILTIN_INVARIANT_NAMES` const and `names()`/`names_for_target()` methods to `SynthesisRegistry`. All four CLI binaries now derive their `--list` output and error messages from these instead of duplicating name strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…registries Each invariant and objective module now calls `inventory::submit!` with an entry struct (`InvariantEntry` / `ObjectiveEntry`) containing the name, metadata, and a factory function. At runtime, `inventory::iter` discovers all registered entries automatically. Key changes: - `InvariantEntry` (name, targets, build fn) + `inventory::collect!` in invariant/mod.rs; each of the 6 invariant files submits one. - `ObjectiveEntry` (name, direction, build fn) + `inventory::collect!` in objective/mod.rs; each of the 7 objective files submits one. - `SynthesisRegistry::from_inventory(test_case, inputs)` replaces manual `register_invariants` functions in all binaries. - `build_objectives_from_inventory(setup, inputs)` and `measure_dyn` replace manual `build_objectives` functions. - `invariant_names()` replaces `BUILTIN_INVARIANT_NAMES` const -- names are derived from the inventory, not hardcoded. - All 5 CLI binaries simplified: no more per-binary registration boilerplate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

auto_redteam now asks the agent to produce a concrete counterexample as a JSON code block, deserializes it into the invariant's Input type via serde, and runs Invariant::check on that specific input. Key changes: - Invariant::Input gains Serialize + DeserializeOwned bounds; Invariant::Setup gains 'static. All 6 built-in Input types derive Serialize + Deserialize. - DynInvariant gains three methods: input_json_example() — serializes a seed-corpus item for the prompt dyn_setup() — type-erased setup (Box<dyn Any>, call once) check_json_input() — deserialize JSON → Input, run check() - CheckJsonResult enum distinguishes Pass / Violation / BadInput (deserialization failure) so parse errors are never mistaken for real counterexamples. - auto_redteam builds a prompt that includes the JSON schema example and asks the agent to end with a ```json code block. The loop extracts JSON via extract_json(), feeds it through check_json_input, and returns Violation with the input_json on success. - RedTeamResult::Violation now carries input_json alongside approach and error. - RedTeamConfig trimmed: num_fuzz_per_iteration removed (the loop checks the single agent-produced input, not random fuzz). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The full text belongs in FailedAttempt.approach so callers can decide how to display it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Give ConstantObjective a configurable name and direction so multiple instances don't collide in the HashMap. test_measure_objectives now passes two objectives and asserts both values appear in the result. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d red-team output ClaudeCodeAgent::invoke_structured now passes --output-format json and --json-schema to the Claude CLI, so the response is guaranteed to conform to the schema. The CLI envelope's `structured_output` field is extracted directly — no fragile regex/extract_json parsing needed for agents that support it. Key changes: - AgentHarness gains invoke_structured(repo_dir, prompt, schema) with a default that falls back to invoke(). - ClaudeCodeAgent overrides invoke_structured: passes the schema via --json-schema, parses the CLI JSON envelope, extracts structured_output (or result as fallback). - Invariant::Input gains a JsonSchema bound (schemars 0.8); all 6 Input types derive JsonSchema. - DynInvariant gains input_json_schema() → serde_json::Value, backed by schemars::schema_for! in the blanket impl. - auto_redteam builds an envelope schema {"analysis": string, "counterexample": <Input schema>} and calls invoke_structured. Response is parsed as the envelope first; if that fails (agent doesn't support structured output), falls back to extract_json on free-form text. - Prompt updated: tells the agent to respond with {analysis, counterexample} JSON instead of a code block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds fuzz/Cargo.toml and one fuzz target per built-in invariant, each a single-line call to the new `fuzz_invariant(name, data)` library function. Adding a new fuzz target is one file: #![no_main] use libfuzzer_sys::fuzz_target; fuzz_target!(|data: &[u8]| { jolt_eval::invariant::synthesis::fuzz::fuzz_invariant("my_invariant", data); }); Run with: cd jolt-eval && JOLT_FUZZ_ELF=path/to/guest.elf cargo fuzz run soundness fuzz_invariant() loads the ELF from JOLT_FUZZ_ELF, builds the inventory-registered invariant and its setup once (LazyLock), then interprets each fuzz input as raw JSON fed through check_json_input. Also: - Removes the _fuzz_check codegen from the #[invariant] proc macro (redundant now that fuzz targets call the library function directly) - Removes generate_fuzz_target() string-template codegen - Adds SynthesisRegistry::into_invariants() for ownership transfer - Tightens Invariant::Setup to Send + Sync + 'static (needed for the LazyLock cache in fuzz_invariant) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tional Introduces a `guests` module with a fixed catalog of known guest programs (muldiv, fibonacci, sha2, sha3, collatz, alloc), each with its memory config, max trace length, and default inputs extracted from the #[jolt::provable(...)] attributes. `GuestSpec::compile()` invokes `host::Program::new(package).build()` to cross-compile the guest to RISC-V automatically. All 5 CLI binaries now accept `--guest <name>` (compiles automatically) as the primary interface, with `--elf <path>` preserved as a fallback for custom ELFs. The shared `guests::resolve_test_case()` helper handles both paths. `InvariantEntry::build` and `ObjectiveEntry::build` now take `Option<Arc<TestCase>>` / `Option<&SharedSetup>` respectively, with a `needs_guest: bool` flag. `SynthesisRegistry::from_inventory` and `build_objectives_from_inventory` skip entries that need a guest when `None` is passed. This lays the groundwork for invariants/objectives that operate on lower-level primitives (e.g. polynomial binding) and don't require a compiled guest program. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ynomial fuzz targets Remove the 6 generic proof-level invariants (soundness, completeness, determinism, serialization_roundtrip, zk_consistency) that required a guest ELF and were slow to exercise. Replace with 2 focused, guest-free invariants that fuzz GruenSplitEqPolynomial::bind against DensePolynomial reference implementations for both LowToHigh and HighToLow binding orders. Also makes JOLT_FUZZ_ELF optional so guest-free invariants work without it, uses DEFAULT_MAX_*_ADVICE_SIZE constants in guest configs, and adds the invariant/objective design spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace inventory::submit!/collect!/iter with explicit arrays in registered_invariants() and registered_objectives(). Each module's entry is now listed directly in its parent mod.rs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The fuzz binary was a random-input runner largely duplicating check-invariants --num-random. The libfuzzer scaffolding in jolt-eval/fuzz/ is kept for cargo-fuzz usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Invariants enum Remove the DynInvariant trait, SynthesisRegistry, and InvariantEntry in favor of a JoltInvariants enum with match-based method dispatch. auto_redteam is now generic over I: Invariant, eliminating the need for type erasure in the red-team path and keeping tests simple. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…t macro Replace the check-invariants binary with macro-generated #[test] functions. The #[invariant] macro now unconditionally generates seed_corpus and random_inputs tests, with iteration count configurable via JOLT_RANDOM_ITERS env var (default 10). Removed the unused redteam description generation from the macro. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ntation Replace fuzz_invariant() function with a fuzz_invariant!() declarative macro that generates the full fuzz_target! expansion with OnceLock-based setup caching. Each fuzz target is now 3 lines. Change SplitEqBindInput to store w/rs challenge vectors directly as Vec<u128> with a manual Arbitrary impl, removing the seed-based indirection. Remove now-dead type-erasure methods (dyn_setup, check_json_input, check_arbitrary_input, CheckJsonResult) from JoltInvariants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ro-driven Move Invariant::targets() to a separate InvariantTargets trait with a default empty-set implementation. The #[invariant] macro now accepts target lists (e.g. #[invariant(Test, Fuzz)]) and generates the InvariantTargets impl and conditionally emits #[test] blocks only when Test is listed. The fuzz_invariant! macro asserts that the invariant includes SynthesisTarget::Fuzz at init time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- agent/mod.rs: AgentHarness trait, types, apply_diff, truncate - agent/claude.rs: ClaudeCodeAgent, worktree helpers - agent/mock.rs: MockAgent for testing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Don't pass --verbose with --output-format json — it turns stdout into a JSONL event stream that can't be parsed as a single result object. Also parse the JSON envelope even on non-zero exit codes, since Claude may still produce structured_output on soft errors like max_turns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Also fix envelope parsing fallback: when extract_json finds the full envelope JSON from markdown, try parsing it as an envelope first to correctly extract the counterexample field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The soundness invariant lets a red-team agent produce a unified diff against guest-sandbox/, which is applied in-place (with RAII revert), compiled via jolt build, proved honestly, then verified against the agent's dishonest (output, panic) claim. The verifier must reject. Adds TestCase::verify_with_claims to override output bytes and panic flag independently. Patch filtering drops hunks with '..' paths to prevent sandbox escapes. Includes filter_patch unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Change Invariant::check to return Result<(), CheckError> where CheckError has Violation and InvalidInput variants. InvalidInput is silently skipped in fuzz targets, macro-generated tests, and run_checks; recorded as a failed attempt in redteam. Also refactor guests.rs into guests/ directory (mod.rs + catalog.rs) and add GuestMemoryConfig to SoundnessInput with validation limits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add guest-sandbox to workspace members so jolt build can find it. Fix guest to return u32 (Vec<u8> lacks Serialize in no_std). Wrap Program::build in catch_unwind to handle compilation panics. Add tests for memory config validation, patch filtering, and the full compile→prove→verify path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-04T03:11:50Z

Warning

This PR has more than 500 changed lines and does not include a spec.

Large features and architectural changes benefit from a spec-driven workflow.
See CONTRIBUTING.md for details on how to create a spec.

If this PR is a bug fix, refactor, or doesn't warrant a spec, feel free to ignore this message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

False positive — libfuzzer-sys is used via the fuzz_invariant! macro which expands to libfuzzer_sys::fuzz_target!. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each OptimizationObjective now declares diff_paths() specifying which file paths an optimizer may modify. ObjectiveFunction::diff_scope() computes the union of all input objectives' paths as DiffScope::Include. - Static analysis + binding objectives: jolt-core/ - NaiveSortTime: jolt-eval/src/sort_targets.rs - Remove diff_scope from OptimizeConfig (now derived automatically) - Fix unwrap() → unwrap_or(f64::INFINITY) in const ObjectiveFunctions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add description() to StaticAnalysisObjective, PerformanceObjective, OptimizationObjective - Optimize prompt: add "What you are optimizing" section with descriptions and units - Optimize prompt: show units alongside measurement values - Optimize prompt: include first 500 chars of each past attempt's diff - Optimize prompt: add anti-repetition steering after previous attempts - Optimize prompt: derive targeted reading guidance from objective's diff_paths - Redteam prompt: always include JSON schema (not just when seed example exists) - Redteam prompt: truncate approach text to 200 chars in previous attempts - Redteam prompt: show "Iteration {current}/{total}" counter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add description() with default (returns name()) to the Objective trait. Each concrete type now owns its description, and the enums delegate instead of hardcoding strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the manual match statement in the redteam binary with a red_team() method on JoltInvariants that uses the dispatch! macro, consistent with name(), description(), targets(), and run_checks(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…normalization Objectives have different units and magnitudes (e.g. LLOC ~5500 lines vs cognitive complexity ~4.0), making weighted combinations unwieldy. - Add baseline() to StaticAnalysisObjective, PerformanceObjective, and OptimizationObjective returning a reference scale - Add normalized(obj, measurements) free function: value / baseline, yielding a dimensionless ratio where 1.0 ≈ current typical value - Add test demonstrating a balanced 50/50 weighted composite that treats a 10% improvement in any objective equally Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace hardcoded baseline() values with dynamic baselines captured from the first env.measure() call at the start of auto_optimize. - ObjectiveFunction.evaluate now takes (measurements, baselines) - normalized(obj, measurements, baselines) divides by the baseline measurement instead of a hardcoded constant - Remove baseline() from all objective enums - auto_optimize passes initial measurements as baselines to evaluate - Single-objective functions ignore baselines (use |m, _|) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces a new jolt-eval crate that provides mechanically checkable invariants and measurable objectives for AI-assisted red-teaming and iterative optimization of the Jolt zkVM codebase, with supporting CLIs and benchmark/fuzz synthesis tooling.

Changes:

Adds an invariant framework (incl. #[invariant(...)] proc macro) plus red-team and fuzz harnesses.
Adds an objective framework spanning static analysis and Criterion-backed performance measurement, plus an optimization loop (optimize).
Adds end-to-end sort targets/tests and a sync_targets.sh helper to keep fuzz/bench targets in sync.

Reviewed changes

Copilot reviewed 50 out of 55 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
jolt-eval/sync_targets.sh	Generates/removes fuzz targets and syncs bench entries in Cargo manifests.
jolt-eval/src/sort_targets.rs	Adds naive and intentionally-buggy sort functions as e2e targets.
jolt-eval/src/sort_e2e.rs	Implements red-team + optimize e2e harnesses and tests around sorting.
jolt-eval/src/objective/synthesis.rs	Adds `bench_objective!` macro to generate Criterion harnesses.
jolt-eval/src/objective/performance/prover_time.rs	Adds a prover-time objective implementation for guest programs.
jolt-eval/src/objective/performance/mod.rs	Exposes performance objective modules.
jolt-eval/src/objective/performance/binding.rs	Adds polynomial binding performance objectives and keys.
jolt-eval/src/objective/optimize.rs	Implements iterative AI optimization loop with git accept/reject behavior.
jolt-eval/src/objective/objective_fn/mod.rs	Adds registered `ObjectiveFunction`s and diff scoping.
jolt-eval/src/objective/mod.rs	Defines Objective traits and type-safe objective keys/enums.
jolt-eval/src/objective/code_quality/mod.rs	Exposes static-analysis objective modules.
jolt-eval/src/objective/code_quality/lloc.rs	Implements LLOC static analysis objective via `rust-code-analysis`.
jolt-eval/src/objective/code_quality/halstead_bugs.rs	Implements Halstead “bugs” static analysis objective.
jolt-eval/src/objective/code_quality/cognitive.rs	Implements cognitive complexity static analysis objective.
jolt-eval/src/lib.rs	Adds crate root exports and module wiring.
jolt-eval/src/invariant/synthesis/redteam.rs	Implements structured JSON red-team loop and prompt construction.
jolt-eval/src/invariant/synthesis/mod.rs	Exposes invariant synthesis submodules.
jolt-eval/src/invariant/synthesis/fuzz.rs	Adds `fuzz_invariant!` macro to create libFuzzer targets.
jolt-eval/src/invariant/split_eq_bind.rs	Adds split-eq binding invariants with Test/Fuzz/RedTeam targets.
jolt-eval/src/invariant/soundness.rs	Adds a soundness red-team invariant using sandbox patching + prove/verify.
jolt-eval/src/invariant/mod.rs	Defines core invariant traits, error types, and invariant registry/dispatch.
jolt-eval/src/invariant/macro_tests.rs	Adds invariants used to test macro synthesis behavior.
jolt-eval/src/guests/sha2_chain.rs	Adds SHA-2 chain guest config for prover-time benchmarking.
jolt-eval/src/guests/secp256k1_ecdsa.rs	Adds secp256k1 ECDSA verify guest config for benchmarking.
jolt-eval/src/guests/mod.rs	Adds guest plumbing for proving/verifying and guest config trait.
jolt-eval/src/guests/fibonacci.rs	Adds Fibonacci guest config for benchmarking.
jolt-eval/src/agent/tests.rs	Adds extensive tests for agent harness + optimize/redteam loops.
jolt-eval/src/agent/mod.rs	Adds agent harness trait, diff scoping, and diff application helper.
jolt-eval/src/agent/mock.rs	Adds a mock agent for deterministic testing.
jolt-eval/src/agent/claude.rs	Adds a Claude Code CLI-backed agent using isolated git worktrees.
jolt-eval/README.md	Documents the jolt-eval concepts, invariants, objectives, and CLIs.
jolt-eval/macros/src/lib.rs	Implements `#[invariant(...)]` proc macro (targets + optional tests).
jolt-eval/macros/Cargo.toml	Declares the proc-macro crate and dependencies.
jolt-eval/guest-sandbox/src/main.rs	Adds guest sandbox entrypoint.
jolt-eval/guest-sandbox/src/lib.rs	Adds default sandbox provable function for soundness red-teaming.
jolt-eval/guest-sandbox/README.md	Documents the sandbox and patch filtering safety model.
jolt-eval/guest-sandbox/Cargo.toml	Declares the sandbox guest package.
jolt-eval/guest-sandbox/.gitignore	Ignores sandbox build output.
jolt-eval/fuzz/fuzz_targets/split_eq_bind_low_high.rs	Adds fuzz target for split-eq bind Low→High invariant.
jolt-eval/fuzz/fuzz_targets/split_eq_bind_high_low.rs	Adds fuzz target for split-eq bind High→Low invariant.
jolt-eval/fuzz/Cargo.toml	Adds fuzz workspace crate and bin targets for fuzzing invariants.
jolt-eval/fuzz/.gitignore	Ignores fuzz build artifacts/corpus.
jolt-eval/Cargo.toml	Declares jolt-eval crate deps, bins, and bench targets.
jolt-eval/bin/redteam.rs	Adds `redteam` CLI for AI red-team sessions.
jolt-eval/bin/optimize.rs	Adds `optimize` CLI for AI optimization runs against objective functions.
jolt-eval/bin/measure_objectives.rs	Adds `measure-objectives` CLI for static + perf objective measurement.
jolt-eval/benches/prover_time_sha2_chain.rs	Adds Criterion bench harness for prover time (SHA2 chain).
jolt-eval/benches/prover_time_secp256k1_ecdsa_verify.rs	Adds Criterion bench harness for prover time (ECDSA verify).
jolt-eval/benches/prover_time_fibonacci.rs	Adds Criterion bench harness for prover time (Fibonacci).
jolt-eval/benches/bind_parallel_low_to_high.rs	Adds Criterion bench harness for binding (Low→High).
jolt-eval/benches/bind_parallel_high_to_low.rs	Adds Criterion bench harness for binding (High→Low).
Cargo.toml	Adds new workspace members and excludes fuzz sub-workspace.
.gitignore	Ignores `.omc/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jolt-eval/src/sort_e2e.rs

jolt-eval/src/objective/performance/prover_time.rs

jolt-eval/src/objective/code_quality/lloc.rs

jolt-eval/src/objective/code_quality/cognitive.rs

jolt-eval/src/objective/mod.rs

jolt-eval/src/objective/optimize.rs

jolt-eval/sync_targets.sh

jolt-eval/src/agent/mod.rs

…tterns - Delete all section separator comments (// ===, // ---, // ──) - Simplify StaticAnalysisObjective::diff_paths (all variants return the same value, no match needed) - Extract read_criterion_estimate into objective::performance with baseline parameter, removing duplicate in both binaries - Replace manual diff_paths dedup in prompt builder with objective.diff_scope() call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The SortOptimizeEnv previously used an in-process function pointer swap to simulate optimization (can't recompile at runtime). Now it shells out to `cargo run --bin optimize -- --measure`, which recompiles the modified sort_targets.rs and measures the actual modified code. - Add --measure flag to optimize binary: measures objective inputs and prints JSON to stdout - Register MINIMIZE_NAIVE_SORT_TIME as a proper ObjectiveFunction - Add measure_perf() helper for per-variant perf measurement (Criterion for bindings, direct timing for NaiveSortTime) - SortOptimizeEnv::measure() shells out to the binary - SortOptimizeEnv::apply_diff() just patches the file (no fn swap) - Replace mock tests with MockEnv (predetermined measurements) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the NaiveSortTime fieldless variant and SortOptimizeEnv subprocess hack with a standard Criterion-based performance objective: - Add NaiveSortObjective in sort_targets.rs implementing Objective with setup()/run() — same pattern as BindLowToHighObjective - Add benches/naive_sort_time.rs using bench_objective! macro - NaiveSortTime now wraps NaiveSortObjective (data-carrying variant) - Include NaiveSortTime in PerformanceObjective::all() - Delete SortOptimizeEnv, run_optimize_test, and mock optimize tests from sort_e2e.rs (redundant with agent/tests.rs) - Rewrite --test in optimize binary to use RealEnv directly - RealEnv::measure() handles naive_sort via cargo bench like all other perf objectives — no special-casing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

socket-security · 2026-04-04T18:19:02Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	schemars@0.8.22
	enumset@1.1.10
	num-derive@0.3.3
	rust-code-analysis@0.0.24

View full report

…--measure Move NaiveSortObjective from sort_targets.rs (which the optimizer agent can modify) to objective/performance/naive_sort.rs (which it cannot). This prevents the agent from cheating by modifying the objective itself. Also remove --measure flag from optimize binary — no longer needed now that the sort objective is a proper Criterion benchmark measured via RealEnv like all other perf objectives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…sage The error reported input/output/stack<=MAX_STACK_SIZE but input and output have their own limits (MAX_INPUT_SIZE, MAX_OUTPUT_SIZE). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix ObjectiveFunction example: update evaluate closure signature to |m, _b| and add missing `normalized` to the use statement - Clarify that prover_time_* benchmarks are standalone Criterion targets not tracked by optimize/measure-objectives binaries - Update guest-sandbox/README to reflect actual behavior: hash guest, in-place patching with PatchGuard git-checkout revert - Add git clean -fd after git checkout . in both rejection branches of auto_optimize so agent-added untracked files don't contaminate subsequent iterations - Capture stderr in apply_diff and include it in the AgentError message Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The harness applies patches in-place with a PatchGuard RAII guard (not copying to a temp dir), and the default guest is a wrapping hash (not an identity function). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ctory Each auto-optimize iteration now writes diff.patch, response.md, measurements.json, and status.json to jolt-eval/optimize-history/{objective}/{attempt-N}/. The baseline is persisted to attempt-0/baseline/. The prompt's previous-attempts section is slimmed down to one line per attempt pointing at the directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tory Mirror the optimize-history pattern for the redteam loop: - Write approach.md and failure_reason.txt to jolt-eval/redteam-history/{invariant}/attempt-{N}/ - Add path: Option<String> to FailedAttempt - Slim the prompt to show description + failure reason inline, with a pointer to the attempt directory for the full approach - Add redteam-history/ to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Code quality objectives now store target_dir (e.g. "jolt-core/src") instead of a leaked root PathBuf. The repo root is computed at runtime via env!("CARGO_MANIFEST_DIR").parent(). This eliminates all Box::leak calls from the code quality objectives and removes the root parameter from StaticAnalysisObjective::all() and OptimizationObjective::all(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Allows computed descriptions that include target_dir, e.g.: "Total logical lines of code in jolt-core/src/" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Required by the simple form of bench_objective! which calls <Type>::default(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

moodlezoup and others added 30 commits March 31, 2026 18:38

docs: add README for jolt-eval crate

ea99ac8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cleanup(jolt-eval): remove truncation of agent responses in auto_redteam

9b52b86

The full text belongs in FailedAttempt.approach so callers can decide how to display it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(jolt-eval): add Fuzzing section to README, remove Crate structure

5cd0271

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(jolt-eval): split agent.rs into agent/ directory

291f98a

- agent/mod.rs: AgentHarness trait, types, apply_diff, truncate - agent/claude.rs: ClaudeCodeAgent, worktree helpers - agent/mock.rs: MockAgent for testing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the no-spec PR has no spec file label Apr 4, 2026

moodlezoup and others added 9 commits April 3, 2026 23:12

fix(jolt-eval): typo unparseable → unparsable

87df638

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(jolt-eval): ignore libfuzzer-sys in cargo-machete

18ecdc2

False positive — libfuzzer-sys is used via the fuzz_invariant! macro which expands to libfuzzer_sys::fuzz_target!. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

delete spec

ad59fea

moodlezoup requested review from Copilot and sagar-a16z April 4, 2026 17:23

Copilot started reviewing on behalf of moodlezoup April 4, 2026 17:24 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

moodlezoup and others added 3 commits April 4, 2026 13:35

moodlezoup and others added 11 commits April 4, 2026 14:34

fix(jolt-eval): correct limit names in soundness validation error mes…

ab555b5

…sage The error reported input/output/stack<=MAX_STACK_SIZE but input and output have their own limits (MAX_INPUT_SIZE, MAX_OUTPUT_SIZE). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clean up unnecessary fields in OptimizationAttempt

3ae9531

refactor(jolt-eval): Objective::description returns String

bef916d

Allows computed descriptions that include target_dir, e.g.: "Total logical lines of code in jolt-core/src/" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Manual cleanup

ef9887c

fix(jolt-eval): restore Default derive on perf objective structs

3f84c42

Required by the simple form of bench_objective! which calls <Type>::default(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(jolt-eval): invariant and objective framework for AI-assisted development#1406

feat(jolt-eval): invariant and objective framework for AI-assisted development#1406
moodlezoup wants to merge 87 commits intomainfrom
feat/invariants-and-objectives

moodlezoup commented Apr 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

socket-security bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moodlezoup commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

socket-security bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moodlezoup commented Apr 4, 2026 •

edited

Loading

socket-security bot commented Apr 4, 2026 •

edited

Loading