Md Rakib Hossain Misu*, Iris Ma, Cristina V. Lopes
This repository contains the implementation of our paper's methodology
VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications
- baselines — implementation/execution scripts of classical (Daikon, Houdini) vs. prompt-based (SpecGen, AutoSpec, FormalBench) approaches.
- optimizer — uses structured OpenJML feedback to iteratively refine prompts.
- spec_harness — evaluates correctness and completeness of verifier-accepted specifications beyond syntactic verification.
- veriact — an agentic loop that combines code execution, OpenJML verification, and Spec-Harness feedback to synthesize formal specifications.
- benchmarks — Two normalized benchmarks are used across experiments.
For full details, see the paper.
- Python >= 3.10
- OpenJML — must be installed and available in
PATHasopenjml - Check
openjml --version
git clone https://github.com/Mondego/VeriAct.git
cd VeriAct
uv sync --group all
source .venv/bin/activateCreate a .env file in config` with the keys for the models you intend to use:
API Keys
OPENAI_API_KEY=... # GPT-4o
ANTHROPIC_API_KEY=... # Claude models
GOOGLE_API_KEY=... # Gemini models
DEEPSEEK_API_KEY=... # DeepSeek API
MISTRAL_API_KEY=... # Mistral API
VLLM_API_KEY=... # Local vLLM server
VLLM_API_BASE=... # e.g. http://localhost:8000/v1Note:
Run Daikon
Note: Install Daikon and configure required jars in the
PATH
python -m baselines.daikon.run \
--name <experiment_name> \
--input <path/to/benchmark.json> \
--output <output_dir> \
--openjml_timeout 300 \
--daikon_timeout 600 \
--threads 4 \
--verboseRun Houdini
Note: java version:
1.6.0_21requires to run Houdini
python -m baselines.houdini.run \
--name <experiment_name> \
--input <path/to/benchmark.json> \
--output <output_dir> \
--openjml_timeout 300 \
--threads 4 \
--verboseRun SpecGen
python -m baselines.specgen.run \
--name <experiment_name> \
--input <path/to/benchmark.json> \
--output <output_dir> \
--model gpt-4o \
--temperature 0.7 \
--prompt_type zero_shot \
--max_iterations 10 \
--openjml_timeout 300 \
--threads 4 \
--verboseNote:
prompt_type:zero_shot|two_shot|four_shot
Run AutoSpec
python -m baselines.autospec.run \
--name <experiment_name> \
--input <path/to/benchmark.json> \
--output <output_dir> \
--model gpt-4o \
--temperature 0.7 \
--prompt_type zero_shot \
--max_iterations 10 \
--openjml_timeout 300 \
--threads 4 \
--verbose \
--simplifyNote:
prompt_type:zero_shot|two_shot|four_shot
Run FormalBench
python -m baselines.formalbench.run \
--name <experiment_name> \
--input <path/to/benchmark.json> \
--output <output_dir> \
--model gpt-4o \
--temperature 0.7 \
--prompt_type zero_shot \
--max_iterations 5 \
--openjml_timeout 300 \
--threads 4 \
--verboseNote:
prompt_type:zero_shot|two_shot|zs_cot|fs_cot|fs_ltm
Run Optimizer
python -m optimizer.prompt_optimizer.py \
--formalbench_path benchmarks/formalbench/fb.json \
--specgenbench_path benchmarks/specgenbench/sgb.json \
--best_seed zero \
--model openai/gpt-4o \
--reflection_model openai/gpt-4o \
--log_dir optimizer_logs \
--output_dir optimizer_results \
--openjml_output_dir openjml_outputNote:
--best_seed:zero|cot|ltm
Run Spec-Harness
Note: Note
Run VeriAct
# Sequential (one task at a time)
python -m veriact.run_single.py \
--benchmark benchmarks/specgenbench/sgb.json \
--model gpt-4o \
--output-dir veriact_output \
--max-steps 12 \
--planning_interval 3
# Parallel
python -m veriact.run_batch.py \
--benchmark benchmarks/specgenbench/sgb.json \
--model gpt-4o \
--threads 4 \
--output-dir veriact_output \
--max-steps 12 \
--planning_interval 3Note: Note
├── baselines # All baselines implementation
│ ├── houdini # Classic, template based approach
│ ├── daikon # Classic, execution trace based approach
│ ├── specgen # Prompt based, spec mutation
│ ├── autospec # Prompt based, program decomposing and static analysis
│ ├── formalbench # Prompt based, advance prompting with repair guidance
├── benchmarks # Normalized benchamrk datasets
│ └── specgenbench # 120 tasks from Leetcode and JML examples
│ ├── formalbench # 662 tasks from FormalBench and MBJP
├── config # API keys in .env file
├── optimizer # Prompt optimizer implementation
├── spec_harness # Spec-Harness implementation
├── veriact # VeriAct agent implementation
├── scripts # CLI run scripts for all baselines, veriact
├── pyproject.toml
├── README.md
@misc{mrhmisu/veriact2026,
title={VeriAct: Beyond Verifiability Agentic Synthesis of Correct and Complete Formal Specifications},
author={Md Rakib Hossain, Iris Ma, and Cristina V. Lopes},
year={2026},
eprint={2604.00280},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/pdf/2604.00280},
}If you have any questions or find any issues, please contact us at mdrh@uci.edu
This repository is licensed under GNU General Public License v3.0