VeriAct: Beyond Verifiability

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

Md Rakib Hossain Misu*, Iris Ma, Cristina V. Lopes

✨ Overview

This repository contains the implementation of our paper's methodology

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

💎 Key Components

baselines — implementation/execution scripts of classical (Daikon, Houdini) vs. prompt-based (SpecGen, AutoSpec, FormalBench) approaches.
optimizer — uses structured OpenJML feedback to iteratively refine prompts.
spec_harness — evaluates correctness and completeness of verifier-accepted specifications beyond syntactic verification.
veriact — an agentic loop that combines code execution, OpenJML verification, and Spec-Harness feedback to synthesize formal specifications.
benchmarks — Two normalized benchmarks are used across experiments.

For full details, see the paper.

🚀 Getting Started

💻 Prerequisites

Python >= 3.10
OpenJML — must be installed and available in PATH as openjml
Check openjml --version

⏳ Install

git clone https://github.com/Mondego/VeriAct.git
cd VeriAct
uv sync --group all 
source .venv/bin/activate

🔑 API Keys

Create a .env file in config` with the keys for the models you intend to use:

API Keys

OPENAI_API_KEY=...       # GPT-4o 
ANTHROPIC_API_KEY=...    # Claude models
GOOGLE_API_KEY=...       # Gemini models
DEEPSEEK_API_KEY=...     # DeepSeek API
MISTRAL_API_KEY=...      # Mistral API
VLLM_API_KEY=...         # Local vLLM server
VLLM_API_BASE=...        # e.g. http://localhost:8000/v1

Note:

▶️ Run Command

Run Daikon

Note: Install Daikon and configure required jars in the PATH

python -m baselines.daikon.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --openjml_timeout 300 \
    --daikon_timeout 600 \
    --threads 4 \
    --verbose

Run Houdini

Note: java version: 1.6.0_21 requires to run Houdini

python -m baselines.houdini.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Run SpecGen

python -m baselines.specgen.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 10 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Note: prompt_type: zero_shot | two_shot | four_shot

Run AutoSpec

python -m baselines.autospec.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 10 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose \
    --simplify

Note: prompt_type: zero_shot | two_shot | four_shot

Run FormalBench

python -m baselines.formalbench.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 5 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Note: prompt_type: zero_shot | two_shot | zs_cot | fs_cot | fs_ltm

Run Optimizer

python -m optimizer.prompt_optimizer.py \
    --formalbench_path benchmarks/formalbench/fb.json \
    --specgenbench_path benchmarks/specgenbench/sgb.json \
    --best_seed zero \           
    --model openai/gpt-4o \
    --reflection_model openai/gpt-4o \
    --log_dir optimizer_logs \
    --output_dir optimizer_results \
    --openjml_output_dir openjml_output

Note: --best_seed: zero | cot | ltm

Run Spec-Harness

Note: Note

Run VeriAct

# Sequential (one task at a time)
python -m veriact.run_single.py \
    --benchmark benchmarks/specgenbench/sgb.json \
    --model gpt-4o \
    --output-dir veriact_output \
    --max-steps 12 \
    --planning_interval 3

# Parallel
python -m veriact.run_batch.py \
    --benchmark benchmarks/specgenbench/sgb.json \
    --model gpt-4o \
    --threads 4 \
    --output-dir veriact_output \
    --max-steps 12 \
    --planning_interval 3

Note: Note

📁 Repository Structure

├── baselines               # All baselines implementation
│   ├── houdini             # Classic, template based approach 
│   ├── daikon              # Classic, execution trace based approach
│   ├── specgen             # Prompt based, spec mutation 
│   ├── autospec            # Prompt based, program decomposing and static analysis
│   ├── formalbench         # Prompt based, advance prompting with repair guidance
├── benchmarks              # Normalized benchamrk datasets
│   └── specgenbench        # 120 tasks from Leetcode and JML examples
│   ├── formalbench         # 662 tasks from FormalBench and MBJP
├── config                  # API keys in .env file
├── optimizer               # Prompt optimizer implementation
├── spec_harness            # Spec-Harness implementation
├── veriact                 # VeriAct agent implementation
├── scripts                 # CLI run scripts for all baselines, veriact
├── pyproject.toml
├── README.md

📝 Citation

@misc{mrhmisu/veriact2026,
      title={VeriAct: Beyond Verifiability  Agentic Synthesis of Correct and Complete Formal Specifications},
      author={Md Rakib Hossain, Iris Ma, and Cristina V. Lopes},
      year={2026},
      eprint={2604.00280},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/pdf/2604.00280},
}

📧 Contact

If you have any questions or find any issues, please contact us at mdrh@uci.edu

📄 License

This repository is licensed under GNU General Public License v3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VeriAct: Beyond Verifiability

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

✨ Overview

💎 Key Components

🚀 Getting Started

💻 Prerequisites

⏳ Install

🔑 API Keys

▶️ Run Command

📁 Repository Structure

📝 Citation

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
baselines		baselines
benchmarks		benchmarks
config		config
optimizer		optimizer
scripts		scripts
spec_harness		spec_harness
veriact		veriact
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

VeriAct: Beyond Verifiability

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

✨ Overview

💎 Key Components

🚀 Getting Started

💻 Prerequisites

⏳ Install

🔑 API Keys

▶️ Run Command

📁 Repository Structure

📝 Citation

📧 Contact

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages