Skip to content

Mondego/VeriAct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VeriAct: Beyond Verifiability

arXiv License Python

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

Md Rakib Hossain Misu*, Iris Ma, Cristina V. Lopes

✨ Overview

This repository contains the implementation of our paper's methodology

VeriAct: Beyond Verifiability -- Agentic Synthesis of Correct and Complete Formal Specifications

💎 Key Components

  • baselines — implementation/execution scripts of classical (Daikon, Houdini) vs. prompt-based (SpecGen, AutoSpec, FormalBench) approaches.
  • optimizer — uses structured OpenJML feedback to iteratively refine prompts.
  • spec_harness — evaluates correctness and completeness of verifier-accepted specifications beyond syntactic verification.
  • veriact — an agentic loop that combines code execution, OpenJML verification, and Spec-Harness feedback to synthesize formal specifications.
  • benchmarks — Two normalized benchmarks are used across experiments.

For full details, see the paper.

🚀 Getting Started

💻 Prerequisites

  • Python >= 3.10
  • OpenJML — must be installed and available in PATH as openjml
  • Check openjml --version

⏳ Install

git clone https://github.com/Mondego/VeriAct.git
cd VeriAct
uv sync --group all 
source .venv/bin/activate

🔑 API Keys

Create a .env file in config` with the keys for the models you intend to use:

API Keys
OPENAI_API_KEY=...       # GPT-4o 
ANTHROPIC_API_KEY=...    # Claude models
GOOGLE_API_KEY=...       # Gemini models
DEEPSEEK_API_KEY=...     # DeepSeek API
MISTRAL_API_KEY=...      # Mistral API
VLLM_API_KEY=...         # Local vLLM server
VLLM_API_BASE=...        # e.g. http://localhost:8000/v1

Note:

▶️ Run Command

Run Daikon

Note: Install Daikon and configure required jars in the PATH

python -m baselines.daikon.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --openjml_timeout 300 \
    --daikon_timeout 600 \
    --threads 4 \
    --verbose
Run Houdini

Note: java version: 1.6.0_21 requires to run Houdini

python -m baselines.houdini.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose
Run SpecGen
python -m baselines.specgen.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 10 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Note: prompt_type: zero_shot | two_shot | four_shot

Run AutoSpec
python -m baselines.autospec.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 10 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose \
    --simplify

Note: prompt_type: zero_shot | two_shot | four_shot

Run FormalBench
python -m baselines.formalbench.run \
    --name <experiment_name> \
    --input <path/to/benchmark.json> \
    --output <output_dir> \
    --model gpt-4o \
    --temperature 0.7 \
    --prompt_type zero_shot \
    --max_iterations 5 \
    --openjml_timeout 300 \
    --threads 4 \
    --verbose

Note: prompt_type: zero_shot | two_shot | zs_cot | fs_cot | fs_ltm

Run Optimizer
python -m optimizer.prompt_optimizer.py \
    --formalbench_path benchmarks/formalbench/fb.json \
    --specgenbench_path benchmarks/specgenbench/sgb.json \
    --best_seed zero \           
    --model openai/gpt-4o \
    --reflection_model openai/gpt-4o \
    --log_dir optimizer_logs \
    --output_dir optimizer_results \
    --openjml_output_dir openjml_output

Note: --best_seed: zero | cot | ltm

Run Spec-Harness

Note: Note

Run VeriAct
# Sequential (one task at a time)
python -m veriact.run_single.py \
    --benchmark benchmarks/specgenbench/sgb.json \
    --model gpt-4o \
    --output-dir veriact_output \
    --max-steps 12 \
    --planning_interval 3

# Parallel
python -m veriact.run_batch.py \
    --benchmark benchmarks/specgenbench/sgb.json \
    --model gpt-4o \
    --threads 4 \
    --output-dir veriact_output \
    --max-steps 12 \
    --planning_interval 3

Note: Note

📁 Repository Structure

├── baselines               # All baselines implementation
│   ├── houdini             # Classic, template based approach 
│   ├── daikon              # Classic, execution trace based approach
│   ├── specgen             # Prompt based, spec mutation 
│   ├── autospec            # Prompt based, program decomposing and static analysis
│   ├── formalbench         # Prompt based, advance prompting with repair guidance
├── benchmarks              # Normalized benchamrk datasets
│   └── specgenbench        # 120 tasks from Leetcode and JML examples
│   ├── formalbench         # 662 tasks from FormalBench and MBJP
├── config                  # API keys in .env file
├── optimizer               # Prompt optimizer implementation
├── spec_harness            # Spec-Harness implementation
├── veriact                 # VeriAct agent implementation
├── scripts                 # CLI run scripts for all baselines, veriact
├── pyproject.toml
├── README.md

📝 Citation

@misc{mrhmisu/veriact2026,
      title={VeriAct: Beyond Verifiability  Agentic Synthesis of Correct and Complete Formal Specifications},
      author={Md Rakib Hossain, Iris Ma, and Cristina V. Lopes},
      year={2026},
      eprint={2604.00280},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/pdf/2604.00280},
}

📧 Contact

If you have any questions or find any issues, please contact us at mdrh@uci.edu

📄 License

This repository is licensed under GNU General Public License v3.0

About

Verification-guided complete & correct formal specification synthesis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors