A comprehensive toolkit for automated bug localization and repair, featuring enhanced MultiSWE-Bench evaluation capabilities.
- Localization: Automatically identify bug locations in code
- Retrieval: Find relevant code context for bug repair
- Repair: Generate patches to fix identified bugs
- Flexible Model Support: Use any LLM model via custom API endpoints
- Local Dataset Support: Work with local SWE-bench datasets
- Prediction Format Converter: Convert model predictions to evaluation format
- Comprehensive Testing: Support for multiple programming languages
- Flexible Configuration: Easy-to-use configuration system
Agentless universe/
βββ Agentless/ # Main Agentless framework
β βββ agentless/
β β βββ fl/ # Fault localization
β β βββ repair/ # Bug repair
β β βββ test/ # Testing utilities
β β βββ util/ # Utilities
β βββ classification/ # Classification tools
βββ multi-swe-bench/ # MultiSWE-Bench evaluation
β βββ multi_swe_bench/
β β βββ harness/ # Evaluation harness
β β βββ collect/ # Data collection
β β βββ utils/ # Utilities
β βββ docs/ # Documentation
βββ convert_preds.py # Prediction format converter
- Python 3.8+
- Git
- Conda (recommended)
This project uses two separate environments for different purposes:
# Create conda environment for Agentless
conda create -n agentless python=3.10
conda activate agentless
# Install Agentless dependencies
pip install -r Agentless/requirements.txt# Create conda environment for MultiSWE-Bench
conda create -n multiswebench python=3.10
conda activate multiswebench
# Install MultiSWE-Bench dependencies
pip install -r multi-swe-bench/requirements.txtEnvironment: Use the agentless environment
# Activate Agentless environment
conda activate agentless
cd Agentless
# Set up environment
export PYTHONPATH=$PYTHONPATH:$(pwd)
export OPENAI_API_KEY="your-api-key-here"
# Create results directory
mkdir -p resultsThe Agentless framework follows a 3-stage localization process followed by repair and validation:
Step 1.1: LLM-based File Localization
python agentless/fl/localize.py --file_level \
--output_folder results/file_level \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--num_threads 10 \
--skip_existingStep 1.2: Identify Irrelevant Folders
python agentless/fl/localize.py --file_level \
--irrelevant \
--output_folder results/file_level_irrelevant \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--num_threads 10 \
--skip_existingStep 1.3: Embedding-based Retrieval
python agentless/fl/retrieve.py --index_type simple \
--filter_type given_files \
--filter_file results/file_level_irrelevant/loc_outputs.jsonl \
--output_folder results/retrieval_embedding \
--persist_dir embedding/swe-bench_simple \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--num_threads 10Step 1.4: Combine LLM and Retrieval Results
python agentless/fl/combine.py --retrieval_loc_file results/retrieval_embedding/retrieve_locs.jsonl \
--model_loc_file results/file_level/loc_outputs.jsonl \
--top_n 3 \
--output_folder results/file_level_combinedpython agentless/fl/localize.py --related_level \
--output_folder results/related_elements \
--top_n 3 \
--compress_assign \
--compress \
--start_file results/file_level_combined/combined_locs.jsonl \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--num_threads 10 \
--skip_existingStep 3.1: Generate Edit Location Samples
python agentless/fl/localize.py --fine_grain_line_level \
--output_folder results/edit_location_samples \
--top_n 3 \
--compress \
--temperature 0.8 \
--num_samples 4 \
--start_file results/related_elements/loc_outputs.jsonl \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--num_threads 10 \
--skip_existingStep 3.2: Separate Individual Edit Location Sets
python agentless/fl/localize.py --merge \
--output_folder results/edit_location_individual \
--top_n 3 \
--num_samples 4 \
--start_file results/edit_location_samples/loc_outputs.jsonlGenerate patches using the edit locations:
python agentless/repair/repair.py --loc_file results/edit_location_individual/loc_merged_0-0_outputs.jsonl \
--output_folder results/repair_sample_1 \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--loc_interval \
--top_n=3 \
--context_window=10 \
--max_samples 10 \
--cot \
--diff_format \
--gen_and_process \
--num_threads 2Repeat for all 4 edit location sets:
# For samples 1-4
for i in {1..4}; do
python agentless/repair/repair.py --loc_file results/edit_location_individual/loc_merged_$((i-1))-$((i-1))_outputs.jsonl \
--output_folder results/repair_sample_$i \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--loc_interval \
--top_n=3 \
--context_window=10 \
--max_samples 10 \
--cot \
--diff_format \
--gen_and_process \
--num_threads 2
doneStep 5.1: Generate Regression Tests
python agentless/test/run_regression_tests.py --run_id generate_regression_tests \
--output_file results/passing_tests.jsonlStep 5.2: Select Regression Tests
python agentless/test/select_regression_tests.py --passing_tests results/passing_tests.jsonl \
--output_folder results/select_regressionStep 5.3: Run Regression Tests on Patches
folder=results/repair_sample_1
for num in {0..9..1}; do
run_id_prefix=$(basename $folder)
python agentless/test/run_regression_tests.py --regression_tests results/select_regression/output.jsonl \
--predictions_path="${folder}/output_${num}_processed.jsonl" \
--run_id="${run_id_prefix}_regression_${num}" \
--num_workers 10
doneStep 5.4: Generate Reproduction Tests
python agentless/test/generate_reproduction_tests.py --max_samples 40 \
--output_folder results/reproduction_test_samples \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--num_threads 10Step 5.5: Execute Reproduction Tests
for st in {0..36..4}; do
en=$((st + 3))
echo "Processing ${st} to ${en}"
for num in $(seq $st $en); do
echo "Processing ${num}"
python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
--test_jsonl="results/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
--num_workers 6 \
--testing
done &
doneStep 5.6: Select Final Reproduction Tests
python agentless/test/generate_reproduction_tests.py --max_samples 40 \
--output_folder results/reproduction_test_samples \
--output_file reproduction_tests.jsonl \
--selectStep 5.7: Evaluate Patches on Reproduction Tests
folder=results/repair_sample_1
for num in {0..9..1}; do
run_id_prefix=$(basename $folder)
python agentless/test/run_reproduction_tests.py --test_jsonl results/reproduction_test_samples/reproduction_tests.jsonl \
--predictions_path="${folder}/output_${num}_processed.jsonl" \
--run_id="${run_id_prefix}_reproduction_${num}" \
--num_workers 10
doneStep 5.8: Final Patch Selection
python agentless/repair/rerank.py --patch_folder results/repair_sample_1/,results/repair_sample_2/,results/repair_sample_3/,results/repair_sample_4/ \
--num_samples 40 \
--deduplicate \
--regression \
--reproductionFor testing with a single target:
# Localization
python agentless/fl/localize.py --output_folder results/localization \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--target_id your-target-id
# Retrieval
python agentless/fl/retrieve.py --output_folder results/retrieval \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai
# Repair
python agentless/repair/repair.py --loc_file results/localization/loc_outputs.jsonl \
--output_folder results/repair \
--local_dataset /path/to/local_dataset.jsonl \
--model your-model-name \
--backend openai \
--target_id your-target-idEnvironment: Use the multiswebench environment
# Activate MultiSWE-Bench environment
conda activate multiswebench# This can be run in either environment
python convert_preds.py input_predictions.jsonl output_patches.jsonlcd multi-swe-bench
python -m multi_swe_bench.harness.run_evaluation \
--config config_example.jsonThis project uses two separate conda environments to avoid dependency conflicts:
agentless: For running Agentless bug repair tasksmultiswebench: For running MultiSWE-Bench evaluation tasks
# For Agentless tasks
conda activate agentless
# For MultiSWE-Bench evaluation
conda activate multiswebenchAll scripts now support flexible model configuration:
- Model: Any model name supported by your backend
- Backend:
openai,deepseek,anthropic, or custom endpoints - API Endpoints: Custom base URLs and API keys
export OPENAI_API_BASE="https://your-api-endpoint.com/v1"
export OPENAI_API_KEY="your-api-key"- SWE-bench: Standard SWE-bench dataset format
- Predictions: Model prediction format with
instance_idandmodel_patch - Local Datasets: JSONL format for local dataset files
- Patches: Git diff format patches
- Evaluations: JSON reports with success/failure metrics
- Logs: Detailed execution logs
- β No Model Restrictions: Use any LLM model
- β Custom API Support: Support for custom API endpoints
- β Local Dataset Support: Work with local datasets
- β Flexible Backend: Support for multiple API providers
- β Standalone Converter: Independent prediction format converter
- β Comprehensive Documentation: Clear usage instructions
- β Example Configurations: Ready-to-use configuration files
To measure the cost of running Agentless, use the provided cost analysis utility:
# Calculate cost for any step's output
python dev/util/cost.py --output_file results/step_name/output.jsonl
# Include embedding costs
python dev/util/cost.py --output_file results/step_name/output.jsonl --embedding_costThis will output the dollar cost and token usage for each step.
loc_outputs.jsonl: Contains localization results with file paths and edit locationsoutput.jsonl: Contains generated patches and repair trajectoriesall_preds.jsonl: Final selected patches ready for evaluation*_test_results.jsonl: Test execution results for validation
results/
βββ file_level/ # Stage 1.1: LLM file localization
βββ file_level_irrelevant/ # Stage 1.2: Irrelevant folder identification
βββ retrieval_embedding/ # Stage 1.3: Embedding-based retrieval
βββ file_level_combined/ # Stage 1.4: Combined file locations
βββ related_elements/ # Stage 2: Related element localization
βββ edit_location_samples/ # Stage 3.1: Edit location samples
βββ edit_location_individual/ # Stage 3.2: Individual edit location sets
βββ repair_sample_1-4/ # Stage 4: Repair results (4 samples)
βββ passing_tests.jsonl # Stage 5.1: Generated regression tests
βββ select_regression/ # Stage 5.2: Selected regression tests
βββ reproduction_test_samples/ # Stage 5.4-5.6: Reproduction test generation
βββ all_preds.jsonl # Final output: Selected patches
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Original Agentless framework
- MultiSWE-Bench evaluation framework
- SWE-bench dataset contributors