pip install -e .We provide an example input file data/examples/dataset.jsonl and a run config data/examples/config.json in examples to help you quickly go through the launch process.
Before getting started, please set your TAVILY_API_KEY environment variable. We use tavily for LLM search engine support.
export TAVILY_API_KEY=...We use LiteLLM for max compatibility of LLM API, AND to enable custom API deployment for agentic training. Export your LLM API KEY, say OPENAI_API_KEY, ANTHROPIC_API_KEY...
export OPENAI_API_KEY=...If your llm provider uses user identity login for API usage, go to modify launch/launch/utilities/llm.py.
Start repo launch process:
launch data/examples/config.json
# equivalently: python -m launch.run --config-path data/examples/config.jsonFor helpers to run RepoLaunch on Windows, see Development-Windows.md
For the input data used to set up the environment, we require the following fields:
| Field | Description |
|---|---|
instance_id |
Unique identifier of the instance |
repo |
Full name of the repository like {user_name}/{project_name} |
base_commit |
Commit to check out |
language |
Main language of the repo |
created_at |
(Optional) Creation time of the instance, used to support time-aware environment setup, useful in Python |
hints |
(Optional) Any hints for setting up the repo you want to give the agent, such as GitHub run checks info |
RepoLaunch is a two step process, the first step is to setup the repo, installing dependencies, build the repo and find test cases to test the build of the repo. The following configs are required.
| Field | Type | Description |
|---|---|---|
print_to_console |
boolean | Whether to print logs to console |
model_config |
dict | Put all arguments for litellm response completion in this dict {"model": "openai/gpt-5.4", ...}. The "model" field should follow formats in litellm document, usually {provider_name}/{model_name}. Put other arguments for litellm response completion here, such as base_url, temperature, top_p. |
workspace_root |
string | Workspace folder for one run |
dataset |
string | Path to the dataset file |
instance_id |
string | Specific instance ID to run, null to run all instances in the dataset |
first_N_repos |
integer | Limit processing to first N repos (-1 for all repos) |
max_workers |
integer | Number of parallel workers for processing |
overwrite |
boolean | Whether to overwrite existing results (false will skip existing repos) |
os |
str | which docker image os architecture to build on (default linux, can choose windows) |
max_trials |
integer | how many rounds of setup-verify loop agent can attempt, default 1 |
max_steps_setup |
integer | how many steps agent can attemp to setup the environment, default 20 |
max_steps_verify |
integer | how many steps agent can attemp to verify the setup, default 20 |
cmd_timeout |
integer | time limit in minute of llm's each shell command, default 30 min. Suggested: 60 for Linux and 90 for Windows. |
image_prefix |
string | prefix of the output_image in the format {namespace}/{dockerhub_repo}, defaults to repolaunch/dev |
RepoLaunch also provides a second optional step to
- Organize the commands to rebuild to repo after edits of the source code;
- Organize the commands to test the repo with verbose testcase-status output, write a python script to parse the output into clean testcase-status mapping in JSON format: { "testcase1": "pass", "testcase2": "fail", "testcase3": "skip", };
- Make best effort to find the command to run a single testcase separately.
The configs required for this step:
| Field | Type | Description |
|---|---|---|
mode |
dict | default to {"setup": true, "organize": false}, set to {"setup": true, "organize": true} to do the two steps together, or set to {"setup": false, "organize": true} to do the second step separately AFTER the first step is DONE |
max_steps_organize |
integer | how many steps agent can attemp to organize the commands, default 20 |
The per-instance output will be saved in {workspace_root}/playground/{instance_id}/result.json.
LLM API logs (input/output/token_count/cost) will be saved in {workspace_root}/playground/{instance_id}/llm/
| Field | Description |
|---|---|
instance_id |
Unique identifier of the instance |
base_image |
Docker base image |
docker_image |
Commited Image |
setup_commands |
Records of shell commands used to set up the environment |
test_commands |
Records of shell commands used to run the tests with verbose output |
duration |
Time taken to run the process (in minutes) |
completed |
Boolean indicating whether the execution completed successfully |
exception |
Error message or null if no exception occurred |
Summary would be saved to {workspace_root}/setup.jsonl
The setup_commands and test_commands of the first step would be noisy, with useless error commands and exploration commands. This is why we design the second step. The second step output would add these fields:
| Field | Description |
|---|---|
organize_duration |
Time taken to run the process (in minutes) |
organize_completed |
Boolean indicating whether the organization attempt completed successfully |
rebuild_commands |
Minimal commands to rebuild the repo instance |
test_commands |
Clean test commands |
parse |
python script to parse the test output intp testcase-status mapping |
test_status |
Parsed testcase-status mapping in JSON |
pertest_command |
Command to specify a testcase to run, might do not exists |
Summary would be saved to {workspace_root}/organize.jsonl
from launch.api import LaunchedInstance
import json
from typing import Literal
# load an instance from organize.jsonl
with open("..../organize.jsonl") as f:
instance_list = [json.loads(i) for i in f]
instance_dict = instance_list[0]
# Object Oriented API
instance: LaunchedInstance = LaunchedInstance(instance_dict, "linux") # or "windows" for windows image
###### To use the testlog parser to get current test statuses ######
success, build_log = instance.build(verbose = False)
log: str = instance.test()
status: dict[str, Literal['pass', 'fail', 'skip']] = instance.parse_test_log(log)
# Equivalently:
status: dict[str, Literal['pass', 'fail', 'skip']] = instance.build_test_parse(verbose = True)
print(status)
# {"testcase1": "pass", "testcase2": "fail", "testcase3": "skip"}
del instance # to release docker container
###### To evaluate the effect of a new diff patch ######
instance: LaunchedInstance = LaunchedInstance(instance_dict, "linux")
# load your diff_patch
# for example, for swe bench format instance:
diff_patch = instance_dict["test_patch"]
instance.apply_patch(diff_patch, verbose=True)
after_patch_status: dict[str, Literal['pass', 'fail', 'skip']] = instance.build_test_parse(verbose = True)
###### Other Utilities ######
# if you need to save the changes
success, log = instance.git_commit(your_message)
instance.commit_to_image(image_name="experiment", tag="1")
# for custom bash command into the docker container:
res = instance.container.send_command(your_command)
print(res.metadata.exit_code, res.output, sep = "\n")
del instance # to release docker containerpython -m launch.scripts.collect\
--workspace data/test1 --step setup # or organizedocker login
python -m launch.scripts.upload_docker\
--dataset data/test1/organize.jsonl\
--clear_after_push 0 # 0 for false and 1 for true