RepoLaunch Agent Tutorial

Run RepoLaunch

pip install -e .

We provide an example input file data/examples/dataset.jsonl and a run config data/examples/config.json in examples to help you quickly go through the launch process.

Before getting started, please set your TAVILY_API_KEY environment variable. We use tavily for LLM search engine support.

export TAVILY_API_KEY=...

We use LiteLLM for max compatibility of LLM API, AND to enable custom API deployment for agentic training. Export your LLM API KEY, say OPENAI_API_KEY, ANTHROPIC_API_KEY...

export OPENAI_API_KEY=...

If your llm provider uses user identity login for API usage, go to modify launch/launch/utilities/llm.py.

Start repo launch process:

launch data/examples/config.json
# equivalently: python -m launch.run --config-path data/examples/config.json

For helpers to run RepoLaunch on Windows, see Development-Windows.md

Input

For the input data used to set up the environment, we require the following fields:

Field	Description
`instance_id`	Unique identifier of the instance
`repo`	Full name of the repository like {user_name}/{project_name}
`base_commit`	Commit to check out
`language`	Main language of the repo
`created_at`	(Optional) Creation time of the instance, used to support time-aware environment setup, useful in Python
`hints`	(Optional) Any hints for setting up the repo you want to give the agent, such as GitHub run checks info

Run Config

Step 1 Setup

RepoLaunch is a two step process, the first step is to setup the repo, installing dependencies, build the repo and find test cases to test the build of the repo. The following configs are required.

Field	Type	Description
`print_to_console`	boolean	Whether to print logs to console
`model_config`	dict	Put all arguments for litellm response completion in this dict {"model": "openai/gpt-5.4", ...}. The "model" field should follow formats in litellm document, usually {provider_name}/{model_name}. Put other arguments for litellm response completion here, such as base_url, temperature, top_p.
`workspace_root`	string	Workspace folder for one run
`dataset`	string	Path to the dataset file
`instance_id`	string	Specific instance ID to run, null to run all instances in the dataset
`first_N_repos`	integer	Limit processing to first N repos (-1 for all repos)
`max_workers`	integer	Number of parallel workers for processing
`overwrite`	boolean	Whether to overwrite existing results (false will skip existing repos)
`os`	str	which docker image os architecture to build on (default linux, can choose windows)
`max_trials`	integer	how many rounds of setup-verify loop agent can attempt, default 1
`max_steps_setup`	integer	how many steps agent can attemp to setup the environment, default 20
`max_steps_verify`	integer	how many steps agent can attemp to verify the setup, default 20
`cmd_timeout`	integer	time limit in minute of llm's each shell command, default 30 min. Suggested: 60 for Linux and 90 for Windows.
`image_prefix`	string	prefix of the output_image in the format {namespace}/{dockerhub_repo}, defaults to repolaunch/dev

Step 2 Organize

RepoLaunch also provides a second optional step to

Organize the commands to rebuild to repo after edits of the source code;
Organize the commands to test the repo with verbose testcase-status output, write a python script to parse the output into clean testcase-status mapping in JSON format: { "testcase1": "pass", "testcase2": "fail", "testcase3": "skip", };
Make best effort to find the command to run a single testcase separately.

The configs required for this step:

Field	Type	Description
`mode`	dict	default to {"setup": true, "organize": false}, set to {"setup": true, "organize": true} to do the two steps together, or set to {"setup": false, "organize": true} to do the second step separately AFTER the first step is DONE
`max_steps_organize`	integer	how many steps agent can attemp to organize the commands, default 20

Output

The per-instance output will be saved in {workspace_root}/playground/{instance_id}/result.json.

LLM API logs (input/output/token_count/cost) will be saved in {workspace_root}/playground/{instance_id}/llm/

Step 1 Setup

Field	Description
`instance_id`	Unique identifier of the instance
`base_image`	Docker base image
`docker_image`	Commited Image
`setup_commands`	Records of shell commands used to set up the environment
`test_commands`	Records of shell commands used to run the tests with verbose output
`duration`	Time taken to run the process (in minutes)
`completed`	Boolean indicating whether the execution completed successfully
`exception`	Error message or `null` if no exception occurred

Summary would be saved to {workspace_root}/setup.jsonl

Step 2 Organize

The setup_commands and test_commands of the first step would be noisy, with useless error commands and exploration commands. This is why we design the second step. The second step output would add these fields:

Field	Description
`organize_duration`	Time taken to run the process (in minutes)
`organize_completed`	Boolean indicating whether the organization attempt completed successfully
`rebuild_commands`	Minimal commands to rebuild the repo instance
`test_commands`	Clean test commands
`parse`	python script to parse the test output intp testcase-status mapping
`test_status`	Parsed testcase-status mapping in JSON
`pertest_command`	Command to specify a testcase to run, might do not exists

Summary would be saved to {workspace_root}/organize.jsonl

Helper scripts

To use launch result

from launch.api import LaunchedInstance
import json
from typing import Literal

# load an instance from organize.jsonl
with open("..../organize.jsonl") as f:
    instance_list = [json.loads(i) for i in f]
instance_dict = instance_list[0]

# Object Oriented API
instance: LaunchedInstance = LaunchedInstance(instance_dict, "linux") # or "windows" for windows image

###### To use the testlog parser to get current test statuses ######
success, build_log = instance.build(verbose = False)
log: str = instance.test()
status: dict[str, Literal['pass', 'fail', 'skip']] = instance.parse_test_log(log)

# Equivalently:
status: dict[str, Literal['pass', 'fail', 'skip']] = instance.build_test_parse(verbose = True)

print(status)
# {"testcase1": "pass", "testcase2": "fail", "testcase3": "skip"}

del instance # to release docker container

###### To evaluate the effect of a new diff patch ######

instance: LaunchedInstance = LaunchedInstance(instance_dict, "linux")

# load your diff_patch
# for example, for swe bench format instance:
diff_patch = instance_dict["test_patch"]

instance.apply_patch(diff_patch, verbose=True)
after_patch_status: dict[str, Literal['pass', 'fail', 'skip']] = instance.build_test_parse(verbose = True)

###### Other Utilities ######

# if you need to save the changes
success, log = instance.git_commit(your_message)
instance.commit_to_image(image_name="experiment", tag="1")

# for custom bash command into the docker container:
res = instance.container.send_command(your_command)
print(res.metadata.exit_code, res.output, sep = "\n")

del instance # to release docker container

If launch was interrupted, you can collect summary manually

python -m launch.scripts.collect\
    --workspace  data/test1  --step setup  # or organize

To upload the result to dockerhub

docker login

python -m launch.scripts.upload_docker\
    --dataset  data/test1/organize.jsonl\
    --clear_after_push 0 # 0 for false and 1 for true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RepoLaunch Agent Tutorial

Run RepoLaunch

Input

Run Config

Step 1 Setup

Step 2 Organize

Output

Step 1 Setup

Step 2 Organize

Helper scripts

To use launch result

If launch was interrupted, you can collect summary manually

To upload the result to dockerhub

FilesExpand file tree

Development.md

Latest commit

History

Development.md

File metadata and controls

RepoLaunch Agent Tutorial

Run RepoLaunch

Input

Run Config

Step 1 Setup

Step 2 Organize

Output

Step 1 Setup

Step 2 Organize

Helper scripts

To use launch result

If launch was interrupted, you can collect summary manually

To upload the result to dockerhub