-
Notifications
You must be signed in to change notification settings - Fork 711
Support Claude Code as LitAgent #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for using Claude Code as a LitAgent, enabling trace collection from agent execution on SWE-bench tasks for training purposes. The implementation includes a Docker-based runtime for executing Claude Code in containers, utilities for evaluation and logging, and integration with LiteLLM proxy for versatile LLM server support.
Key Changes:
- New
CodingAgentclass that implements the LitAgent interface for SWE-bench task solving - Docker runtime management with support for both Linux and Windows containers
- Custom adapters and callbacks for extracting token IDs and logprobs from LLM responses
- SWE-bench evaluation harness integration
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/cc/README.md | Documentation for setting up and running Claude Code agent |
| examples/cc/cc_agent.py | Main agent implementation with rollout execution and dataset handling |
| examples/cc/swe_debug.jsonl | Sample dataset with 5 SWE-bench instances for testing |
| examples/cc/utils/claude_code_controller.py | Controller for managing Claude Code execution in Docker containers |
| examples/cc/utils/custom_adapter.py | Adapter for converting LLM proxy traces to augmented triplets with logprobs |
| examples/cc/utils/custom_callbacks.py | Custom LiteLLM callback to request logprobs from vLLM |
| examples/cc/utils/docker_runtime.py | Docker container runtime with persistent bash sessions and file operations |
| examples/cc/utils/evaluation.py | SWE-bench evaluation utilities for running and grading predictions |
| examples/cc/utils/logger.py | Simple logging utility for recording container execution logs |
| examples/cc/utils/handle_hook.template.sh | Shell script template for handling Claude Code hooks |
| examples/cc/utils/settings.template.json | Configuration template for Claude Code hook settings |
Comments suppressed due to low confidence (1)
examples/cc/utils/evaluation.py:113
- Except block directly handles BaseException.
except:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The bug description is: | ||
| {description} | ||
| ================================================= | ||
| You task is to fix the bug with the following steps: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the user prompt: "You task" should be "Your task".
| You task is to fix the bug with the following steps: | |
| Your task is to fix the bug with the following steps: |
|
|
||
| # 3. obtain rewards (evaluation result) |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler on lines 101-102 catches all exceptions but doesn't re-raise or return a proper error value. The code continues to line 105 where it tries to use prediction which may not be defined if the exception occurred before it was assigned. This could lead to an UnboundLocalError.
| # 3. obtain rewards (evaluation result) | |
| return 0.0 |
| try: | ||
| # link the image build dir in the log dir | ||
| image_build_link.symlink_to(build_dir.absolute(), target_is_directory=True) | ||
| except: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bare except: clause (line 113) catches all exceptions without logging which specific exception was caught. This makes debugging difficult. Consider catching specific exceptions or at least logging the exception details.
| copy_to_container(container, patch_file, PurePosixPath(DOCKER_PATCH)) | ||
|
|
||
| # Attempt to apply patch to container (TODO: FIX THIS) | ||
| applied_patch = False |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bare except: clause (line 137) catches all exceptions without logging which specific exception was caught. This makes debugging difficult. Consider catching specific exceptions or at least logging the exception details.
| # print(f"Unexpected error in _stream_output: {e}") | ||
| break |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bare except Exception as e: clause on line 318 is too broad and swallows all exceptions without taking any action. The commented-out print statement suggests this was intentional, but this makes debugging difficult. Consider logging the exception or handling specific exception types.
| # url = f'https://github.com/{instance["repo"]}.git' | ||
| # base_commit = instance["base_commit"] | ||
|
|
||
| # if platform == "windows": | ||
| # res: CommandResult = session.send_command( | ||
| # r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format( | ||
| # url=url, base=base_commit | ||
| # ) | ||
| # ) | ||
| # else: | ||
| # res: CommandResult = session.send_command( | ||
| # f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}" | ||
| # ) | ||
|
|
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # url = f'https://github.com/{instance["repo"]}.git' | |
| # base_commit = instance["base_commit"] | |
| # if platform == "windows": | |
| # res: CommandResult = session.send_command( | |
| # r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format( | |
| # url=url, base=base_commit | |
| # ) | |
| # ) | |
| # else: | |
| # res: CommandResult = session.send_command( | |
| # f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}" | |
| # ) |
| # url = f'https://github.com/{instance["repo"]}.git' | ||
| # base_commit = instance["base_commit"] | ||
|
|
||
| # if platform == "windows": | ||
| # res: CommandResult = session.send_command( | ||
| # r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format( | ||
| # url=url, base=base_commit | ||
| # ) | ||
| # ) | ||
| # else: | ||
| # res: CommandResult = session.send_command( | ||
| # f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}" | ||
| # ) | ||
|
|
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # url = f'https://github.com/{instance["repo"]}.git' | |
| # base_commit = instance["base_commit"] | |
| # if platform == "windows": | |
| # res: CommandResult = session.send_command( | |
| # r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format( | |
| # url=url, base=base_commit | |
| # ) | |
| # ) | |
| # else: | |
| # res: CommandResult = session.send_command( | |
| # f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}" | |
| # ) |
| for match in CMD_OUTPUT_METADATA_PS1_REGEX.finditer(output): | ||
| scope = match.group(1).strip() | ||
| try: | ||
| d = json.loads(scope) # Try to parse as JSON |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to 'd' is unnecessary as it is redefined before this value is used.
| d = json.loads(scope) # Try to parse as JSON | |
| json.loads(scope) # Try to parse as JSON |
| @@ -0,0 +1,125 @@ | |||
| import os | |||
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
| """ | ||
| try: | ||
| metadata = json.loads(match.group(1)) | ||
| except: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except block directly handles BaseException.
| except: | |
| except (json.JSONDecodeError, TypeError): |
| self._start_output_thread() | ||
| self._clear_initial_prompt() | ||
| if self.platform == "windows": | ||
| self.send_command( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to support Windows currently
| @@ -0,0 +1,605 @@ | |||
| """ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file looks like a general util. Any third-party lib we can directly leverage? Such as pydocker?
| import os | ||
|
|
||
|
|
||
| def logger(run_id, instance_id, text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can agl.setup_logging now.
No description provided.