Support Claude Code as LitAgent #332

zxgx · 2025-11-24T04:23:42Z

No description provided.

Copilot

Pull request overview

This PR adds support for using Claude Code as a LitAgent, enabling trace collection from agent execution on SWE-bench tasks for training purposes. The implementation includes a Docker-based runtime for executing Claude Code in containers, utilities for evaluation and logging, and integration with LiteLLM proxy for versatile LLM server support.

Key Changes:

New CodingAgent class that implements the LitAgent interface for SWE-bench task solving
Docker runtime management with support for both Linux and Windows containers
Custom adapters and callbacks for extracting token IDs and logprobs from LLM responses
SWE-bench evaluation harness integration

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
examples/cc/README.md	Documentation for setting up and running Claude Code agent
examples/cc/cc_agent.py	Main agent implementation with rollout execution and dataset handling
examples/cc/swe_debug.jsonl	Sample dataset with 5 SWE-bench instances for testing
examples/cc/utils/claude_code_controller.py	Controller for managing Claude Code execution in Docker containers
examples/cc/utils/custom_adapter.py	Adapter for converting LLM proxy traces to augmented triplets with logprobs
examples/cc/utils/custom_callbacks.py	Custom LiteLLM callback to request logprobs from vLLM
examples/cc/utils/docker_runtime.py	Docker container runtime with persistent bash sessions and file operations
examples/cc/utils/evaluation.py	SWE-bench evaluation utilities for running and grading predictions
examples/cc/utils/logger.py	Simple logging utility for recording container execution logs
examples/cc/utils/handle_hook.template.sh	Shell script template for handling Claude Code hooks
examples/cc/utils/settings.template.json	Configuration template for Claude Code hook settings

Comments suppressed due to low confidence (1)

examples/cc/utils/evaluation.py:113

Except block directly handles BaseException.

            except:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-24T04:26:48Z

examples/claude_code/claude_code_controller.py

+The bug description is:
+{description}
+=================================================
+You task is to fix the bug with the following steps:


Typo in the user prompt: "You task" should be "Your task".

Suggested change

You task is to fix the bug with the following steps:

Your task is to fix the bug with the following steps:

Copilot · 2025-11-24T04:26:48Z

examples/claude_code/cc_agent.py

+
+        # 3. obtain rewards (evaluation result)


The exception handler on lines 101-102 catches all exceptions but doesn't re-raise or return a proper error value. The code continues to line 105 where it tries to use prediction which may not be defined if the exception occurred before it was assigned. This could lead to an UnboundLocalError.

Suggested change

# 3. obtain rewards (evaluation result)

return 0.0

Copilot · 2025-11-24T04:26:48Z

examples/claude_code/swebench_utils/evaluation.py

+            try:
+                # link the image build dir in the log dir
+                image_build_link.symlink_to(build_dir.absolute(), target_is_directory=True)
+            except:


The bare except: clause (line 113) catches all exceptions without logging which specific exception was caught. This makes debugging difficult. Consider catching specific exceptions or at least logging the exception details.

Copilot · 2025-11-24T04:26:49Z

examples/claude_code/swebench_utils/evaluation.py

+        copy_to_container(container, patch_file, PurePosixPath(DOCKER_PATCH))
+
+        # Attempt to apply patch to container (TODO: FIX THIS)
+        applied_patch = False


The bare except: clause (line 137) catches all exceptions without logging which specific exception was caught. This makes debugging difficult. Consider catching specific exceptions or at least logging the exception details.

Copilot · 2025-11-24T04:26:49Z

examples/claude_code/swebench_utils/docker_runtime.py

+                # print(f"Unexpected error in _stream_output: {e}")
+                break


The bare except Exception as e: clause on line 318 is too broad and swallows all exceptions without taking any action. The commented-out print statement suggests this was intentional, but this makes debugging difficult. Consider logging the exception or handling specific exception types.

Copilot · 2025-11-24T04:26:50Z

examples/cc/swebench_utils/docker_runtime.py

+        # url = f'https://github.com/{instance["repo"]}.git'
+        # base_commit = instance["base_commit"]
+
+        # if platform == "windows":
+        #     res: CommandResult = session.send_command(
+        #        r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(
+        #            url=url, base=base_commit
+        #    )
+        # )
+        # else:
+        #    res: CommandResult = session.send_command(
+        #        f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"
+        #    )
+


This comment appears to contain commented-out code.

Suggested change

# url = f'https://github.com/{instance["repo"]}.git'

# base_commit = instance["base_commit"]

# if platform == "windows":

# res: CommandResult = session.send_command(

# r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(

# url=url, base=base_commit

# )

# )

# else:

# res: CommandResult = session.send_command(

# f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"

# )

Copilot · 2025-11-24T04:26:50Z

examples/cc/swebench_utils/docker_runtime.py

+        # url = f'https://github.com/{instance["repo"]}.git'
+        # base_commit = instance["base_commit"]
+
+        # if platform == "windows":
+        #     res: CommandResult = session.send_command(
+        #        r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(
+        #            url=url, base=base_commit
+        #    )
+        # )
+        # else:
+        #    res: CommandResult = session.send_command(
+        #        f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"
+        #    )
+


This comment appears to contain commented-out code.

Suggested change

# url = f'https://github.com/{instance["repo"]}.git'

# base_commit = instance["base_commit"]

# if platform == "windows":

# res: CommandResult = session.send_command(

# r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(

# url=url, base=base_commit

# )

# )

# else:

# res: CommandResult = session.send_command(

# f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"

# )

Copilot · 2025-11-24T04:26:50Z

examples/claude_code/swebench_utils/docker_runtime.py

+        for match in CMD_OUTPUT_METADATA_PS1_REGEX.finditer(output):
+            scope = match.group(1).strip()
+            try:
+                d = json.loads(scope)  # Try to parse as JSON


This assignment to 'd' is unnecessary as it is redefined before this value is used.

Suggested change

d = json.loads(scope) # Try to parse as JSON

json.loads(scope) # Try to parse as JSON

Copilot · 2025-11-24T04:26:51Z

examples/cc/utils/claude_code_controller.py

@@ -0,0 +1,125 @@
+import os


Import of 'os' is not used.

Suggested change

import os

Copilot · 2025-11-24T04:26:51Z

examples/claude_code/swebench_utils/docker_runtime.py

+        """
+        try:
+            metadata = json.loads(match.group(1))
+        except:


Except block directly handles BaseException.

Suggested change

except:

except (json.JSONDecodeError, TypeError):

ultmaster · 2025-11-24T04:48:19Z

examples/cc/swebench_utils/docker_runtime.py

+        self._start_output_thread()
+        self._clear_initial_prompt()
+        if self.platform == "windows":
+            self.send_command(


We don't need to support Windows currently

ultmaster · 2025-11-24T04:48:53Z

examples/cc/swebench_utils/docker_runtime.py

@@ -0,0 +1,605 @@
+"""


This file looks like a general util. Any third-party lib we can directly leverage? Such as pydocker?

ultmaster · 2025-11-24T04:49:18Z

examples/claude_code/swebench_utils/logger.py

+import os
+
+
+def logger(run_id, instance_id, text):


You can agl.setup_logging now.

zxgx added 2 commits November 24, 2025 04:22

support claude code as agl LitAgent

036ab3d

add debug swe-bench dataset

d83c3a6

Copilot AI review requested due to automatic review settings November 24, 2025 04:23

Copilot started reviewing on behalf of zxgx November 24, 2025 04:24 View session

Copilot finished reviewing on behalf of zxgx November 24, 2025 04:25

Copilot AI reviewed Nov 24, 2025

View reviewed changes

ultmaster reviewed Nov 24, 2025

View reviewed changes

zxgx added 4 commits November 24, 2025 06:11

reorganize

82f05de

simplify docker runtime

bdaad91

polish license header, docstring, readme

d922701

rename cc to claude_code

9cb7485

	You task is to fix the bug with the following steps:
	Your task is to fix the bug with the following steps:

	d = json.loads(scope) # Try to parse as JSON
	json.loads(scope) # Try to parse as JSON

Support Claude Code as LitAgent #332

Are you sure you want to change the base?

Support Claude Code as LitAgent #332

Uh oh!

Conversation

zxgx commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants