Skip to content

Conversation

@zxgx
Copy link
Contributor

@zxgx zxgx commented Nov 24, 2025

No description provided.

Copilot AI review requested due to automatic review settings November 24, 2025 04:23
Copilot finished reviewing on behalf of zxgx November 24, 2025 04:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for using Claude Code as a LitAgent, enabling trace collection from agent execution on SWE-bench tasks for training purposes. The implementation includes a Docker-based runtime for executing Claude Code in containers, utilities for evaluation and logging, and integration with LiteLLM proxy for versatile LLM server support.

Key Changes:

  • New CodingAgent class that implements the LitAgent interface for SWE-bench task solving
  • Docker runtime management with support for both Linux and Windows containers
  • Custom adapters and callbacks for extracting token IDs and logprobs from LLM responses
  • SWE-bench evaluation harness integration

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
examples/cc/README.md Documentation for setting up and running Claude Code agent
examples/cc/cc_agent.py Main agent implementation with rollout execution and dataset handling
examples/cc/swe_debug.jsonl Sample dataset with 5 SWE-bench instances for testing
examples/cc/utils/claude_code_controller.py Controller for managing Claude Code execution in Docker containers
examples/cc/utils/custom_adapter.py Adapter for converting LLM proxy traces to augmented triplets with logprobs
examples/cc/utils/custom_callbacks.py Custom LiteLLM callback to request logprobs from vLLM
examples/cc/utils/docker_runtime.py Docker container runtime with persistent bash sessions and file operations
examples/cc/utils/evaluation.py SWE-bench evaluation utilities for running and grading predictions
examples/cc/utils/logger.py Simple logging utility for recording container execution logs
examples/cc/utils/handle_hook.template.sh Shell script template for handling Claude Code hooks
examples/cc/utils/settings.template.json Configuration template for Claude Code hook settings
Comments suppressed due to low confidence (1)

examples/cc/utils/evaluation.py:113

  • Except block directly handles BaseException.
            except:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The bug description is:
{description}
=================================================
You task is to fix the bug with the following steps:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the user prompt: "You task" should be "Your task".

Suggested change
You task is to fix the bug with the following steps:
Your task is to fix the bug with the following steps:

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +104

# 3. obtain rewards (evaluation result)
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handler on lines 101-102 catches all exceptions but doesn't re-raise or return a proper error value. The code continues to line 105 where it tries to use prediction which may not be defined if the exception occurred before it was assigned. This could lead to an UnboundLocalError.

Suggested change
# 3. obtain rewards (evaluation result)
return 0.0

Copilot uses AI. Check for mistakes.
try:
# link the image build dir in the log dir
image_build_link.symlink_to(build_dir.absolute(), target_is_directory=True)
except:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except: clause (line 113) catches all exceptions without logging which specific exception was caught. This makes debugging difficult. Consider catching specific exceptions or at least logging the exception details.

Copilot uses AI. Check for mistakes.
copy_to_container(container, patch_file, PurePosixPath(DOCKER_PATCH))

# Attempt to apply patch to container (TODO: FIX THIS)
applied_patch = False
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except: clause (line 137) catches all exceptions without logging which specific exception was caught. This makes debugging difficult. Consider catching specific exceptions or at least logging the exception details.

Copilot uses AI. Check for mistakes.
Comment on lines +318 to +319
# print(f"Unexpected error in _stream_output: {e}")
break
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except Exception as e: clause on line 318 is too broad and swallows all exceptions without taking any action. The commented-out print statement suggests this was intentional, but this makes debugging difficult. Consider logging the exception or handling specific exception types.

Copilot uses AI. Check for mistakes.
Comment on lines 591 to 604
# url = f'https://github.com/{instance["repo"]}.git'
# base_commit = instance["base_commit"]

# if platform == "windows":
# res: CommandResult = session.send_command(
# r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(
# url=url, base=base_commit
# )
# )
# else:
# res: CommandResult = session.send_command(
# f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"
# )

Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# url = f'https://github.com/{instance["repo"]}.git'
# base_commit = instance["base_commit"]
# if platform == "windows":
# res: CommandResult = session.send_command(
# r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(
# url=url, base=base_commit
# )
# )
# else:
# res: CommandResult = session.send_command(
# f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"
# )

Copilot uses AI. Check for mistakes.
Comment on lines 591 to 604
# url = f'https://github.com/{instance["repo"]}.git'
# base_commit = instance["base_commit"]

# if platform == "windows":
# res: CommandResult = session.send_command(
# r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(
# url=url, base=base_commit
# )
# )
# else:
# res: CommandResult = session.send_command(
# f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"
# )

Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# url = f'https://github.com/{instance["repo"]}.git'
# base_commit = instance["base_commit"]
# if platform == "windows":
# res: CommandResult = session.send_command(
# r'git clone {url} "C:\testbed"; cd "C:\testbed"; git reset --hard {base}'.format(
# url=url, base=base_commit
# )
# )
# else:
# res: CommandResult = session.send_command(
# f"git clone {url} /testbed && cd /testbed && git reset --hard {base_commit}"
# )

Copilot uses AI. Check for mistakes.
for match in CMD_OUTPUT_METADATA_PS1_REGEX.finditer(output):
scope = match.group(1).strip()
try:
d = json.loads(scope) # Try to parse as JSON
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'd' is unnecessary as it is redefined before this value is used.

Suggested change
d = json.loads(scope) # Try to parse as JSON
json.loads(scope) # Try to parse as JSON

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,125 @@
import os
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
"""
try:
metadata = json.loads(match.group(1))
except:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except block directly handles BaseException.

Suggested change
except:
except (json.JSONDecodeError, TypeError):

Copilot uses AI. Check for mistakes.
self._start_output_thread()
self._clear_initial_prompt()
if self.platform == "windows":
self.send_command(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to support Windows currently

@@ -0,0 +1,605 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks like a general util. Any third-party lib we can directly leverage? Such as pydocker?

import os


def logger(run_id, instance_id, text):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can agl.setup_logging now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants