Skip to content

Use linguafranca Open Responses requests across ARES#97

Open
joshgreaves wants to merge 14 commits intomainfrom
plan/linguafranca-migration
Open

Use linguafranca Open Responses requests across ARES#97
joshgreaves wants to merge 14 commits intomainfrom
plan/linguafranca-migration

Conversation

@joshgreaves
Copy link
Copy Markdown
Contributor

@joshgreaves joshgreaves commented Mar 13, 2026

User description

Summary

  • make linguafranca Open Responses the canonical internal request shape across queue mediation, environments, agents, mocks, and the main chat-compatible client
  • replace the handwritten Chat/Responses/Anthropic request conversion internals with thin linguafranca-backed edge adapters while preserving legacy converter behavior
  • update example-facing integrations, add a lightweight Twenty Questions smoke example, and make built-in preset registration idempotent for repeated imports

Testing

  • uv run pytest src/ares/llms/open_responses_test.py src/ares/llms/request_test.py src/ares/llms/openai_chat_converter_test.py src/ares/llms/openai_responses_converter_test.py src/ares/llms/anthropic_converter_test.py src/ares/testing/mock_llm_test.py
  • uv run pytest src/ares/code_agents/terminus2/terminus2_agent_test.py
  • uv run pyright src/ares/llms/open_responses.py src/ares/llms/openai_chat_converter.py src/ares/llms/openai_responses_converter.py src/ares/llms/anthropic_converter.py src/ares/llms/chat_completions_compatible.py src/ares/llms/llm_clients.py src/ares/llms/queue_mediated_client.py src/ares/environments/code_env.py src/ares/environments/twenty_questions.py src/ares/code_agents/code_agent_base.py src/ares/code_agents/mini_swe_agent.py src/ares/code_agents/terminus2/terminus2_agent.py src/ares/testing/mock_llm.py src/ares/testing/mock_llm_test.py src/ares/presets.py examples/utils.py examples/00_twenty_questions_smoke.py
  • uv run ruff check
  • uv run python -m examples.00_twenty_questions_smoke
  • uv run python -m py_compile examples/00_twenty_questions_smoke.py examples/utils.py examples/01_sequential_eval_with_local_llm.py examples/02_sequential_eval_with_api.py examples/03_parallel_eval_with_api.py examples/04_rl_training_with_skyrl.py examples/05_tinker_train.py examples/07_mech_interp_hooked_transformer.py

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Adopt linguafranca open_responses builders and InferenceResult responses as the canonical payloads exchanged between queue-mediated clients, adapters, code agents, and environments so every RL observation/action shares one typed representation. Update docs, contributor guidance, presets, and smoke examples to describe the refreshed Open Responses loop and keep builtin presets and tooling aligned with the new dependency.

TopicDetails
Testing & Tooling Refresh testing helpers to consume the canonical requests, validate the new stimuli, and ensure the mocking utilities expose the new open_responses inputs/responses for deterministic suites.
Modified files (2)
  • src/ares/testing/mock_llm.py
  • src/ares/testing/mock_llm_test.py
Latest Contributors(2)
UserCommitDate
joshua.greaves@gmail.comAdd an LLMResponse mod...January 29, 2026
sarvanithinAdd test primitives fo...January 22, 2026
Other Other files
Modified files (6)
  • src/ares/llms/__init__.py
  • src/ares/llms/anthropic_converter_test.py
  • src/ares/llms/openai_chat_converter_test.py
  • src/ares/llms/openai_responses_converter_test.py
  • src/ares/llms/request.py
  • src/ares/llms/request_test.py
Latest Contributors(2)
UserCommitDate
sarvanithinRefactor LLMRequest co...February 13, 2026
joshua.greaves@gmail.comMassively simplify the...January 29, 2026
Docs, Guides & Presets Refresh documentation, CONTRIBUTING guidance, README, and registry assets so every reference to observations/actions mentions Open Responses requests and InferenceResult, add the dependency on martian-linguafranca, and make preset registration idempotent while exercising the Twenty Questions smoke flow in tests. Mention the new behavior in the top-level package docstring and the CLI-affiliated docs so contributors know which helpers to import.
Modified files (10)
  • CLAUDE.md
  • CONTRIBUTING.md
  • README.md
  • docs/source/core-concepts.rst
  • docs/source/how-it-works.rst
  • docs/source/index.rst
  • pyproject.toml
  • src/ares/__init__.py
  • src/ares/presets.py
  • src/ares/presets_test.py
Latest Contributors(2)
UserCommitDate
joshua.greaves@gmail.comAdd a transformers cli...February 24, 2026
Narmeen07Add mechanistic interp...February 19, 2026
Agents & Environments Flow Switch environments, code agents, instrumentation, and examples over to the new flow so observations returned from CodeEnvironment, TwentyQuestionsEnvironment, and the SkyRLLib/Tinker adapters are still consumable while code agents keep recording assistants’ utterances with open_responses messages and response.extract_text_content. Harden container helpers and dashboards to understand the new telemetry, and exercise the transformation in the Twenty Questions smoke example.
Modified files (11)
  • examples/04_rl_training_with_skyrl.py
  • examples/05_tinker_train.py
  • examples/utils.py
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/terminus2/terminus2_agent.py
  • src/ares/code_agents/terminus2/terminus2_agent_test.py
  • src/ares/containers/docker.py
  • src/ares/contrib/eval_visualizer.py
  • src/ares/environments/code_env.py
  • src/ares/environments/twenty_questions.py
Latest Contributors(2)
UserCommitDate
ryan@withmartian.comAdd SkyRL Integration ...February 03, 2026
joshua.greaves@gmail.comAdd an LLMResponse mod...January 29, 2026
Canonical Open Responses Stack Adopt the open_responses helpers, InferenceResult, and corresponding request/response plumbing to replace the legacy chat/Responses/Claude converters, keeping queue mediation, LLM clients, contrib adapters, and their tests aligned with the canonical payload shape. Include the new martian-linguafranca dependency so adapters such as chat_completions_compatible, llama_cpp, hooked_transformer_client, and transformers_client can all call open_responses.to_chat_completions_kwargs and return the unified InferenceResult, while retiring the previous converter modules.
Modified files (14)
  • src/ares/contrib/llama_cpp.py
  • src/ares/contrib/mech_interp/hooked_transformer_client.py
  • src/ares/contrib/transformers_client.py
  • src/ares/contrib/transformers_client_test.py
  • src/ares/llms/anthropic_converter.py
  • src/ares/llms/chat_completions_compatible.py
  • src/ares/llms/llm_clients.py
  • src/ares/llms/open_responses.py
  • src/ares/llms/open_responses_test.py
  • src/ares/llms/openai_chat_converter.py
  • src/ares/llms/openai_responses_converter.py
  • src/ares/llms/queue_mediated_client.py
  • src/ares/llms/queue_mediated_client_test.py
  • src/ares/llms/response.py
Latest Contributors(2)
UserCommitDate
joshua.greaves@gmail.comAdd a transformers cli...February 24, 2026
Narmeen07Add mechanistic interp...February 19, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

  • New Features

    • Standardized request/response handling using canonical "Open Responses" types from the Linguafranca library.
  • Bug Fixes

    • Added runtime guards to container operations to error early if a container isn’t started.
  • Documentation

    • Updated docs, examples, and contribution guidance to reflect the new request/response model and usage patterns.
  • Refactor

    • Broad API and internal type migration to the new canonical request/response shape.
  • Tests

    • Updated and added tests to cover the new request/response helpers and queue-mediated flows.
  • Dependencies

    • Added martian-linguafranca>=0.1.5.

Make Open Responses the canonical internal request shape so queue mediation, environments, agents, and local clients share one path. Keep legacy request adapters at the edges through thin linguafranca-backed converters while preserving existing behavior.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

The PR replaces the internal ARES LLM request/response abstractions with linguafranca "Open Responses" types, introduces an InferenceResult wrapper for responses and cost, removes legacy converters, adds open_responses helpers, and updates clients, agents, environments, examples, and tests to the new types and APIs.

Changes

Cohort / File(s) Summary
Documentation & Guides
CLAUDE.md, CONTRIBUTING.md, README.md, docs/source/core-concepts.rst, docs/source/how-it-works.rst, docs/source/index.rst, src/ares/__init__.py
Swap terminology/signatures to use lft.OpenResponsesRequest observations and InferenceResult actions; update import guidance to prefer from ares.llms import open_responses and open_responses.make_request(...).
Core LLM API & Types
src/ares/llms/__init__.py, src/ares/llms/llm_clients.py, src/ares/llms/response.py
Replace legacy LLMResponse/TextData/Usage with InferenceResult wrapping lft.OpenResponsesResponse; update LLMClient protocol to accept lft.OpenResponsesRequest and return InferenceResult; expose extract_text_content/make_response.
New helpers: Open Responses
src/ares/llms/open_responses.py, src/ares/llms/open_responses_test.py
Add open_responses module: builders (user_message, make_request, etc.), conversions to Chat Completions kwargs, message/content extraction, JSON normalization, and tests validating behavior and tool/function handling.
Removed legacy request & converters
src/ares/llms/request.py, src/ares/llms/anthropic_converter.py, src/ares/llms/openai_chat_converter.py, src/ares/llms/openai_responses_converter.py and their tests
Delete the old internal LLMRequest abstraction and provider-specific converter modules and associated tests — functionality superseded by linguafranca types + open_responses.
LLM Client implementations & adapters
src/ares/llms/chat_completions_compatible.py, src/ares/llms/queue_mediated_client.py, src/ares/llms/queue_mediated_client_test.py, src/ares/contrib/llama_cpp.py, src/ares/contrib/mech_interp/hooked_transformer_client.py, src/ares/contrib/transformers_client.py, src/ares/contrib/transformers_client_test.py
Update clients to accept lft.OpenResponsesRequest and return InferenceResult; convert request/response handling to use open_responses.to_chat_completions_kwargs(...) and response.make_response(...); adjust batching and rendering logic in transformers client and tests.
Code agents & tests
src/ares/code_agents/..., src/ares/code_agents/terminus2/terminus2_agent_test.py
Migrate agents to build requests via open_responses.make_request(...), store linguafranca message types, accept/return InferenceResult, and extract text with extract_text_content(...); update tests to mock/wrap responses via make_response(...).
Environments & adapters
src/ares/environments/code_env.py, src/ares/environments/twenty_questions.py, examples/04_rl_training_with_skyrl.py, examples/05_tinker_train.py, examples/utils.py
Retype environments/adapters to Environment[InferenceResult, lft.OpenResponsesRequest, ...]; change observation/action construction and message extraction to use open_responses helpers and extract_text_content.
Testing, mocks & utils
src/ares/testing/mock_llm.py, src/ares/testing/mock_llm_test.py, src/ares/llms/queue_mediated_client_test.py, src/ares/llms/open_responses_test.py
Update MockLLM to record OpenResponsesRequest and return InferenceResult; update tests to construct canonical requests via open_responses and assert using response.extract_text_content(...).
Contrib / Visualization / Containers
src/ares/contrib/eval_visualizer.py, src/ares/containers/docker.py
Retype TrackedEnvironment/dashboard to new request/response types; add guard checks in DockerContainer.* methods to raise when container not started.
Presets & registry tests
src/ares/presets.py, src/ares/presets_test.py
Prevent duplicate preset registration by tracking existing names; add test ensuring TwentyQuestions preset uses Open Responses observations.
Project config
pyproject.toml
Add runtime dependency: martian-linguafranca>=0.1.5.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Agent
participant OpenResponses as open_responses
participant LLMClient
participant ExternalModel as Model/API
participant ResponseLib as response.make_response
participant Environment
Agent->>OpenResponses: make_request(...) (instructions/messages)
OpenResponses->>LLMClient: lft.OpenResponsesRequest
LLMClient->>ExternalModel: external API call (chat completions / local model)
ExternalModel-->>LLMClient: external response payload
LLMClient->>ResponseLib: make_response(content, tokens...)
ResponseLib-->>LLMClient: lft.OpenResponsesResponse
LLMClient-->>Agent: InferenceResult(response, cost)
Agent->>Environment: step(InferenceResult)
Environment-->>Agent: TimeStep(observation: lft.OpenResponsesRequest,...)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hopped from old requests to linguafranca's light,
Built tidy make_request paths and wrapped responses tight.
An InferenceResult pouch holds tokens and cost,
The rabbit approves—no converters lost! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Use linguafranca Open Responses requests across ARES' accurately and specifically describes the main objective of the changeset: adopting linguafranca's canonical Open Responses types as the core request/response abstraction throughout the ARES codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch plan/linguafranca-migration

Comment @coderabbitai help to get the list of available commands and usage tips.

Address audit findings by preserving embedded assistant tool calls, restoring preset re-registration after registry clears, and rejecting unknown external request parameters in strict mode. Align docs and helpers with the canonical Open Responses request path and keep local batching permissive where the previous behavior expected it.
Unify the canonical chat serialization path with the tested converter cleanup, make preset registration idempotent without registry coupling, and add direct regression coverage for queue-mediated requests and preset-based observations. Update public docs and examples to reflect Open Responses as the runtime request model.

_LOGGER = logging.getLogger(__name__)

MODEL_STUB = "__ARES_MODEL_UNSET__"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need error handling?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Copy Markdown

@andrewtran117 andrewtran117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left small comment on error handling the model stub. otherwise looks good pending test files passed.

joshgreaves and others added 9 commits March 26, 2026 13:57
…ctly

Delete the ares.llms.request module and all legacy conversion functions
(from_legacy_request, to_legacy_request). Converters now take/return
lft.OpenResponsesRequest directly, simplifying the codebase.

- Remove request.py and request_test.py (legacy LLMRequest dataclass)
- Update open_responses.py: remove legacy conversion helpers, use lft types
- Update converters (anthropic, openai_chat, openai_responses) to use
  lft.OpenResponsesRequest input/output
- Update all tests to use open_responses.make_request() instead of LLMRequest
- Update type annotations across agents, environments, and clients

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…types

Migrate from custom response types (LLMResponse, TextData, Usage) to a thin
wrapper that holds the linguafranca OpenResponsesResponse directly. This
simplifies the codebase by using canonical types throughout.

Key changes:
- Rename LLMResponse to InferenceResult with fields: response, cost
- Add extract_text_content() utility function for text extraction
- Add make_response() helper for creating test responses
- Update all code agents, environments, and tests to use new types
- Fix docker.py runtime checks for optional container access

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These converters were dead code - only used in their own tests.
The actual conversion happens via open_responses.to_chat_completions_kwargs()
which calls linguafranca directly.

Deleted:
- openai_chat_converter.py
- anthropic_converter.py
- openai_responses_converter.py
- Their corresponding test files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
if not self.preset_name:
raise ValueError("preset_name must be provided in extras or kwargs")
self.env: ares.Environment[llms.LLMResponse, llms.LLMRequest, float, float] | None = None
self.env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float] | None = None
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q for @rsmith49
There is a little bit of asymmetry here, wdyt of it? Inputs to step are LLM responses (ARES type, though it wraps linguafranca), outputs of step are LLM requests (linguafranca type).

Is this confusing? We could alias linguafranca types so people can use ARES aliases instead, but I'm not sure if that's even more confusing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it feels a little off. The right approach is probably to wrap lft.OpenResponsesRequest with an ARES-specific wrapper as well to future proof in case there are other top level fields we need (like cost for the response), and have it be a single attribute dataclass?

If we want to do this approach long-term, I think aliasing the type within ARES makes sense for now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I ended up using an alias to the type called InferenceRequest
Now there's a slightly weird matchup, InferenceRequest and InferenceResult. I think this is better, but agree Result is weird, but it feels like we can't do Response

pass


def _render_content_to_text(content: object, *, context: str) -> str:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love all these new formatting fns, but this is contrib so we can come back to this if we think it's worth it.

@joshgreaves joshgreaves requested a review from rsmith49 March 27, 2026 23:14
@joshgreaves joshgreaves marked this pull request as ready for review March 27, 2026 23:14
Comment on lines 49 to 56
import ares
from ares import containers
from ares import llms
from ares.llms import open_responses
import chz
import frozendict
from linguafranca import types as lft
import numpy as np
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move from ares.llms import open_responses and from linguafranca import types as lft after the third-party imports so all third-party imports appear before ARES/local imports per CLAUDE.md and CONTRIBUTING.md?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
examples/05_tinker_train.py around lines 49-56, the local ARES imports (import ares,
from ares import containers, from ares import llms, from ares.llms import
open_responses) are placed among third-party imports. Reorder the imports so all
third-party imports (chz, frozendict, from linguafranca import types as lft, numpy,
tinker, from tinker_cookbook import cli_utils) appear together first, then add a blank
line, and place the ARES/local imports after them. Preserve existing import names and
aliases and ensure the file follows stdlib → third-party → local grouping with a
blank line separating groups.

Comment on lines +168 to +179
def with_model(request: lft.OpenResponsesRequest, model: str) -> lft.OpenResponsesRequest:
"""Return a copy of the request with the model field replaced.

Args:
request: The original request.
model: The new model identifier.

Returns:
A new request with the updated model field.
"""
return dataclasses.replace(request, model=model)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open_responses.ensure_request is referenced by tests but not defined and raises AttributeError, should we add ensure_request that returns the canonical request unchanged?
def ensure_request(req): return req

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/llms/open_responses.py around lines 168 to 183, add a new top-level function
named ensure_request that accepts an OpenResponsesRequest and returns it unchanged.
Implement it as: def ensure_request(request: lft.OpenResponsesRequest) ->
lft.OpenResponsesRequest: """Return the canonical request unchanged; provided for
callers/tests that expect this helper.""" return request. Keep typing and a short
docstring so it is importable and documented.

Comment on lines +266 to +272
def to_jsonable(value: Any) -> _JSONABLE:
# TODO: Replace this with frfr.
# The issue is that OpenResponsesRequest has enum values, which frfr doesn't handle correctly yet.
# It won't, in general, either. So we should make sure OpenResposnesRequest enums are StrEnums where appropriate.
if dataclasses.is_dataclass(value):
return {field.name: to_jsonable(getattr(value, field.name)) for field in dataclasses.fields(value)}
if isinstance(value, enum.Enum):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment before to_jsonable says OpenResposnesRequest, should we replace it with OpenResponsesRequest?

Finding type: Naming and Typos | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/llms/open_responses.py around lines 266 to 272, the comment above the
to_jsonable function mistakenly spells 'OpenResponsesRequest' as 'OpenResposnesRequest'.
Edit the comment to correct the typo to 'OpenResponsesRequest' (leave the rest of the
comment text unchanged) so the terminology matches the rest of the module.

Comment on lines +236 to 239
observation = open_responses.make_request(
[open_responses.user_message(initial_prompt)],
instructions=self._system_prompt,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branches of step() duplicate the open_responses.make_request([open_responses.user_message(..., instructions=self._system_prompt)observation construction, should we extract it into_make_observation_prompt(text: str)`?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Comment on lines 7 to 13

import ares
from ares import llms
from ares.llms import open_responses
from linguafranca import types as lft
import tqdm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ares imports sit before third-party tqdm with no separating blank line — should we move ares into the final local group separated by a blank line per CLAUDE.md?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In examples/utils.py
around lines 7-13, the import group ordering is wrong: local ARES imports (import ares,
from ares import llms, from ares.llms import open_responses, from linguafranca import
types as lft) appear before the third-party import tqdm and there is no blank line
separating third-party and local imports. Reorder the imports into conventional groups
(stdlib first, then third-party, then local), place from ares... and from
linguafranca... into the local group after tqdm, and add a single blank line between the
third-party group and the local ARES imports. Also ensure imports within each group are
sorted consistently (e.g., alphabetically).

Comment on lines 103 to 107
) -> containers.ExecResult:
if self._container is None:
raise RuntimeError("Container not started. Call start() first.")

# Use default_workdir if workdir is not explicitly provided
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._container is None start guard is duplicated in exec_run, upload_files, and download_files — should we extract it into a _assert_started() helper?

Finding type: Code Dedup and Conventions | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah #47 addressed this - would be great to address merge conflicts on that and get it merged at some point

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder—#47 has landed but the docker container guard conflict still needs resolving, so please rebase onto the latest main and address the duplication before we can merge.

Comment on lines +1 to +12
"""LLM client interfaces and data types.

Canonical request builders and request types live in :mod:`ares.llms.open_responses`.

Prefer ``from ares.llms import open_responses`` to access request types and builders
rather than importing individual type aliases from this package.
"""

# Request types
# Client protocol
from ares.llms.chat_completions_compatible import ChatCompletionCompatibleLLMClient
from ares.llms.llm_clients import LLMClient
from ares.llms.request import AssistantMessage
from ares.llms.request import LLMRequest
from ares.llms.request import Message
from ares.llms.request import ToolCallMessage
from ares.llms.request import ToolCallResponseMessage
from ares.llms.request import UserMessage

# Response types
from ares.llms.response import LLMResponse
from ares.llms.response import TextData
from ares.llms.response import Usage
from ares.llms.response import InferenceResult
from ares.llms.response import extract_text_content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ares.llms import LLMRequest breaks after removing src/ares/llms/request.py; should we reintroduce thin shims/aliases in ares.llms to re-export the new types or provide a documented migration path and review the removed file for shim locations?

Finding type: Breaking Changes | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
src/ares/llms/__init__.py around lines 1-21, the package no longer re-exports the
previous request/message/tool dataclasses (e.g. LLMRequest, AssistantMessage,
ToolCallMessage, UserMessage) and this breaks downstream `from ares.llms import ...`
imports. Restore backwards-compatible thin shims by re-importing and re-exporting those
types (either from src/ares/llms/request.py if it still exists, or from the new
open_responses module if types were moved) and add them back into __all__. Also add
short deprecation notices (warnings.warn) or docstrings pointing to the new canonical
API (ares.llms.open_responses) so consumers have a migration path. Finally, review
src/ares/llms/request.py to ensure the correct original type names are exported and
include a unit test that imports the legacy names to validate compatibility.

Comment on lines 49 to 54
import ares
from ares import llms
from ares.llms import open_responses
import hydra
from linguafranca import types as lft
import omegaconf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARES imports appear before third-party imports; should we reorder to stdlib → third-party → local/ARES with blank lines and ARES last per CLAUDE.md?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
examples/04_rl_training_with_skyrl.py around lines 49 to 54, the ARES imports (import
ares; from ares import llms; from ares.llms import open_responses) are placed before
third-party imports, violating the project's import ordering. Reorder imports to follow:
stdlib imports first (unchanged), then a single blank line, then all third-party imports
(hydra, linguafranca/types, omegaconf, ray, skyrl_gym) grouped together, then a single
blank line, and finally the ARES/local imports grouped together (import ares; from ares
import llms; from ares.llms import open_responses). Ensure there is exactly one blank
line separating each group and update any import lines or spacing accordingly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/ares/contrib/mech_interp/hooked_transformer_client.py (2)

74-91: ⚠️ Potential issue | 🟠 Major

Validate max_output_tokens before subtracting it from n_ctx.

Values <= 0 or >= self.model.cfg.n_ctx make max_new_tokens invalid or drive apply_chat_template(..., max_length=...) to a non-positive value. Right now that becomes a runtime failure instead of a clean input error.

🛠️ Proposed guard
-        max_output_tokens = max_output_tokens or request.max_output_tokens or self.max_new_tokens
+        if max_output_tokens is None:
+            max_output_tokens = request.max_output_tokens
+        if max_output_tokens is None:
+            max_output_tokens = self.max_new_tokens
+        if max_output_tokens <= 0:
+            raise ValueError("max_output_tokens must be positive")
+        if max_output_tokens >= self.model.cfg.n_ctx:
+            raise ValueError("max_output_tokens must be smaller than the model context window")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/mech_interp/hooked_transformer_client.py` around lines 74 -
91, Validate resolved max_output_tokens (from max_output_tokens or
request.max_output_tokens or self.max_new_tokens) before using it to compute
max_length for apply_chat_template: ensure it is >0 and < self.model.cfg.n_ctx,
and if not raise a clear ValueError (or return a clean input error) indicating
the invalid max_output_tokens and the allowed range. Do this check immediately
after computing max_output_tokens and before computing
max_length/self.model.tokenizer.apply_chat_template; reference the symbols
max_output_tokens, request.max_output_tokens, self.max_new_tokens,
self.model.cfg.n_ctx, and apply_chat_template so the guard is easy to locate and
update.

97-106: ⚠️ Potential issue | 🟠 Major

top_p is silently ignored in this client.

The canonical request now carries sampling controls, but this adapter only forwards temperature. That means the same OpenResponsesRequest behaves differently here than it does in TransformersLLMClient. Either pass request.top_p through to generation or reject it explicitly instead of dropping it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/mech_interp/hooked_transformer_client.py` around lines 97 -
106, The client currently only forwards request.temperature into gen_kwargs (see
gen_kwargs and the temperature handling block), silently dropping request.top_p;
update the adapter in HookedTransformerClient (hooked_transformer_client.py) to
also forward request.top_p into gen_kwargs when not None, or explicitly raise an
error if top_p is unsupported. Locate the generation kwargs construction
(gen_kwargs) and either add gen_kwargs["top_p"] = request.top_p when
request.top_p is not None, or add a guard that raises a clear exception
mentioning top_p if you intend not to support it.
src/ares/contrib/eval_visualizer.py (1)

91-121: ⚠️ Potential issue | 🟡 Minor

Widen the dashboard wrapper's observation types to accept CodeEnvironment.

Lines 93, 112, 121 (constructor, reset(), step() return types) and line 702 (wrap() parameter type) currently expect lft.OpenResponsesRequest, but CodeEnvironment is defined with observation type lft.OpenResponsesRequest | None on terminal timesteps. This causes static type checking to reject CodeEnvironment as an argument to wrap() or the TrackedEnvironment constructor, despite the docstring claiming to support "all ARES code agent environments."

Suggested fix
-        env: base.Environment[response.InferenceResult, lft.OpenResponsesRequest, RewardType, DiscountType],
+        env: base.Environment[
+            response.InferenceResult, lft.OpenResponsesRequest | None, RewardType, DiscountType
+        ],
@@
-    async def reset(self) -> base.TimeStep[lft.OpenResponsesRequest, RewardType, DiscountType]:
+    async def reset(self) -> base.TimeStep[lft.OpenResponsesRequest | None, RewardType, DiscountType]:
@@
-    ) -> base.TimeStep[lft.OpenResponsesRequest, RewardType, DiscountType]:
+    ) -> base.TimeStep[lft.OpenResponsesRequest | None, RewardType, DiscountType]:
@@
-        env: base.Environment[response.InferenceResult, lft.OpenResponsesRequest, RewardType, DiscountType],
+        env: base.Environment[
+            response.InferenceResult, lft.OpenResponsesRequest | None, RewardType, DiscountType
+        ],

Rerun pyright to confirm the fix resolves the type incompatibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/eval_visualizer.py` around lines 91 - 121, The type
annotations for observations in TrackedEnvironment need to accept
CodeEnvironment terminal timesteps that use None; update the constructor
signature, reset() return type, step() return type, and the wrap() parameter to
use lft.OpenResponsesRequest | None (or Optional[lft.OpenResponsesRequest])
instead of just lft.OpenResponsesRequest so CodeEnvironment is accepted; modify
the annotations on __init__, reset, step, and the wrap(...) parameter
accordingly and rerun pyright to confirm the type error is resolved.
examples/05_tinker_train.py (1)

124-138: ⚠️ Potential issue | 🟡 Minor

Widen the env observation type to accept None.

Line 126 declares the observation type as lft.OpenResponsesRequest, but the environment's step() method returns TimeStep with lft.OpenResponsesRequest | None on terminal transitions. The _get_tinker_observation() method already handles the None case correctly (line 139), and CodeEnvironment—the main environment this adapter wraps—declares observation type with | None. The type annotation must match the actual behavior.

Suggested fix
-        env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float],
+        env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest | None, float, float],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/05_tinker_train.py` around lines 124 - 138, The constructor's env
annotation is too narrow: change the env type parameter in __init__ to accept
observations that may be None (i.e., use lft.OpenResponsesRequest | None) so it
matches the TimeStep passed into _get_tinker_observation; update the signature
where env is typed (the __init__ method) to
ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest | None, float,
float] so the type matches the actual environment behavior handled by
_get_tinker_observation.
🧹 Nitpick comments (4)
src/ares/containers/docker.py (1)

104-106: Consider centralizing the startup guard and documenting the raised RuntimeError.

The same check/message is repeated in three methods. A small helper (e.g., _require_started_container()) would reduce duplication and improve type narrowing. Also worth updating the Container protocol/docstrings to mention this precondition explicitly for callers.

Also applies to: 135-136, 165-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/containers/docker.py` around lines 104 - 106, Extract the repeated
guard "if self._container is None: raise RuntimeError('Container not started.
Call start() first.')" into a small private helper method, e.g.,
_require_started_container(self), and replace the three occurrences with calls
to that helper to reduce duplication and enable type narrowing of
self._container; update the Container protocol/docstrings to document that
methods accessing self._container require the container to be started and that
they raise RuntimeError otherwise (mention _require_started_container as the
enforcement point).
README.md (1)

18-18: Minor terminology inconsistency.

"Open Responses requests" and "LLMResponses" are mixed terminology styles. Given the migration, consider using consistent naming like "Open Responses requests" and "inference results" (or "LLM responses" to match the code comment style).

📝 Suggested wording
-ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
+ARES treats Open Responses requests as observations and LLM responses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 18, The README line uses mixed terminology ("Open
Responses requests" vs "LLMResponses"); update the sentence to use consistent
naming (e.g., keep "Open Responses requests" and change "LLMResponses" to "LLM
responses" or "inference results") so the phrasing matches other comments and
code; locate the exact sentence containing "Open Responses requests" and
"LLMResponses" and replace "LLMResponses" with the chosen consistent term across
the README to maintain style uniformity.
examples/04_rl_training_with_skyrl.py (1)

109-110: Consider strict=False for robustness in training scenarios.

Using strict=True in to_chat_messages will raise ValueError if observations contain non-text content (images, tool calls, function outputs). While this provides fail-fast behavior, it could cause training crashes if any environment produces such observations.

Consider using strict=False to match the pattern in examples/utils.py, or document that this training integration assumes text-only observations.

Also applies to: 135-135

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/04_rl_training_with_skyrl.py` around lines 109 - 110, The training
code uses open_responses.to_chat_messages(ts.observation, strict=True) which
will raise ValueError on non-text observations; change strict=True to
strict=False in the two uses of to_chat_messages (the calls on ts.observation)
so the function tolerates images/tool outputs during training and matches the
pattern in examples/utils.py, or alternatively add a clear comment documenting
the assumption of text-only observations if you intentionally want to keep
strict=True.
docs/source/core-concepts.rst (1)

160-170: Code example missing import for open_responses module.

The example uses open_responses.make_request() and open_responses.user_message() but doesn't show the required import statement. Consider adding the import for clarity:

from ares.llms import open_responses

This would help users understand where these helpers come from when following the documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/core-concepts.rst` around lines 160 - 170, The example uses
open_responses.make_request and open_responses.user_message but omits the
import; add an import for the open_responses helper (e.g., import open_responses
from its module) at the top of the snippet so symbols like
open_responses.make_request and open_responses.user_message are defined when
demonstrating the async run(self, task: str) -> None method and its calls to
self._llm_client and self.parse_commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 284-291: The docs reference non-existent types (llms.TextData,
llms.Usage, response.TextData, response.Usage); update CLAUDE.md to use the
actual exported symbols: replace TextData usages with InferenceResult or
extract_text_content/make_response where appropriate, and replace Usage with
lft.Usage (from linguafranca). Also prefer referencing the package exports:
ChatCompletionCompatibleLLMClient, InferenceResult, LLMClient,
extract_text_content, make_response from ares.llms/__init__.py instead of the
removed types; search for any occurrences of TextData/Usage and swap them to the
correct symbols and examples accordingly.

In `@src/ares/contrib/llama_cpp.py`:
- Around line 101-109: The current conversion in the LlamaCpp adapter discards
tool call data by using chat_completion.choices[0].message.content and
response.make_response, so preserve tool calls by converting the entire chat
message (including message.tool_calls) into the library's
Response/InferenceResult format instead of only text; update the logic around
chat_completion.choices[0].message to detect and map message.tool_calls into the
equivalent fields on the Response (or build a Response from message content plus
tool_call structure) and return that in response.InferenceResult (mirror the
same pattern for ChatCompletionCompatibleLLMClient to avoid the same loss); add
a unit test that supplies a chat_completion with message.tool_calls to assert
the resulting InferenceResult contains the tool call payload.

In `@src/ares/llms/chat_completions_compatible.py`:
- Around line 77-85: The Chat Completions handling currently only uses
resp.choices[0].message.content and drops any
resp.choices[0].message.tool_calls; update the code that builds the inference
response (around response.make_response and response.InferenceResult) to
preserve tool invocations by detecting resp.choices[0].message.tool_calls and
passing them through into the Open Responses format: either extend
response.make_response to accept a tool_calls (or tools) parameter and pass
resp.choices[0].message.tool_calls, or add a small converter that constructs the
Open Responses-style response object containing both content and tool_calls
before creating lf_response; keep existing fields (model, input_tokens,
output_tokens, response_id, cost) unchanged.

In `@src/ares/llms/open_responses_test.py`:
- Around line 78-81: The test is calling a missing helper
open_responses.ensure_request which raises AttributeError; either reintroduce
ensure_request in src/ares/llms/open_responses.py (implement it to accept a
request produced by make_request and return the canonical request object) or
change the test to use the real normalization entry point (e.g., call the
existing request factory/normalizer such as open_responses.make_request or the
module's public normalize function) so the test asserts the canonical request is
returned; update references to ensure_request, make_request, and user_message
accordingly to keep the test exercising the supported API.

---

Outside diff comments:
In `@examples/05_tinker_train.py`:
- Around line 124-138: The constructor's env annotation is too narrow: change
the env type parameter in __init__ to accept observations that may be None
(i.e., use lft.OpenResponsesRequest | None) so it matches the TimeStep passed
into _get_tinker_observation; update the signature where env is typed (the
__init__ method) to ares.Environment[llms.InferenceResult,
lft.OpenResponsesRequest | None, float, float] so the type matches the actual
environment behavior handled by _get_tinker_observation.

In `@src/ares/contrib/eval_visualizer.py`:
- Around line 91-121: The type annotations for observations in
TrackedEnvironment need to accept CodeEnvironment terminal timesteps that use
None; update the constructor signature, reset() return type, step() return type,
and the wrap() parameter to use lft.OpenResponsesRequest | None (or
Optional[lft.OpenResponsesRequest]) instead of just lft.OpenResponsesRequest so
CodeEnvironment is accepted; modify the annotations on __init__, reset, step,
and the wrap(...) parameter accordingly and rerun pyright to confirm the type
error is resolved.

In `@src/ares/contrib/mech_interp/hooked_transformer_client.py`:
- Around line 74-91: Validate resolved max_output_tokens (from max_output_tokens
or request.max_output_tokens or self.max_new_tokens) before using it to compute
max_length for apply_chat_template: ensure it is >0 and < self.model.cfg.n_ctx,
and if not raise a clear ValueError (or return a clean input error) indicating
the invalid max_output_tokens and the allowed range. Do this check immediately
after computing max_output_tokens and before computing
max_length/self.model.tokenizer.apply_chat_template; reference the symbols
max_output_tokens, request.max_output_tokens, self.max_new_tokens,
self.model.cfg.n_ctx, and apply_chat_template so the guard is easy to locate and
update.
- Around line 97-106: The client currently only forwards request.temperature
into gen_kwargs (see gen_kwargs and the temperature handling block), silently
dropping request.top_p; update the adapter in HookedTransformerClient
(hooked_transformer_client.py) to also forward request.top_p into gen_kwargs
when not None, or explicitly raise an error if top_p is unsupported. Locate the
generation kwargs construction (gen_kwargs) and either add gen_kwargs["top_p"] =
request.top_p when request.top_p is not None, or add a guard that raises a clear
exception mentioning top_p if you intend not to support it.

---

Nitpick comments:
In `@docs/source/core-concepts.rst`:
- Around line 160-170: The example uses open_responses.make_request and
open_responses.user_message but omits the import; add an import for the
open_responses helper (e.g., import open_responses from its module) at the top
of the snippet so symbols like open_responses.make_request and
open_responses.user_message are defined when demonstrating the async run(self,
task: str) -> None method and its calls to self._llm_client and
self.parse_commands.

In `@examples/04_rl_training_with_skyrl.py`:
- Around line 109-110: The training code uses
open_responses.to_chat_messages(ts.observation, strict=True) which will raise
ValueError on non-text observations; change strict=True to strict=False in the
two uses of to_chat_messages (the calls on ts.observation) so the function
tolerates images/tool outputs during training and matches the pattern in
examples/utils.py, or alternatively add a clear comment documenting the
assumption of text-only observations if you intentionally want to keep
strict=True.

In `@README.md`:
- Line 18: The README line uses mixed terminology ("Open Responses requests" vs
"LLMResponses"); update the sentence to use consistent naming (e.g., keep "Open
Responses requests" and change "LLMResponses" to "LLM responses" or "inference
results") so the phrasing matches other comments and code; locate the exact
sentence containing "Open Responses requests" and "LLMResponses" and replace
"LLMResponses" with the chosen consistent term across the README to maintain
style uniformity.

In `@src/ares/containers/docker.py`:
- Around line 104-106: Extract the repeated guard "if self._container is None:
raise RuntimeError('Container not started. Call start() first.')" into a small
private helper method, e.g., _require_started_container(self), and replace the
three occurrences with calls to that helper to reduce duplication and enable
type narrowing of self._container; update the Container protocol/docstrings to
document that methods accessing self._container require the container to be
started and that they raise RuntimeError otherwise (mention
_require_started_container as the enforcement point).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9305d7ad-0b27-4bc1-97a4-3d0a28c2128f

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 134ef0f.

📒 Files selected for processing (43)
  • CLAUDE.md
  • CONTRIBUTING.md
  • README.md
  • docs/source/core-concepts.rst
  • docs/source/how-it-works.rst
  • docs/source/index.rst
  • examples/04_rl_training_with_skyrl.py
  • examples/05_tinker_train.py
  • examples/utils.py
  • pyproject.toml
  • src/ares/__init__.py
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/terminus2/terminus2_agent.py
  • src/ares/code_agents/terminus2/terminus2_agent_test.py
  • src/ares/containers/docker.py
  • src/ares/contrib/eval_visualizer.py
  • src/ares/contrib/llama_cpp.py
  • src/ares/contrib/mech_interp/hooked_transformer_client.py
  • src/ares/contrib/transformers_client.py
  • src/ares/contrib/transformers_client_test.py
  • src/ares/environments/code_env.py
  • src/ares/environments/twenty_questions.py
  • src/ares/llms/__init__.py
  • src/ares/llms/anthropic_converter.py
  • src/ares/llms/anthropic_converter_test.py
  • src/ares/llms/chat_completions_compatible.py
  • src/ares/llms/llm_clients.py
  • src/ares/llms/open_responses.py
  • src/ares/llms/open_responses_test.py
  • src/ares/llms/openai_chat_converter.py
  • src/ares/llms/openai_chat_converter_test.py
  • src/ares/llms/openai_responses_converter.py
  • src/ares/llms/openai_responses_converter_test.py
  • src/ares/llms/queue_mediated_client.py
  • src/ares/llms/queue_mediated_client_test.py
  • src/ares/llms/request.py
  • src/ares/llms/request_test.py
  • src/ares/llms/response.py
  • src/ares/presets.py
  • src/ares/presets_test.py
  • src/ares/testing/mock_llm.py
  • src/ares/testing/mock_llm_test.py
💤 Files with no reviewable changes (8)
  • src/ares/llms/request_test.py
  • src/ares/llms/openai_responses_converter_test.py
  • src/ares/llms/anthropic_converter_test.py
  • src/ares/llms/openai_chat_converter.py
  • src/ares/llms/request.py
  • src/ares/llms/openai_chat_converter_test.py
  • src/ares/llms/openai_responses_converter.py
  • src/ares/llms/anthropic_converter.py

Comment on lines 101 to +109
content = chat_completion.choices[0].message.content or ""
usage = response.Usage(
prompt_tokens=chat_completion.usage.prompt_tokens if chat_completion.usage else 0,
generated_tokens=chat_completion.usage.completion_tokens if chat_completion.usage else 0,
lf_response = response.make_response(
content,
model=self.model_name,
input_tokens=chat_completion.usage.prompt_tokens if chat_completion.usage else 0,
output_tokens=chat_completion.usage.completion_tokens if chat_completion.usage else 0,
response_id=chat_completion.id,
)
return response.LLMResponse(data=[response.TextData(content=content)], cost=0.0, usage=usage)
return response.InferenceResult(response=lf_response, cost=0.0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the llama_cpp.py file around lines 101-109
cat -n src/ares/contrib/llama_cpp.py | sed -n '95,115p'

Repository: withmartian/ares

Length of output: 1190


🏁 Script executed:

# Also check the overall structure of the file
wc -l src/ares/contrib/llama_cpp.py

Repository: withmartian/ares

Length of output: 93


🏁 Script executed:

# Look at the response.py to understand make_response
cat -n src/ares/llms/response.py | head -120

Repository: withmartian/ares

Length of output: 3685


🏁 Script executed:

# Search for other Chat Completions clients that might handle tool_calls
fd -t f -name "*.py" src/ares/contrib | head -20

Repository: withmartian/ares

Length of output: 290


🏁 Script executed:

# Search for other LLM client implementations that might be Chat Completions-based
find src/ares/contrib -name "*.py" -type f

Repository: withmartian/ares

Length of output: 432


🏁 Script executed:

# Search for tool_calls handling in the codebase
rg -A 5 -B 5 "tool_calls" src/ares --type py | head -100

Repository: withmartian/ares

Length of output: 7538


🏁 Script executed:

# Search for other uses of response.make_response to see how it's used
rg "response\.make_response\|make_response" src/ares --type py -B 3 -A 3

Repository: withmartian/ares

Length of output: 42


🏁 Script executed:

# Look for open_responses.to_chat_completions_kwargs mentioned in the review scratchpad
rg "to_chat_completions_kwargs" src/ares --type py -B 3 -A 10

Repository: withmartian/ares

Length of output: 10653


🏁 Script executed:

# Look at how ChatCompletionCompatibleLLMClient handles responses
cat -n src/ares/llms/chat_completions_compatible.py

Repository: withmartian/ares

Length of output: 3664


🏁 Script executed:

# Search for response conversion functions from chat completions to Open Responses format
rg "from_chat_completion\|ChatCompletion" src/ares/llms --type py -B 2 -A 5 | head -150

Repository: withmartian/ares

Length of output: 42


Preserve tool calls in LlamaCpp response conversion.

This adapter discards message.tool_calls by extracting only message.content before calling response.make_response(), which is a text-only helper. If the model returns tool calls (which the request-side explicitly supports via to_chat_completions_kwargs()), they are silently lost and downstream code sees only plain text. This breaks tool-calling workflows.

Note that ChatCompletionCompatibleLLMClient has the same limitation, suggesting the issue is broader than just this client—both need response-side tool call support. Consider building a proper response converter that maps OpenAI chat completion messages (including tool_calls) back to Open Responses format, rather than relying on the text-only make_response() helper. Add a test case with tool call responses to prevent regression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/llama_cpp.py` around lines 101 - 109, The current conversion
in the LlamaCpp adapter discards tool call data by using
chat_completion.choices[0].message.content and response.make_response, so
preserve tool calls by converting the entire chat message (including
message.tool_calls) into the library's Response/InferenceResult format instead
of only text; update the logic around chat_completion.choices[0].message to
detect and map message.tool_calls into the equivalent fields on the Response (or
build a Response from message content plus tool_call structure) and return that
in response.InferenceResult (mirror the same pattern for
ChatCompletionCompatibleLLMClient to avoid the same loss); add a unit test that
supplies a chat_completion with message.tool_calls to assert the resulting
InferenceResult contains the tool call payload.

Comment on lines +78 to +81
def test_ensure_request_accepts_canonical_request():
request = open_responses.make_request([open_responses.user_message("Hello")])
result = open_responses.ensure_request(request)
assert result is request
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This test calls an API that isn't present.

open_responses.ensure_request(...) doesn't exist, and CI is already failing here with AttributeError. Either restore the helper in src/ares/llms/open_responses.py or update this test to exercise the supported request-normalization entry point.

🧰 Tools
🪛 GitHub Actions: unit-tests

[error] 80-80: pytest failure (exit code 1): AttributeError: module 'ares.llms.open_responses' has no attribute 'ensure_request' in test_ensure_request_accepts_canonical_request

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/llms/open_responses_test.py` around lines 78 - 81, The test is
calling a missing helper open_responses.ensure_request which raises
AttributeError; either reintroduce ensure_request in
src/ares/llms/open_responses.py (implement it to accept a request produced by
make_request and return the canonical request object) or change the test to use
the real normalization entry point (e.g., call the existing request
factory/normalizer such as open_responses.make_request or the module's public
normalize function) so the test asserts the canonical request is returned;
update references to ensure_request, make_request, and user_message accordingly
to keep the test exercising the supported API.

Copy link
Copy Markdown
Contributor

@rsmith49 rsmith49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Josh 🎉 This was a huge effort, and I think it will unlock a lot of functionality for us!

A lot of comments (sorry), one of the most important IMO is making sure our OpenResponses -> transformers chat templates conversion handles tool calls in the best way, since that will affect training a lot.

| │ │ from code agent via │ │
└─────────────────────────────────┼──│ QueueMediatedLLMClient │ │
LLMRequest (observation) │ └──────────────────┬─────────────┘ │
Open Responses observation │ └──────────────────┬─────────────┘ │
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "OpenResponsesRequest (observation)" instead to show the type -> RL concept

Open Responses observation │ └──────────────────┬─────────────┘ │
│ ^ │ │
LLMRequest │ │ LLMResponse
Open Responses │ │ InferenceResult
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same nit as above

* **Treats LLM responses as actions** - Your trainable agent/policy provides responses

Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``LLMResponse`` outputs given ``LLMRequest`` observations.
Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``InferenceResult`` outputs given canonical Open Responses observations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"canonical" is confusing here, I would remove

The ``LLMClient`` abstraction serves two purposes:

1. **Observations = LLM Requests**: In the RL loop, ``timestep.observation`` is an ``LLMRequest`` containing the messages the code agent wants to send to the LLM. This is the "state" your policy observes.
1. **Observations = Open Responses requests**: In the RL loop, ``timestep.observation`` is a canonical Open Responses request containing what the code agent wants to send to the LLM. This is the "state" your policy observes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same "canonical" comment

cost: float

This simple interface wraps OpenAI-style chat completion APIs. The ``messages`` field follows the OpenAI format with ``role`` (system/user/assistant) and ``content``.
ARES uses linguafranca's ``OpenResponsesRequest`` as the canonical request type for observations and client inputs. Edge adapters convert to Chat/Responses/Anthropic formats only when needed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to a note like

ARES leverages linguafranca for request and response types - we use the OpenResponsesRequest as our base request object returned from the environment for observations, and encourage users to use linguafranca.convert_* methods for translating between different provider and local formats.

@@ -1,383 +0,0 @@
"""Converter for Anthropic Messages API format.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love all the red 🙌 🚀

@@ -1,38 +1,99 @@
"""LLM response model."""
"""LLM response model wrapping linguafranca types."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth moving all this into open_responses.py?


# Twenty Questions — lightweight, no Docker needed.
registry.register_preset("20q", TwentyQuestionsSpec())
if "20q" not in existing_preset_names:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the above logic is just masking an error - IMO we either make register_preset allow overwriting existing presets, or we make sure that _register_default_presets is only called once (although the duplicate check in the list of harbor presets is worth keeping so we don't accidentally break if they add duplicates to their registry)

"harbor>=0.1.32",
"httpx>=0.28.1",
"jinja2>=3.1.6",
"martian-linguafranca>=0.1.5",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

It is a modern [gym](https://github.com/Farama-Foundation/Gymnasium): the environment layer powering RL research.

ARES treats LLMRequests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should give an actual class name - same comment as way above

Copy link
Copy Markdown

@Jhoysbou Jhoysbou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of linguafranca looks good to me

_LOGGER.warning("%s warning for %s: %s", context, warning.field, warning.message)


def to_jsonable(value: Any) -> _JSONABLE:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LF has a similar built-in function. If we adopt StrEnum, you can actually delete this and rely on LF's implementation.

def _to_jsonable(value: object) -> Any:
    if is_dataclass(value) and not isinstance(value, type):
        return asdict(value)
    if isinstance(value, dict):
        return value
    if hasattr(value, "model_dump"):
        try:
            return value.model_dump(mode="json")
        except TypeError:
            return value.model_dump()
    raise TypeError(f"unsupported payload type: {type(value)!r}")

This function is used in convert_request, convert_response, and the stream conversation classes. Consequently, you can simplify to_chat_completions_kwargs as well.

@mwilliammyers
Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
README.md (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Replace the legacy LLMResponses name.

LLMResponses reads like a concrete type, but the new action wrapper is ares.llms.InferenceResult. Using the removed name here will send readers toward a symbol that no longer exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 18, Update the README text to stop referencing the removed
symbol `LLMResponses` and instead point readers to the new action wrapper
`ares.llms.InferenceResult`; locate the sentence mentioning `LLMResponses` and
replace that token with `ares.llms.InferenceResult`, and confirm any linked
examples or docs (e.g., example 3) use the new symbol name so readers aren’t
directed to a non-existent type.
CLAUDE.md (1)

284-291: ⚠️ Potential issue | 🟡 Minor

Documentation still references non-existent types.

Despite being marked as addressed in prior commits, lines 285 and 289 still reference TextData and Usage which don't exist in the updated codebase. Based on the PR's migration, these should reference the actual exports:

  • llms.InferenceResult, llms.extract_text_content, llms.make_response
  • response.InferenceResult, response.extract_text_content, response.make_response
📝 Proposed fix
 - **External consumers** (examples, docs):
   - ✅ Good: `import ares` → use `ares.make(...)`
   - ✅ Good: `from ares.llms import open_responses` → use `open_responses.make_request(...)`
-  - ✅ Good: `from ares import llms` → use `llms.TextData`, `llms.Usage`
-  - ❌ Avoid: `from ares.llms import OpenResponsesRequest, TextData`
+  - ✅ Good: `from ares import llms` → use `llms.InferenceResult`, `llms.extract_text_content`
+  - ❌ Avoid: `from ares.llms import InferenceResult, extract_text_content`
 - **Internal code**:
   - ✅ Good: `from ares.llms import open_responses` → use `open_responses.make_request(...)`
-  - ✅ Good: `from ares.llms import response` → use `response.TextData`, `response.Usage`
+  - ✅ Good: `from ares.llms import response` → use `response.InferenceResult`, `response.extract_text_content`
   - ❌ Avoid: `from ares.llms.open_responses import Request`
-  - ❌ Avoid: `from ares.llms.response import TextData, Usage`
+  - ❌ Avoid: `from ares.llms.response import InferenceResult, extract_text_content`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 284 - 291, Update the documentation references that
still mention TextData and Usage to use the new exported symbols: replace any
occurrences of llms.TextData and llms.Usage with llms.InferenceResult,
llms.extract_text_content and llms.make_response (as appropriate), and similarly
replace response.TextData and response.Usage with response.InferenceResult,
response.extract_text_content and response.make_response; ensure examples and
import guidance reference llms.InferenceResult / llms.extract_text_content /
llms.make_response (and response.* counterparts) rather than the removed
TextData/Usage types.
src/ares/llms/open_responses_test.py (1)

78-81: ⚠️ Potential issue | 🔴 Critical

CI is failing: ensure_request does not exist.

The test calls open_responses.ensure_request() but this function is not defined in the open_responses module, causing AttributeError in CI. Either implement ensure_request in src/ares/llms/open_responses.py or remove/update this test.

🔧 Option 1: Remove the test if ensure_request is not needed
-def test_ensure_request_accepts_canonical_request():
-    request = open_responses.make_request([open_responses.user_message("Hello")])
-    result = open_responses.ensure_request(request)
-    assert result is request
🔧 Option 2: Add ensure_request to open_responses.py
def ensure_request(request: lft.OpenResponsesRequest) -> lft.OpenResponsesRequest:
    """Return the request as-is (identity function for canonical requests)."""
    return request
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/llms/open_responses_test.py` around lines 78 - 81, The test fails
because open_responses.ensure_request is missing; add an identity helper in the
module (implement ensure_request) with the signature matching the expected type
(e.g., def ensure_request(request: lft.OpenResponsesRequest) ->
lft.OpenResponsesRequest) that simply returns the passed request, so tests
calling open_responses.ensure_request(), open_responses.make_request(), and
open_responses.user_message() will work; place the function in
src/ares/llms/open_responses.py near related helpers and ensure it uses the same
lft/OpenResponsesRequest type import as the module.
🧹 Nitpick comments (1)
src/ares/contrib/transformers_client.py (1)

87-167: Consider adding a parameter for native tool call support.

The flattening approach for function_call and function_call_output items works for models without tool call support, but transformers has first-class tool call support via apply_chat_template for compatible models. The current implementation always flattens tool interactions to plain text, which loses semantic structure for models that could handle it natively.

Consider adding a model_supports_tools: bool parameter to branch between the flattened approach (for simpler models) and the native transformers tool format (for capable models). This could improve performance for models that have been trained with tool-calling capabilities.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/transformers_client.py` around lines 87 - 167, Add a boolean
toggle so _render_request_to_chat_messages can preserve native tool-call
structures for transformers models that support them: modify
_render_request_to_chat_messages to accept model_supports_tools: bool = False
and, when True, avoid flattening items of type "function_call" and
"function_call_output" into text and instead translate them into the native
tool-call message shapes expected by apply_chat_template (preserving fields like
name, call_id, arguments, output) while still using
_render_content_to_text/_render_value_to_text for any nested content; keep the
existing flattened behavior as the default for backward compatibility and update
callers to pass model_supports_tools when they know the model supports tool
calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CONTRIBUTING.md`:
- Around line 115-116: Update the CONTRIBUTING.md example to explicitly pass
token counts to the response factory so readers see the full API signature: when
constructing response.InferenceResult, call response.make_response with
input_tokens and output_tokens (e.g., response.make_response("Hello!",
input_tokens=..., output_tokens=...)) and keep cost=0.0; this clarifies the
parameters used by response.make_response and matches how tests exercise the
API.

In `@examples/utils.py`:
- Around line 33-37: The fallback branch can produce the literal "None" when
observation.instructions is None; update the else branch that sets
observation_content (after open_responses.to_chat_messages) to use a proper
truthy check instead of str(...) — e.g., use observation.instructions if it is
truthy (or use getattr(observation, "instructions", None)) and otherwise set
observation_content to "(no messages)"; reference observation_content, messages,
open_responses.to_chat_messages, and observation.instructions when making the
change.

In `@src/ares/containers/docker.py`:
- Around line 104-106: Create a shared ContainerNotStartedError and use it
instead of RuntimeError where container access is guarded (replace the
RuntimeError in src/ares/containers/docker.py checks around self._container and
the similar guards at the other sites), and update
src/ares/testing/mock_container.py to raise the same ContainerNotStartedError so
tests and real Docker behavior share the contract; add the new exception to the
module/public API where other callers import container classes so callers can
catch ContainerNotStartedError explicitly.

In `@src/ares/presets_test.py`:
- Around line 3-11: The import block in presets_test.py is not sorted per the
project's isort/Ruff rules; reorder the imports (pytest, standard library first
if any, then third-party, then local package imports) so they match
Google-style/isort order and groupings — specifically adjust the lines importing
pytest, linguafranca.types as lft, ares, from ares import presets, registry, and
the ares.llms imports (open_responses, response); run `ruff check --fix` or `uv
run ruff check --fix` to automatically apply the correct ordering and ensure the
import block passes I001.

---

Duplicate comments:
In `@CLAUDE.md`:
- Around line 284-291: Update the documentation references that still mention
TextData and Usage to use the new exported symbols: replace any occurrences of
llms.TextData and llms.Usage with llms.InferenceResult,
llms.extract_text_content and llms.make_response (as appropriate), and similarly
replace response.TextData and response.Usage with response.InferenceResult,
response.extract_text_content and response.make_response; ensure examples and
import guidance reference llms.InferenceResult / llms.extract_text_content /
llms.make_response (and response.* counterparts) rather than the removed
TextData/Usage types.

In `@README.md`:
- Line 18: Update the README text to stop referencing the removed symbol
`LLMResponses` and instead point readers to the new action wrapper
`ares.llms.InferenceResult`; locate the sentence mentioning `LLMResponses` and
replace that token with `ares.llms.InferenceResult`, and confirm any linked
examples or docs (e.g., example 3) use the new symbol name so readers aren’t
directed to a non-existent type.

In `@src/ares/llms/open_responses_test.py`:
- Around line 78-81: The test fails because open_responses.ensure_request is
missing; add an identity helper in the module (implement ensure_request) with
the signature matching the expected type (e.g., def ensure_request(request:
lft.OpenResponsesRequest) -> lft.OpenResponsesRequest) that simply returns the
passed request, so tests calling open_responses.ensure_request(),
open_responses.make_request(), and open_responses.user_message() will work;
place the function in src/ares/llms/open_responses.py near related helpers and
ensure it uses the same lft/OpenResponsesRequest type import as the module.

---

Nitpick comments:
In `@src/ares/contrib/transformers_client.py`:
- Around line 87-167: Add a boolean toggle so _render_request_to_chat_messages
can preserve native tool-call structures for transformers models that support
them: modify _render_request_to_chat_messages to accept model_supports_tools:
bool = False and, when True, avoid flattening items of type "function_call" and
"function_call_output" into text and instead translate them into the native
tool-call message shapes expected by apply_chat_template (preserving fields like
name, call_id, arguments, output) while still using
_render_content_to_text/_render_value_to_text for any nested content; keep the
existing flattened behavior as the default for backward compatibility and update
callers to pass model_supports_tools when they know the model supports tool
calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f19fd66b-963b-413f-9d15-cd7bb3c1b385

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 134ef0f.

📒 Files selected for processing (43)
  • CLAUDE.md
  • CONTRIBUTING.md
  • README.md
  • docs/source/core-concepts.rst
  • docs/source/how-it-works.rst
  • docs/source/index.rst
  • examples/04_rl_training_with_skyrl.py
  • examples/05_tinker_train.py
  • examples/utils.py
  • pyproject.toml
  • src/ares/__init__.py
  • src/ares/code_agents/code_agent_base.py
  • src/ares/code_agents/mini_swe_agent.py
  • src/ares/code_agents/terminus2/terminus2_agent.py
  • src/ares/code_agents/terminus2/terminus2_agent_test.py
  • src/ares/containers/docker.py
  • src/ares/contrib/eval_visualizer.py
  • src/ares/contrib/llama_cpp.py
  • src/ares/contrib/mech_interp/hooked_transformer_client.py
  • src/ares/contrib/transformers_client.py
  • src/ares/contrib/transformers_client_test.py
  • src/ares/environments/code_env.py
  • src/ares/environments/twenty_questions.py
  • src/ares/llms/__init__.py
  • src/ares/llms/anthropic_converter.py
  • src/ares/llms/anthropic_converter_test.py
  • src/ares/llms/chat_completions_compatible.py
  • src/ares/llms/llm_clients.py
  • src/ares/llms/open_responses.py
  • src/ares/llms/open_responses_test.py
  • src/ares/llms/openai_chat_converter.py
  • src/ares/llms/openai_chat_converter_test.py
  • src/ares/llms/openai_responses_converter.py
  • src/ares/llms/openai_responses_converter_test.py
  • src/ares/llms/queue_mediated_client.py
  • src/ares/llms/queue_mediated_client_test.py
  • src/ares/llms/request.py
  • src/ares/llms/request_test.py
  • src/ares/llms/response.py
  • src/ares/presets.py
  • src/ares/presets_test.py
  • src/ares/testing/mock_llm.py
  • src/ares/testing/mock_llm_test.py
💤 Files with no reviewable changes (8)
  • src/ares/llms/request.py
  • src/ares/llms/request_test.py
  • src/ares/llms/anthropic_converter_test.py
  • src/ares/llms/openai_responses_converter_test.py
  • src/ares/llms/openai_chat_converter_test.py
  • src/ares/llms/openai_responses_converter.py
  • src/ares/llms/openai_chat_converter.py
  • src/ares/llms/anthropic_converter.py

Comment on lines +115 to +116
req = open_responses.make_request([open_responses.user_message("Hello")])
resp = response.InferenceResult(response=response.make_response("Hello!"), cost=0.0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the signature of make_response to see if tokens are required
ast-grep --pattern $'def make_response($$$) -> $_:
    $$$'

Repository: withmartian/ares

Length of output: 3038


🏁 Script executed:

#!/bin/bash
# First, let's examine the actual CONTRIBUTING.md at lines 115-116
head -120 CONTRIBUTING.md | tail -10

Repository: withmartian/ares

Length of output: 423


🏁 Script executed:

#!/bin/bash
# Search for make_response usage patterns in the codebase to see how it's typically called
rg "make_response\(" --max-count 20 -B 1 -A 1

Repository: withmartian/ares

Length of output: 3674


Consider explicitly showing token parameters in the documentation example.

While response.make_response("Hello!") is syntactically valid (both input_tokens and output_tokens default to 0), the example in CONTRIBUTING.md would be more instructive if it explicitly shows these parameters, consistent with how tests use the API:

resp = response.InferenceResult(
    response=response.make_response("Hello!", input_tokens=10, output_tokens=5),
    cost=0.0
)

This makes the full API signature clearer for readers learning the framework.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CONTRIBUTING.md` around lines 115 - 116, Update the CONTRIBUTING.md example
to explicitly pass token counts to the response factory so readers see the full
API signature: when constructing response.InferenceResult, call
response.make_response with input_tokens and output_tokens (e.g.,
response.make_response("Hello!", input_tokens=..., output_tokens=...)) and keep
cost=0.0; this clarifies the parameters used by response.make_response and
matches how tests exercise the API.

Comment on lines +33 to +37
messages = open_responses.to_chat_messages(observation, strict=False)
if len(messages) > 0:
observation_content = messages[-1].get("content", "")
else:
observation_content = str(observation.system_prompt) or "(no messages)"
observation_content = str(observation.instructions) or "(no messages)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential "None" string if instructions is absent.

When messages is empty and observation.instructions is None, str(observation.instructions) evaluates to the literal string "None" rather than falling back to "(no messages)".

🛡️ Proposed fix
         if len(messages) > 0:
             observation_content = messages[-1].get("content", "")
         else:
-            observation_content = str(observation.instructions) or "(no messages)"
+            observation_content = observation.instructions or "(no messages)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/utils.py` around lines 33 - 37, The fallback branch can produce the
literal "None" when observation.instructions is None; update the else branch
that sets observation_content (after open_responses.to_chat_messages) to use a
proper truthy check instead of str(...) — e.g., use observation.instructions if
it is truthy (or use getattr(observation, "instructions", None)) and otherwise
set observation_content to "(no messages)"; reference observation_content,
messages, open_responses.to_chat_messages, and observation.instructions when
making the change.

Comment on lines +104 to +106
if self._container is None:
raise RuntimeError("Container not started. Call start() first.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the new pre-start failure part of the shared container contract.

src/ares/testing/mock_container.py:45-76 still allows these calls before start(), so tests can stay green while real Docker now raises. If this precondition is intentional, mirror it in the mock and expose a shared ContainerNotStartedError instead of a bare RuntimeError so callers can handle it consistently.

As per coding guidelines: src/**/*.py: Implement custom exception hierarchies to distinguish terminating vs non-terminating errors.

Also applies to: 135-136, 165-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/containers/docker.py` around lines 104 - 106, Create a shared
ContainerNotStartedError and use it instead of RuntimeError where container
access is guarded (replace the RuntimeError in src/ares/containers/docker.py
checks around self._container and the similar guards at the other sites), and
update src/ares/testing/mock_container.py to raise the same
ContainerNotStartedError so tests and real Docker behavior share the contract;
add the new exception to the module/public API where other callers import
container classes so callers can catch ContainerNotStartedError explicitly.

Comment on lines +3 to +11
import pytest

from linguafranca import types as lft

import ares
from ares import presets
from ares import registry
from ares.llms import open_responses
from ares.llms import response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix import sorting to pass ruff check.

The CI pipeline is failing with I001: import block is un-sorted or un-formatted. Rearrange imports to comply with Google-style isort configuration.

🔧 Proposed fix
-import pytest
-
-from linguafranca import types as lft
-
 import ares
 from ares import presets
 from ares import registry
 from ares.llms import open_responses
 from ares.llms import response
+from linguafranca import types as lft
+import pytest

Note: The exact ordering depends on your ruff/isort configuration. Run uv run ruff check --fix to auto-correct.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import pytest
from linguafranca import types as lft
import ares
from ares import presets
from ares import registry
from ares.llms import open_responses
from ares.llms import response
import pytest
from linguafranca import types as lft
import ares
from ares import presets
from ares import registry
from ares.llms import open_responses
from ares.llms import response
🧰 Tools
🪛 GitHub Actions: ruff

[error] 3-11: ruff check failed: import block is un-sorted or un-formatted. Rule: I001. Help: Organize imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/presets_test.py` around lines 3 - 11, The import block in
presets_test.py is not sorted per the project's isort/Ruff rules; reorder the
imports (pytest, standard library first if any, then third-party, then local
package imports) so they match Google-style/isort order and groupings —
specifically adjust the lines importing pytest, linguafranca.types as lft, ares,
from ares import presets, registry, and the ares.llms imports (open_responses,
response); run `ruff check --fix` or `uv run ruff check --fix` to automatically
apply the correct ordering and ensure the import block passes I001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants