Use linguafranca Open Responses requests across ARES#97
Use linguafranca Open Responses requests across ARES#97joshgreaves wants to merge 14 commits intomainfrom
Conversation
Make Open Responses the canonical internal request shape so queue mediation, environments, agents, and local clients share one path. Keep legacy request adapters at the edges through thin linguafranca-backed converters while preserving existing behavior.
📝 WalkthroughWalkthroughThe PR replaces the internal ARES LLM request/response abstractions with linguafranca "Open Responses" types, introduces an Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Address audit findings by preserving embedded assistant tool calls, restoring preset re-registration after registry clears, and rejecting unknown external request parameters in strict mode. Align docs and helpers with the canonical Open Responses request path and keep local batching permissive where the previous behavior expected it.
Unify the canonical chat serialization path with the tested converter cleanup, make preset registration idempotent without registry coupling, and add direct regression coverage for queue-mediated requests and preset-based observations. Update public docs and examples to reflect Open Responses as the runtime request model.
|
|
||
| _LOGGER = logging.getLogger(__name__) | ||
|
|
||
| MODEL_STUB = "__ARES_MODEL_UNSET__" |
There was a problem hiding this comment.
What do you mean?
andrewtran117
left a comment
There was a problem hiding this comment.
left small comment on error handling the model stub. otherwise looks good pending test files passed.
…stency, tighten public API
…ctly Delete the ares.llms.request module and all legacy conversion functions (from_legacy_request, to_legacy_request). Converters now take/return lft.OpenResponsesRequest directly, simplifying the codebase. - Remove request.py and request_test.py (legacy LLMRequest dataclass) - Update open_responses.py: remove legacy conversion helpers, use lft types - Update converters (anthropic, openai_chat, openai_responses) to use lft.OpenResponsesRequest input/output - Update all tests to use open_responses.make_request() instead of LLMRequest - Update type annotations across agents, environments, and clients Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…types Migrate from custom response types (LLMResponse, TextData, Usage) to a thin wrapper that holds the linguafranca OpenResponsesResponse directly. This simplifies the codebase by using canonical types throughout. Key changes: - Rename LLMResponse to InferenceResult with fields: response, cost - Add extract_text_content() utility function for text extraction - Add make_response() helper for creating test responses - Update all code agents, environments, and tests to use new types - Fix docker.py runtime checks for optional container access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These converters were dead code - only used in their own tests. The actual conversion happens via open_responses.to_chat_completions_kwargs() which calls linguafranca directly. Deleted: - openai_chat_converter.py - anthropic_converter.py - openai_responses_converter.py - Their corresponding test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if not self.preset_name: | ||
| raise ValueError("preset_name must be provided in extras or kwargs") | ||
| self.env: ares.Environment[llms.LLMResponse, llms.LLMRequest, float, float] | None = None | ||
| self.env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float] | None = None |
There was a problem hiding this comment.
Q for @rsmith49
There is a little bit of asymmetry here, wdyt of it? Inputs to step are LLM responses (ARES type, though it wraps linguafranca), outputs of step are LLM requests (linguafranca type).
Is this confusing? We could alias linguafranca types so people can use ARES aliases instead, but I'm not sure if that's even more confusing.
There was a problem hiding this comment.
Yeah, I agree it feels a little off. The right approach is probably to wrap lft.OpenResponsesRequest with an ARES-specific wrapper as well to future proof in case there are other top level fields we need (like cost for the response), and have it be a single attribute dataclass?
If we want to do this approach long-term, I think aliasing the type within ARES makes sense for now
There was a problem hiding this comment.
I agree, I ended up using an alias to the type called InferenceRequest
Now there's a slightly weird matchup, InferenceRequest and InferenceResult. I think this is better, but agree Result is weird, but it feels like we can't do Response
| pass | ||
|
|
||
|
|
||
| def _render_content_to_text(content: object, *, context: str) -> str: |
There was a problem hiding this comment.
I don't love all these new formatting fns, but this is contrib so we can come back to this if we think it's worth it.
| import ares | ||
| from ares import containers | ||
| from ares import llms | ||
| from ares.llms import open_responses | ||
| import chz | ||
| import frozendict | ||
| from linguafranca import types as lft | ||
| import numpy as np |
There was a problem hiding this comment.
Should we move from ares.llms import open_responses and from linguafranca import types as lft after the third-party imports so all third-party imports appear before ARES/local imports per CLAUDE.md and CONTRIBUTING.md?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
examples/05_tinker_train.py around lines 49-56, the local ARES imports (import ares,
from ares import containers, from ares import llms, from ares.llms import
open_responses) are placed among third-party imports. Reorder the imports so all
third-party imports (chz, frozendict, from linguafranca import types as lft, numpy,
tinker, from tinker_cookbook import cli_utils) appear together first, then add a blank
line, and place the ARES/local imports after them. Preserve existing import names and
aliases and ensure the file follows stdlib → third-party → local grouping with a
blank line separating groups.
| def with_model(request: lft.OpenResponsesRequest, model: str) -> lft.OpenResponsesRequest: | ||
| """Return a copy of the request with the model field replaced. | ||
|
|
||
| Args: | ||
| request: The original request. | ||
| model: The new model identifier. | ||
|
|
||
| Returns: | ||
| A new request with the updated model field. | ||
| """ | ||
| return dataclasses.replace(request, model=model) | ||
|
|
There was a problem hiding this comment.
open_responses.ensure_request is referenced by tests but not defined and raises AttributeError, should we add ensure_request that returns the canonical request unchanged?
def ensure_request(req): return req
Finding type: Logical Bugs | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/llms/open_responses.py around lines 168 to 183, add a new top-level function
named ensure_request that accepts an OpenResponsesRequest and returns it unchanged.
Implement it as: def ensure_request(request: lft.OpenResponsesRequest) ->
lft.OpenResponsesRequest: """Return the canonical request unchanged; provided for
callers/tests that expect this helper.""" return request. Keep typing and a short
docstring so it is importable and documented.
| def to_jsonable(value: Any) -> _JSONABLE: | ||
| # TODO: Replace this with frfr. | ||
| # The issue is that OpenResponsesRequest has enum values, which frfr doesn't handle correctly yet. | ||
| # It won't, in general, either. So we should make sure OpenResposnesRequest enums are StrEnums where appropriate. | ||
| if dataclasses.is_dataclass(value): | ||
| return {field.name: to_jsonable(getattr(value, field.name)) for field in dataclasses.fields(value)} | ||
| if isinstance(value, enum.Enum): |
There was a problem hiding this comment.
Comment before to_jsonable says OpenResposnesRequest, should we replace it with OpenResponsesRequest?
Finding type: Naming and Typos | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/llms/open_responses.py around lines 266 to 272, the comment above the
to_jsonable function mistakenly spells 'OpenResponsesRequest' as 'OpenResposnesRequest'.
Edit the comment to correct the typo to 'OpenResponsesRequest' (leave the rest of the
comment text unchanged) so the terminology matches the rest of the module.
| observation = open_responses.make_request( | ||
| [open_responses.user_message(initial_prompt)], | ||
| instructions=self._system_prompt, | ||
| ) |
There was a problem hiding this comment.
The branches of step() duplicate the open_responses.make_request([open_responses.user_message(..., instructions=self._system_prompt)observation construction, should we extract it into_make_observation_prompt(text: str)`?
Finding type: Code Dedup and Conventions | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
|
|
||
| import ares | ||
| from ares import llms | ||
| from ares.llms import open_responses | ||
| from linguafranca import types as lft | ||
| import tqdm | ||
|
|
There was a problem hiding this comment.
ares imports sit before third-party tqdm with no separating blank line — should we move ares into the final local group separated by a blank line per CLAUDE.md?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In examples/utils.py
around lines 7-13, the import group ordering is wrong: local ARES imports (import ares,
from ares import llms, from ares.llms import open_responses, from linguafranca import
types as lft) appear before the third-party import tqdm and there is no blank line
separating third-party and local imports. Reorder the imports into conventional groups
(stdlib first, then third-party, then local), place from ares... and from
linguafranca... into the local group after tqdm, and add a single blank line between the
third-party group and the local ARES imports. Also ensure imports within each group are
sorted consistently (e.g., alphabetically).
| ) -> containers.ExecResult: | ||
| if self._container is None: | ||
| raise RuntimeError("Container not started. Call start() first.") | ||
|
|
||
| # Use default_workdir if workdir is not explicitly provided |
There was a problem hiding this comment.
self._container is None start guard is duplicated in exec_run, upload_files, and download_files — should we extract it into a _assert_started() helper?
Finding type: Code Dedup and Conventions | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
There was a problem hiding this comment.
Yeah #47 addressed this - would be great to address merge conflicts on that and get it merged at some point
There was a problem hiding this comment.
Thanks for the reminder—#47 has landed but the docker container guard conflict still needs resolving, so please rebase onto the latest main and address the duplication before we can merge.
| """LLM client interfaces and data types. | ||
|
|
||
| Canonical request builders and request types live in :mod:`ares.llms.open_responses`. | ||
|
|
||
| Prefer ``from ares.llms import open_responses`` to access request types and builders | ||
| rather than importing individual type aliases from this package. | ||
| """ | ||
|
|
||
| # Request types | ||
| # Client protocol | ||
| from ares.llms.chat_completions_compatible import ChatCompletionCompatibleLLMClient | ||
| from ares.llms.llm_clients import LLMClient | ||
| from ares.llms.request import AssistantMessage | ||
| from ares.llms.request import LLMRequest | ||
| from ares.llms.request import Message | ||
| from ares.llms.request import ToolCallMessage | ||
| from ares.llms.request import ToolCallResponseMessage | ||
| from ares.llms.request import UserMessage | ||
|
|
||
| # Response types | ||
| from ares.llms.response import LLMResponse | ||
| from ares.llms.response import TextData | ||
| from ares.llms.response import Usage | ||
| from ares.llms.response import InferenceResult | ||
| from ares.llms.response import extract_text_content |
There was a problem hiding this comment.
from ares.llms import LLMRequest breaks after removing src/ares/llms/request.py; should we reintroduce thin shims/aliases in ares.llms to re-export the new types or provide a documented migration path and review the removed file for shim locations?
Finding type: Breaking Changes | Severity: 🔴 High
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
src/ares/llms/__init__.py around lines 1-21, the package no longer re-exports the
previous request/message/tool dataclasses (e.g. LLMRequest, AssistantMessage,
ToolCallMessage, UserMessage) and this breaks downstream `from ares.llms import ...`
imports. Restore backwards-compatible thin shims by re-importing and re-exporting those
types (either from src/ares/llms/request.py if it still exists, or from the new
open_responses module if types were moved) and add them back into __all__. Also add
short deprecation notices (warnings.warn) or docstrings pointing to the new canonical
API (ares.llms.open_responses) so consumers have a migration path. Finally, review
src/ares/llms/request.py to ensure the correct original type names are exported and
include a unit test that imports the legacy names to validate compatibility.
| import ares | ||
| from ares import llms | ||
| from ares.llms import open_responses | ||
| import hydra | ||
| from linguafranca import types as lft | ||
| import omegaconf |
There was a problem hiding this comment.
ARES imports appear before third-party imports; should we reorder to stdlib → third-party → local/ARES with blank lines and ARES last per CLAUDE.md?
Finding type: AI Coding Guidelines | Severity: 🟢 Low
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents:
Before applying, verify this suggestion against the current code. In
examples/04_rl_training_with_skyrl.py around lines 49 to 54, the ARES imports (import
ares; from ares import llms; from ares.llms import open_responses) are placed before
third-party imports, violating the project's import ordering. Reorder imports to follow:
stdlib imports first (unchanged), then a single blank line, then all third-party imports
(hydra, linguafranca/types, omegaconf, ray, skyrl_gym) grouped together, then a single
blank line, and finally the ARES/local imports grouped together (import ares; from ares
import llms; from ares.llms import open_responses). Ensure there is exactly one blank
line separating each group and update any import lines or spacing accordingly.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/ares/contrib/mech_interp/hooked_transformer_client.py (2)
74-91:⚠️ Potential issue | 🟠 MajorValidate
max_output_tokensbefore subtracting it fromn_ctx.Values
<= 0or>= self.model.cfg.n_ctxmakemax_new_tokensinvalid or driveapply_chat_template(..., max_length=...)to a non-positive value. Right now that becomes a runtime failure instead of a clean input error.🛠️ Proposed guard
- max_output_tokens = max_output_tokens or request.max_output_tokens or self.max_new_tokens + if max_output_tokens is None: + max_output_tokens = request.max_output_tokens + if max_output_tokens is None: + max_output_tokens = self.max_new_tokens + if max_output_tokens <= 0: + raise ValueError("max_output_tokens must be positive") + if max_output_tokens >= self.model.cfg.n_ctx: + raise ValueError("max_output_tokens must be smaller than the model context window")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/contrib/mech_interp/hooked_transformer_client.py` around lines 74 - 91, Validate resolved max_output_tokens (from max_output_tokens or request.max_output_tokens or self.max_new_tokens) before using it to compute max_length for apply_chat_template: ensure it is >0 and < self.model.cfg.n_ctx, and if not raise a clear ValueError (or return a clean input error) indicating the invalid max_output_tokens and the allowed range. Do this check immediately after computing max_output_tokens and before computing max_length/self.model.tokenizer.apply_chat_template; reference the symbols max_output_tokens, request.max_output_tokens, self.max_new_tokens, self.model.cfg.n_ctx, and apply_chat_template so the guard is easy to locate and update.
97-106:⚠️ Potential issue | 🟠 Major
top_pis silently ignored in this client.The canonical request now carries sampling controls, but this adapter only forwards
temperature. That means the sameOpenResponsesRequestbehaves differently here than it does inTransformersLLMClient. Either passrequest.top_pthrough to generation or reject it explicitly instead of dropping it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/contrib/mech_interp/hooked_transformer_client.py` around lines 97 - 106, The client currently only forwards request.temperature into gen_kwargs (see gen_kwargs and the temperature handling block), silently dropping request.top_p; update the adapter in HookedTransformerClient (hooked_transformer_client.py) to also forward request.top_p into gen_kwargs when not None, or explicitly raise an error if top_p is unsupported. Locate the generation kwargs construction (gen_kwargs) and either add gen_kwargs["top_p"] = request.top_p when request.top_p is not None, or add a guard that raises a clear exception mentioning top_p if you intend not to support it.src/ares/contrib/eval_visualizer.py (1)
91-121:⚠️ Potential issue | 🟡 MinorWiden the dashboard wrapper's observation types to accept
CodeEnvironment.Lines 93, 112, 121 (constructor,
reset(),step()return types) and line 702 (wrap()parameter type) currently expectlft.OpenResponsesRequest, butCodeEnvironmentis defined with observation typelft.OpenResponsesRequest | Noneon terminal timesteps. This causes static type checking to rejectCodeEnvironmentas an argument towrap()or theTrackedEnvironmentconstructor, despite the docstring claiming to support "all ARES code agent environments."Suggested fix
- env: base.Environment[response.InferenceResult, lft.OpenResponsesRequest, RewardType, DiscountType], + env: base.Environment[ + response.InferenceResult, lft.OpenResponsesRequest | None, RewardType, DiscountType + ], @@ - async def reset(self) -> base.TimeStep[lft.OpenResponsesRequest, RewardType, DiscountType]: + async def reset(self) -> base.TimeStep[lft.OpenResponsesRequest | None, RewardType, DiscountType]: @@ - ) -> base.TimeStep[lft.OpenResponsesRequest, RewardType, DiscountType]: + ) -> base.TimeStep[lft.OpenResponsesRequest | None, RewardType, DiscountType]: @@ - env: base.Environment[response.InferenceResult, lft.OpenResponsesRequest, RewardType, DiscountType], + env: base.Environment[ + response.InferenceResult, lft.OpenResponsesRequest | None, RewardType, DiscountType + ],Rerun pyright to confirm the fix resolves the type incompatibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/contrib/eval_visualizer.py` around lines 91 - 121, The type annotations for observations in TrackedEnvironment need to accept CodeEnvironment terminal timesteps that use None; update the constructor signature, reset() return type, step() return type, and the wrap() parameter to use lft.OpenResponsesRequest | None (or Optional[lft.OpenResponsesRequest]) instead of just lft.OpenResponsesRequest so CodeEnvironment is accepted; modify the annotations on __init__, reset, step, and the wrap(...) parameter accordingly and rerun pyright to confirm the type error is resolved.examples/05_tinker_train.py (1)
124-138:⚠️ Potential issue | 🟡 MinorWiden the env observation type to accept None.
Line 126 declares the observation type as
lft.OpenResponsesRequest, but the environment'sstep()method returnsTimeStepwithlft.OpenResponsesRequest | Noneon terminal transitions. The_get_tinker_observation()method already handles theNonecase correctly (line 139), andCodeEnvironment—the main environment this adapter wraps—declares observation type with| None. The type annotation must match the actual behavior.Suggested fix
- env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float], + env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest | None, float, float],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/05_tinker_train.py` around lines 124 - 138, The constructor's env annotation is too narrow: change the env type parameter in __init__ to accept observations that may be None (i.e., use lft.OpenResponsesRequest | None) so it matches the TimeStep passed into _get_tinker_observation; update the signature where env is typed (the __init__ method) to ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest | None, float, float] so the type matches the actual environment behavior handled by _get_tinker_observation.
🧹 Nitpick comments (4)
src/ares/containers/docker.py (1)
104-106: Consider centralizing the startup guard and documenting the raisedRuntimeError.The same check/message is repeated in three methods. A small helper (e.g.,
_require_started_container()) would reduce duplication and improve type narrowing. Also worth updating theContainerprotocol/docstrings to mention this precondition explicitly for callers.Also applies to: 135-136, 165-166
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/containers/docker.py` around lines 104 - 106, Extract the repeated guard "if self._container is None: raise RuntimeError('Container not started. Call start() first.')" into a small private helper method, e.g., _require_started_container(self), and replace the three occurrences with calls to that helper to reduce duplication and enable type narrowing of self._container; update the Container protocol/docstrings to document that methods accessing self._container require the container to be started and that they raise RuntimeError otherwise (mention _require_started_container as the enforcement point).README.md (1)
18-18: Minor terminology inconsistency."Open Responses requests" and "LLMResponses" are mixed terminology styles. Given the migration, consider using consistent naming like "Open Responses requests" and "inference results" (or "LLM responses" to match the code comment style).
📝 Suggested wording
-ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself. +ARES treats Open Responses requests as observations and LLM responses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 18, The README line uses mixed terminology ("Open Responses requests" vs "LLMResponses"); update the sentence to use consistent naming (e.g., keep "Open Responses requests" and change "LLMResponses" to "LLM responses" or "inference results") so the phrasing matches other comments and code; locate the exact sentence containing "Open Responses requests" and "LLMResponses" and replace "LLMResponses" with the chosen consistent term across the README to maintain style uniformity.examples/04_rl_training_with_skyrl.py (1)
109-110: Considerstrict=Falsefor robustness in training scenarios.Using
strict=Trueinto_chat_messageswill raiseValueErrorif observations contain non-text content (images, tool calls, function outputs). While this provides fail-fast behavior, it could cause training crashes if any environment produces such observations.Consider using
strict=Falseto match the pattern inexamples/utils.py, or document that this training integration assumes text-only observations.Also applies to: 135-135
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/04_rl_training_with_skyrl.py` around lines 109 - 110, The training code uses open_responses.to_chat_messages(ts.observation, strict=True) which will raise ValueError on non-text observations; change strict=True to strict=False in the two uses of to_chat_messages (the calls on ts.observation) so the function tolerates images/tool outputs during training and matches the pattern in examples/utils.py, or alternatively add a clear comment documenting the assumption of text-only observations if you intentionally want to keep strict=True.docs/source/core-concepts.rst (1)
160-170: Code example missing import foropen_responsesmodule.The example uses
open_responses.make_request()andopen_responses.user_message()but doesn't show the required import statement. Consider adding the import for clarity:from ares.llms import open_responsesThis would help users understand where these helpers come from when following the documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/core-concepts.rst` around lines 160 - 170, The example uses open_responses.make_request and open_responses.user_message but omits the import; add an import for the open_responses helper (e.g., import open_responses from its module) at the top of the snippet so symbols like open_responses.make_request and open_responses.user_message are defined when demonstrating the async run(self, task: str) -> None method and its calls to self._llm_client and self.parse_commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Around line 284-291: The docs reference non-existent types (llms.TextData,
llms.Usage, response.TextData, response.Usage); update CLAUDE.md to use the
actual exported symbols: replace TextData usages with InferenceResult or
extract_text_content/make_response where appropriate, and replace Usage with
lft.Usage (from linguafranca). Also prefer referencing the package exports:
ChatCompletionCompatibleLLMClient, InferenceResult, LLMClient,
extract_text_content, make_response from ares.llms/__init__.py instead of the
removed types; search for any occurrences of TextData/Usage and swap them to the
correct symbols and examples accordingly.
In `@src/ares/contrib/llama_cpp.py`:
- Around line 101-109: The current conversion in the LlamaCpp adapter discards
tool call data by using chat_completion.choices[0].message.content and
response.make_response, so preserve tool calls by converting the entire chat
message (including message.tool_calls) into the library's
Response/InferenceResult format instead of only text; update the logic around
chat_completion.choices[0].message to detect and map message.tool_calls into the
equivalent fields on the Response (or build a Response from message content plus
tool_call structure) and return that in response.InferenceResult (mirror the
same pattern for ChatCompletionCompatibleLLMClient to avoid the same loss); add
a unit test that supplies a chat_completion with message.tool_calls to assert
the resulting InferenceResult contains the tool call payload.
In `@src/ares/llms/chat_completions_compatible.py`:
- Around line 77-85: The Chat Completions handling currently only uses
resp.choices[0].message.content and drops any
resp.choices[0].message.tool_calls; update the code that builds the inference
response (around response.make_response and response.InferenceResult) to
preserve tool invocations by detecting resp.choices[0].message.tool_calls and
passing them through into the Open Responses format: either extend
response.make_response to accept a tool_calls (or tools) parameter and pass
resp.choices[0].message.tool_calls, or add a small converter that constructs the
Open Responses-style response object containing both content and tool_calls
before creating lf_response; keep existing fields (model, input_tokens,
output_tokens, response_id, cost) unchanged.
In `@src/ares/llms/open_responses_test.py`:
- Around line 78-81: The test is calling a missing helper
open_responses.ensure_request which raises AttributeError; either reintroduce
ensure_request in src/ares/llms/open_responses.py (implement it to accept a
request produced by make_request and return the canonical request object) or
change the test to use the real normalization entry point (e.g., call the
existing request factory/normalizer such as open_responses.make_request or the
module's public normalize function) so the test asserts the canonical request is
returned; update references to ensure_request, make_request, and user_message
accordingly to keep the test exercising the supported API.
---
Outside diff comments:
In `@examples/05_tinker_train.py`:
- Around line 124-138: The constructor's env annotation is too narrow: change
the env type parameter in __init__ to accept observations that may be None
(i.e., use lft.OpenResponsesRequest | None) so it matches the TimeStep passed
into _get_tinker_observation; update the signature where env is typed (the
__init__ method) to ares.Environment[llms.InferenceResult,
lft.OpenResponsesRequest | None, float, float] so the type matches the actual
environment behavior handled by _get_tinker_observation.
In `@src/ares/contrib/eval_visualizer.py`:
- Around line 91-121: The type annotations for observations in
TrackedEnvironment need to accept CodeEnvironment terminal timesteps that use
None; update the constructor signature, reset() return type, step() return type,
and the wrap() parameter to use lft.OpenResponsesRequest | None (or
Optional[lft.OpenResponsesRequest]) instead of just lft.OpenResponsesRequest so
CodeEnvironment is accepted; modify the annotations on __init__, reset, step,
and the wrap(...) parameter accordingly and rerun pyright to confirm the type
error is resolved.
In `@src/ares/contrib/mech_interp/hooked_transformer_client.py`:
- Around line 74-91: Validate resolved max_output_tokens (from max_output_tokens
or request.max_output_tokens or self.max_new_tokens) before using it to compute
max_length for apply_chat_template: ensure it is >0 and < self.model.cfg.n_ctx,
and if not raise a clear ValueError (or return a clean input error) indicating
the invalid max_output_tokens and the allowed range. Do this check immediately
after computing max_output_tokens and before computing
max_length/self.model.tokenizer.apply_chat_template; reference the symbols
max_output_tokens, request.max_output_tokens, self.max_new_tokens,
self.model.cfg.n_ctx, and apply_chat_template so the guard is easy to locate and
update.
- Around line 97-106: The client currently only forwards request.temperature
into gen_kwargs (see gen_kwargs and the temperature handling block), silently
dropping request.top_p; update the adapter in HookedTransformerClient
(hooked_transformer_client.py) to also forward request.top_p into gen_kwargs
when not None, or explicitly raise an error if top_p is unsupported. Locate the
generation kwargs construction (gen_kwargs) and either add gen_kwargs["top_p"] =
request.top_p when request.top_p is not None, or add a guard that raises a clear
exception mentioning top_p if you intend not to support it.
---
Nitpick comments:
In `@docs/source/core-concepts.rst`:
- Around line 160-170: The example uses open_responses.make_request and
open_responses.user_message but omits the import; add an import for the
open_responses helper (e.g., import open_responses from its module) at the top
of the snippet so symbols like open_responses.make_request and
open_responses.user_message are defined when demonstrating the async run(self,
task: str) -> None method and its calls to self._llm_client and
self.parse_commands.
In `@examples/04_rl_training_with_skyrl.py`:
- Around line 109-110: The training code uses
open_responses.to_chat_messages(ts.observation, strict=True) which will raise
ValueError on non-text observations; change strict=True to strict=False in the
two uses of to_chat_messages (the calls on ts.observation) so the function
tolerates images/tool outputs during training and matches the pattern in
examples/utils.py, or alternatively add a clear comment documenting the
assumption of text-only observations if you intentionally want to keep
strict=True.
In `@README.md`:
- Line 18: The README line uses mixed terminology ("Open Responses requests" vs
"LLMResponses"); update the sentence to use consistent naming (e.g., keep "Open
Responses requests" and change "LLMResponses" to "LLM responses" or "inference
results") so the phrasing matches other comments and code; locate the exact
sentence containing "Open Responses requests" and "LLMResponses" and replace
"LLMResponses" with the chosen consistent term across the README to maintain
style uniformity.
In `@src/ares/containers/docker.py`:
- Around line 104-106: Extract the repeated guard "if self._container is None:
raise RuntimeError('Container not started. Call start() first.')" into a small
private helper method, e.g., _require_started_container(self), and replace the
three occurrences with calls to that helper to reduce duplication and enable
type narrowing of self._container; update the Container protocol/docstrings to
document that methods accessing self._container require the container to be
started and that they raise RuntimeError otherwise (mention
_require_started_container as the enforcement point).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9305d7ad-0b27-4bc1-97a4-3d0a28c2128f
📒 Files selected for processing (43)
CLAUDE.mdCONTRIBUTING.mdREADME.mddocs/source/core-concepts.rstdocs/source/how-it-works.rstdocs/source/index.rstexamples/04_rl_training_with_skyrl.pyexamples/05_tinker_train.pyexamples/utils.pypyproject.tomlsrc/ares/__init__.pysrc/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/terminus2/terminus2_agent.pysrc/ares/code_agents/terminus2/terminus2_agent_test.pysrc/ares/containers/docker.pysrc/ares/contrib/eval_visualizer.pysrc/ares/contrib/llama_cpp.pysrc/ares/contrib/mech_interp/hooked_transformer_client.pysrc/ares/contrib/transformers_client.pysrc/ares/contrib/transformers_client_test.pysrc/ares/environments/code_env.pysrc/ares/environments/twenty_questions.pysrc/ares/llms/__init__.pysrc/ares/llms/anthropic_converter.pysrc/ares/llms/anthropic_converter_test.pysrc/ares/llms/chat_completions_compatible.pysrc/ares/llms/llm_clients.pysrc/ares/llms/open_responses.pysrc/ares/llms/open_responses_test.pysrc/ares/llms/openai_chat_converter.pysrc/ares/llms/openai_chat_converter_test.pysrc/ares/llms/openai_responses_converter.pysrc/ares/llms/openai_responses_converter_test.pysrc/ares/llms/queue_mediated_client.pysrc/ares/llms/queue_mediated_client_test.pysrc/ares/llms/request.pysrc/ares/llms/request_test.pysrc/ares/llms/response.pysrc/ares/presets.pysrc/ares/presets_test.pysrc/ares/testing/mock_llm.pysrc/ares/testing/mock_llm_test.py
💤 Files with no reviewable changes (8)
- src/ares/llms/request_test.py
- src/ares/llms/openai_responses_converter_test.py
- src/ares/llms/anthropic_converter_test.py
- src/ares/llms/openai_chat_converter.py
- src/ares/llms/request.py
- src/ares/llms/openai_chat_converter_test.py
- src/ares/llms/openai_responses_converter.py
- src/ares/llms/anthropic_converter.py
| content = chat_completion.choices[0].message.content or "" | ||
| usage = response.Usage( | ||
| prompt_tokens=chat_completion.usage.prompt_tokens if chat_completion.usage else 0, | ||
| generated_tokens=chat_completion.usage.completion_tokens if chat_completion.usage else 0, | ||
| lf_response = response.make_response( | ||
| content, | ||
| model=self.model_name, | ||
| input_tokens=chat_completion.usage.prompt_tokens if chat_completion.usage else 0, | ||
| output_tokens=chat_completion.usage.completion_tokens if chat_completion.usage else 0, | ||
| response_id=chat_completion.id, | ||
| ) | ||
| return response.LLMResponse(data=[response.TextData(content=content)], cost=0.0, usage=usage) | ||
| return response.InferenceResult(response=lf_response, cost=0.0) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the llama_cpp.py file around lines 101-109
cat -n src/ares/contrib/llama_cpp.py | sed -n '95,115p'Repository: withmartian/ares
Length of output: 1190
🏁 Script executed:
# Also check the overall structure of the file
wc -l src/ares/contrib/llama_cpp.pyRepository: withmartian/ares
Length of output: 93
🏁 Script executed:
# Look at the response.py to understand make_response
cat -n src/ares/llms/response.py | head -120Repository: withmartian/ares
Length of output: 3685
🏁 Script executed:
# Search for other Chat Completions clients that might handle tool_calls
fd -t f -name "*.py" src/ares/contrib | head -20Repository: withmartian/ares
Length of output: 290
🏁 Script executed:
# Search for other LLM client implementations that might be Chat Completions-based
find src/ares/contrib -name "*.py" -type fRepository: withmartian/ares
Length of output: 432
🏁 Script executed:
# Search for tool_calls handling in the codebase
rg -A 5 -B 5 "tool_calls" src/ares --type py | head -100Repository: withmartian/ares
Length of output: 7538
🏁 Script executed:
# Search for other uses of response.make_response to see how it's used
rg "response\.make_response\|make_response" src/ares --type py -B 3 -A 3Repository: withmartian/ares
Length of output: 42
🏁 Script executed:
# Look for open_responses.to_chat_completions_kwargs mentioned in the review scratchpad
rg "to_chat_completions_kwargs" src/ares --type py -B 3 -A 10Repository: withmartian/ares
Length of output: 10653
🏁 Script executed:
# Look at how ChatCompletionCompatibleLLMClient handles responses
cat -n src/ares/llms/chat_completions_compatible.pyRepository: withmartian/ares
Length of output: 3664
🏁 Script executed:
# Search for response conversion functions from chat completions to Open Responses format
rg "from_chat_completion\|ChatCompletion" src/ares/llms --type py -B 2 -A 5 | head -150Repository: withmartian/ares
Length of output: 42
Preserve tool calls in LlamaCpp response conversion.
This adapter discards message.tool_calls by extracting only message.content before calling response.make_response(), which is a text-only helper. If the model returns tool calls (which the request-side explicitly supports via to_chat_completions_kwargs()), they are silently lost and downstream code sees only plain text. This breaks tool-calling workflows.
Note that ChatCompletionCompatibleLLMClient has the same limitation, suggesting the issue is broader than just this client—both need response-side tool call support. Consider building a proper response converter that maps OpenAI chat completion messages (including tool_calls) back to Open Responses format, rather than relying on the text-only make_response() helper. Add a test case with tool call responses to prevent regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/contrib/llama_cpp.py` around lines 101 - 109, The current conversion
in the LlamaCpp adapter discards tool call data by using
chat_completion.choices[0].message.content and response.make_response, so
preserve tool calls by converting the entire chat message (including
message.tool_calls) into the library's Response/InferenceResult format instead
of only text; update the logic around chat_completion.choices[0].message to
detect and map message.tool_calls into the equivalent fields on the Response (or
build a Response from message content plus tool_call structure) and return that
in response.InferenceResult (mirror the same pattern for
ChatCompletionCompatibleLLMClient to avoid the same loss); add a unit test that
supplies a chat_completion with message.tool_calls to assert the resulting
InferenceResult contains the tool call payload.
| def test_ensure_request_accepts_canonical_request(): | ||
| request = open_responses.make_request([open_responses.user_message("Hello")]) | ||
| result = open_responses.ensure_request(request) | ||
| assert result is request |
There was a problem hiding this comment.
This test calls an API that isn't present.
open_responses.ensure_request(...) doesn't exist, and CI is already failing here with AttributeError. Either restore the helper in src/ares/llms/open_responses.py or update this test to exercise the supported request-normalization entry point.
🧰 Tools
🪛 GitHub Actions: unit-tests
[error] 80-80: pytest failure (exit code 1): AttributeError: module 'ares.llms.open_responses' has no attribute 'ensure_request' in test_ensure_request_accepts_canonical_request
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/llms/open_responses_test.py` around lines 78 - 81, The test is
calling a missing helper open_responses.ensure_request which raises
AttributeError; either reintroduce ensure_request in
src/ares/llms/open_responses.py (implement it to accept a request produced by
make_request and return the canonical request object) or change the test to use
the real normalization entry point (e.g., call the existing request
factory/normalizer such as open_responses.make_request or the module's public
normalize function) so the test asserts the canonical request is returned;
update references to ensure_request, make_request, and user_message accordingly
to keep the test exercising the supported API.
rsmith49
left a comment
There was a problem hiding this comment.
Great work Josh 🎉 This was a huge effort, and I think it will unlock a lot of functionality for us!
A lot of comments (sorry), one of the most important IMO is making sure our OpenResponses -> transformers chat templates conversion handles tool calls in the best way, since that will affect training a lot.
| | │ │ from code agent via │ │ | ||
| └─────────────────────────────────┼──│ QueueMediatedLLMClient │ │ | ||
| LLMRequest (observation) │ └──────────────────┬─────────────┘ │ | ||
| Open Responses observation │ └──────────────────┬─────────────┘ │ |
There was a problem hiding this comment.
nit: "OpenResponsesRequest (observation)" instead to show the type -> RL concept
| Open Responses observation │ └──────────────────┬─────────────┘ │ | ||
| │ ^ │ │ | ||
| │ LLMRequest │ │ LLMResponse │ | ||
| │ Open Responses │ │ InferenceResult│ |
| * **Treats LLM responses as actions** - Your trainable agent/policy provides responses | ||
|
|
||
| Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``LLMResponse`` outputs given ``LLMRequest`` observations. | ||
| Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``InferenceResult`` outputs given canonical Open Responses observations. |
There was a problem hiding this comment.
"canonical" is confusing here, I would remove
| The ``LLMClient`` abstraction serves two purposes: | ||
|
|
||
| 1. **Observations = LLM Requests**: In the RL loop, ``timestep.observation`` is an ``LLMRequest`` containing the messages the code agent wants to send to the LLM. This is the "state" your policy observes. | ||
| 1. **Observations = Open Responses requests**: In the RL loop, ``timestep.observation`` is a canonical Open Responses request containing what the code agent wants to send to the LLM. This is the "state" your policy observes. |
| cost: float | ||
|
|
||
| This simple interface wraps OpenAI-style chat completion APIs. The ``messages`` field follows the OpenAI format with ``role`` (system/user/assistant) and ``content``. | ||
| ARES uses linguafranca's ``OpenResponsesRequest`` as the canonical request type for observations and client inputs. Edge adapters convert to Chat/Responses/Anthropic formats only when needed. |
There was a problem hiding this comment.
I would change this to a note like
ARES leverages linguafranca for request and response types - we use the
OpenResponsesRequestas our base request object returned from the environment for observations, and encourage users to uselinguafranca.convert_*methods for translating between different provider and local formats.
| @@ -1,383 +0,0 @@ | |||
| """Converter for Anthropic Messages API format. | |||
| @@ -1,38 +1,99 @@ | |||
| """LLM response model.""" | |||
| """LLM response model wrapping linguafranca types.""" | |||
There was a problem hiding this comment.
Worth moving all this into open_responses.py?
|
|
||
| # Twenty Questions — lightweight, no Docker needed. | ||
| registry.register_preset("20q", TwentyQuestionsSpec()) | ||
| if "20q" not in existing_preset_names: |
There was a problem hiding this comment.
This and the above logic is just masking an error - IMO we either make register_preset allow overwriting existing presets, or we make sure that _register_default_presets is only called once (although the duplicate check in the list of harbor presets is worth keeping so we don't accidentally break if they add duplicates to their registry)
| "harbor>=0.1.32", | ||
| "httpx>=0.28.1", | ||
| "jinja2>=3.1.6", | ||
| "martian-linguafranca>=0.1.5", |
| It is a modern [gym](https://github.com/Farama-Foundation/Gymnasium): the environment layer powering RL research. | ||
|
|
||
| ARES treats LLMRequests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself. | ||
| ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself. |
There was a problem hiding this comment.
Should give an actual class name - same comment as way above
Jhoysbou
left a comment
There was a problem hiding this comment.
The usage of linguafranca looks good to me
| _LOGGER.warning("%s warning for %s: %s", context, warning.field, warning.message) | ||
|
|
||
|
|
||
| def to_jsonable(value: Any) -> _JSONABLE: |
There was a problem hiding this comment.
LF has a similar built-in function. If we adopt StrEnum, you can actually delete this and rely on LF's implementation.
def _to_jsonable(value: object) -> Any:
if is_dataclass(value) and not isinstance(value, type):
return asdict(value)
if isinstance(value, dict):
return value
if hasattr(value, "model_dump"):
try:
return value.model_dump(mode="json")
except TypeError:
return value.model_dump()
raise TypeError(f"unsupported payload type: {type(value)!r}")This function is used in convert_request, convert_response, and the stream conversation classes. Consequently, you can simplify to_chat_completions_kwargs as well.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
README.md (1)
18-18:⚠️ Potential issue | 🟡 MinorReplace the legacy
LLMResponsesname.
LLMResponsesreads like a concrete type, but the new action wrapper isares.llms.InferenceResult. Using the removed name here will send readers toward a symbol that no longer exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` at line 18, Update the README text to stop referencing the removed symbol `LLMResponses` and instead point readers to the new action wrapper `ares.llms.InferenceResult`; locate the sentence mentioning `LLMResponses` and replace that token with `ares.llms.InferenceResult`, and confirm any linked examples or docs (e.g., example 3) use the new symbol name so readers aren’t directed to a non-existent type.CLAUDE.md (1)
284-291:⚠️ Potential issue | 🟡 MinorDocumentation still references non-existent types.
Despite being marked as addressed in prior commits, lines 285 and 289 still reference
TextDataandUsagewhich don't exist in the updated codebase. Based on the PR's migration, these should reference the actual exports:
llms.InferenceResult,llms.extract_text_content,llms.make_responseresponse.InferenceResult,response.extract_text_content,response.make_response📝 Proposed fix
- **External consumers** (examples, docs): - ✅ Good: `import ares` → use `ares.make(...)` - ✅ Good: `from ares.llms import open_responses` → use `open_responses.make_request(...)` - - ✅ Good: `from ares import llms` → use `llms.TextData`, `llms.Usage` - - ❌ Avoid: `from ares.llms import OpenResponsesRequest, TextData` + - ✅ Good: `from ares import llms` → use `llms.InferenceResult`, `llms.extract_text_content` + - ❌ Avoid: `from ares.llms import InferenceResult, extract_text_content` - **Internal code**: - ✅ Good: `from ares.llms import open_responses` → use `open_responses.make_request(...)` - - ✅ Good: `from ares.llms import response` → use `response.TextData`, `response.Usage` + - ✅ Good: `from ares.llms import response` → use `response.InferenceResult`, `response.extract_text_content` - ❌ Avoid: `from ares.llms.open_responses import Request` - - ❌ Avoid: `from ares.llms.response import TextData, Usage` + - ❌ Avoid: `from ares.llms.response import InferenceResult, extract_text_content`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 284 - 291, Update the documentation references that still mention TextData and Usage to use the new exported symbols: replace any occurrences of llms.TextData and llms.Usage with llms.InferenceResult, llms.extract_text_content and llms.make_response (as appropriate), and similarly replace response.TextData and response.Usage with response.InferenceResult, response.extract_text_content and response.make_response; ensure examples and import guidance reference llms.InferenceResult / llms.extract_text_content / llms.make_response (and response.* counterparts) rather than the removed TextData/Usage types.src/ares/llms/open_responses_test.py (1)
78-81:⚠️ Potential issue | 🔴 CriticalCI is failing:
ensure_requestdoes not exist.The test calls
open_responses.ensure_request()but this function is not defined in theopen_responsesmodule, causingAttributeErrorin CI. Either implementensure_requestinsrc/ares/llms/open_responses.pyor remove/update this test.🔧 Option 1: Remove the test if ensure_request is not needed
-def test_ensure_request_accepts_canonical_request(): - request = open_responses.make_request([open_responses.user_message("Hello")]) - result = open_responses.ensure_request(request) - assert result is request🔧 Option 2: Add ensure_request to open_responses.py
def ensure_request(request: lft.OpenResponsesRequest) -> lft.OpenResponsesRequest: """Return the request as-is (identity function for canonical requests).""" return request🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/llms/open_responses_test.py` around lines 78 - 81, The test fails because open_responses.ensure_request is missing; add an identity helper in the module (implement ensure_request) with the signature matching the expected type (e.g., def ensure_request(request: lft.OpenResponsesRequest) -> lft.OpenResponsesRequest) that simply returns the passed request, so tests calling open_responses.ensure_request(), open_responses.make_request(), and open_responses.user_message() will work; place the function in src/ares/llms/open_responses.py near related helpers and ensure it uses the same lft/OpenResponsesRequest type import as the module.
🧹 Nitpick comments (1)
src/ares/contrib/transformers_client.py (1)
87-167: Consider adding a parameter for native tool call support.The flattening approach for
function_callandfunction_call_outputitems works for models without tool call support, but transformers has first-class tool call support viaapply_chat_templatefor compatible models. The current implementation always flattens tool interactions to plain text, which loses semantic structure for models that could handle it natively.Consider adding a
model_supports_tools: boolparameter to branch between the flattened approach (for simpler models) and the native transformers tool format (for capable models). This could improve performance for models that have been trained with tool-calling capabilities.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ares/contrib/transformers_client.py` around lines 87 - 167, Add a boolean toggle so _render_request_to_chat_messages can preserve native tool-call structures for transformers models that support them: modify _render_request_to_chat_messages to accept model_supports_tools: bool = False and, when True, avoid flattening items of type "function_call" and "function_call_output" into text and instead translate them into the native tool-call message shapes expected by apply_chat_template (preserving fields like name, call_id, arguments, output) while still using _render_content_to_text/_render_value_to_text for any nested content; keep the existing flattened behavior as the default for backward compatibility and update callers to pass model_supports_tools when they know the model supports tool calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CONTRIBUTING.md`:
- Around line 115-116: Update the CONTRIBUTING.md example to explicitly pass
token counts to the response factory so readers see the full API signature: when
constructing response.InferenceResult, call response.make_response with
input_tokens and output_tokens (e.g., response.make_response("Hello!",
input_tokens=..., output_tokens=...)) and keep cost=0.0; this clarifies the
parameters used by response.make_response and matches how tests exercise the
API.
In `@examples/utils.py`:
- Around line 33-37: The fallback branch can produce the literal "None" when
observation.instructions is None; update the else branch that sets
observation_content (after open_responses.to_chat_messages) to use a proper
truthy check instead of str(...) — e.g., use observation.instructions if it is
truthy (or use getattr(observation, "instructions", None)) and otherwise set
observation_content to "(no messages)"; reference observation_content, messages,
open_responses.to_chat_messages, and observation.instructions when making the
change.
In `@src/ares/containers/docker.py`:
- Around line 104-106: Create a shared ContainerNotStartedError and use it
instead of RuntimeError where container access is guarded (replace the
RuntimeError in src/ares/containers/docker.py checks around self._container and
the similar guards at the other sites), and update
src/ares/testing/mock_container.py to raise the same ContainerNotStartedError so
tests and real Docker behavior share the contract; add the new exception to the
module/public API where other callers import container classes so callers can
catch ContainerNotStartedError explicitly.
In `@src/ares/presets_test.py`:
- Around line 3-11: The import block in presets_test.py is not sorted per the
project's isort/Ruff rules; reorder the imports (pytest, standard library first
if any, then third-party, then local package imports) so they match
Google-style/isort order and groupings — specifically adjust the lines importing
pytest, linguafranca.types as lft, ares, from ares import presets, registry, and
the ares.llms imports (open_responses, response); run `ruff check --fix` or `uv
run ruff check --fix` to automatically apply the correct ordering and ensure the
import block passes I001.
---
Duplicate comments:
In `@CLAUDE.md`:
- Around line 284-291: Update the documentation references that still mention
TextData and Usage to use the new exported symbols: replace any occurrences of
llms.TextData and llms.Usage with llms.InferenceResult,
llms.extract_text_content and llms.make_response (as appropriate), and similarly
replace response.TextData and response.Usage with response.InferenceResult,
response.extract_text_content and response.make_response; ensure examples and
import guidance reference llms.InferenceResult / llms.extract_text_content /
llms.make_response (and response.* counterparts) rather than the removed
TextData/Usage types.
In `@README.md`:
- Line 18: Update the README text to stop referencing the removed symbol
`LLMResponses` and instead point readers to the new action wrapper
`ares.llms.InferenceResult`; locate the sentence mentioning `LLMResponses` and
replace that token with `ares.llms.InferenceResult`, and confirm any linked
examples or docs (e.g., example 3) use the new symbol name so readers aren’t
directed to a non-existent type.
In `@src/ares/llms/open_responses_test.py`:
- Around line 78-81: The test fails because open_responses.ensure_request is
missing; add an identity helper in the module (implement ensure_request) with
the signature matching the expected type (e.g., def ensure_request(request:
lft.OpenResponsesRequest) -> lft.OpenResponsesRequest) that simply returns the
passed request, so tests calling open_responses.ensure_request(),
open_responses.make_request(), and open_responses.user_message() will work;
place the function in src/ares/llms/open_responses.py near related helpers and
ensure it uses the same lft/OpenResponsesRequest type import as the module.
---
Nitpick comments:
In `@src/ares/contrib/transformers_client.py`:
- Around line 87-167: Add a boolean toggle so _render_request_to_chat_messages
can preserve native tool-call structures for transformers models that support
them: modify _render_request_to_chat_messages to accept model_supports_tools:
bool = False and, when True, avoid flattening items of type "function_call" and
"function_call_output" into text and instead translate them into the native
tool-call message shapes expected by apply_chat_template (preserving fields like
name, call_id, arguments, output) while still using
_render_content_to_text/_render_value_to_text for any nested content; keep the
existing flattened behavior as the default for backward compatibility and update
callers to pass model_supports_tools when they know the model supports tool
calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f19fd66b-963b-413f-9d15-cd7bb3c1b385
📒 Files selected for processing (43)
CLAUDE.mdCONTRIBUTING.mdREADME.mddocs/source/core-concepts.rstdocs/source/how-it-works.rstdocs/source/index.rstexamples/04_rl_training_with_skyrl.pyexamples/05_tinker_train.pyexamples/utils.pypyproject.tomlsrc/ares/__init__.pysrc/ares/code_agents/code_agent_base.pysrc/ares/code_agents/mini_swe_agent.pysrc/ares/code_agents/terminus2/terminus2_agent.pysrc/ares/code_agents/terminus2/terminus2_agent_test.pysrc/ares/containers/docker.pysrc/ares/contrib/eval_visualizer.pysrc/ares/contrib/llama_cpp.pysrc/ares/contrib/mech_interp/hooked_transformer_client.pysrc/ares/contrib/transformers_client.pysrc/ares/contrib/transformers_client_test.pysrc/ares/environments/code_env.pysrc/ares/environments/twenty_questions.pysrc/ares/llms/__init__.pysrc/ares/llms/anthropic_converter.pysrc/ares/llms/anthropic_converter_test.pysrc/ares/llms/chat_completions_compatible.pysrc/ares/llms/llm_clients.pysrc/ares/llms/open_responses.pysrc/ares/llms/open_responses_test.pysrc/ares/llms/openai_chat_converter.pysrc/ares/llms/openai_chat_converter_test.pysrc/ares/llms/openai_responses_converter.pysrc/ares/llms/openai_responses_converter_test.pysrc/ares/llms/queue_mediated_client.pysrc/ares/llms/queue_mediated_client_test.pysrc/ares/llms/request.pysrc/ares/llms/request_test.pysrc/ares/llms/response.pysrc/ares/presets.pysrc/ares/presets_test.pysrc/ares/testing/mock_llm.pysrc/ares/testing/mock_llm_test.py
💤 Files with no reviewable changes (8)
- src/ares/llms/request.py
- src/ares/llms/request_test.py
- src/ares/llms/anthropic_converter_test.py
- src/ares/llms/openai_responses_converter_test.py
- src/ares/llms/openai_chat_converter_test.py
- src/ares/llms/openai_responses_converter.py
- src/ares/llms/openai_chat_converter.py
- src/ares/llms/anthropic_converter.py
| req = open_responses.make_request([open_responses.user_message("Hello")]) | ||
| resp = response.InferenceResult(response=response.make_response("Hello!"), cost=0.0) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the signature of make_response to see if tokens are required
ast-grep --pattern $'def make_response($$$) -> $_:
$$$'Repository: withmartian/ares
Length of output: 3038
🏁 Script executed:
#!/bin/bash
# First, let's examine the actual CONTRIBUTING.md at lines 115-116
head -120 CONTRIBUTING.md | tail -10Repository: withmartian/ares
Length of output: 423
🏁 Script executed:
#!/bin/bash
# Search for make_response usage patterns in the codebase to see how it's typically called
rg "make_response\(" --max-count 20 -B 1 -A 1Repository: withmartian/ares
Length of output: 3674
Consider explicitly showing token parameters in the documentation example.
While response.make_response("Hello!") is syntactically valid (both input_tokens and output_tokens default to 0), the example in CONTRIBUTING.md would be more instructive if it explicitly shows these parameters, consistent with how tests use the API:
resp = response.InferenceResult(
response=response.make_response("Hello!", input_tokens=10, output_tokens=5),
cost=0.0
)This makes the full API signature clearer for readers learning the framework.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CONTRIBUTING.md` around lines 115 - 116, Update the CONTRIBUTING.md example
to explicitly pass token counts to the response factory so readers see the full
API signature: when constructing response.InferenceResult, call
response.make_response with input_tokens and output_tokens (e.g.,
response.make_response("Hello!", input_tokens=..., output_tokens=...)) and keep
cost=0.0; this clarifies the parameters used by response.make_response and
matches how tests exercise the API.
| messages = open_responses.to_chat_messages(observation, strict=False) | ||
| if len(messages) > 0: | ||
| observation_content = messages[-1].get("content", "") | ||
| else: | ||
| observation_content = str(observation.system_prompt) or "(no messages)" | ||
| observation_content = str(observation.instructions) or "(no messages)" |
There was a problem hiding this comment.
Potential "None" string if instructions is absent.
When messages is empty and observation.instructions is None, str(observation.instructions) evaluates to the literal string "None" rather than falling back to "(no messages)".
🛡️ Proposed fix
if len(messages) > 0:
observation_content = messages[-1].get("content", "")
else:
- observation_content = str(observation.instructions) or "(no messages)"
+ observation_content = observation.instructions or "(no messages)"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/utils.py` around lines 33 - 37, The fallback branch can produce the
literal "None" when observation.instructions is None; update the else branch
that sets observation_content (after open_responses.to_chat_messages) to use a
proper truthy check instead of str(...) — e.g., use observation.instructions if
it is truthy (or use getattr(observation, "instructions", None)) and otherwise
set observation_content to "(no messages)"; reference observation_content,
messages, open_responses.to_chat_messages, and observation.instructions when
making the change.
| if self._container is None: | ||
| raise RuntimeError("Container not started. Call start() first.") | ||
|
|
There was a problem hiding this comment.
Make the new pre-start failure part of the shared container contract.
src/ares/testing/mock_container.py:45-76 still allows these calls before start(), so tests can stay green while real Docker now raises. If this precondition is intentional, mirror it in the mock and expose a shared ContainerNotStartedError instead of a bare RuntimeError so callers can handle it consistently.
As per coding guidelines: src/**/*.py: Implement custom exception hierarchies to distinguish terminating vs non-terminating errors.
Also applies to: 135-136, 165-166
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/containers/docker.py` around lines 104 - 106, Create a shared
ContainerNotStartedError and use it instead of RuntimeError where container
access is guarded (replace the RuntimeError in src/ares/containers/docker.py
checks around self._container and the similar guards at the other sites), and
update src/ares/testing/mock_container.py to raise the same
ContainerNotStartedError so tests and real Docker behavior share the contract;
add the new exception to the module/public API where other callers import
container classes so callers can catch ContainerNotStartedError explicitly.
| import pytest | ||
|
|
||
| from linguafranca import types as lft | ||
|
|
||
| import ares | ||
| from ares import presets | ||
| from ares import registry | ||
| from ares.llms import open_responses | ||
| from ares.llms import response |
There was a problem hiding this comment.
Fix import sorting to pass ruff check.
The CI pipeline is failing with I001: import block is un-sorted or un-formatted. Rearrange imports to comply with Google-style isort configuration.
🔧 Proposed fix
-import pytest
-
-from linguafranca import types as lft
-
import ares
from ares import presets
from ares import registry
from ares.llms import open_responses
from ares.llms import response
+from linguafranca import types as lft
+import pytestNote: The exact ordering depends on your ruff/isort configuration. Run uv run ruff check --fix to auto-correct.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import pytest | |
| from linguafranca import types as lft | |
| import ares | |
| from ares import presets | |
| from ares import registry | |
| from ares.llms import open_responses | |
| from ares.llms import response | |
| import pytest | |
| from linguafranca import types as lft | |
| import ares | |
| from ares import presets | |
| from ares import registry | |
| from ares.llms import open_responses | |
| from ares.llms import response |
🧰 Tools
🪛 GitHub Actions: ruff
[error] 3-11: ruff check failed: import block is un-sorted or un-formatted. Rule: I001. Help: Organize imports.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ares/presets_test.py` around lines 3 - 11, The import block in
presets_test.py is not sorted per the project's isort/Ruff rules; reorder the
imports (pytest, standard library first if any, then third-party, then local
package imports) so they match Google-style/isort order and groupings —
specifically adjust the lines importing pytest, linguafranca.types as lft, ares,
from ares import presets, registry, and the ares.llms imports (open_responses,
response); run `ruff check --fix` or `uv run ruff check --fix` to automatically
apply the correct ordering and ensure the import block passes I001.
User description
Summary
Testing
Generated description
Below is a concise technical summary of the changes proposed in this PR:
Adopt linguafranca
open_responsesbuilders andInferenceResultresponses as the canonical payloads exchanged between queue-mediated clients, adapters, code agents, and environments so every RL observation/action shares one typed representation. Update docs, contributor guidance, presets, and smoke examples to describe the refreshed Open Responses loop and keep builtin presets and tooling aligned with the new dependency.open_responsesinputs/responses for deterministic suites.Modified files (2)
Latest Contributors(2)
Modified files (6)
Latest Contributors(2)
README, and registry assets so every reference to observations/actions mentions Open Responses requests andInferenceResult, add the dependency onmartian-linguafranca, and make preset registration idempotent while exercising the Twenty Questions smoke flow in tests. Mention the new behavior in the top-level package docstring and the CLI-affiliated docs so contributors know which helpers to import.Modified files (10)
Latest Contributors(2)
CodeEnvironment,TwentyQuestionsEnvironment, and the SkyRLLib/Tinker adapters are still consumable while code agents keep recording assistants’ utterances withopen_responsesmessages andresponse.extract_text_content. Harden container helpers and dashboards to understand the new telemetry, and exercise the transformation in the Twenty Questions smoke example.Modified files (11)
Latest Contributors(2)
open_responseshelpers,InferenceResult, and corresponding request/response plumbing to replace the legacy chat/Responses/Claude converters, keeping queue mediation, LLM clients, contrib adapters, and their tests aligned with the canonical payload shape. Include the newmartian-linguafrancadependency so adapters such aschat_completions_compatible,llama_cpp,hooked_transformer_client, andtransformers_clientcan all callopen_responses.to_chat_completions_kwargsand return the unifiedInferenceResult, while retiring the previous converter modules.Modified files (14)
Latest Contributors(2)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
Tests
Dependencies