Use linguafranca Open Responses requests across ARES by joshgreaves · Pull Request #97 · withmartian/ares

joshgreaves · 2026-03-13T21:49:02Z

User description

Summary

make linguafranca Open Responses the canonical internal request shape across queue mediation, environments, agents, mocks, and the main chat-compatible client
replace the handwritten Chat/Responses/Anthropic request conversion internals with thin linguafranca-backed edge adapters while preserving legacy converter behavior
update example-facing integrations, add a lightweight Twenty Questions smoke example, and make built-in preset registration idempotent for repeated imports

Testing

uv run pytest src/ares/llms/open_responses_test.py src/ares/llms/request_test.py src/ares/llms/openai_chat_converter_test.py src/ares/llms/openai_responses_converter_test.py src/ares/llms/anthropic_converter_test.py src/ares/testing/mock_llm_test.py
uv run pytest src/ares/code_agents/terminus2/terminus2_agent_test.py
uv run pyright src/ares/llms/open_responses.py src/ares/llms/openai_chat_converter.py src/ares/llms/openai_responses_converter.py src/ares/llms/anthropic_converter.py src/ares/llms/chat_completions_compatible.py src/ares/llms/llm_clients.py src/ares/llms/queue_mediated_client.py src/ares/environments/code_env.py src/ares/environments/twenty_questions.py src/ares/code_agents/code_agent_base.py src/ares/code_agents/mini_swe_agent.py src/ares/code_agents/terminus2/terminus2_agent.py src/ares/testing/mock_llm.py src/ares/testing/mock_llm_test.py src/ares/presets.py examples/utils.py examples/00_twenty_questions_smoke.py
uv run ruff check
uv run python -m examples.00_twenty_questions_smoke
uv run python -m py_compile examples/00_twenty_questions_smoke.py examples/utils.py examples/01_sequential_eval_with_local_llm.py examples/02_sequential_eval_with_api.py examples/03_parallel_eval_with_api.py examples/04_rl_training_with_skyrl.py examples/05_tinker_train.py examples/07_mech_interp_hooked_transformer.py

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Adopt linguafranca open_responses builders and InferenceResult responses as the canonical payloads exchanged between queue-mediated clients, adapters, code agents, and environments so every RL observation/action shares one typed representation. Update docs, contributor guidance, presets, and smoke examples to describe the refreshed Open Responses loop and keep builtin presets and tooling aligned with the new dependency.

Topic Details

Testing & Tooling

Refresh testing helpers to consume the canonical requests, validate the new stimuli, and ensure the mocking utilities expose the new open_responses inputs/responses for deterministic suites.

Modified files (2)

src/ares/testing/mock_llm.py
src/ares/testing/mock_llm_test.py

Latest Contributors(2)

User	Commit	Date
joshua.greaves@gmail.com	Add an LLMResponse mod...	January 29, 2026
sarvanithin	Add test primitives fo...	January 22, 2026

Other

Other files

Modified files (6)

src/ares/llms/__init__.py
src/ares/llms/anthropic_converter_test.py
src/ares/llms/openai_chat_converter_test.py
src/ares/llms/openai_responses_converter_test.py
src/ares/llms/request.py
src/ares/llms/request_test.py

Latest Contributors(2)

User	Commit	Date
sarvanithin	Refactor LLMRequest co...	February 13, 2026
joshua.greaves@gmail.com	Massively simplify the...	January 29, 2026

Docs, Guides & Presets

Refresh documentation, CONTRIBUTING guidance, README, and registry assets so every reference to observations/actions mentions Open Responses requests and InferenceResult, add the dependency on martian-linguafranca, and make preset registration idempotent while exercising the Twenty Questions smoke flow in tests. Mention the new behavior in the top-level package docstring and the CLI-affiliated docs so contributors know which helpers to import.

Modified files (10)

CLAUDE.md
CONTRIBUTING.md
README.md
docs/source/core-concepts.rst
docs/source/how-it-works.rst
docs/source/index.rst
pyproject.toml
src/ares/__init__.py
src/ares/presets.py
src/ares/presets_test.py

Latest Contributors(2)

User	Commit	Date
joshua.greaves@gmail.com	Add a transformers cli...	February 24, 2026
Narmeen07	Add mechanistic interp...	February 19, 2026

Agents & Environments Flow

Switch environments, code agents, instrumentation, and examples over to the new flow so observations returned from CodeEnvironment, TwentyQuestionsEnvironment, and the SkyRLLib/Tinker adapters are still consumable while code agents keep recording assistants’ utterances with open_responses messages and response.extract_text_content. Harden container helpers and dashboards to understand the new telemetry, and exercise the transformation in the Twenty Questions smoke example.

Modified files (11)

examples/04_rl_training_with_skyrl.py
examples/05_tinker_train.py
examples/utils.py
src/ares/code_agents/code_agent_base.py
src/ares/code_agents/mini_swe_agent.py
src/ares/code_agents/terminus2/terminus2_agent.py
src/ares/code_agents/terminus2/terminus2_agent_test.py
src/ares/containers/docker.py
src/ares/contrib/eval_visualizer.py
src/ares/environments/code_env.py
src/ares/environments/twenty_questions.py

Latest Contributors(2)

User	Commit	Date
ryan@withmartian.com	Add SkyRL Integration ...	February 03, 2026
joshua.greaves@gmail.com	Add an LLMResponse mod...	January 29, 2026

Canonical Open Responses Stack

Adopt the open_responses helpers, InferenceResult, and corresponding request/response plumbing to replace the legacy chat/Responses/Claude converters, keeping queue mediation, LLM clients, contrib adapters, and their tests aligned with the canonical payload shape. Include the new martian-linguafranca dependency so adapters such as chat_completions_compatible, llama_cpp, hooked_transformer_client, and transformers_client can all call open_responses.to_chat_completions_kwargs and return the unified InferenceResult, while retiring the previous converter modules.

Modified files (14)

src/ares/contrib/llama_cpp.py
src/ares/contrib/mech_interp/hooked_transformer_client.py
src/ares/contrib/transformers_client.py
src/ares/contrib/transformers_client_test.py
src/ares/llms/anthropic_converter.py
src/ares/llms/chat_completions_compatible.py
src/ares/llms/llm_clients.py
src/ares/llms/open_responses.py
src/ares/llms/open_responses_test.py
src/ares/llms/openai_chat_converter.py
src/ares/llms/openai_responses_converter.py
src/ares/llms/queue_mediated_client.py
src/ares/llms/queue_mediated_client_test.py
src/ares/llms/response.py

Latest Contributors(2)

User	Commit	Date
joshua.greaves@gmail.com	Add a transformers cli...	February 24, 2026
Narmeen07	Add mechanistic interp...	February 19, 2026

This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

New Features
- Standardized request/response handling using canonical "Open Responses" types from the Linguafranca library.
Bug Fixes
- Added runtime guards to container operations to error early if a container isn’t started.
Documentation
- Updated docs, examples, and contribution guidance to reflect the new request/response model and usage patterns.
Refactor
- Broad API and internal type migration to the new canonical request/response shape.
Tests
- Updated and added tests to cover the new request/response helpers and queue-mediated flows.
Dependencies
- Added martian-linguafranca>=0.1.5.

Make Open Responses the canonical internal request shape so queue mediation, environments, agents, and local clients share one path. Keep legacy request adapters at the edges through thin linguafranca-backed converters while preserving existing behavior.

coderabbitai · 2026-03-13T21:49:09Z

📝 Walkthrough

Walkthrough

The PR replaces the internal ARES LLM request/response abstractions with linguafranca "Open Responses" types, introduces an InferenceResult wrapper for responses and cost, removes legacy converters, adds open_responses helpers, and updates clients, agents, environments, examples, and tests to the new types and APIs.

Changes

Cohort / File(s)	Summary
Documentation & Guides `CLAUDE.md`, `CONTRIBUTING.md`, `README.md`, `docs/source/core-concepts.rst`, `docs/source/how-it-works.rst`, `docs/source/index.rst`, `src/ares/__init__.py`	Swap terminology/signatures to use `lft.OpenResponsesRequest` observations and `InferenceResult` actions; update import guidance to prefer `from ares.llms import open_responses` and `open_responses.make_request(...)`.
Core LLM API & Types `src/ares/llms/__init__.py`, `src/ares/llms/llm_clients.py`, `src/ares/llms/response.py`	Replace legacy `LLMResponse`/`TextData`/`Usage` with `InferenceResult` wrapping `lft.OpenResponsesResponse`; update `LLMClient` protocol to accept `lft.OpenResponsesRequest` and return `InferenceResult`; expose `extract_text_content`/`make_response`.
New helpers: Open Responses `src/ares/llms/open_responses.py`, `src/ares/llms/open_responses_test.py`	Add `open_responses` module: builders (`user_message`, `make_request`, etc.), conversions to Chat Completions kwargs, message/content extraction, JSON normalization, and tests validating behavior and tool/function handling.
Removed legacy request & converters `src/ares/llms/request.py`, `src/ares/llms/anthropic_converter.py`, `src/ares/llms/openai_chat_converter.py`, `src/ares/llms/openai_responses_converter.py` and their tests	Delete the old internal `LLMRequest` abstraction and provider-specific converter modules and associated tests — functionality superseded by linguafranca types + `open_responses`.
LLM Client implementations & adapters `src/ares/llms/chat_completions_compatible.py`, `src/ares/llms/queue_mediated_client.py`, `src/ares/llms/queue_mediated_client_test.py`, `src/ares/contrib/llama_cpp.py`, `src/ares/contrib/mech_interp/hooked_transformer_client.py`, `src/ares/contrib/transformers_client.py`, `src/ares/contrib/transformers_client_test.py`	Update clients to accept `lft.OpenResponsesRequest` and return `InferenceResult`; convert request/response handling to use `open_responses.to_chat_completions_kwargs(...)` and `response.make_response(...)`; adjust batching and rendering logic in transformers client and tests.
Code agents & tests `src/ares/code_agents/...`, `src/ares/code_agents/terminus2/terminus2_agent_test.py`	Migrate agents to build requests via `open_responses.make_request(...)`, store linguafranca message types, accept/return `InferenceResult`, and extract text with `extract_text_content(...)`; update tests to mock/wrap responses via `make_response(...)`.
Environments & adapters `src/ares/environments/code_env.py`, `src/ares/environments/twenty_questions.py`, `examples/04_rl_training_with_skyrl.py`, `examples/05_tinker_train.py`, `examples/utils.py`	Retype environments/adapters to `Environment[InferenceResult, lft.OpenResponsesRequest, ...]`; change observation/action construction and message extraction to use `open_responses` helpers and `extract_text_content`.
Testing, mocks & utils `src/ares/testing/mock_llm.py`, `src/ares/testing/mock_llm_test.py`, `src/ares/llms/queue_mediated_client_test.py`, `src/ares/llms/open_responses_test.py`	Update MockLLM to record `OpenResponsesRequest` and return `InferenceResult`; update tests to construct canonical requests via `open_responses` and assert using `response.extract_text_content(...)`.
Contrib / Visualization / Containers `src/ares/contrib/eval_visualizer.py`, `src/ares/containers/docker.py`	Retype TrackedEnvironment/dashboard to new request/response types; add guard checks in `DockerContainer.*` methods to raise when container not started.
Presets & registry tests `src/ares/presets.py`, `src/ares/presets_test.py`	Prevent duplicate preset registration by tracking existing names; add test ensuring TwentyQuestions preset uses Open Responses observations.
Project config `pyproject.toml`	Add runtime dependency: `martian-linguafranca>=0.1.5`.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Agent
participant OpenResponses as open_responses
participant LLMClient
participant ExternalModel as Model/API
participant ResponseLib as response.make_response
participant Environment
Agent->>OpenResponses: make_request(...) (instructions/messages)
OpenResponses->>LLMClient: lft.OpenResponsesRequest
LLMClient->>ExternalModel: external API call (chat completions / local model)
ExternalModel-->>LLMClient: external response payload
LLMClient->>ResponseLib: make_response(content, tokens...)
ResponseLib-->>LLMClient: lft.OpenResponsesResponse
LLMClient-->>Agent: InferenceResult(response, cost)
Agent->>Environment: step(InferenceResult)
Environment-->>Agent: TimeStep(observation: lft.OpenResponsesRequest,...)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 I hopped from old requests to linguafranca's light,
Built tidy make_request paths and wrapped responses tight.
An InferenceResult pouch holds tokens and cost,
The rabbit approves—no converters lost! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.37% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Use linguafranca Open Responses requests across ARES' accurately and specifically describes the main objective of the changeset: adopting linguafranca's canonical Open Responses types as the core request/response abstraction throughout the ARES codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch plan/linguafranca-migration

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Address audit findings by preserving embedded assistant tool calls, restoring preset re-registration after registry clears, and rejecting unknown external request parameters in strict mode. Align docs and helpers with the canonical Open Responses request path and keep local batching permissive where the previous behavior expected it.

Unify the canonical chat serialization path with the tested converter cleanup, make preset registration idempotent without registry coupling, and add direct regression coverage for queue-mediated requests and preset-based observations. Update public docs and examples to reflect Open Responses as the runtime request model.

andrewtran117 · 2026-03-25T04:29:11Z

src/ares/llms/open_responses.py

+
+_LOGGER = logging.getLogger(__name__)
+
+MODEL_STUB = "__ARES_MODEL_UNSET__"


Does this need error handling?

What do you mean?

andrewtran117

left small comment on error handling the model stub. otherwise looks good pending test files passed.

… client

…stency, tighten public API

…ctly Delete the ares.llms.request module and all legacy conversion functions (from_legacy_request, to_legacy_request). Converters now take/return lft.OpenResponsesRequest directly, simplifying the codebase. - Remove request.py and request_test.py (legacy LLMRequest dataclass) - Update open_responses.py: remove legacy conversion helpers, use lft types - Update converters (anthropic, openai_chat, openai_responses) to use lft.OpenResponsesRequest input/output - Update all tests to use open_responses.make_request() instead of LLMRequest - Update type annotations across agents, environments, and clients Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…types Migrate from custom response types (LLMResponse, TextData, Usage) to a thin wrapper that holds the linguafranca OpenResponsesResponse directly. This simplifies the codebase by using canonical types throughout. Key changes: - Rename LLMResponse to InferenceResult with fields: response, cost - Add extract_text_content() utility function for text extraction - Add make_response() helper for creating test responses - Update all code agents, environments, and tests to use new types - Fix docker.py runtime checks for optional container access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

These converters were dead code - only used in their own tests. The actual conversion happens via open_responses.to_chat_completions_kwargs() which calls linguafranca directly. Deleted: - openai_chat_converter.py - anthropic_converter.py - openai_responses_converter.py - Their corresponding test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

joshgreaves · 2026-03-27T21:40:33Z

examples/04_rl_training_with_skyrl.py

        if not self.preset_name:
            raise ValueError("preset_name must be provided in extras or kwargs")
-        self.env: ares.Environment[llms.LLMResponse, llms.LLMRequest, float, float] | None = None
+        self.env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float] | None = None


Q for @rsmith49
There is a little bit of asymmetry here, wdyt of it? Inputs to step are LLM responses (ARES type, though it wraps linguafranca), outputs of step are LLM requests (linguafranca type).

Is this confusing? We could alias linguafranca types so people can use ARES aliases instead, but I'm not sure if that's even more confusing.

Yeah, I agree it feels a little off. The right approach is probably to wrap lft.OpenResponsesRequest with an ARES-specific wrapper as well to future proof in case there are other top level fields we need (like cost for the response), and have it be a single attribute dataclass?

If we want to do this approach long-term, I think aliasing the type within ARES makes sense for now

I agree, I ended up using an alias to the type called InferenceRequest
Now there's a slightly weird matchup, InferenceRequest and InferenceResult. I think this is better, but agree Result is weird, but it feels like we can't do Response

joshgreaves · 2026-03-27T22:30:07Z

src/ares/contrib/transformers_client.py

    pass


+def _render_content_to_text(content: object, *, context: str) -> str:


I don't love all these new formatting fns, but this is contrib so we can come back to this if we think it's worth it.

baz-reviewer · 2026-03-27T23:20:54Z

examples/05_tinker_train.py

 import ares
 from ares import containers
 from ares import llms
+from ares.llms import open_responses
 import chz
 import frozendict
+from linguafranca import types as lft
 import numpy as np


Should we move from ares.llms import open_responses and from linguafranca import types as lft after the third-party imports so all third-party imports appear before ARES/local imports per CLAUDE.md and CONTRIBUTING.md?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In examples/05_tinker_train.py around lines 49-56, the local ARES imports (import ares, from ares import containers, from ares import llms, from ares.llms import open_responses) are placed among third-party imports. Reorder the imports so all third-party imports (chz, frozendict, from linguafranca import types as lft, numpy, tinker, from tinker_cookbook import cli_utils) appear together first, then add a blank line, and place the ARES/local imports after them. Preserve existing import names and aliases and ensure the file follows stdlib → third-party → local grouping with a blank line separating groups.

baz-reviewer · 2026-03-27T23:20:54Z

src/ares/llms/open_responses.py

+def with_model(request: lft.OpenResponsesRequest, model: str) -> lft.OpenResponsesRequest:
+    """Return a copy of the request with the model field replaced.
+
+    Args:
+        request: The original request.
+        model: The new model identifier.
+
+    Returns:
+        A new request with the updated model field.
+    """
+    return dataclasses.replace(request, model=model)
+


open_responses.ensure_request is referenced by tests but not defined and raises AttributeError, should we add ensure_request that returns the canonical request unchanged?
def ensure_request(req): return req

_{Finding type: Logical Bugs | Severity: 🔴 High}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In src/ares/llms/open_responses.py around lines 168 to 183, add a new top-level function named ensure_request that accepts an OpenResponsesRequest and returns it unchanged. Implement it as: def ensure_request(request: lft.OpenResponsesRequest) -> lft.OpenResponsesRequest: """Return the canonical request unchanged; provided for callers/tests that expect this helper.""" return request. Keep typing and a short docstring so it is importable and documented.

baz-reviewer · 2026-03-27T23:20:54Z

src/ares/llms/open_responses.py

+def to_jsonable(value: Any) -> _JSONABLE:
+    # TODO: Replace this with frfr.
+    # The issue is that OpenResponsesRequest has enum values, which frfr doesn't handle correctly yet.
+    # It won't, in general, either. So we should make sure OpenResposnesRequest enums are StrEnums where appropriate.
+    if dataclasses.is_dataclass(value):
+        return {field.name: to_jsonable(getattr(value, field.name)) for field in dataclasses.fields(value)}
+    if isinstance(value, enum.Enum):


Comment before to_jsonable says OpenResposnesRequest, should we replace it with OpenResponsesRequest?

_{Finding type: Naming and Typos | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In src/ares/llms/open_responses.py around lines 266 to 272, the comment above the to_jsonable function mistakenly spells 'OpenResponsesRequest' as 'OpenResposnesRequest'. Edit the comment to correct the typo to 'OpenResponsesRequest' (leave the rest of the comment text unchanged) so the terminology matches the rest of the module.

baz-reviewer · 2026-03-27T23:20:54Z

src/ares/environments/twenty_questions.py

+        observation = open_responses.make_request(
+            [open_responses.user_message(initial_prompt)],
+            instructions=self._system_prompt,
        )


The branches of step() duplicate the open_responses.make_request([open_responses.user_message(..., instructions=self._system_prompt)observation construction, should we extract it into_make_observation_prompt(text: str)`?

_{Finding type: Code Dedup and Conventions | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

baz-reviewer · 2026-03-27T23:20:55Z

examples/utils.py


 import ares
 from ares import llms
+from ares.llms import open_responses
+from linguafranca import types as lft
 import tqdm



ares imports sit before third-party tqdm with no separating blank line — should we move ares into the final local group separated by a blank line per CLAUDE.md?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In examples/utils.py around lines 7-13, the import group ordering is wrong: local ARES imports (import ares, from ares import llms, from ares.llms import open_responses, from linguafranca import types as lft) appear before the third-party import tqdm and there is no blank line separating third-party and local imports. Reorder the imports into conventional groups (stdlib first, then third-party, then local), place from ares... and from linguafranca... into the local group after tqdm, and add a single blank line between the third-party group and the local ARES imports. Also ensure imports within each group are sorted consistently (e.g., alphabetically).

baz-reviewer · 2026-03-27T23:20:55Z

src/ares/containers/docker.py

    ) -> containers.ExecResult:
+        if self._container is None:
+            raise RuntimeError("Container not started. Call start() first.")
+
        # Use default_workdir if workdir is not explicitly provided


self._container is None start guard is duplicated in exec_run, upload_files, and download_files — should we extract it into a _assert_started() helper?

_{Finding type: Code Dedup and Conventions | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Yeah #47 addressed this - would be great to address merge conflicts on that and get it merged at some point

Thanks for the reminder—#47 has landed but the docker container guard conflict still needs resolving, so please rebase onto the latest main and address the duplication before we can merge.

baz-reviewer · 2026-03-27T23:20:55Z

src/ares/llms/__init__.py

+"""LLM client interfaces and data types.
+
+Canonical request builders and request types live in :mod:`ares.llms.open_responses`.
+
+Prefer ``from ares.llms import open_responses`` to access request types and builders
+rather than importing individual type aliases from this package.
+"""

-# Request types
-# Client protocol
 from ares.llms.chat_completions_compatible import ChatCompletionCompatibleLLMClient
 from ares.llms.llm_clients import LLMClient
-from ares.llms.request import AssistantMessage
-from ares.llms.request import LLMRequest
-from ares.llms.request import Message
-from ares.llms.request import ToolCallMessage
-from ares.llms.request import ToolCallResponseMessage
-from ares.llms.request import UserMessage
-
-# Response types
-from ares.llms.response import LLMResponse
-from ares.llms.response import TextData
-from ares.llms.response import Usage
+from ares.llms.response import InferenceResult
+from ares.llms.response import extract_text_content


from ares.llms import LLMRequest breaks after removing src/ares/llms/request.py; should we reintroduce thin shims/aliases in ares.llms to re-export the new types or provide a documented migration path and review the removed file for shim locations?

_{Finding type: Breaking Changes | Severity: 🔴 High}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In src/ares/llms/__init__.py around lines 1-21, the package no longer re-exports the previous request/message/tool dataclasses (e.g. LLMRequest, AssistantMessage, ToolCallMessage, UserMessage) and this breaks downstream `from ares.llms import ...` imports. Restore backwards-compatible thin shims by re-importing and re-exporting those types (either from src/ares/llms/request.py if it still exists, or from the new open_responses module if types were moved) and add them back into __all__. Also add short deprecation notices (warnings.warn) or docstrings pointing to the new canonical API (ares.llms.open_responses) so consumers have a migration path. Finally, review src/ares/llms/request.py to ensure the correct original type names are exported and include a unit test that imports the legacy names to validate compatibility.

baz-reviewer · 2026-03-27T23:20:55Z

examples/04_rl_training_with_skyrl.py

 import ares
 from ares import llms
+from ares.llms import open_responses
 import hydra
+from linguafranca import types as lft
 import omegaconf


ARES imports appear before third-party imports; should we reorder to stdlib → third-party → local/ARES with blank lines and ARES last per CLAUDE.md?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In examples/04_rl_training_with_skyrl.py around lines 49 to 54, the ARES imports (import ares; from ares import llms; from ares.llms import open_responses) are placed before third-party imports, violating the project's import ordering. Reorder imports to follow: stdlib imports first (unchanged), then a single blank line, then all third-party imports (hydra, linguafranca/types, omegaconf, ray, skyrl_gym) grouped together, then a single blank line, and finally the ARES/local imports grouped together (import ares; from ares import llms; from ares.llms import open_responses). Ensure there is exactly one blank line separating each group and update any import lines or spacing accordingly.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/ares/contrib/mech_interp/hooked_transformer_client.py (2)

74-91: ⚠️ Potential issue | 🟠 Major

Validate max_output_tokens before subtracting it from n_ctx.

Values <= 0 or >= self.model.cfg.n_ctx make max_new_tokens invalid or drive apply_chat_template(..., max_length=...) to a non-positive value. Right now that becomes a runtime failure instead of a clean input error.

🛠️ Proposed guard

-        max_output_tokens = max_output_tokens or request.max_output_tokens or self.max_new_tokens
+        if max_output_tokens is None:
+            max_output_tokens = request.max_output_tokens
+        if max_output_tokens is None:
+            max_output_tokens = self.max_new_tokens
+        if max_output_tokens <= 0:
+            raise ValueError("max_output_tokens must be positive")
+        if max_output_tokens >= self.model.cfg.n_ctx:
+            raise ValueError("max_output_tokens must be smaller than the model context window")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/mech_interp/hooked_transformer_client.py` around lines 74 -
91, Validate resolved max_output_tokens (from max_output_tokens or
request.max_output_tokens or self.max_new_tokens) before using it to compute
max_length for apply_chat_template: ensure it is >0 and < self.model.cfg.n_ctx,
and if not raise a clear ValueError (or return a clean input error) indicating
the invalid max_output_tokens and the allowed range. Do this check immediately
after computing max_output_tokens and before computing
max_length/self.model.tokenizer.apply_chat_template; reference the symbols
max_output_tokens, request.max_output_tokens, self.max_new_tokens,
self.model.cfg.n_ctx, and apply_chat_template so the guard is easy to locate and
update.

97-106: ⚠️ Potential issue | 🟠 Major

top_p is silently ignored in this client.

The canonical request now carries sampling controls, but this adapter only forwards temperature. That means the same OpenResponsesRequest behaves differently here than it does in TransformersLLMClient. Either pass request.top_p through to generation or reject it explicitly instead of dropping it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/mech_interp/hooked_transformer_client.py` around lines 97 -
106, The client currently only forwards request.temperature into gen_kwargs (see
gen_kwargs and the temperature handling block), silently dropping request.top_p;
update the adapter in HookedTransformerClient (hooked_transformer_client.py) to
also forward request.top_p into gen_kwargs when not None, or explicitly raise an
error if top_p is unsupported. Locate the generation kwargs construction
(gen_kwargs) and either add gen_kwargs["top_p"] = request.top_p when
request.top_p is not None, or add a guard that raises a clear exception
mentioning top_p if you intend not to support it.

src/ares/contrib/eval_visualizer.py (1)

91-121: ⚠️ Potential issue | 🟡 Minor

Widen the dashboard wrapper's observation types to accept CodeEnvironment.

Lines 93, 112, 121 (constructor, reset(), step() return types) and line 702 (wrap() parameter type) currently expect lft.OpenResponsesRequest, but CodeEnvironment is defined with observation type lft.OpenResponsesRequest | None on terminal timesteps. This causes static type checking to reject CodeEnvironment as an argument to wrap() or the TrackedEnvironment constructor, despite the docstring claiming to support "all ARES code agent environments."

Suggested fix

-        env: base.Environment[response.InferenceResult, lft.OpenResponsesRequest, RewardType, DiscountType],
+        env: base.Environment[
+            response.InferenceResult, lft.OpenResponsesRequest | None, RewardType, DiscountType
+        ],
@@
-    async def reset(self) -> base.TimeStep[lft.OpenResponsesRequest, RewardType, DiscountType]:
+    async def reset(self) -> base.TimeStep[lft.OpenResponsesRequest | None, RewardType, DiscountType]:
@@
-    ) -> base.TimeStep[lft.OpenResponsesRequest, RewardType, DiscountType]:
+    ) -> base.TimeStep[lft.OpenResponsesRequest | None, RewardType, DiscountType]:
@@
-        env: base.Environment[response.InferenceResult, lft.OpenResponsesRequest, RewardType, DiscountType],
+        env: base.Environment[
+            response.InferenceResult, lft.OpenResponsesRequest | None, RewardType, DiscountType
+        ],

Rerun pyright to confirm the fix resolves the type incompatibility.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/eval_visualizer.py` around lines 91 - 121, The type
annotations for observations in TrackedEnvironment need to accept
CodeEnvironment terminal timesteps that use None; update the constructor
signature, reset() return type, step() return type, and the wrap() parameter to
use lft.OpenResponsesRequest | None (or Optional[lft.OpenResponsesRequest])
instead of just lft.OpenResponsesRequest so CodeEnvironment is accepted; modify
the annotations on __init__, reset, step, and the wrap(...) parameter
accordingly and rerun pyright to confirm the type error is resolved.

examples/05_tinker_train.py (1)

124-138: ⚠️ Potential issue | 🟡 Minor

Widen the env observation type to accept None.

Line 126 declares the observation type as lft.OpenResponsesRequest, but the environment's step() method returns TimeStep with lft.OpenResponsesRequest | None on terminal transitions. The _get_tinker_observation() method already handles the None case correctly (line 139), and CodeEnvironment—the main environment this adapter wraps—declares observation type with | None. The type annotation must match the actual behavior.
Suggested fix
-        env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest, float, float],
+        env: ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest | None, float, float],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/05_tinker_train.py` around lines 124 - 138, The constructor's env
annotation is too narrow: change the env type parameter in __init__ to accept
observations that may be None (i.e., use lft.OpenResponsesRequest | None) so it
matches the TimeStep passed into _get_tinker_observation; update the signature
where env is typed (the __init__ method) to
ares.Environment[llms.InferenceResult, lft.OpenResponsesRequest | None, float,
float] so the type matches the actual environment behavior handled by
_get_tinker_observation.

🧹 Nitpick comments (4)

src/ares/containers/docker.py (1)
104-106: Consider centralizing the startup guard and documenting the raised RuntimeError.

The same check/message is repeated in three methods. A small helper (e.g., _require_started_container()) would reduce duplication and improve type narrowing. Also worth updating the Container protocol/docstrings to mention this precondition explicitly for callers.

Also applies to: 135-136, 165-166
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/containers/docker.py` around lines 104 - 106, Extract the repeated
guard "if self._container is None: raise RuntimeError('Container not started.
Call start() first.')" into a small private helper method, e.g.,
_require_started_container(self), and replace the three occurrences with calls
to that helper to reduce duplication and enable type narrowing of
self._container; update the Container protocol/docstrings to document that
methods accessing self._container require the container to be started and that
they raise RuntimeError otherwise (mention _require_started_container as the
enforcement point).
README.md (1)
18-18: Minor terminology inconsistency.

"Open Responses requests" and "LLMResponses" are mixed terminology styles. Given the migration, consider using consistent naming like "Open Responses requests" and "inference results" (or "LLM responses" to match the code comment style).
📝 Suggested wording
-ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
+ARES treats Open Responses requests as observations and LLM responses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 18, The README line uses mixed terminology ("Open
Responses requests" vs "LLMResponses"); update the sentence to use consistent
naming (e.g., keep "Open Responses requests" and change "LLMResponses" to "LLM
responses" or "inference results") so the phrasing matches other comments and
code; locate the exact sentence containing "Open Responses requests" and
"LLMResponses" and replace "LLMResponses" with the chosen consistent term across
the README to maintain style uniformity.
examples/04_rl_training_with_skyrl.py (1)
109-110: Consider strict=False for robustness in training scenarios.

Using strict=True in to_chat_messages will raise ValueError if observations contain non-text content (images, tool calls, function outputs). While this provides fail-fast behavior, it could cause training crashes if any environment produces such observations.

Consider using strict=False to match the pattern in examples/utils.py, or document that this training integration assumes text-only observations.

Also applies to: 135-135
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/04_rl_training_with_skyrl.py` around lines 109 - 110, The training
code uses open_responses.to_chat_messages(ts.observation, strict=True) which
will raise ValueError on non-text observations; change strict=True to
strict=False in the two uses of to_chat_messages (the calls on ts.observation)
so the function tolerates images/tool outputs during training and matches the
pattern in examples/utils.py, or alternatively add a clear comment documenting
the assumption of text-only observations if you intentionally want to keep
strict=True.
docs/source/core-concepts.rst (1)
160-170: Code example missing import for open_responses module.

The example uses open_responses.make_request() and open_responses.user_message() but doesn't show the required import statement. Consider adding the import for clarity:
from ares.llms import open_responses
This would help users understand where these helpers come from when following the documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/core-concepts.rst` around lines 160 - 170, The example uses
open_responses.make_request and open_responses.user_message but omits the
import; add an import for the open_responses helper (e.g., import open_responses
from its module) at the top of the snippet so symbols like
open_responses.make_request and open_responses.user_message are defined when
demonstrating the async run(self, task: str) -> None method and its calls to
self._llm_client and self.parse_commands.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 284-291: The docs reference non-existent types (llms.TextData,
llms.Usage, response.TextData, response.Usage); update CLAUDE.md to use the
actual exported symbols: replace TextData usages with InferenceResult or
extract_text_content/make_response where appropriate, and replace Usage with
lft.Usage (from linguafranca). Also prefer referencing the package exports:
ChatCompletionCompatibleLLMClient, InferenceResult, LLMClient,
extract_text_content, make_response from ares.llms/__init__.py instead of the
removed types; search for any occurrences of TextData/Usage and swap them to the
correct symbols and examples accordingly.

In `@src/ares/contrib/llama_cpp.py`:
- Around line 101-109: The current conversion in the LlamaCpp adapter discards
tool call data by using chat_completion.choices[0].message.content and
response.make_response, so preserve tool calls by converting the entire chat
message (including message.tool_calls) into the library's
Response/InferenceResult format instead of only text; update the logic around
chat_completion.choices[0].message to detect and map message.tool_calls into the
equivalent fields on the Response (or build a Response from message content plus
tool_call structure) and return that in response.InferenceResult (mirror the
same pattern for ChatCompletionCompatibleLLMClient to avoid the same loss); add
a unit test that supplies a chat_completion with message.tool_calls to assert
the resulting InferenceResult contains the tool call payload.

In `@src/ares/llms/chat_completions_compatible.py`:
- Around line 77-85: The Chat Completions handling currently only uses
resp.choices[0].message.content and drops any
resp.choices[0].message.tool_calls; update the code that builds the inference
response (around response.make_response and response.InferenceResult) to
preserve tool invocations by detecting resp.choices[0].message.tool_calls and
passing them through into the Open Responses format: either extend
response.make_response to accept a tool_calls (or tools) parameter and pass
resp.choices[0].message.tool_calls, or add a small converter that constructs the
Open Responses-style response object containing both content and tool_calls
before creating lf_response; keep existing fields (model, input_tokens,
output_tokens, response_id, cost) unchanged.

In `@src/ares/llms/open_responses_test.py`:
- Around line 78-81: The test is calling a missing helper
open_responses.ensure_request which raises AttributeError; either reintroduce
ensure_request in src/ares/llms/open_responses.py (implement it to accept a
request produced by make_request and return the canonical request object) or
change the test to use the real normalization entry point (e.g., call the
existing request factory/normalizer such as open_responses.make_request or the
module's public normalize function) so the test asserts the canonical request is
returned; update references to ensure_request, make_request, and user_message
accordingly to keep the test exercising the supported API.

---

Outside diff comments:
In `@examples/05_tinker_train.py`:
- Around line 124-138: The constructor's env annotation is too narrow: change
the env type parameter in __init__ to accept observations that may be None
(i.e., use lft.OpenResponsesRequest | None) so it matches the TimeStep passed
into _get_tinker_observation; update the signature where env is typed (the
__init__ method) to ares.Environment[llms.InferenceResult,
lft.OpenResponsesRequest | None, float, float] so the type matches the actual
environment behavior handled by _get_tinker_observation.

In `@src/ares/contrib/eval_visualizer.py`:
- Around line 91-121: The type annotations for observations in
TrackedEnvironment need to accept CodeEnvironment terminal timesteps that use
None; update the constructor signature, reset() return type, step() return type,
and the wrap() parameter to use lft.OpenResponsesRequest | None (or
Optional[lft.OpenResponsesRequest]) instead of just lft.OpenResponsesRequest so
CodeEnvironment is accepted; modify the annotations on __init__, reset, step,
and the wrap(...) parameter accordingly and rerun pyright to confirm the type
error is resolved.

In `@src/ares/contrib/mech_interp/hooked_transformer_client.py`:
- Around line 74-91: Validate resolved max_output_tokens (from max_output_tokens
or request.max_output_tokens or self.max_new_tokens) before using it to compute
max_length for apply_chat_template: ensure it is >0 and < self.model.cfg.n_ctx,
and if not raise a clear ValueError (or return a clean input error) indicating
the invalid max_output_tokens and the allowed range. Do this check immediately
after computing max_output_tokens and before computing
max_length/self.model.tokenizer.apply_chat_template; reference the symbols
max_output_tokens, request.max_output_tokens, self.max_new_tokens,
self.model.cfg.n_ctx, and apply_chat_template so the guard is easy to locate and
update.
- Around line 97-106: The client currently only forwards request.temperature
into gen_kwargs (see gen_kwargs and the temperature handling block), silently
dropping request.top_p; update the adapter in HookedTransformerClient
(hooked_transformer_client.py) to also forward request.top_p into gen_kwargs
when not None, or explicitly raise an error if top_p is unsupported. Locate the
generation kwargs construction (gen_kwargs) and either add gen_kwargs["top_p"] =
request.top_p when request.top_p is not None, or add a guard that raises a clear
exception mentioning top_p if you intend not to support it.

---

Nitpick comments:
In `@docs/source/core-concepts.rst`:
- Around line 160-170: The example uses open_responses.make_request and
open_responses.user_message but omits the import; add an import for the
open_responses helper (e.g., import open_responses from its module) at the top
of the snippet so symbols like open_responses.make_request and
open_responses.user_message are defined when demonstrating the async run(self,
task: str) -> None method and its calls to self._llm_client and
self.parse_commands.

In `@examples/04_rl_training_with_skyrl.py`:
- Around line 109-110: The training code uses
open_responses.to_chat_messages(ts.observation, strict=True) which will raise
ValueError on non-text observations; change strict=True to strict=False in the
two uses of to_chat_messages (the calls on ts.observation) so the function
tolerates images/tool outputs during training and matches the pattern in
examples/utils.py, or alternatively add a clear comment documenting the
assumption of text-only observations if you intentionally want to keep
strict=True.

In `@README.md`:
- Line 18: The README line uses mixed terminology ("Open Responses requests" vs
"LLMResponses"); update the sentence to use consistent naming (e.g., keep "Open
Responses requests" and change "LLMResponses" to "LLM responses" or "inference
results") so the phrasing matches other comments and code; locate the exact
sentence containing "Open Responses requests" and "LLMResponses" and replace
"LLMResponses" with the chosen consistent term across the README to maintain
style uniformity.

In `@src/ares/containers/docker.py`:
- Around line 104-106: Extract the repeated guard "if self._container is None:
raise RuntimeError('Container not started. Call start() first.')" into a small
private helper method, e.g., _require_started_container(self), and replace the
three occurrences with calls to that helper to reduce duplication and enable
type narrowing of self._container; update the Container protocol/docstrings to
document that methods accessing self._container require the container to be
started and that they raise RuntimeError otherwise (mention
_require_started_container as the enforcement point).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9305d7ad-0b27-4bc1-97a4-3d0a28c2128f

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 134ef0f.

📒 Files selected for processing (43)

CLAUDE.md
CONTRIBUTING.md
README.md
docs/source/core-concepts.rst
docs/source/how-it-works.rst
docs/source/index.rst
examples/04_rl_training_with_skyrl.py
examples/05_tinker_train.py
examples/utils.py
pyproject.toml
src/ares/__init__.py
src/ares/code_agents/code_agent_base.py
src/ares/code_agents/mini_swe_agent.py
src/ares/code_agents/terminus2/terminus2_agent.py
src/ares/code_agents/terminus2/terminus2_agent_test.py
src/ares/containers/docker.py
src/ares/contrib/eval_visualizer.py
src/ares/contrib/llama_cpp.py
src/ares/contrib/mech_interp/hooked_transformer_client.py
src/ares/contrib/transformers_client.py
src/ares/contrib/transformers_client_test.py
src/ares/environments/code_env.py
src/ares/environments/twenty_questions.py
src/ares/llms/__init__.py
src/ares/llms/anthropic_converter.py
src/ares/llms/anthropic_converter_test.py
src/ares/llms/chat_completions_compatible.py
src/ares/llms/llm_clients.py
src/ares/llms/open_responses.py
src/ares/llms/open_responses_test.py
src/ares/llms/openai_chat_converter.py
src/ares/llms/openai_chat_converter_test.py
src/ares/llms/openai_responses_converter.py
src/ares/llms/openai_responses_converter_test.py
src/ares/llms/queue_mediated_client.py
src/ares/llms/queue_mediated_client_test.py
src/ares/llms/request.py
src/ares/llms/request_test.py
src/ares/llms/response.py
src/ares/presets.py
src/ares/presets_test.py
src/ares/testing/mock_llm.py
src/ares/testing/mock_llm_test.py

💤 Files with no reviewable changes (8)

src/ares/llms/request_test.py
src/ares/llms/openai_responses_converter_test.py
src/ares/llms/anthropic_converter_test.py
src/ares/llms/openai_chat_converter.py
src/ares/llms/request.py
src/ares/llms/openai_chat_converter_test.py
src/ares/llms/openai_responses_converter.py
src/ares/llms/anthropic_converter.py

CLAUDE.md

coderabbitai · 2026-03-27T23:27:40Z

src/ares/contrib/llama_cpp.py

        content = chat_completion.choices[0].message.content or ""
-        usage = response.Usage(
-            prompt_tokens=chat_completion.usage.prompt_tokens if chat_completion.usage else 0,
-            generated_tokens=chat_completion.usage.completion_tokens if chat_completion.usage else 0,
+        lf_response = response.make_response(
+            content,
+            model=self.model_name,
+            input_tokens=chat_completion.usage.prompt_tokens if chat_completion.usage else 0,
+            output_tokens=chat_completion.usage.completion_tokens if chat_completion.usage else 0,
+            response_id=chat_completion.id,
        )
-        return response.LLMResponse(data=[response.TextData(content=content)], cost=0.0, usage=usage)
+        return response.InferenceResult(response=lf_response, cost=0.0)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the llama_cpp.py file around lines 101-109 cat -n src/ares/contrib/llama_cpp.py | sed -n '95,115p'

Repository: withmartian/ares

Length of output: 1190

🏁 Script executed:

# Also check the overall structure of the file wc -l src/ares/contrib/llama_cpp.py

Repository: withmartian/ares

Length of output: 93

🏁 Script executed:

# Look at the response.py to understand make_response cat -n src/ares/llms/response.py | head -120

Repository: withmartian/ares

Length of output: 3685

🏁 Script executed:

# Search for other Chat Completions clients that might handle tool_calls fd -t f -name "*.py" src/ares/contrib | head -20

Repository: withmartian/ares

Length of output: 290

🏁 Script executed:

# Search for other LLM client implementations that might be Chat Completions-based find src/ares/contrib -name "*.py" -type f

Repository: withmartian/ares

Length of output: 432

🏁 Script executed:

# Search for tool_calls handling in the codebase rg -A 5 -B 5 "tool_calls" src/ares --type py | head -100

Repository: withmartian/ares

Length of output: 7538

🏁 Script executed:

# Search for other uses of response.make_response to see how it's used rg "response\.make_response\|make_response" src/ares --type py -B 3 -A 3

Repository: withmartian/ares

Length of output: 42

🏁 Script executed:

# Look for open_responses.to_chat_completions_kwargs mentioned in the review scratchpad rg "to_chat_completions_kwargs" src/ares --type py -B 3 -A 10

Repository: withmartian/ares

Length of output: 10653

🏁 Script executed:

# Look at how ChatCompletionCompatibleLLMClient handles responses cat -n src/ares/llms/chat_completions_compatible.py

Repository: withmartian/ares

Length of output: 3664

🏁 Script executed:

# Search for response conversion functions from chat completions to Open Responses format rg "from_chat_completion\|ChatCompletion" src/ares/llms --type py -B 2 -A 5 | head -150

Repository: withmartian/ares

Length of output: 42

Preserve tool calls in LlamaCpp response conversion.

This adapter discards message.tool_calls by extracting only message.content before calling response.make_response(), which is a text-only helper. If the model returns tool calls (which the request-side explicitly supports via to_chat_completions_kwargs()), they are silently lost and downstream code sees only plain text. This breaks tool-calling workflows.

Note that ChatCompletionCompatibleLLMClient has the same limitation, suggesting the issue is broader than just this client—both need response-side tool call support. Consider building a proper response converter that maps OpenAI chat completion messages (including tool_calls) back to Open Responses format, rather than relying on the text-only make_response() helper. Add a test case with tool call responses to prevent regression.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ares/contrib/llama_cpp.py` around lines 101 - 109, The current conversion in the LlamaCpp adapter discards tool call data by using chat_completion.choices[0].message.content and response.make_response, so preserve tool calls by converting the entire chat message (including message.tool_calls) into the library's Response/InferenceResult format instead of only text; update the logic around chat_completion.choices[0].message to detect and map message.tool_calls into the equivalent fields on the Response (or build a Response from message content plus tool_call structure) and return that in response.InferenceResult (mirror the same pattern for ChatCompletionCompatibleLLMClient to avoid the same loss); add a unit test that supplies a chat_completion with message.tool_calls to assert the resulting InferenceResult contains the tool call payload.

src/ares/llms/chat_completions_compatible.py

coderabbitai · 2026-03-27T23:27:41Z

src/ares/llms/open_responses_test.py

+def test_ensure_request_accepts_canonical_request():
+    request = open_responses.make_request([open_responses.user_message("Hello")])
+    result = open_responses.ensure_request(request)
+    assert result is request


⚠️ Potential issue | 🔴 Critical

This test calls an API that isn't present.

open_responses.ensure_request(...) doesn't exist, and CI is already failing here with AttributeError. Either restore the helper in src/ares/llms/open_responses.py or update this test to exercise the supported request-normalization entry point.

🧰 Tools

🪛 GitHub Actions: unit-tests

[error] 80-80: pytest failure (exit code 1): AttributeError: module 'ares.llms.open_responses' has no attribute 'ensure_request' in test_ensure_request_accepts_canonical_request

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ares/llms/open_responses_test.py` around lines 78 - 81, The test is calling a missing helper open_responses.ensure_request which raises AttributeError; either reintroduce ensure_request in src/ares/llms/open_responses.py (implement it to accept a request produced by make_request and return the canonical request object) or change the test to use the real normalization entry point (e.g., call the existing request factory/normalizer such as open_responses.make_request or the module's public normalize function) so the test asserts the canonical request is returned; update references to ensure_request, make_request, and user_message accordingly to keep the test exercising the supported API.

rsmith49

Great work Josh 🎉 This was a huge effort, and I think it will unlock a lot of functionality for us!

A lot of comments (sorry), one of the most important IMO is making sure our OpenResponses -> transformers chat templates conversion handles tool calls in the best way, since that will affect training a lot.

rsmith49 · 2026-03-29T02:07:32Z

docs/source/core-concepts.rst

        |                                 │  │   from code agent via          │  │
        └─────────────────────────────────┼──│   QueueMediatedLLMClient       │  │
-                 LLMRequest (observation) │  └──────────────────┬─────────────┘  │
+           Open Responses observation     │  └──────────────────┬─────────────┘  │


nit: "OpenResponsesRequest (observation)" instead to show the type -> RL concept

rsmith49 · 2026-03-29T02:07:48Z

docs/source/core-concepts.rst

+           Open Responses observation     │  └──────────────────┬─────────────┘  │
                                          │                 ^   │                │
-                                          │      LLMRequest │   │ LLMResponse    │
+                                          │ Open Responses  │   │ InferenceResult│


Same nit as above

rsmith49 · 2026-03-29T02:08:35Z

docs/source/core-concepts.rst

 * **Treats LLM responses as actions** - Your trainable agent/policy provides responses

-Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``LLMResponse`` outputs given ``LLMRequest`` observations.
+Crucially, the **CodeAgent is part of the environment**, not what you're training. Your training loop optimizes an agent/policy that produces better ``InferenceResult`` outputs given canonical Open Responses observations.


"canonical" is confusing here, I would remove

rsmith49 · 2026-03-29T02:11:29Z

docs/source/core-concepts.rst

 The ``LLMClient`` abstraction serves two purposes:

-1. **Observations = LLM Requests**: In the RL loop, ``timestep.observation`` is an ``LLMRequest`` containing the messages the code agent wants to send to the LLM. This is the "state" your policy observes.
+1. **Observations = Open Responses requests**: In the RL loop, ``timestep.observation`` is a canonical Open Responses request containing what the code agent wants to send to the LLM. This is the "state" your policy observes.


Same "canonical" comment

rsmith49 · 2026-03-29T02:14:16Z

docs/source/core-concepts.rst

        cost: float

-This simple interface wraps OpenAI-style chat completion APIs. The ``messages`` field follows the OpenAI format with ``role`` (system/user/assistant) and ``content``.
+ARES uses linguafranca's ``OpenResponsesRequest`` as the canonical request type for observations and client inputs. Edge adapters convert to Chat/Responses/Anthropic formats only when needed.


I would change this to a note like

ARES leverages linguafranca for request and response types - we use the OpenResponsesRequest as our base request object returned from the environment for observations, and encourage users to use linguafranca.convert_* methods for translating between different provider and local formats.

rsmith49 · 2026-03-29T02:56:39Z

src/ares/llms/anthropic_converter.py

@@ -1,383 +0,0 @@
-"""Converter for Anthropic Messages API format.


Love all the red 🙌 🚀

rsmith49 · 2026-03-29T02:59:57Z

src/ares/llms/response.py

@@ -1,38 +1,99 @@
-"""LLM response model."""
+"""LLM response model wrapping linguafranca types."""


Worth moving all this into open_responses.py?

rsmith49 · 2026-03-29T03:05:08Z

src/ares/presets.py


    # Twenty Questions — lightweight, no Docker needed.
-    registry.register_preset("20q", TwentyQuestionsSpec())
+    if "20q" not in existing_preset_names:


This and the above logic is just masking an error - IMO we either make register_preset allow overwriting existing presets, or we make sure that _register_default_presets is only called once (although the duplicate check in the list of harbor presets is worth keeping so we don't accidentally break if they add duplicates to their registry)

rsmith49 · 2026-03-29T03:05:41Z

pyproject.toml

    "harbor>=0.1.32",
    "httpx>=0.28.1",
    "jinja2>=3.1.6",
+    "martian-linguafranca>=0.1.5",


rsmith49 · 2026-03-29T03:06:03Z

README.md

 It is a modern [gym](https://github.com/Farama-Foundation/Gymnasium): the environment layer powering RL research.

-ARES treats LLMRequests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.
+ARES treats Open Responses requests as observations and LLMResponses as actions within the environment, so you can focus on training just the LLM - not the Code Agent surrounding it. The interface is entirely async, and supports scaling up to hundreds or thousands of parallel environments easily - check out [example 3](https://github.com/withmartian/ares/tree/main/examples/03_parallel_eval_with_api.py) to run this yourself.


Should give an actual class name - same comment as way above

Jhoysbou

The usage of linguafranca looks good to me

Jhoysbou · 2026-03-30T15:41:38Z

src/ares/llms/open_responses.py

+        _LOGGER.warning("%s warning for %s: %s", context, warning.field, warning.message)
+
+
+def to_jsonable(value: Any) -> _JSONABLE:


LF has a similar built-in function. If we adopt StrEnum, you can actually delete this and rely on LF's implementation.

def _to_jsonable(value: object) -> Any: if is_dataclass(value) and not isinstance(value, type): return asdict(value) if isinstance(value, dict): return value if hasattr(value, "model_dump"): try: return value.model_dump(mode="json") except TypeError: return value.model_dump() raise TypeError(f"unsupported payload type: {type(value)!r}")

This function is used in convert_request, convert_response, and the stream conversation classes. Consequently, you can simplify to_chat_completions_kwargs as well.

mwilliammyers · 2026-04-07T23:44:07Z

@coderabbitai full review

coderabbitai · 2026-04-07T23:44:13Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (3)

README.md (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Replace the legacy LLMResponses name.

LLMResponses reads like a concrete type, but the new action wrapper is ares.llms.InferenceResult. Using the removed name here will send readers toward a symbol that no longer exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 18, Update the README text to stop referencing the removed
symbol `LLMResponses` and instead point readers to the new action wrapper
`ares.llms.InferenceResult`; locate the sentence mentioning `LLMResponses` and
replace that token with `ares.llms.InferenceResult`, and confirm any linked
examples or docs (e.g., example 3) use the new symbol name so readers aren’t
directed to a non-existent type.

CLAUDE.md (1)

284-291: ⚠️ Potential issue | 🟡 Minor

Documentation still references non-existent types.

Despite being marked as addressed in prior commits, lines 285 and 289 still reference TextData and Usage which don't exist in the updated codebase. Based on the PR's migration, these should reference the actual exports:

llms.InferenceResult, llms.extract_text_content, llms.make_response
response.InferenceResult, response.extract_text_content, response.make_response

📝 Proposed fix

 - **External consumers** (examples, docs):
   - ✅ Good: `import ares` → use `ares.make(...)`
   - ✅ Good: `from ares.llms import open_responses` → use `open_responses.make_request(...)`
-  - ✅ Good: `from ares import llms` → use `llms.TextData`, `llms.Usage`
-  - ❌ Avoid: `from ares.llms import OpenResponsesRequest, TextData`
+  - ✅ Good: `from ares import llms` → use `llms.InferenceResult`, `llms.extract_text_content`
+  - ❌ Avoid: `from ares.llms import InferenceResult, extract_text_content`
 - **Internal code**:
   - ✅ Good: `from ares.llms import open_responses` → use `open_responses.make_request(...)`
-  - ✅ Good: `from ares.llms import response` → use `response.TextData`, `response.Usage`
+  - ✅ Good: `from ares.llms import response` → use `response.InferenceResult`, `response.extract_text_content`
   - ❌ Avoid: `from ares.llms.open_responses import Request`
-  - ❌ Avoid: `from ares.llms.response import TextData, Usage`
+  - ❌ Avoid: `from ares.llms.response import InferenceResult, extract_text_content`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 284 - 291, Update the documentation references that
still mention TextData and Usage to use the new exported symbols: replace any
occurrences of llms.TextData and llms.Usage with llms.InferenceResult,
llms.extract_text_content and llms.make_response (as appropriate), and similarly
replace response.TextData and response.Usage with response.InferenceResult,
response.extract_text_content and response.make_response; ensure examples and
import guidance reference llms.InferenceResult / llms.extract_text_content /
llms.make_response (and response.* counterparts) rather than the removed
TextData/Usage types.

src/ares/llms/open_responses_test.py (1)

78-81: ⚠️ Potential issue | 🔴 Critical

CI is failing: ensure_request does not exist.

The test calls open_responses.ensure_request() but this function is not defined in the open_responses module, causing AttributeError in CI. Either implement ensure_request in src/ares/llms/open_responses.py or remove/update this test.

🔧 Option 1: Remove the test if ensure_request is not needed

-def test_ensure_request_accepts_canonical_request():
-    request = open_responses.make_request([open_responses.user_message("Hello")])
-    result = open_responses.ensure_request(request)
-    assert result is request

🔧 Option 2: Add ensure_request to open_responses.py

def ensure_request(request: lft.OpenResponsesRequest) -> lft.OpenResponsesRequest:
    """Return the request as-is (identity function for canonical requests)."""
    return request

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/llms/open_responses_test.py` around lines 78 - 81, The test fails
because open_responses.ensure_request is missing; add an identity helper in the
module (implement ensure_request) with the signature matching the expected type
(e.g., def ensure_request(request: lft.OpenResponsesRequest) ->
lft.OpenResponsesRequest) that simply returns the passed request, so tests
calling open_responses.ensure_request(), open_responses.make_request(), and
open_responses.user_message() will work; place the function in
src/ares/llms/open_responses.py near related helpers and ensure it uses the same
lft/OpenResponsesRequest type import as the module.

🧹 Nitpick comments (1)

src/ares/contrib/transformers_client.py (1)
87-167: Consider adding a parameter for native tool call support.

The flattening approach for function_call and function_call_output items works for models without tool call support, but transformers has first-class tool call support via apply_chat_template for compatible models. The current implementation always flattens tool interactions to plain text, which loses semantic structure for models that could handle it natively.

Consider adding a model_supports_tools: bool parameter to branch between the flattened approach (for simpler models) and the native transformers tool format (for capable models). This could improve performance for models that have been trained with tool-calling capabilities.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/contrib/transformers_client.py` around lines 87 - 167, Add a boolean
toggle so _render_request_to_chat_messages can preserve native tool-call
structures for transformers models that support them: modify
_render_request_to_chat_messages to accept model_supports_tools: bool = False
and, when True, avoid flattening items of type "function_call" and
"function_call_output" into text and instead translate them into the native
tool-call message shapes expected by apply_chat_template (preserving fields like
name, call_id, arguments, output) while still using
_render_content_to_text/_render_value_to_text for any nested content; keep the
existing flattened behavior as the default for backward compatibility and update
callers to pass model_supports_tools when they know the model supports tool
calls.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CONTRIBUTING.md`:
- Around line 115-116: Update the CONTRIBUTING.md example to explicitly pass
token counts to the response factory so readers see the full API signature: when
constructing response.InferenceResult, call response.make_response with
input_tokens and output_tokens (e.g., response.make_response("Hello!",
input_tokens=..., output_tokens=...)) and keep cost=0.0; this clarifies the
parameters used by response.make_response and matches how tests exercise the
API.

In `@examples/utils.py`:
- Around line 33-37: The fallback branch can produce the literal "None" when
observation.instructions is None; update the else branch that sets
observation_content (after open_responses.to_chat_messages) to use a proper
truthy check instead of str(...) — e.g., use observation.instructions if it is
truthy (or use getattr(observation, "instructions", None)) and otherwise set
observation_content to "(no messages)"; reference observation_content, messages,
open_responses.to_chat_messages, and observation.instructions when making the
change.

In `@src/ares/containers/docker.py`:
- Around line 104-106: Create a shared ContainerNotStartedError and use it
instead of RuntimeError where container access is guarded (replace the
RuntimeError in src/ares/containers/docker.py checks around self._container and
the similar guards at the other sites), and update
src/ares/testing/mock_container.py to raise the same ContainerNotStartedError so
tests and real Docker behavior share the contract; add the new exception to the
module/public API where other callers import container classes so callers can
catch ContainerNotStartedError explicitly.

In `@src/ares/presets_test.py`:
- Around line 3-11: The import block in presets_test.py is not sorted per the
project's isort/Ruff rules; reorder the imports (pytest, standard library first
if any, then third-party, then local package imports) so they match
Google-style/isort order and groupings — specifically adjust the lines importing
pytest, linguafranca.types as lft, ares, from ares import presets, registry, and
the ares.llms imports (open_responses, response); run `ruff check --fix` or `uv
run ruff check --fix` to automatically apply the correct ordering and ensure the
import block passes I001.

---

Duplicate comments:
In `@CLAUDE.md`:
- Around line 284-291: Update the documentation references that still mention
TextData and Usage to use the new exported symbols: replace any occurrences of
llms.TextData and llms.Usage with llms.InferenceResult,
llms.extract_text_content and llms.make_response (as appropriate), and similarly
replace response.TextData and response.Usage with response.InferenceResult,
response.extract_text_content and response.make_response; ensure examples and
import guidance reference llms.InferenceResult / llms.extract_text_content /
llms.make_response (and response.* counterparts) rather than the removed
TextData/Usage types.

In `@README.md`:
- Line 18: Update the README text to stop referencing the removed symbol
`LLMResponses` and instead point readers to the new action wrapper
`ares.llms.InferenceResult`; locate the sentence mentioning `LLMResponses` and
replace that token with `ares.llms.InferenceResult`, and confirm any linked
examples or docs (e.g., example 3) use the new symbol name so readers aren’t
directed to a non-existent type.

In `@src/ares/llms/open_responses_test.py`:
- Around line 78-81: The test fails because open_responses.ensure_request is
missing; add an identity helper in the module (implement ensure_request) with
the signature matching the expected type (e.g., def ensure_request(request:
lft.OpenResponsesRequest) -> lft.OpenResponsesRequest) that simply returns the
passed request, so tests calling open_responses.ensure_request(),
open_responses.make_request(), and open_responses.user_message() will work;
place the function in src/ares/llms/open_responses.py near related helpers and
ensure it uses the same lft/OpenResponsesRequest type import as the module.

---

Nitpick comments:
In `@src/ares/contrib/transformers_client.py`:
- Around line 87-167: Add a boolean toggle so _render_request_to_chat_messages
can preserve native tool-call structures for transformers models that support
them: modify _render_request_to_chat_messages to accept model_supports_tools:
bool = False and, when True, avoid flattening items of type "function_call" and
"function_call_output" into text and instead translate them into the native
tool-call message shapes expected by apply_chat_template (preserving fields like
name, call_id, arguments, output) while still using
_render_content_to_text/_render_value_to_text for any nested content; keep the
existing flattened behavior as the default for backward compatibility and update
callers to pass model_supports_tools when they know the model supports tool
calls.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f19fd66b-963b-413f-9d15-cd7bb3c1b385

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and 134ef0f.

📒 Files selected for processing (43)

CLAUDE.md
CONTRIBUTING.md
README.md
docs/source/core-concepts.rst
docs/source/how-it-works.rst
docs/source/index.rst
examples/04_rl_training_with_skyrl.py
examples/05_tinker_train.py
examples/utils.py
pyproject.toml
src/ares/__init__.py
src/ares/code_agents/code_agent_base.py
src/ares/code_agents/mini_swe_agent.py
src/ares/code_agents/terminus2/terminus2_agent.py
src/ares/code_agents/terminus2/terminus2_agent_test.py
src/ares/containers/docker.py
src/ares/contrib/eval_visualizer.py
src/ares/contrib/llama_cpp.py
src/ares/contrib/mech_interp/hooked_transformer_client.py
src/ares/contrib/transformers_client.py
src/ares/contrib/transformers_client_test.py
src/ares/environments/code_env.py
src/ares/environments/twenty_questions.py
src/ares/llms/__init__.py
src/ares/llms/anthropic_converter.py
src/ares/llms/anthropic_converter_test.py
src/ares/llms/chat_completions_compatible.py
src/ares/llms/llm_clients.py
src/ares/llms/open_responses.py
src/ares/llms/open_responses_test.py
src/ares/llms/openai_chat_converter.py
src/ares/llms/openai_chat_converter_test.py
src/ares/llms/openai_responses_converter.py
src/ares/llms/openai_responses_converter_test.py
src/ares/llms/queue_mediated_client.py
src/ares/llms/queue_mediated_client_test.py
src/ares/llms/request.py
src/ares/llms/request_test.py
src/ares/llms/response.py
src/ares/presets.py
src/ares/presets_test.py
src/ares/testing/mock_llm.py
src/ares/testing/mock_llm_test.py

💤 Files with no reviewable changes (8)

src/ares/llms/request.py
src/ares/llms/request_test.py
src/ares/llms/anthropic_converter_test.py
src/ares/llms/openai_responses_converter_test.py
src/ares/llms/openai_chat_converter_test.py
src/ares/llms/openai_responses_converter.py
src/ares/llms/openai_chat_converter.py
src/ares/llms/anthropic_converter.py

coderabbitai · 2026-04-07T23:57:17Z

CONTRIBUTING.md

+req = open_responses.make_request([open_responses.user_message("Hello")])
+resp = response.InferenceResult(response=response.make_response("Hello!"), cost=0.0)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the signature of make_response to see if tokens are required ast-grep --pattern $'def make_response($$$) -> $_: $$$'

Repository: withmartian/ares

Length of output: 3038

🏁 Script executed:

#!/bin/bash # First, let's examine the actual CONTRIBUTING.md at lines 115-116 head -120 CONTRIBUTING.md | tail -10

Repository: withmartian/ares

Length of output: 423

🏁 Script executed:

#!/bin/bash # Search for make_response usage patterns in the codebase to see how it's typically called rg "make_response\(" --max-count 20 -B 1 -A 1

Repository: withmartian/ares

Length of output: 3674

Consider explicitly showing token parameters in the documentation example.

While response.make_response("Hello!") is syntactically valid (both input_tokens and output_tokens default to 0), the example in CONTRIBUTING.md would be more instructive if it explicitly shows these parameters, consistent with how tests use the API:

resp = response.InferenceResult( response=response.make_response("Hello!", input_tokens=10, output_tokens=5), cost=0.0 )

This makes the full API signature clearer for readers learning the framework.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CONTRIBUTING.md` around lines 115 - 116, Update the CONTRIBUTING.md example to explicitly pass token counts to the response factory so readers see the full API signature: when constructing response.InferenceResult, call response.make_response with input_tokens and output_tokens (e.g., response.make_response("Hello!", input_tokens=..., output_tokens=...)) and keep cost=0.0; this clarifies the parameters used by response.make_response and matches how tests exercise the API.

coderabbitai · 2026-04-07T23:57:17Z

examples/utils.py

+        messages = open_responses.to_chat_messages(observation, strict=False)
        if len(messages) > 0:
            observation_content = messages[-1].get("content", "")
        else:
-            observation_content = str(observation.system_prompt) or "(no messages)"
+            observation_content = str(observation.instructions) or "(no messages)"


⚠️ Potential issue | 🟡 Minor

Potential "None" string if instructions is absent.

When messages is empty and observation.instructions is None, str(observation.instructions) evaluates to the literal string "None" rather than falling back to "(no messages)".

🛡️ Proposed fix

if len(messages) > 0: observation_content = messages[-1].get("content", "") else: - observation_content = str(observation.instructions) or "(no messages)" + observation_content = observation.instructions or "(no messages)"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/utils.py` around lines 33 - 37, The fallback branch can produce the literal "None" when observation.instructions is None; update the else branch that sets observation_content (after open_responses.to_chat_messages) to use a proper truthy check instead of str(...) — e.g., use observation.instructions if it is truthy (or use getattr(observation, "instructions", None)) and otherwise set observation_content to "(no messages)"; reference observation_content, messages, open_responses.to_chat_messages, and observation.instructions when making the change.

coderabbitai · 2026-04-07T23:57:17Z

src/ares/containers/docker.py

+        if self._container is None:
+            raise RuntimeError("Container not started. Call start() first.")
+


⚠️ Potential issue | 🟠 Major

Make the new pre-start failure part of the shared container contract.

src/ares/testing/mock_container.py:45-76 still allows these calls before start(), so tests can stay green while real Docker now raises. If this precondition is intentional, mirror it in the mock and expose a shared ContainerNotStartedError instead of a bare RuntimeError so callers can handle it consistently.

As per coding guidelines: src/**/*.py: Implement custom exception hierarchies to distinguish terminating vs non-terminating errors.

Also applies to: 135-136, 165-166

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ares/containers/docker.py` around lines 104 - 106, Create a shared ContainerNotStartedError and use it instead of RuntimeError where container access is guarded (replace the RuntimeError in src/ares/containers/docker.py checks around self._container and the similar guards at the other sites), and update src/ares/testing/mock_container.py to raise the same ContainerNotStartedError so tests and real Docker behavior share the contract; add the new exception to the module/public API where other callers import container classes so callers can catch ContainerNotStartedError explicitly.

coderabbitai · 2026-04-07T23:57:17Z

src/ares/presets_test.py

+import pytest
+
+from linguafranca import types as lft
+
+import ares
+from ares import presets
+from ares import registry
+from ares.llms import open_responses
+from ares.llms import response


⚠️ Potential issue | 🟡 Minor

Fix import sorting to pass ruff check.

The CI pipeline is failing with I001: import block is un-sorted or un-formatted. Rearrange imports to comply with Google-style isort configuration.

🔧 Proposed fix

-import pytest - -from linguafranca import types as lft - import ares from ares import presets from ares import registry from ares.llms import open_responses from ares.llms import response +from linguafranca import types as lft +import pytest

Note: The exact ordering depends on your ruff/isort configuration. Run uv run ruff check --fix to auto-correct.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import pytest

from linguafranca import types as lft

import ares

from ares import presets

from ares import registry

from ares.llms import open_responses

from ares.llms import response

import pytest

from linguafranca import types as lft

import ares

from ares import presets

from ares import registry

from ares.llms import open_responses

from ares.llms import response

🧰 Tools

🪛 GitHub Actions: ruff

[error] 3-11: ruff check failed: import block is un-sorted or un-formatted. Rule: I001. Help: Organize imports.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ares/presets_test.py` around lines 3 - 11, The import block in presets_test.py is not sorted per the project's isort/Ruff rules; reorder the imports (pytest, standard library first if any, then third-party, then local package imports) so they match Google-style/isort order and groupings — specifically adjust the lines importing pytest, linguafranca.types as lft, ares, from ares import presets, registry, and the ares.llms imports (open_responses, response); run `ruff check --fix` or `uv run ruff check --fix` to automatically apply the correct ordering and ensure the import block passes I001.

joshgreaves added 2 commits March 16, 2026 15:01

andrewtran117 reviewed Mar 25, 2026

View reviewed changes

andrewtran117 approved these changes Mar 25, 2026

View reviewed changes

joshgreaves and others added 9 commits March 26, 2026 13:57

Add custom Open Responses rendering and top_p support to transformers…

dd442ae

… client

Code review fixes: document converter overrides, fix renderer inconsi…

475d4f2

…stency, tighten public API

Fix ruff issues

b4216b6

Update docs

b7c3771

Update linguafranca and imported types

2ade4ed

Remove smoke test example

3f3483b

joshgreaves commented Mar 27, 2026

View reviewed changes

joshgreaves added 2 commits March 27, 2026 15:33

Update docstrings

ffd22fd

More cleanups

134ef0f

joshgreaves requested a review from rsmith49 March 27, 2026 23:14

joshgreaves marked this pull request as ready for review March 27, 2026 23:14

baz-reviewer bot reviewed Mar 27, 2026

View reviewed changes

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

rsmith49 approved these changes Mar 29, 2026

View reviewed changes

Jhoysbou approved these changes Mar 30, 2026

View reviewed changes

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes


		_LOGGER = logging.getLogger(__name__)

		MODEL_STUB = "__ARES_MODEL_UNSET__"

		pass


		def _render_content_to_text(content: object, *, context: str) -> str:

		@@ -1,383 +0,0 @@
		"""Converter for Anthropic Messages API format.

		@@ -1,38 +1,99 @@
		"""LLM response model."""
		"""LLM response model wrapping linguafranca types."""

		_LOGGER.warning("%s warning for %s: %s", context, warning.field, warning.message)


		def to_jsonable(value: Any) -> _JSONABLE:

		req = open_responses.make_request([open_responses.user_message("Hello")])
		resp = response.InferenceResult(response=response.make_response("Hello!"), cost=0.0)

		if self._container is None:
		raise RuntimeError("Container not started. Call start() first.")

Conversation

joshgreaves commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Testing

Generated description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewtran117 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

rsmith49 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshgreaves commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading