Add EnvironmentCheckpoint and CheckpointableEnvironment protocol by joshgreaves · Pull Request #96 · withmartian/ares

joshgreaves · 2026-03-13T21:02:43Z

User description

Summary

Adds EnvironmentCheckpoint[EnvType] -- a generic checkpoint object that holds closures for restore and release. restore() creates a brand new, independent environment instance at the checkpointed state.
Adds CheckpointableEnvironment protocol extending Environment with a single checkpoint() method returning EnvironmentCheckpoint[Self]
Adds _CheckpointJanitor for atexit cleanup of unreleased checkpoints (mirrors the existing _Janitor pattern in code_env.py)
Exports CheckpointableEnvironment and EnvironmentCheckpoint from the top-level ares package
Fixes pre-existing duplicate preset registration bug in presets.py

Design

The environment protocol gains one method: checkpoint(). Everything else lives on the checkpoint object:

checkpoint = await env.checkpoint()       # capture state
env, ts = await checkpoint.restore()      # create new env at that state
await checkpoint.release()                # free resources (Docker images, etc.)

The checkpoint takes closures from the environment -- no internal protocol methods are imposed. The environment is free to implement restore/release however it wants. See local/analyses/go-explore/04-go-explore-sketch.py for the full design rationale.

Context

Part of Go-Explore support (Wave 1). This PR is independent of the other two Wave 1 PRs and can be reviewed/merged in parallel:

PR: SnapshotableContainer protocol (go-explore/snapshotable-container)
PR: Agent state serialization (go-explore/agent-state-serialization)

Tests

11 new tests covering restore, independent env creation, release idempotency, restore-after-release errors, and janitor register/unregister/cleanup/exception handling.

Generated description

Below is a concise technical summary of the changes proposed in this PR:
Introduce EnvironmentCheckpoint to capture restorable snapshots and extend CheckpointableEnvironment so environments can expose checkpoint/restore/release flows along with a _CheckpointJanitor for cleanup; export these new APIs at the package level. Add focused tests covering restore, release, idempotency, and janitor cleanup to ensure the new checkpoint semantics work and register with the existing infrastructure.

Topic Details

Checkpoint Flow

Describe how EnvironmentCheckpoint, CheckpointableEnvironment, and _CheckpointJanitor capture state, restore new environments, release resources, and are covered by new async tests for checkpoint behavior and cleanup.

Modified files (3)

src/ares/__init__.py
src/ares/environments/base.py
src/ares/environments/checkpoint_test.py

Latest Contributors(2)

User	Commit	Date
Narmeen07	Add-mechanistic-interp...	February 19, 2026
joshua.greaves@gmail.com	Massively-simplify-the...	January 29, 2026

Preset Dedup

Prevent duplicate Harbor dataset presets by tracking seen IDs before registering each combination of dataset and agent.

Modified files (1)

src/ares/presets.py

Latest Contributors(2)

User	Commit	Date
Narmeen07	Add-mechanistic-interp...	February 19, 2026
ryan@withmartian.com	Removed-repeated-indiv...	January 29, 2026

This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

New Features
- Environment checkpointing: snapshot and restore independent environment copies, explicit release/is-released semantics, and automatic cleanup of unreleased checkpoints at program exit; checkpointing types now accessible from the package root.
Chores
- Deduplicated dataset registrations during preset initialization to avoid repeated registrations.
Tests
- Added tests covering checkpoint restore/release semantics and janitor cleanup behavior.

coderabbitai · 2026-03-13T21:02:54Z

📝 Walkthrough

Walkthrough

Adds EnvironmentCheckpoint and CheckpointableEnvironment types, a _CheckpointJanitor with atexit cleanup, exposes the new types at package root, adds tests for checkpoint lifecycle and janitor behavior, and prevents duplicate Harbor dataset registrations in presets.

Changes

Cohort / File(s)	Summary
Package exports `src/ares/__init__.py`	Exports `CheckpointableEnvironment` and `EnvironmentCheckpoint` from the environments package at the top-level `ares` package.
Checkpointing core `src/ares/environments/base.py`	Adds `EnvironmentCheckpoint[EnvType]` (restore/release/is_released, step_count, timestep), `CheckpointableEnvironment[...]` protocol with `checkpoint()`, and `_CheckpointJanitor` with global `_CHECKPOINT_JANITOR` and atexit `_cleanup`.
Checkpointing tests `src/ares/environments/checkpoint_test.py`	New pytest suite verifying restore semantics, independence of restored envs, release idempotency and error-on-restore-after-release, plus janitor register/unregister and cleanup behavior (including exception suppression).
Presets deduplication `src/ares/presets.py`	Avoids duplicate Harbor dataset registration by computing `ds_id` once per dataset and tracking seen IDs to skip repeats; logs debug when skipping.

Sequence Diagram

sequenceDiagram
    participant User
    participant Env as Environment
    participant CP as EnvironmentCheckpoint
    participant Janitor as _CheckpointJanitor

    User->>Env: await checkpoint()
    Env->>CP: create(restore_fn, release_fn, timestep, step_count)
    CP->>Janitor: register(self)

    User->>CP: await restore()
    CP->>CP: call restore_fn -> new_env
    CP-->>User: return (new_env, timestep)

    User->>CP: await release()
    CP->>CP: call release_fn (optional)
    CP->>Janitor: unregister(self)

    Note over Janitor: at exit -> _cleanup()
    Janitor->>CP: call release_fn_sync for remaining checkpoints

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled a snapshot, tucked it tight,
A checkpoint saved for a rainy night.
Restore brings new clover, fresh and clear,
Release rings the bell — janitor near. 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.41% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main additions: EnvironmentCheckpoint and CheckpointableEnvironment protocol, matching the primary changes in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch go-explore/checkpoint-protocol

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

src/ares/environments/checkpoint_test.py (1)

25-33: Consider releasing checkpoints in tests to avoid polluting the global janitor.

Each EnvironmentCheckpoint created in tests auto-registers with _CHECKPOINT_JANITOR. While this doesn't cause test failures (entries are cleared at interpreter exit), it could affect test isolation if the janitor's state is inspected or if tests run in the same process accumulate many entries.
📝 Example cleanup pattern
checkpoint = base.EnvironmentCheckpoint(...)
try:
    env, ts = await checkpoint.restore()
    assert env is mock_env
    assert ts is simple_timestep
finally:
    await checkpoint.release()
Alternatively, use a pytest fixture that auto-releases checkpoints after tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/checkpoint_test.py` around lines 25 - 33, The test
creates an EnvironmentCheckpoint which auto-registers with _CHECKPOINT_JANITOR;
update the test to explicitly release that checkpoint to avoid polluting global
janitor state by ensuring you call await checkpoint.release() after using
checkpoint.restore(). Wrap the restore/assert block in a try/finally (or use a
pytest fixture) so that await checkpoint.release() runs even on assertion
failures; reference the EnvironmentCheckpoint instance named checkpoint and its
restore() and release() methods.

src/ares/environments/base.py (2)

270-271: Minor: Consider using EnvironmentCheckpoint[Any] for the type annotation.

The bare EnvironmentCheckpoint without a type parameter will work at runtime but loses type information. Using EnvironmentCheckpoint[Any] or EnvironmentCheckpoint[object] would be more explicit.

📝 Suggested annotation

+from typing import Any
+
 class _CheckpointJanitor:
     ...
     def __init__(self):
-        self._checkpoints: dict[int, EnvironmentCheckpoint] = {}
+        self._checkpoints: dict[int, EnvironmentCheckpoint[Any]] = {}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 270 - 271, The type annotation
for the _checkpoints attribute in __init__ should be made explicit by using a
parameterized EnvironmentCheckpoint to preserve generic type info; change the
annotation from dict[int, EnvironmentCheckpoint] to dict[int,
EnvironmentCheckpoint[Any]] (or EnvironmentCheckpoint[object]) and add the
necessary typing import (Any) so the _checkpoints: dict[int,
EnvironmentCheckpoint[Any]] declaration in __init__ reflects the generic
parameter.

215-227: Consider ensuring unregister is called even if release_fn raises.

If _release_fn() raises an exception, the checkpoint is marked as released (_released = True) but unregister() is never called. This leaves a stale entry in the janitor that:

Won't be cleaned up at exit (since _release_fn_sync may not match the failed state)
Holds a reference to the checkpoint preventing garbage collection

🛠️ Proposed fix using try/finally

     async def release(self) -> None:
         """Release resources held by this checkpoint.

         For environments backed by containers, this deletes the snapshot image.
         For simple environments, this is a no-op. Safe to call multiple times.
         After release(), restore() will raise RuntimeError.
         """
         if self._released:
             return
         self._released = True
-        if self._release_fn is not None:
-            await self._release_fn()
-        _CHECKPOINT_JANITOR.unregister(self)
+        try:
+            if self._release_fn is not None:
+                await self._release_fn()
+        finally:
+            _CHECKPOINT_JANITOR.unregister(self)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 215 - 227, The release() method
must always call _CHECKPOINT_JANITOR.unregister(self) even if await
self._release_fn() raises; wrap the await in a try/finally so unregister is
executed in the finally block, and re-raise any exception from _release_fn so
behavior stays the same. Concretely, keep setting self._released = True, then if
self._release_fn is not None call it inside try: await self._release_fn()
finally: _CHECKPOINT_JANITOR.unregister(self); if there is no _release_fn still
call _CHECKPOINT_JANITOR.unregister(self) as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/environments/base.py`:
- Around line 270-271: The type annotation for the _checkpoints attribute in
__init__ should be made explicit by using a parameterized EnvironmentCheckpoint
to preserve generic type info; change the annotation from dict[int,
EnvironmentCheckpoint] to dict[int, EnvironmentCheckpoint[Any]] (or
EnvironmentCheckpoint[object]) and add the necessary typing import (Any) so the
_checkpoints: dict[int, EnvironmentCheckpoint[Any]] declaration in __init__
reflects the generic parameter.
- Around line 215-227: The release() method must always call
_CHECKPOINT_JANITOR.unregister(self) even if await self._release_fn() raises;
wrap the await in a try/finally so unregister is executed in the finally block,
and re-raise any exception from _release_fn so behavior stays the same.
Concretely, keep setting self._released = True, then if self._release_fn is not
None call it inside try: await self._release_fn() finally:
_CHECKPOINT_JANITOR.unregister(self); if there is no _release_fn still call
_CHECKPOINT_JANITOR.unregister(self) as before.

In `@src/ares/environments/checkpoint_test.py`:
- Around line 25-33: The test creates an EnvironmentCheckpoint which
auto-registers with _CHECKPOINT_JANITOR; update the test to explicitly release
that checkpoint to avoid polluting global janitor state by ensuring you call
await checkpoint.release() after using checkpoint.restore(). Wrap the
restore/assert block in a try/finally (or use a pytest fixture) so that await
checkpoint.release() runs even on assertion failures; reference the
EnvironmentCheckpoint instance named checkpoint and its restore() and release()
methods.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1614895-b5bf-48fe-833f-0269681b9e56

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and f142329.

📒 Files selected for processing (4)

src/ares/__init__.py
src/ares/environments/base.py
src/ares/environments/checkpoint_test.py
src/ares/presets.py

baz-reviewer · 2026-03-17T16:40:06Z

src/ares/environments/base.py

+        restore_fn: Callable[[], Awaitable[EnvType]],
+        release_fn: Callable[[], Awaitable[None]] | None = None,
+        release_fn_sync: Callable[[], None] | None = None,
+        step_count: int,
+        timestep: TimeStep,
+    ):
+        self._restore_fn = restore_fn
+        self._release_fn = release_fn
+        self._release_fn_sync = release_fn_sync
+        self.step_count = step_count
+        self.timestep = timestep
+        self._released = False


The atexit janitor only calls checkpoint._release_fn_sync and never awaits the async release_fn, so async-only cleanup can be skipped on exit — should we run the async release_fn in a temporary event loop or provide a sync fallback?

_{Finding type: Logical Bugs | Severity: 🔴 High}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

In src/ares/environments/base.py around lines 181 to 226, the _CheckpointJanitor._cleanup currently only calls checkpoint._release_fn_sync and therefore skips async _release_fn, leaving resources unreleased at process exit. Modify _cleanup so that for each checkpoint: if _release_fn_sync is present, call it as before; otherwise if _release_fn is present, run it synchronously by creating a temporary asyncio event loop and calling loop.run_until_complete(_release_fn()), catching and logging exceptions. If creating/running a loop fails because an event loop is already running, fall back to running the coroutine in a separate Thread with its own event loop and join it, also catching and logging exceptions. Keep the existing warning logging and ensure the _checkpoints dict is cleared afterward.

baz-reviewer · 2026-03-17T16:40:06Z

src/ares/environments/base.py

+    async def release(self) -> None:
+        """Release resources held by this checkpoint.
+
+        For environments backed by containers, this deletes the snapshot image.
+        For simple environments, this is a no-op. Safe to call multiple times.
+        After release(), restore() will raise RuntimeError.
+        """
+        if self._released:
+            return
+        self._released = True
+        if self._release_fn is not None:
+            await self._release_fn()


release() flips _released to True before awaiting _release_fn so failures leave the checkpoint marked and prevent retries, should we postpone marking _released and unregistering the janitor until after _release_fn succeeds?

self._released = True; await self._release_fn() => await self._release_fn(); self._released = True

_{Finding type: Logical Bugs | Severity: 🔴 High}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

In src/ares/environments/base.py around lines 215-228, the release() method currently sets self._released = True before awaiting self._release_fn, which causes the checkpoint to be marked released and unregistered incorrectly if the async release raises. Change release() so that it (1) does not set self._released until after await self._release_fn() completes successfully, (2) only calls _CHECKPOINT_JANITOR.unregister(self) after a successful release, and (3) keeps the checkpoint registered and _released False if the release raises (propagate the exception so callers can retry). Also preserve the no-op path: if _release_fn is None, still set _released = True and unregister. Ensure no broad exception swallowing; let failures surface so cleanup can be retried.

baz-reviewer · 2026-03-17T16:40:06Z

src/ares/environments/base.py

+        if self._checkpoints:
+            _LOGGER.info("Cleaning up %d unreleased checkpoints at exit.", len(self._checkpoints))
+        for cp in list(self._checkpoints.values()):
+            if cp._release_fn_sync is not None:
+                try:
+                    cp._release_fn_sync()


Cleanup logs lack checkpoint IDs despite CLAUDE.md requiring object IDs for async lifecycle events — should we include id(cp) in those log lines?

_{Finding type: AI Coding Guidelines | Severity: 🟢 Low}

Want Baz to fix this for you? Activate Fixer

Other fix methods

Prompt for AI Agents:

In src/ares/environments/base.py around lines 284 to 289, the _CheckpointJanitor._cleanup method currently logs only the count and a generic warning but does not include checkpoint identifiers. Update the info log at line ~285 to include the IDs of the registered checkpoints (e.g., build a list of ids from self._checkpoints values and log it alongside the count), and change the warning at line ~291 to include the specific checkpoint id (id(cp)) when a release fails while keeping exc_info=True. Keep formatting consistent with existing _LOGGER calls and avoid exposing sensitive data beyond Python object ids.

gowthamkishore3 · 2026-03-17T16:56:26Z

@coderabbitai full review

gowthamkishore3 · 2026-03-17T17:24:45Z

@coderabbitai full review

coderabbitai · 2026-03-17T17:24:53Z

✅ Actions performed

Full review triggered.

coderabbitai

🧹 Nitpick comments (2)

src/ares/environments/base.py (2)
270-271: Consider thread safety for _checkpoints dict.

If checkpoints are created or released concurrently from multiple threads, the _checkpoints dict could face race conditions. While the GIL provides some protection and typical async usage is single-threaded, consider using a threading.Lock if multi-threaded checkpoint management is anticipated.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 270 - 271, The _checkpoints dict
initialized in __init__ can be mutated from multiple threads; add a
threading.Lock (e.g., self._checkpoints_lock = threading.RLock()) in __init__
and use it to guard all accesses and mutations of self._checkpoints (wrap
reads/writes in a with self._checkpoints_lock: block) inside methods that
create, release, or read checkpoints (e.g., create_checkpoint,
release_checkpoint, get_checkpoint or any other method referencing _checkpoints)
to prevent race conditions; ensure the lock is used consistently and prefer
RLock if reentrant locking may occur.
222-227: Exception in release_fn leaves checkpoint registered with janitor.

If release_fn raises an exception, the checkpoint is marked as released (_released = True at line 224) but unregister() at line 227 is never called. This leaves a stale entry in the janitor, though it's benign since _cleanup() only calls release_fn_sync (not release_fn).

Consider whether to wrap the release_fn call in try/finally to ensure unregistration:
♻️ Optional: Ensure unregistration even if release_fn fails
     async def release(self) -> None:
         if self._released:
             return
         self._released = True
-        if self._release_fn is not None:
-            await self._release_fn()
-        _CHECKPOINT_JANITOR.unregister(self)
+        try:
+            if self._release_fn is not None:
+                await self._release_fn()
+        finally:
+            _CHECKPOINT_JANITOR.unregister(self)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 222 - 227, The release path
should always unregister the checkpoint from _CHECKPOINT_JANITOR even if the
async release function raises: in the method containing self._released, wrap the
await self._release_fn() call in a try/finally (or equivalent) so that
_CHECKPOINT_JANITOR.unregister(self) is executed in the finally block; keep
setting self._released = True as appropriate (before or after calling
_release_fn per desired semantics) but ensure unregister is unconditional to
avoid leaving stale janitor entries (refer to the attributes _released,
_release_fn and the janitor call _CHECKPOINT_JANITOR.unregister(self)).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/environments/base.py`:
- Around line 270-271: The _checkpoints dict initialized in __init__ can be
mutated from multiple threads; add a threading.Lock (e.g.,
self._checkpoints_lock = threading.RLock()) in __init__ and use it to guard all
accesses and mutations of self._checkpoints (wrap reads/writes in a with
self._checkpoints_lock: block) inside methods that create, release, or read
checkpoints (e.g., create_checkpoint, release_checkpoint, get_checkpoint or any
other method referencing _checkpoints) to prevent race conditions; ensure the
lock is used consistently and prefer RLock if reentrant locking may occur.
- Around line 222-227: The release path should always unregister the checkpoint
from _CHECKPOINT_JANITOR even if the async release function raises: in the
method containing self._released, wrap the await self._release_fn() call in a
try/finally (or equivalent) so that _CHECKPOINT_JANITOR.unregister(self) is
executed in the finally block; keep setting self._released = True as appropriate
(before or after calling _release_fn per desired semantics) but ensure
unregister is unconditional to avoid leaving stale janitor entries (refer to the
attributes _released, _release_fn and the janitor call
_CHECKPOINT_JANITOR.unregister(self)).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ecd827c3-9bdb-46dc-a578-6cd520f3d0a7

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and f142329.

📒 Files selected for processing (4)

src/ares/__init__.py
src/ares/environments/base.py
src/ares/environments/checkpoint_test.py
src/ares/presets.py

mwilliammyers · 2026-04-07T23:50:53Z

@coderabbitai full review

coderabbitai · 2026-04-07T23:51:00Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/ares/environments/base.py (1)

282-293: Keep checkpoint state consistent after janitor cleanup.

_cleanup() can run release_fn_sync, but it does not flip _released. After manual cleanup, a checkpoint may still allow restore() even though underlying resources were already cleaned.

Proposed fix

     def _cleanup(self) -> None:
         """Clean up all registered checkpoints at exit."""
@@
         for cp in list(self._checkpoints.values()):
             if cp._release_fn_sync is not None:
                 try:
                     cp._release_fn_sync()
                 except Exception:
                     _LOGGER.warning("Failed to clean up checkpoint.", exc_info=True)
+            cp._released = True
         self._checkpoints.clear()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 282 - 293, _cleanup currently
calls each checkpoint's cp._release_fn_sync but never flips cp._released,
leaving checkpoints usable via restore() after resources were cleaned; update
_cleanup to set cp._released = True for each checkpoint after invoking
cp._release_fn_sync (do this in a finally block so the flag is set even if the
release function raises) and keep the existing logging/exception handling around
cp._release_fn_sync and the subsequent self._checkpoints.clear() call.

src/ares/environments/checkpoint_test.py (1)

56-117: Add a regression test for release_fn failure semantics.

There is no test asserting state after release_fn raises. Adding one will lock lifecycle behavior and catch release-ordering regressions.

Suggested test case

 class TestEnvironmentCheckpoint:
@@
     async def test_release_is_idempotent(self, simple_timestep):
@@
         assert call_count == 1
+
+    `@pytest.mark.asyncio`
+    async def test_release_failure_does_not_mark_released(self, simple_timestep):
+        async def restore_fn():
+            return object()
+
+        async def release_fn():
+            raise RuntimeError("release failed")
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            release_fn=release_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        with pytest.raises(RuntimeError, match="release failed"):
+            await checkpoint.release()
+
+        assert not checkpoint.is_released

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/checkpoint_test.py` around lines 56 - 117, Add an async
test (e.g., test_release_when_release_fn_raises) that uses EnvironmentCheckpoint
with a release_fn that raises an exception; await checkpoint.release() expecting
that exception to propagate, then assert checkpoint.is_released is True, assert
await checkpoint.restore() raises RuntimeError (or raises when called) and
assert calling checkpoint.release() again does not re-run the failing release_fn
or raise (idempotence). Reference EnvironmentCheckpoint, release(), release_fn,
restore(), and is_released when adding the test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ares/environments/base.py`:
- Around line 222-227: The method currently sets self._released before awaiting
the potentially-failing async cleanup, which prevents retries if release fails;
change the order so you await self._release_fn() first (if _release_fn is not
None), only set self._released = True and call
_CHECKPOINT_JANITOR.unregister(self) after the await completes successfully, and
let exceptions propagate (or re-raise) so the checkpoint remains not released on
failure; update the release() method accordingly to move the _released
assignment and unregister call to after the awaited _release_fn() completes.

In `@src/ares/environments/checkpoint_test.py`:
- Around line 25-229: Tests leak state because EnvironmentCheckpoint
auto-registers with the module-level janitor; add an autouse pytest fixture that
resets the global janitor before each test by creating a fresh
base._CheckpointJanitor() and assigning it to the module-level janitor used by
EnvironmentCheckpoint (then restore/cleanup after the test) so each test runs
with an empty janitor and no shared _checkpoints; reference
EnvironmentCheckpoint and _CheckpointJanitor to locate where to wire the
fixture.

---

Nitpick comments:
In `@src/ares/environments/base.py`:
- Around line 282-293: _cleanup currently calls each checkpoint's
cp._release_fn_sync but never flips cp._released, leaving checkpoints usable via
restore() after resources were cleaned; update _cleanup to set cp._released =
True for each checkpoint after invoking cp._release_fn_sync (do this in a
finally block so the flag is set even if the release function raises) and keep
the existing logging/exception handling around cp._release_fn_sync and the
subsequent self._checkpoints.clear() call.

In `@src/ares/environments/checkpoint_test.py`:
- Around line 56-117: Add an async test (e.g.,
test_release_when_release_fn_raises) that uses EnvironmentCheckpoint with a
release_fn that raises an exception; await checkpoint.release() expecting that
exception to propagate, then assert checkpoint.is_released is True, assert await
checkpoint.restore() raises RuntimeError (or raises when called) and assert
calling checkpoint.release() again does not re-run the failing release_fn or
raise (idempotence). Reference EnvironmentCheckpoint, release(), release_fn,
restore(), and is_released when adding the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9be1ea3-1ed9-483f-9488-417827581c61

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and f142329.

📒 Files selected for processing (4)

src/ares/__init__.py
src/ares/environments/base.py
src/ares/environments/checkpoint_test.py
src/ares/presets.py

coderabbitai · 2026-04-07T23:55:01Z

src/ares/environments/base.py

+        if self._released:
+            return
+        self._released = True
+        if self._release_fn is not None:
+            await self._release_fn()
+        _CHECKPOINT_JANITOR.unregister(self)


⚠️ Potential issue | 🟠 Major

Do not mark checkpoints released before async release succeeds.

On Line 224, _released is set before awaiting release_fn (Line 226). If release_fn raises, the checkpoint becomes permanently “released” even though cleanup failed, so release() cannot be retried and restore() is blocked.

Proposed fix

async def release(self) -> None: """Release resources held by this checkpoint. @@ if self._released: return - self._released = True - if self._release_fn is not None: - await self._release_fn() - _CHECKPOINT_JANITOR.unregister(self) + if self._release_fn is not None: + await self._release_fn() + self._released = True + _CHECKPOINT_JANITOR.unregister(self)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self._released:

return

self._released = True

if self._release_fn is not None:

await self._release_fn()

_CHECKPOINT_JANITOR.unregister(self)

if self._released:

return

if self._release_fn is not None:

await self._release_fn()

self._released = True

_CHECKPOINT_JANITOR.unregister(self)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/base.py` around lines 222 - 227, The method currently sets self._released before awaiting the potentially-failing async cleanup, which prevents retries if release fails; change the order so you await self._release_fn() first (if _release_fn is not None), only set self._released = True and call _CHECKPOINT_JANITOR.unregister(self) after the await completes successfully, and let exceptions propagate (or re-raise) so the checkpoint remains not released on failure; update the release() method accordingly to move the _released assignment and unregister call to after the awaited _release_fn() completes.

coderabbitai · 2026-04-07T23:55:01Z

src/ares/environments/checkpoint_test.py

+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=5,
+            timestep=simple_timestep,
+        )
+
+        env, ts = await checkpoint.restore()
+        assert env is mock_env
+        assert ts is simple_timestep
+
+    @pytest.mark.asyncio
+    async def test_restore_creates_independent_envs(self, simple_timestep):
+        """Test that multiple restores create independent environments."""
+        call_count = 0
+
+        async def restore_fn():
+            nonlocal call_count
+            call_count += 1
+            return {"id": call_count}
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        env_a, _ = await checkpoint.restore()
+        env_b, _ = await checkpoint.restore()
+        assert env_a != env_b
+        assert call_count == 2
+
+    @pytest.mark.asyncio
+    async def test_release_calls_release_fn(self, simple_timestep):
+        """Test that release() calls the release function."""
+        released = False
+
+        async def release_fn():
+            nonlocal released
+            released = True
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            release_fn=release_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        await checkpoint.release()
+        assert released
+        assert checkpoint.is_released
+
+    @pytest.mark.asyncio
+    async def test_release_is_idempotent(self, simple_timestep):
+        """Test that release() can be called multiple times safely."""
+        call_count = 0
+
+        async def release_fn():
+            nonlocal call_count
+            call_count += 1
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            release_fn=release_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        await checkpoint.release()
+        await checkpoint.release()
+        assert call_count == 1
+
+    @pytest.mark.asyncio
+    async def test_restore_after_release_raises(self, simple_timestep):
+        """Test that restore() raises after release()."""
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        await checkpoint.release()
+        with pytest.raises(RuntimeError, match="released"):
+            await checkpoint.restore()
+
+    @pytest.mark.asyncio
+    async def test_no_release_fn_is_ok(self, simple_timestep):
+        """Test that checkpoints without release_fn work fine."""
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        # Release should be a no-op
+        await checkpoint.release()
+        assert checkpoint.is_released
+
+    def test_step_count_accessible(self, simple_timestep):
+        """Test that step_count is publicly accessible."""
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=42,
+            timestep=simple_timestep,
+        )
+
+        assert checkpoint.step_count == 42
+
+    def test_timestep_accessible(self, simple_timestep):
+        """Test that timestep is publicly accessible."""
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        assert checkpoint.timestep is simple_timestep
+
+
+class TestCheckpointJanitor:
+    """Tests for _CheckpointJanitor."""
+
+    def test_register_and_unregister(self, simple_timestep):
+        """Test that register/unregister tracks checkpoints."""
+        janitor = base._CheckpointJanitor()
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        # Checkpoint auto-registers with the global janitor, but
+        # we test the local janitor instance here.
+        janitor.register(checkpoint)
+        assert id(checkpoint) in janitor._checkpoints
+
+        janitor.unregister(checkpoint)
+        assert id(checkpoint) not in janitor._checkpoints
+
+    def test_cleanup_calls_sync_release(self, simple_timestep):
+        """Test that janitor cleanup calls release_fn_sync."""
+        janitor = base._CheckpointJanitor()
+        released = False
+
+        def release_sync():
+            nonlocal released
+            released = True
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            release_fn_sync=release_sync,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        janitor.register(checkpoint)
+        janitor._cleanup()
+
+        assert released
+        assert len(janitor._checkpoints) == 0
+
+    def test_cleanup_handles_exceptions(self, simple_timestep):
+        """Test that janitor cleanup handles exceptions gracefully."""
+        janitor = base._CheckpointJanitor()
+
+        def release_sync():
+            raise RuntimeError("cleanup failed")
+
+        async def restore_fn():
+            return object()
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            release_fn_sync=release_sync,
+            step_count=0,
+            timestep=simple_timestep,
+        )


⚠️ Potential issue | 🟡 Minor

Tests currently leak checkpoint state into the global janitor.

Each EnvironmentCheckpoint auto-registers globally, and many tests leave those entries behind. That introduces shared mutable state across test cases.

Proposed fixture to isolate janitor state

import pytest from ares.environments import base +@pytest.fixture(autouse=True) +def _isolate_global_checkpoint_janitor(): + base._CHECKPOINT_JANITOR._checkpoints.clear() + yield + base._CHECKPOINT_JANITOR._checkpoints.clear() + `@pytest.fixture` def simple_timestep() -> base.TimeStep:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ares/environments/checkpoint_test.py` around lines 25 - 229, Tests leak state because EnvironmentCheckpoint auto-registers with the module-level janitor; add an autouse pytest fixture that resets the global janitor before each test by creating a fresh base._CheckpointJanitor() and assigning it to the module-level janitor used by EnvironmentCheckpoint (then restore/cleanup after the test) so each test runs with an empty janitor and no shared _checkpoints; reference EnvironmentCheckpoint and _CheckpointJanitor to locate where to wire the fixture.

Add EnvironmentCheckpoint and CheckpointableEnvironment protocol

f142329

mwilliammyers marked this pull request as ready for review March 17, 2026 16:33

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

baz-reviewer bot reviewed Mar 17, 2026

View reviewed changes

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

coderabbitai bot reviewed Apr 7, 2026

View reviewed changes

Conversation

joshgreaves commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Design

Context

Tests

Generated description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gowthamkishore3 commented Mar 17, 2026

Uh oh!

gowthamkishore3 commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mwilliammyers commented Apr 7, 2026

Uh oh!

coderabbitai bot commented Apr 7, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joshgreaves commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading