Skip to content

Add EnvironmentCheckpoint and CheckpointableEnvironment protocol#96

Open
joshgreaves wants to merge 1 commit intomainfrom
go-explore/checkpoint-protocol
Open

Add EnvironmentCheckpoint and CheckpointableEnvironment protocol#96
joshgreaves wants to merge 1 commit intomainfrom
go-explore/checkpoint-protocol

Conversation

@joshgreaves
Copy link
Copy Markdown
Contributor

@joshgreaves joshgreaves commented Mar 13, 2026

User description

Summary

  • Adds EnvironmentCheckpoint[EnvType] -- a generic checkpoint object that holds closures for restore and release. restore() creates a brand new, independent environment instance at the checkpointed state.
  • Adds CheckpointableEnvironment protocol extending Environment with a single checkpoint() method returning EnvironmentCheckpoint[Self]
  • Adds _CheckpointJanitor for atexit cleanup of unreleased checkpoints (mirrors the existing _Janitor pattern in code_env.py)
  • Exports CheckpointableEnvironment and EnvironmentCheckpoint from the top-level ares package
  • Fixes pre-existing duplicate preset registration bug in presets.py

Design

The environment protocol gains one method: checkpoint(). Everything else lives on the checkpoint object:

checkpoint = await env.checkpoint()       # capture state
env, ts = await checkpoint.restore()      # create new env at that state
await checkpoint.release()                # free resources (Docker images, etc.)

The checkpoint takes closures from the environment -- no internal protocol methods are imposed. The environment is free to implement restore/release however it wants. See local/analyses/go-explore/04-go-explore-sketch.py for the full design rationale.

Context

Part of Go-Explore support (Wave 1). This PR is independent of the other two Wave 1 PRs and can be reviewed/merged in parallel:

  • PR: SnapshotableContainer protocol (go-explore/snapshotable-container)
  • PR: Agent state serialization (go-explore/agent-state-serialization)

Tests

11 new tests covering restore, independent env creation, release idempotency, restore-after-release errors, and janitor register/unregister/cleanup/exception handling.


Generated description

Below is a concise technical summary of the changes proposed in this PR:
Introduce EnvironmentCheckpoint to capture restorable snapshots and extend CheckpointableEnvironment so environments can expose checkpoint/restore/release flows along with a _CheckpointJanitor for cleanup; export these new APIs at the package level. Add focused tests covering restore, release, idempotency, and janitor cleanup to ensure the new checkpoint semantics work and register with the existing infrastructure.

TopicDetails
Checkpoint Flow Describe how EnvironmentCheckpoint, CheckpointableEnvironment, and _CheckpointJanitor capture state, restore new environments, release resources, and are covered by new async tests for checkpoint behavior and cleanup.
Modified files (3)
  • src/ares/__init__.py
  • src/ares/environments/base.py
  • src/ares/environments/checkpoint_test.py
Latest Contributors(2)
UserCommitDate
Narmeen07Add-mechanistic-interp...February 19, 2026
joshua.greaves@gmail.comMassively-simplify-the...January 29, 2026
Preset Dedup Prevent duplicate Harbor dataset presets by tracking seen IDs before registering each combination of dataset and agent.
Modified files (1)
  • src/ares/presets.py
Latest Contributors(2)
UserCommitDate
Narmeen07Add-mechanistic-interp...February 19, 2026
ryan@withmartian.comRemoved-repeated-indiv...January 29, 2026
This pull request is reviewed by Baz. Review like a pro on (Baz).

Summary by CodeRabbit

  • New Features

    • Environment checkpointing: snapshot and restore independent environment copies, explicit release/is-released semantics, and automatic cleanup of unreleased checkpoints at program exit; checkpointing types now accessible from the package root.
  • Chores

    • Deduplicated dataset registrations during preset initialization to avoid repeated registrations.
  • Tests

    • Added tests covering checkpoint restore/release semantics and janitor cleanup behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Adds EnvironmentCheckpoint and CheckpointableEnvironment types, a _CheckpointJanitor with atexit cleanup, exposes the new types at package root, adds tests for checkpoint lifecycle and janitor behavior, and prevents duplicate Harbor dataset registrations in presets.

Changes

Cohort / File(s) Summary
Package exports
src/ares/__init__.py
Exports CheckpointableEnvironment and EnvironmentCheckpoint from the environments package at the top-level ares package.
Checkpointing core
src/ares/environments/base.py
Adds EnvironmentCheckpoint[EnvType] (restore/release/is_released, step_count, timestep), CheckpointableEnvironment[...] protocol with checkpoint(), and _CheckpointJanitor with global _CHECKPOINT_JANITOR and atexit _cleanup.
Checkpointing tests
src/ares/environments/checkpoint_test.py
New pytest suite verifying restore semantics, independence of restored envs, release idempotency and error-on-restore-after-release, plus janitor register/unregister and cleanup behavior (including exception suppression).
Presets deduplication
src/ares/presets.py
Avoids duplicate Harbor dataset registration by computing ds_id once per dataset and tracking seen IDs to skip repeats; logs debug when skipping.

Sequence Diagram

sequenceDiagram
    participant User
    participant Env as Environment
    participant CP as EnvironmentCheckpoint
    participant Janitor as _CheckpointJanitor

    User->>Env: await checkpoint()
    Env->>CP: create(restore_fn, release_fn, timestep, step_count)
    CP->>Janitor: register(self)

    User->>CP: await restore()
    CP->>CP: call restore_fn -> new_env
    CP-->>User: return (new_env, timestep)

    User->>CP: await release()
    CP->>CP: call release_fn (optional)
    CP->>Janitor: unregister(self)

    Note over Janitor: at exit -> _cleanup()
    Janitor->>CP: call release_fn_sync for remaining checkpoints
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled a snapshot, tucked it tight,
A checkpoint saved for a rainy night.
Restore brings new clover, fresh and clear,
Release rings the bell — janitor near. 🌿

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main additions: EnvironmentCheckpoint and CheckpointableEnvironment protocol, matching the primary changes in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch go-explore/checkpoint-protocol

Comment @coderabbitai help to get the list of available commands and usage tips.

@mwilliammyers mwilliammyers marked this pull request as ready for review March 17, 2026 16:33
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/ares/environments/checkpoint_test.py (1)

25-33: Consider releasing checkpoints in tests to avoid polluting the global janitor.

Each EnvironmentCheckpoint created in tests auto-registers with _CHECKPOINT_JANITOR. While this doesn't cause test failures (entries are cleared at interpreter exit), it could affect test isolation if the janitor's state is inspected or if tests run in the same process accumulate many entries.

📝 Example cleanup pattern
checkpoint = base.EnvironmentCheckpoint(...)
try:
    env, ts = await checkpoint.restore()
    assert env is mock_env
    assert ts is simple_timestep
finally:
    await checkpoint.release()

Alternatively, use a pytest fixture that auto-releases checkpoints after tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/checkpoint_test.py` around lines 25 - 33, The test
creates an EnvironmentCheckpoint which auto-registers with _CHECKPOINT_JANITOR;
update the test to explicitly release that checkpoint to avoid polluting global
janitor state by ensuring you call await checkpoint.release() after using
checkpoint.restore(). Wrap the restore/assert block in a try/finally (or use a
pytest fixture) so that await checkpoint.release() runs even on assertion
failures; reference the EnvironmentCheckpoint instance named checkpoint and its
restore() and release() methods.
src/ares/environments/base.py (2)

270-271: Minor: Consider using EnvironmentCheckpoint[Any] for the type annotation.

The bare EnvironmentCheckpoint without a type parameter will work at runtime but loses type information. Using EnvironmentCheckpoint[Any] or EnvironmentCheckpoint[object] would be more explicit.

📝 Suggested annotation
+from typing import Any
+
 class _CheckpointJanitor:
     ...
     def __init__(self):
-        self._checkpoints: dict[int, EnvironmentCheckpoint] = {}
+        self._checkpoints: dict[int, EnvironmentCheckpoint[Any]] = {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 270 - 271, The type annotation
for the _checkpoints attribute in __init__ should be made explicit by using a
parameterized EnvironmentCheckpoint to preserve generic type info; change the
annotation from dict[int, EnvironmentCheckpoint] to dict[int,
EnvironmentCheckpoint[Any]] (or EnvironmentCheckpoint[object]) and add the
necessary typing import (Any) so the _checkpoints: dict[int,
EnvironmentCheckpoint[Any]] declaration in __init__ reflects the generic
parameter.

215-227: Consider ensuring unregister is called even if release_fn raises.

If _release_fn() raises an exception, the checkpoint is marked as released (_released = True) but unregister() is never called. This leaves a stale entry in the janitor that:

  1. Won't be cleaned up at exit (since _release_fn_sync may not match the failed state)
  2. Holds a reference to the checkpoint preventing garbage collection
🛠️ Proposed fix using try/finally
     async def release(self) -> None:
         """Release resources held by this checkpoint.

         For environments backed by containers, this deletes the snapshot image.
         For simple environments, this is a no-op. Safe to call multiple times.
         After release(), restore() will raise RuntimeError.
         """
         if self._released:
             return
         self._released = True
-        if self._release_fn is not None:
-            await self._release_fn()
-        _CHECKPOINT_JANITOR.unregister(self)
+        try:
+            if self._release_fn is not None:
+                await self._release_fn()
+        finally:
+            _CHECKPOINT_JANITOR.unregister(self)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 215 - 227, The release() method
must always call _CHECKPOINT_JANITOR.unregister(self) even if await
self._release_fn() raises; wrap the await in a try/finally so unregister is
executed in the finally block, and re-raise any exception from _release_fn so
behavior stays the same. Concretely, keep setting self._released = True, then if
self._release_fn is not None call it inside try: await self._release_fn()
finally: _CHECKPOINT_JANITOR.unregister(self); if there is no _release_fn still
call _CHECKPOINT_JANITOR.unregister(self) as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/environments/base.py`:
- Around line 270-271: The type annotation for the _checkpoints attribute in
__init__ should be made explicit by using a parameterized EnvironmentCheckpoint
to preserve generic type info; change the annotation from dict[int,
EnvironmentCheckpoint] to dict[int, EnvironmentCheckpoint[Any]] (or
EnvironmentCheckpoint[object]) and add the necessary typing import (Any) so the
_checkpoints: dict[int, EnvironmentCheckpoint[Any]] declaration in __init__
reflects the generic parameter.
- Around line 215-227: The release() method must always call
_CHECKPOINT_JANITOR.unregister(self) even if await self._release_fn() raises;
wrap the await in a try/finally so unregister is executed in the finally block,
and re-raise any exception from _release_fn so behavior stays the same.
Concretely, keep setting self._released = True, then if self._release_fn is not
None call it inside try: await self._release_fn() finally:
_CHECKPOINT_JANITOR.unregister(self); if there is no _release_fn still call
_CHECKPOINT_JANITOR.unregister(self) as before.

In `@src/ares/environments/checkpoint_test.py`:
- Around line 25-33: The test creates an EnvironmentCheckpoint which
auto-registers with _CHECKPOINT_JANITOR; update the test to explicitly release
that checkpoint to avoid polluting global janitor state by ensuring you call
await checkpoint.release() after using checkpoint.restore(). Wrap the
restore/assert block in a try/finally (or use a pytest fixture) so that await
checkpoint.release() runs even on assertion failures; reference the
EnvironmentCheckpoint instance named checkpoint and its restore() and release()
methods.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1614895-b5bf-48fe-833f-0269681b9e56

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and f142329.

📒 Files selected for processing (4)
  • src/ares/__init__.py
  • src/ares/environments/base.py
  • src/ares/environments/checkpoint_test.py
  • src/ares/presets.py

Comment on lines +181 to +192
restore_fn: Callable[[], Awaitable[EnvType]],
release_fn: Callable[[], Awaitable[None]] | None = None,
release_fn_sync: Callable[[], None] | None = None,
step_count: int,
timestep: TimeStep,
):
self._restore_fn = restore_fn
self._release_fn = release_fn
self._release_fn_sync = release_fn_sync
self.step_count = step_count
self.timestep = timestep
self._released = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atexit janitor only calls checkpoint._release_fn_sync and never awaits the async release_fn, so async-only cleanup can be skipped on exit — should we run the async release_fn in a temporary event loop or provide a sync fallback?

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In src/ares/environments/base.py around lines 181 to 226, the
_CheckpointJanitor._cleanup currently only calls checkpoint._release_fn_sync and
therefore skips async _release_fn, leaving resources unreleased at process exit. Modify
_cleanup so that for each checkpoint: if _release_fn_sync is present, call it as before;
otherwise if _release_fn is present, run it synchronously by creating a temporary
asyncio event loop and calling loop.run_until_complete(_release_fn()), catching and
logging exceptions. If creating/running a loop fails because an event loop is already
running, fall back to running the coroutine in a separate Thread with its own event loop
and join it, also catching and logging exceptions. Keep the existing warning logging and
ensure the _checkpoints dict is cleared afterward.

Comment on lines +215 to +226
async def release(self) -> None:
"""Release resources held by this checkpoint.

For environments backed by containers, this deletes the snapshot image.
For simple environments, this is a no-op. Safe to call multiple times.
After release(), restore() will raise RuntimeError.
"""
if self._released:
return
self._released = True
if self._release_fn is not None:
await self._release_fn()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release() flips _released to True before awaiting _release_fn so failures leave the checkpoint marked and prevent retries, should we postpone marking _released and unregistering the janitor until after _release_fn succeeds?

self._released = True; await self._release_fn() => await self._release_fn(); self._released = True

Finding type: Logical Bugs | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In src/ares/environments/base.py around lines 215-228, the release() method currently
sets self._released = True before awaiting self._release_fn, which causes the checkpoint
to be marked released and unregistered incorrectly if the async release raises. Change
release() so that it (1) does not set self._released until after await
self._release_fn() completes successfully, (2) only calls
_CHECKPOINT_JANITOR.unregister(self) after a successful release, and (3) keeps the
checkpoint registered and _released False if the release raises (propagate the exception
so callers can retry). Also preserve the no-op path: if _release_fn is None, still set
_released = True and unregister. Ensure no broad exception swallowing; let failures
surface so cleanup can be retried.

Comment on lines +284 to +289
if self._checkpoints:
_LOGGER.info("Cleaning up %d unreleased checkpoints at exit.", len(self._checkpoints))
for cp in list(self._checkpoints.values()):
if cp._release_fn_sync is not None:
try:
cp._release_fn_sync()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup logs lack checkpoint IDs despite CLAUDE.md requiring object IDs for async lifecycle events — should we include id(cp) in those log lines?

Finding type: AI Coding Guidelines | Severity: 🟢 Low


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

In src/ares/environments/base.py around lines 284 to 289, the
_CheckpointJanitor._cleanup method currently logs only the count and a generic warning
but does not include checkpoint identifiers. Update the info log at line ~285 to include
the IDs of the registered checkpoints (e.g., build a list of ids from self._checkpoints
values and log it alongside the count), and change the warning at line ~291 to include
the specific checkpoint id (id(cp)) when a release fails while keeping exc_info=True.
Keep formatting consistent with existing _LOGGER calls and avoid exposing sensitive data
beyond Python object ids.

@gowthamkishore3
Copy link
Copy Markdown

@coderabbitai full review

1 similar comment
@gowthamkishore3
Copy link
Copy Markdown

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/ares/environments/base.py (2)

270-271: Consider thread safety for _checkpoints dict.

If checkpoints are created or released concurrently from multiple threads, the _checkpoints dict could face race conditions. While the GIL provides some protection and typical async usage is single-threaded, consider using a threading.Lock if multi-threaded checkpoint management is anticipated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 270 - 271, The _checkpoints dict
initialized in __init__ can be mutated from multiple threads; add a
threading.Lock (e.g., self._checkpoints_lock = threading.RLock()) in __init__
and use it to guard all accesses and mutations of self._checkpoints (wrap
reads/writes in a with self._checkpoints_lock: block) inside methods that
create, release, or read checkpoints (e.g., create_checkpoint,
release_checkpoint, get_checkpoint or any other method referencing _checkpoints)
to prevent race conditions; ensure the lock is used consistently and prefer
RLock if reentrant locking may occur.

222-227: Exception in release_fn leaves checkpoint registered with janitor.

If release_fn raises an exception, the checkpoint is marked as released (_released = True at line 224) but unregister() at line 227 is never called. This leaves a stale entry in the janitor, though it's benign since _cleanup() only calls release_fn_sync (not release_fn).

Consider whether to wrap the release_fn call in try/finally to ensure unregistration:

♻️ Optional: Ensure unregistration even if release_fn fails
     async def release(self) -> None:
         if self._released:
             return
         self._released = True
-        if self._release_fn is not None:
-            await self._release_fn()
-        _CHECKPOINT_JANITOR.unregister(self)
+        try:
+            if self._release_fn is not None:
+                await self._release_fn()
+        finally:
+            _CHECKPOINT_JANITOR.unregister(self)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 222 - 227, The release path
should always unregister the checkpoint from _CHECKPOINT_JANITOR even if the
async release function raises: in the method containing self._released, wrap the
await self._release_fn() call in a try/finally (or equivalent) so that
_CHECKPOINT_JANITOR.unregister(self) is executed in the finally block; keep
setting self._released = True as appropriate (before or after calling
_release_fn per desired semantics) but ensure unregister is unconditional to
avoid leaving stale janitor entries (refer to the attributes _released,
_release_fn and the janitor call _CHECKPOINT_JANITOR.unregister(self)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/ares/environments/base.py`:
- Around line 270-271: The _checkpoints dict initialized in __init__ can be
mutated from multiple threads; add a threading.Lock (e.g.,
self._checkpoints_lock = threading.RLock()) in __init__ and use it to guard all
accesses and mutations of self._checkpoints (wrap reads/writes in a with
self._checkpoints_lock: block) inside methods that create, release, or read
checkpoints (e.g., create_checkpoint, release_checkpoint, get_checkpoint or any
other method referencing _checkpoints) to prevent race conditions; ensure the
lock is used consistently and prefer RLock if reentrant locking may occur.
- Around line 222-227: The release path should always unregister the checkpoint
from _CHECKPOINT_JANITOR even if the async release function raises: in the
method containing self._released, wrap the await self._release_fn() call in a
try/finally (or equivalent) so that _CHECKPOINT_JANITOR.unregister(self) is
executed in the finally block; keep setting self._released = True as appropriate
(before or after calling _release_fn per desired semantics) but ensure
unregister is unconditional to avoid leaving stale janitor entries (refer to the
attributes _released, _release_fn and the janitor call
_CHECKPOINT_JANITOR.unregister(self)).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ecd827c3-9bdb-46dc-a578-6cd520f3d0a7

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and f142329.

📒 Files selected for processing (4)
  • src/ares/__init__.py
  • src/ares/environments/base.py
  • src/ares/environments/checkpoint_test.py
  • src/ares/presets.py

@mwilliammyers
Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/ares/environments/base.py (1)

282-293: Keep checkpoint state consistent after janitor cleanup.

_cleanup() can run release_fn_sync, but it does not flip _released. After manual cleanup, a checkpoint may still allow restore() even though underlying resources were already cleaned.

Proposed fix
     def _cleanup(self) -> None:
         """Clean up all registered checkpoints at exit."""
@@
         for cp in list(self._checkpoints.values()):
             if cp._release_fn_sync is not None:
                 try:
                     cp._release_fn_sync()
                 except Exception:
                     _LOGGER.warning("Failed to clean up checkpoint.", exc_info=True)
+            cp._released = True
         self._checkpoints.clear()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 282 - 293, _cleanup currently
calls each checkpoint's cp._release_fn_sync but never flips cp._released,
leaving checkpoints usable via restore() after resources were cleaned; update
_cleanup to set cp._released = True for each checkpoint after invoking
cp._release_fn_sync (do this in a finally block so the flag is set even if the
release function raises) and keep the existing logging/exception handling around
cp._release_fn_sync and the subsequent self._checkpoints.clear() call.
src/ares/environments/checkpoint_test.py (1)

56-117: Add a regression test for release_fn failure semantics.

There is no test asserting state after release_fn raises. Adding one will lock lifecycle behavior and catch release-ordering regressions.

Suggested test case
 class TestEnvironmentCheckpoint:
@@
     async def test_release_is_idempotent(self, simple_timestep):
@@
         assert call_count == 1
+
+    `@pytest.mark.asyncio`
+    async def test_release_failure_does_not_mark_released(self, simple_timestep):
+        async def restore_fn():
+            return object()
+
+        async def release_fn():
+            raise RuntimeError("release failed")
+
+        checkpoint = base.EnvironmentCheckpoint(
+            restore_fn=restore_fn,
+            release_fn=release_fn,
+            step_count=0,
+            timestep=simple_timestep,
+        )
+
+        with pytest.raises(RuntimeError, match="release failed"):
+            await checkpoint.release()
+
+        assert not checkpoint.is_released
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/checkpoint_test.py` around lines 56 - 117, Add an async
test (e.g., test_release_when_release_fn_raises) that uses EnvironmentCheckpoint
with a release_fn that raises an exception; await checkpoint.release() expecting
that exception to propagate, then assert checkpoint.is_released is True, assert
await checkpoint.restore() raises RuntimeError (or raises when called) and
assert calling checkpoint.release() again does not re-run the failing release_fn
or raise (idempotence). Reference EnvironmentCheckpoint, release(), release_fn,
restore(), and is_released when adding the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ares/environments/base.py`:
- Around line 222-227: The method currently sets self._released before awaiting
the potentially-failing async cleanup, which prevents retries if release fails;
change the order so you await self._release_fn() first (if _release_fn is not
None), only set self._released = True and call
_CHECKPOINT_JANITOR.unregister(self) after the await completes successfully, and
let exceptions propagate (or re-raise) so the checkpoint remains not released on
failure; update the release() method accordingly to move the _released
assignment and unregister call to after the awaited _release_fn() completes.

In `@src/ares/environments/checkpoint_test.py`:
- Around line 25-229: Tests leak state because EnvironmentCheckpoint
auto-registers with the module-level janitor; add an autouse pytest fixture that
resets the global janitor before each test by creating a fresh
base._CheckpointJanitor() and assigning it to the module-level janitor used by
EnvironmentCheckpoint (then restore/cleanup after the test) so each test runs
with an empty janitor and no shared _checkpoints; reference
EnvironmentCheckpoint and _CheckpointJanitor to locate where to wire the
fixture.

---

Nitpick comments:
In `@src/ares/environments/base.py`:
- Around line 282-293: _cleanup currently calls each checkpoint's
cp._release_fn_sync but never flips cp._released, leaving checkpoints usable via
restore() after resources were cleaned; update _cleanup to set cp._released =
True for each checkpoint after invoking cp._release_fn_sync (do this in a
finally block so the flag is set even if the release function raises) and keep
the existing logging/exception handling around cp._release_fn_sync and the
subsequent self._checkpoints.clear() call.

In `@src/ares/environments/checkpoint_test.py`:
- Around line 56-117: Add an async test (e.g.,
test_release_when_release_fn_raises) that uses EnvironmentCheckpoint with a
release_fn that raises an exception; await checkpoint.release() expecting that
exception to propagate, then assert checkpoint.is_released is True, assert await
checkpoint.restore() raises RuntimeError (or raises when called) and assert
calling checkpoint.release() again does not re-run the failing release_fn or
raise (idempotence). Reference EnvironmentCheckpoint, release(), release_fn,
restore(), and is_released when adding the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9be1ea3-1ed9-483f-9488-417827581c61

📥 Commits

Reviewing files that changed from the base of the PR and between c804aa2 and f142329.

📒 Files selected for processing (4)
  • src/ares/__init__.py
  • src/ares/environments/base.py
  • src/ares/environments/checkpoint_test.py
  • src/ares/presets.py

Comment on lines +222 to +227
if self._released:
return
self._released = True
if self._release_fn is not None:
await self._release_fn()
_CHECKPOINT_JANITOR.unregister(self)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not mark checkpoints released before async release succeeds.

On Line 224, _released is set before awaiting release_fn (Line 226). If release_fn raises, the checkpoint becomes permanently “released” even though cleanup failed, so release() cannot be retried and restore() is blocked.

Proposed fix
     async def release(self) -> None:
         """Release resources held by this checkpoint.
@@
         if self._released:
             return
-        self._released = True
-        if self._release_fn is not None:
-            await self._release_fn()
-        _CHECKPOINT_JANITOR.unregister(self)
+        if self._release_fn is not None:
+            await self._release_fn()
+        self._released = True
+        _CHECKPOINT_JANITOR.unregister(self)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self._released:
return
self._released = True
if self._release_fn is not None:
await self._release_fn()
_CHECKPOINT_JANITOR.unregister(self)
if self._released:
return
if self._release_fn is not None:
await self._release_fn()
self._released = True
_CHECKPOINT_JANITOR.unregister(self)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/base.py` around lines 222 - 227, The method currently
sets self._released before awaiting the potentially-failing async cleanup, which
prevents retries if release fails; change the order so you await
self._release_fn() first (if _release_fn is not None), only set self._released =
True and call _CHECKPOINT_JANITOR.unregister(self) after the await completes
successfully, and let exceptions propagate (or re-raise) so the checkpoint
remains not released on failure; update the release() method accordingly to move
the _released assignment and unregister call to after the awaited _release_fn()
completes.

Comment on lines +25 to +229
checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=5,
timestep=simple_timestep,
)

env, ts = await checkpoint.restore()
assert env is mock_env
assert ts is simple_timestep

@pytest.mark.asyncio
async def test_restore_creates_independent_envs(self, simple_timestep):
"""Test that multiple restores create independent environments."""
call_count = 0

async def restore_fn():
nonlocal call_count
call_count += 1
return {"id": call_count}

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=0,
timestep=simple_timestep,
)

env_a, _ = await checkpoint.restore()
env_b, _ = await checkpoint.restore()
assert env_a != env_b
assert call_count == 2

@pytest.mark.asyncio
async def test_release_calls_release_fn(self, simple_timestep):
"""Test that release() calls the release function."""
released = False

async def release_fn():
nonlocal released
released = True

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
release_fn=release_fn,
step_count=0,
timestep=simple_timestep,
)

await checkpoint.release()
assert released
assert checkpoint.is_released

@pytest.mark.asyncio
async def test_release_is_idempotent(self, simple_timestep):
"""Test that release() can be called multiple times safely."""
call_count = 0

async def release_fn():
nonlocal call_count
call_count += 1

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
release_fn=release_fn,
step_count=0,
timestep=simple_timestep,
)

await checkpoint.release()
await checkpoint.release()
assert call_count == 1

@pytest.mark.asyncio
async def test_restore_after_release_raises(self, simple_timestep):
"""Test that restore() raises after release()."""

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=0,
timestep=simple_timestep,
)

await checkpoint.release()
with pytest.raises(RuntimeError, match="released"):
await checkpoint.restore()

@pytest.mark.asyncio
async def test_no_release_fn_is_ok(self, simple_timestep):
"""Test that checkpoints without release_fn work fine."""

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=0,
timestep=simple_timestep,
)

# Release should be a no-op
await checkpoint.release()
assert checkpoint.is_released

def test_step_count_accessible(self, simple_timestep):
"""Test that step_count is publicly accessible."""

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=42,
timestep=simple_timestep,
)

assert checkpoint.step_count == 42

def test_timestep_accessible(self, simple_timestep):
"""Test that timestep is publicly accessible."""

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=0,
timestep=simple_timestep,
)

assert checkpoint.timestep is simple_timestep


class TestCheckpointJanitor:
"""Tests for _CheckpointJanitor."""

def test_register_and_unregister(self, simple_timestep):
"""Test that register/unregister tracks checkpoints."""
janitor = base._CheckpointJanitor()

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
step_count=0,
timestep=simple_timestep,
)

# Checkpoint auto-registers with the global janitor, but
# we test the local janitor instance here.
janitor.register(checkpoint)
assert id(checkpoint) in janitor._checkpoints

janitor.unregister(checkpoint)
assert id(checkpoint) not in janitor._checkpoints

def test_cleanup_calls_sync_release(self, simple_timestep):
"""Test that janitor cleanup calls release_fn_sync."""
janitor = base._CheckpointJanitor()
released = False

def release_sync():
nonlocal released
released = True

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
release_fn_sync=release_sync,
step_count=0,
timestep=simple_timestep,
)

janitor.register(checkpoint)
janitor._cleanup()

assert released
assert len(janitor._checkpoints) == 0

def test_cleanup_handles_exceptions(self, simple_timestep):
"""Test that janitor cleanup handles exceptions gracefully."""
janitor = base._CheckpointJanitor()

def release_sync():
raise RuntimeError("cleanup failed")

async def restore_fn():
return object()

checkpoint = base.EnvironmentCheckpoint(
restore_fn=restore_fn,
release_fn_sync=release_sync,
step_count=0,
timestep=simple_timestep,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tests currently leak checkpoint state into the global janitor.

Each EnvironmentCheckpoint auto-registers globally, and many tests leave those entries behind. That introduces shared mutable state across test cases.

Proposed fixture to isolate janitor state
 import pytest
 
 from ares.environments import base
 
+@pytest.fixture(autouse=True)
+def _isolate_global_checkpoint_janitor():
+    base._CHECKPOINT_JANITOR._checkpoints.clear()
+    yield
+    base._CHECKPOINT_JANITOR._checkpoints.clear()
+
 
 `@pytest.fixture`
 def simple_timestep() -> base.TimeStep:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ares/environments/checkpoint_test.py` around lines 25 - 229, Tests leak
state because EnvironmentCheckpoint auto-registers with the module-level
janitor; add an autouse pytest fixture that resets the global janitor before
each test by creating a fresh base._CheckpointJanitor() and assigning it to the
module-level janitor used by EnvironmentCheckpoint (then restore/cleanup after
the test) so each test runs with an empty janitor and no shared _checkpoints;
reference EnvironmentCheckpoint and _CheckpointJanitor to locate where to wire
the fixture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants