improve: Stale container cleanup, fail-fast parent selection, post-compilation smoke test#10
Conversation
Three robustness improvements to generate_loop.py: 1. Stale container cleanup: At loop start, scans for and removes orphaned hyperagents-* containers from previous crashed runs. Prevents resource accumulation. 2. Fail-fast for deterministic errors: In parent selection retry logic, distinguishes Python script errors (SyntaxError, ImportError, etc.) from transient container failures. Deterministic errors fail immediately instead of burning 10 container lifecycles. 3. Lightweight smoke test: After compilation check passes, verifies TaskAgent is instantiable and forward() has the right signature. Catches runtime errors that import-only checks miss, at negligible cost (no LLM calls). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Hi @Ryuketsukami! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
Problem
Three robustness gaps in
generate_loop.py:1. Orphaned containers accumulate from crashed runs
If the process is killed between container creation and the
finallyblock, orphanedhyperagents-*containers persist. Over multiple crashed runs, these consume Docker resources (memory, storage layers, network namespaces).2. Parent selection retries 10 times on deterministic errors
select_next_parent_container()catches all exceptions uniformly and retries up to 10 times. If the evolvedselect_next_parent.pyhas a syntax error, every retry fails identically — burning 10 Docker container lifecycles (each with container creation, archive copy, patch application, teardown).3. Compilation check misses runtime errors
The sole pre-evaluation gate is
python -c "from task_agent import TaskAgent"— an import check. A mutation that imports cleanly but crashes inforward()wastes expensive evaluation compute. Research (CMU S3D-25-101) confirms: "non-syntactic mistakes manifest only in specific test cases."Solution
Container cleanup: At
generate_loop()start, scans for and force-removes any containers matchinghyperagents-*. Wrapped in try/except so failures never block.Fail-fast: After
exec_runin parent selection, checks if the error indicates a Python script failure (SyntaxError, ImportError, NameError, AttributeError, TypeError, IndentationError, traceback). Deterministic errors fail immediately; infrastructure errors still retry.Smoke test: New
run_smoke_test()ingl_utils.pyruns a 60-second-timeout check inside the container: imports TaskAgent, instantiates it (no LLM calls), verifiesforward()signature. Called after compilation check. If it fails, evaluation is skipped.Tests
Added
tests/test_smoke_test.py(6 tests):Question for maintainers
For the fail-fast logic: is there a scenario where a Python SyntaxError in
select_next_parent.pyshould be retried? The current implementation treats all script-level errors as deterministic (non-retryable).