Skip to content

test: expand agent harness coverage#513

Open
senamakel wants to merge 12 commits intotinyhumansai:mainfrom
senamakel:tests/improve-coverage
Open

test: expand agent harness coverage#513
senamakel wants to merge 12 commits intotinyhumansai:mainfrom
senamakel:tests/improve-coverage

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Apr 12, 2026

Summary

  • Add focused unit coverage for src/openhuman/agent/harness/session/turn.rs, including transcript resume, full tool-cycle turns, max-iteration failure, hook dispatch, and inline tool-result budgeting.
  • Add targeted unit coverage for src/openhuman/agent/harness/session/runtime.rs, including accessors, history reset, error sanitization, overlap detection, and native tool-call persistence helpers.
  • Add targeted unit coverage for src/openhuman/agent/harness/tool_loop.rs, including provider failures, tool execution failures, supervised auto-approval, default iteration fallback, and native tool result persistence.
  • Fold in rustfmt/doc-comment cleanup required for the agent module so repository formatting gates pass on push.

Problem

  • The openhuman::agent tree still had large untested sections in the harness hot path, especially around turn execution, runtime helpers, and the legacy tool loop.
  • That left error handling, transcript reuse, approval behavior, and iteration-limit branches under-covered, which makes future changes in the agent runtime harder to review safely.

Solution

  • Added deterministic unit fixtures that drive the harness through real multi-step tool turns rather than only helper-level assertions.
  • Added explicit coverage for failure branches that were previously unhit: provider errors, tool execution errors, unknown tools, iteration exhaustion, and inline tool-result truncation.
  • Exercised transcript resume and post-turn hook behavior directly in turn.rs so the session lifecycle coverage now includes more of the real runtime path.
  • Included the rustfmt-required doc/comment cleanup in adjacent agent files so the branch passes the repository push gate without introducing functional changes there.

Submission Checklist

  • Unit testscargo test --lib 'openhuman::agent::' and targeted harness test modules
  • E2E / integration — N/A, this PR is unit-test coverage only and does not change user-visible flow behavior
  • N/A — E2E not applicable for this coverage-only change
  • Doc comments — rustfmt/doc-comment cleanup included where required by the repo formatting gate
  • Inline comments — N/A, no new production logic paths were introduced

Impact

  • Runtime impact is limited to tests plus non-functional formatting/doc-comment cleanup in adjacent agent files.
  • No product logic changes were required in this pass.
  • Coverage for src/openhuman/agent increased, with the main runtime-heavy harness files materially improved:
    • session/turn.rs to 87.81%
    • tool_loop.rs to 89.52%
    • session/runtime.rs to 89.33%

Related

  • Issue(s): none referenced
  • Follow-up PR(s)/TODOs: push remaining openhuman::agent coverage via memory_loader, triage/escalation, interrupt, and triage/routing

Summary by CodeRabbit

  • New Features

    • Extended tool-call tracking with richer metadata (arguments, success, duration, output summaries) and richer subagent run outcomes (task/agent id, iterations, elapsed, mode).
    • Agent/AgentBuilder exposed at the top-level module for easier construction.
  • Documentation

    • Expanded and clarified docs for agent lifecycle, dispatching, tool formats, error cases, and harness/subagent behaviors.
  • Tests

    • Large suite of new unit/integration tests covering runtime, tool loop, memory, multimodal, triage, channels, hooks, learning, and utils.

- Introduced a suite of tests for the `fire_hooks` function, ensuring that all hooks are dispatched even if one fails, enhancing reliability in the hook execution flow.
- Added tests for the `sanitize_tool_output` function to validate the correct mapping of success and failure messages for various tool outputs.
- Implemented a mock memory structure to facilitate testing of memory context building, ensuring that working memory is prioritized and deduplication occurs correctly.
- Enhanced the `build_memory_context` function tests to verify filtering and truncation of memory entries, improving the robustness of memory management in the system.
- Overall, these tests aim to strengthen the codebase by ensuring that critical functionalities related to hooks and memory context are thoroughly validated.
…mpt options

- Updated the `provider_alias_and_route_selection_round_trip` test to dynamically resolve the first provider from the registry, ensuring accurate alias resolution.
- Added new tests for `DumpPromptOptions` and `ComposioStubTools`, validating default settings and expected tool names.
- Introduced tests for rendering main agent dumps, ensuring tool instructions and skill counts are correctly included in the output.
- Enhanced prompt handling tests to cover cache boundary extraction and subagent render options, improving overall test coverage and reliability.
- Added new tests for `AgentError` variants to validate string formatting and error recovery from `anyhow`.
- Improved formatting consistency in existing tests for better readability.
- Enhanced the `sanitize_tool_output` test to ensure accurate mapping of success and failure messages.
- Updated memory loader tests to enforce minimum limits and budget constraints, ensuring robust memory management.
- Overall, these changes aim to strengthen the test suite by improving coverage and clarity in error handling scenarios.
…ering

- Introduced new tests to validate the filtering of tools based on named scopes and disallowed tools, ensuring correct tool selection in debug dumps.
- Added tests for rendering subagent dumps, including handling file prompt fallbacks and missing files without panicking.
- Enhanced the workspace prompt handling to prefer custom prompt locations, improving the flexibility and reliability of agent prompt rendering.
- Overall, these additions strengthen the test suite by covering critical functionalities related to tool filtering and prompt rendering.
…modal helpers

- Introduced new tests for the memory loader to validate behavior when the header exceeds budget and when recall fails, ensuring robust memory management.
- Added tests for multimodal helpers, covering image marker counting, payload extraction, and MIME type normalization, enhancing the reliability of multimodal interactions.
- Overall, these additions strengthen the test suite by improving coverage and ensuring correct functionality in memory handling and multimodal processing.
- Introduced new tests for the `run_tool_call_loop` function, validating the rejection of vision markers for non-vision providers and ensuring correct streaming of final text chunks.
- Added tests to verify that CLI-only tools are blocked in prompt mode and that native tool results are persisted as tool messages.
- Enhanced the `Agent` class with tests for error handling and event publishing during single runs, ensuring robust agent behavior in various scenarios.
- Overall, these additions strengthen the test suite by improving coverage and reliability in tool execution and agent interactions.
- Added new tests for the `run_tool_call_loop` function, including scenarios for auto-approving supervised tools on non-CLI channels and handling unknown tools with default max iterations.
- Introduced `ErrorResultTool` and `FailingTool` to simulate error conditions during tool execution, improving coverage of error handling in the agent.
- Updated the `ScriptedProvider` to return results wrapped in `anyhow::Result`, ensuring consistent error handling across test cases.
- Overall, these enhancements strengthen the test suite by validating tool execution paths and agent behavior under various conditions.
- Enhanced formatting in various test cases for better clarity and consistency, including the use of line breaks and indentation.
- Updated assertions to improve readability by aligning them with Rust's idiomatic style.
- Overall, these changes aim to strengthen the test suite by making it more maintainable and easier to understand.
- Added comprehensive documentation to the `mod.rs` file, detailing the agent domain's purpose, key components, and their functionalities.
- Improved clarity on how LLMs interact with the system, manage conversation history, and handle autonomous behaviors.
- This update aims to enhance understanding and maintainability of the agent domain within the OpenHuman project.
- Enhanced comments and documentation in `dispatcher.rs`, `error.rs`, `hooks.rs`, and `harness/mod.rs` for better clarity and consistency.
- Adjusted formatting in the `TurnContext` and `ToolCallRecord` structs to improve readability and understanding of their purpose and functionality.
- Overall, these changes aim to strengthen the documentation and maintainability of the agent domain within the OpenHuman project.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

Adds extensive test coverage across agent, channels, context, triage, learning, and runtime modules; expands dispatcher ToolDispatcher interface and tool-call metadata (tool_call_id); augments TurnContext/ToolCallRecord with learning fields; and updates documentation for agent bus, harness, and subagent semantics.

Changes

Cohort / File(s) Summary
Dispatcher & Agent Bus
src/openhuman/agent/dispatcher.rs, src/openhuman/agent/bus.rs
Expanded ToolDispatcher trait with new methods (parse_response, format_results, prompt_instructions, to_provider_messages, should_send_tool_specs); added tool_call_id: Option<String> to ParsedToolCall and ToolExecutionResult; rewrote bus docs to clarify owned/Arc/trait-object/channel semantics.
Turn Context & Hooks
src/openhuman/agent/hooks.rs
Added persisted turn metadata fields to TurnContext (tool_calls, turn_duration_ms, session_id, iteration_count) and extended ToolCallRecord with name, arguments, success, output_summary, duration_ms; added tests for sanitization and hook dispatch.
Agent Public API & Docs
src/openhuman/agent/mod.rs, src/openhuman/agent/harness/*, src/openhuman/agent/harness/definition.rs, src/openhuman/agent/harness/session/*, src/openhuman/agent/harness/subagent_runner.rs
Added module-level docs and pub use exports for Agent/AgentBuilder; rewrote harness/module docs and many public method docstrings; SubagentRunOutcome struct expanded with task_id, agent_id, iterations, elapsed, mode (fields exposed).
Extensive Tests — Agent Core & Tool Loop
src/openhuman/agent/harness/memory_context.rs, src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/harness/session/turn.rs, src/openhuman/agent/harness/session/runtime.rs, src/openhuman/agent/harness/session/types.rs
Added comprehensive #[cfg(test)] suites covering memory context building, tool-call loop behaviors (streaming, approvals, scopes), session/turn lifecycle (history trimming, fork context, persistence, telemetry), and runtime event publishing.
Memory, Multimodal & Prompt Tests
src/openhuman/agent/memory_loader.rs, src/openhuman/agent/multimodal.rs, src/openhuman/agent/pformat.rs, src/openhuman/context/prompt.rs, src/openhuman/context/debug_dump.rs
Added many tests validating memory loader budgets/filters, multimodal image handling and MIME detection, p-format parsing docs updated, prompt rendering, cache-boundary behavior, and workspace file sync/injection.
Triage & Routing Tests
src/openhuman/agent/triage/evaluator.rs, src/openhuman/agent/triage/escalation.rs, src/openhuman/agent/triage/events.rs, src/openhuman/agent/triage/routing.rs
Updated triage/evaluator docs; added async tests for run_triage, apply_decision, event publishing helpers, cache TTL/refresh behavior, and remote provider fallback assertions.
Learning & User Profile Tests
src/openhuman/learning/prompt_sections.rs, src/openhuman/learning/reflection.rs, src/openhuman/learning/tool_tracker.rs, src/openhuman/learning/user_profile.rs
Added test doubles and suites for learned-context rendering, reflection hook behavior (prompt building, counters, persistence), tool stats tracking (error-pattern eviction), and user preference extraction/storage.
Channels & Routes Tests
src/openhuman/channels/bus.rs, src/openhuman/channels/commands.rs, src/openhuman/channels/context.rs, src/openhuman/channels/routes.rs
Added tests for channel subscriber metadata, health classification, timeout/history compaction, memory-context filtering, runtime command parsing, route selection persistence, and cached model preview loading.
Schemas & Runtime Tests
src/openhuman/agent/schemas.rs, src/openhuman/agent/host_runtime.rs
Added schema controller tests and RPC handler exercises; added runtime capability and command-building tests for native and Docker runtimes.
Error types & Small docs
src/openhuman/agent/error.rs, src/openhuman/agent/pformat.rs
Expanded AgentError documentation and added unit tests for Display/from-anyhow behavior; small doc clarifications for p-format rendering/parsing.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~40 minutes

Possibly related PRs

Suggested reviewers

  • YellowSnnowmann

Poem

🐰 "I hopped through tests both far and near,

Tool calls numbered, errors clear,
Arcs and traits all put in place,
Coverage blooms across the space—
A carrot for CI, now cheer!"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: expand agent harness coverage' directly and accurately describes the main change—adding focused unit tests to increase coverage across agent harness hot paths without modifying runtime behavior.
Docstring Coverage ✅ Passed Docstring coverage is 90.42% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

…vior

- Introduced new tests for the `DefaultMemoryLoader`, validating error propagation during primary recall failures and ensuring correct context emission when working memory is present.
- Added tests to verify behavior when the working memory section exceeds budget constraints, enhancing memory management robustness.
- Overall, these additions strengthen the test suite by improving coverage and reliability in memory handling and agent interactions.
…tests

- Reformatted the `load_context` calls in memory loader tests for better readability by chaining method calls.
- Enhanced documentation comments in the `pformat.rs` file to clarify the purpose and functionality of functions.
- Improved consistency in spacing and formatting across various sections of the codebase, including the `interrupt.rs` and `builder.rs` files.
- Overall, these changes aim to enhance code clarity and maintainability within the test suite and related modules.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/openhuman/agent/bus.rs (1)

361-373: ⚠️ Potential issue | 🟠 Major

Guard global-handler registration test with BUS_HANDLER_LOCK to avoid race conditions.

Line 367 calls register_agent_handlers() directly without acquiring the lock used by other bus tests (e.g., mock_agent_run_turn()). This can race with tests that install handler stubs, causing nondeterministic test outcomes.

Suggested fix
 #[tokio::test]
 async fn register_agent_handlers_exposes_run_turn_on_global_registry() {
     // Read-only smoke test: prove the production registration path
     // actually puts `agent.run_turn` on the global registry. Does
     // NOT dispatch — dispatching from this test would race with any
     // other test that installs a handler override (e.g. the channel
     // dispatch integration tests in `runtime_dispatch.rs`).
-    register_agent_handlers();
+    let _guard = use_real_agent_handler().await;
     let registry = crate::core::event_bus::native_registry()
         .expect("native registry should be initialized after register_agent_handlers");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/bus.rs` around lines 361 - 373, Wrap the test's call to
register_agent_handlers() with the same global test lock used elsewhere to
prevent races: acquire BUS_HANDLER_LOCK at the start of the async test (before
calling register_agent_handlers()) and hold it until after the registry
assertion completes, then release it; this mirrors other tests like
mock_agent_run_turn() and ensures the AGENT_RUN_TURN_METHOD registration check
runs under the BUS_HANDLER_LOCK to avoid nondeterministic handler-install races.
src/openhuman/agent/triage/routing.rs (1)

391-446: ⚠️ Potential issue | 🟠 Major

Serialize cache-mutating tests with a lock to prevent race-induced flakiness at line 422.

The test at line 407 (decide_with_cache_respects_ttl_window) primes the cache with CacheState::Degraded and expects it to be returned within the TTL window. However, line 422 accepts either Degraded or Remote, which masks a race where a concurrent test clears the cache between setup and assertion. Since tests run in parallel by default and access the shared static DECISION_CACHE, the assertion should be strict: assert_eq!(state, CacheState::Degraded).

Fix: Add a test-scoped Mutex lock to serialize the three tokio tests that access DECISION_CACHE:

✅ Deterministic test fix
 #[cfg(test)]
 mod tests {
     use super::*;
+    static CACHE_TEST_LOCK: tokio::sync::Mutex<()> = tokio::sync::Mutex::const_new(());

     #[tokio::test]
     async fn mark_degraded_forces_remote_on_next_resolve() {
+        let _guard = CACHE_TEST_LOCK.lock().await;
         clear_cache().await;
         mark_degraded().await;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/routing.rs` around lines 391 - 446, The three
tests that mutate DECISION_CACHE (mark_degraded_forces_remote_on_next_resolve,
decide_with_cache_respects_ttl_window,
cache_snapshot_returns_none_when_empty_and_refreshes_expired_entries) can race;
serialize them by introducing a test-scoped async mutex and acquiring it at the
start of each test to hold for the test duration, and tighten the assertion in
decide_with_cache_respects_ttl_window to assert_eq!(state, CacheState::Degraded)
(replace the current permissive matches! check). Use a single shared symbol like
TEST_DECISION_CACHE_LOCK (a tokio::sync::Mutex<()> created with
once_cell::sync::Lazy or once_cell::sync::OnceLock) and in each test do let
_guard = TEST_DECISION_CACHE_LOCK.lock().await before calling clear_cache /
priming DECISION_CACHE / decide_with_cache so the cache setup and assertion
cannot be interleaved by other tests.
🧹 Nitpick comments (6)
src/openhuman/channels/commands.rs (1)

307-308: Simplify redundant default initialization in test setup.

Config::default() already provides default channels_config, so the explicit reassignment can be removed.

Proposed simplification
-        let mut config = Config::default();
-        config.channels_config = crate::openhuman::config::ChannelsConfig::default();
+        let config = Config::default();
         doctor_channels(config).await.unwrap();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/channels/commands.rs` around lines 307 - 308, The test creates
a Config via Config::default() then immediately overwrites
config.channels_config with ChannelsConfig::default(), which is redundant
because Config::default() already sets channels_config; remove the explicit
reassignment line (the assignment to config.channels_config =
crate::openhuman::config::ChannelsConfig::default()) and rely on
Config::default() to provide the default ChannelsConfig in the test setup for
functions using Config.
src/openhuman/agent/multimodal.rs (2)

629-632: Use a tempdir-derived missing path for platform-safe determinism.

Line 629 currently uses a hardcoded absolute path; generating a missing path from tempfile::tempdir() is more reliable across CI environments.

Refactor suggestion
-        let err = normalize_local_image("/definitely/missing.png", 1024)
+        let temp = tempfile::tempdir().unwrap();
+        let missing = temp.path().join("missing.png");
+        let err = normalize_local_image(missing.to_string_lossy().as_ref(), 1024)
             .await
             .expect_err("missing local file should fail");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/multimodal.rs` around lines 629 - 632, The test uses a
hardcoded absolute path; replace it with a path derived from a
tempfile::tempdir() so the "missing" file is platform-safe and deterministic:
create a TempDir via tempfile::tempdir(), join a non-existent filename (e.g.,
temp_dir.path().join("definitely_missing.png")) and pass that path to
normalize_local_image(...) in the test; keep the expect_err assertion and the
check that err.to_string() contains "not found or unreadable" so the intent of
normalize_local_image(...) is preserved.

612-614: Avoid exact full error-string equality in tests.

Line 612-Line 614 asserts the entire formatted error text, which is brittle to harmless wording changes.

Refactor suggestion
-        assert_eq!(
-            validate_mime("x", "text/plain").unwrap_err().to_string(),
-            "multimodal image MIME type is not allowed for 'x': text/plain"
-        );
+        let err = validate_mime("x", "text/plain").unwrap_err().to_string();
+        assert!(err.contains("MIME type is not allowed"));
+        assert!(err.contains("text/plain"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/multimodal.rs` around lines 612 - 614, The test currently
asserts exact equality of the formatted error string from validate_mime("x",
"text/plain"), which is brittle; change the assertion to check for a stable
substring or error kind instead — for example call let err = validate_mime("x",
"text/plain").unwrap_err().to_string(); then assert!(err.contains("multimodal
image MIME type is not allowed") && err.contains("'x'")) or, if the error type
is exposed, match on the error variant from validate_mime instead of comparing
the full Display output; this keeps the test robust while still verifying the
expected failure.
src/openhuman/context/debug_dump.rs (1)

754-785: Consider using RAII for test workspace cleanup.

Several tests create temporary workspaces but use let _ = std::fs::remove_dir_all(workspace); for cleanup, which won't execute if the test panics. Consider using the tempfile crate's TempDir for automatic cleanup, or at minimum document that manual cleanup may leave directories on panic.

♻️ Example with tempfile crate
+use tempfile::TempDir;
+
 #[test]
 fn render_main_agent_dump_includes_tool_instructions_and_skill_count() {
-    let workspace =
-        std::env::temp_dir().join(format!("openhuman_debug_main_{}", uuid::Uuid::new_v4()));
-    std::fs::create_dir_all(&workspace).unwrap();
+    let temp_dir = TempDir::new().unwrap();
+    let workspace = temp_dir.path();
     std::fs::write(workspace.join("SOUL.md"), "# Soul\nContext").unwrap();
     // ... rest of test
-    let _ = std::fs::remove_dir_all(workspace);
+    // TempDir automatically cleans up on drop
 }

Also applies to: 817-858, 860-902, 904-961

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/context/debug_dump.rs` around lines 754 - 785, Tests like
render_main_agent_dump_includes_tool_instructions_and_skill_count create
temporary workspace dirs and call std::fs::remove_dir_all at the end, which
won't run on panic; update these tests to use tempfile::TempDir (or similar RAII
temp-dir helper) so the directory is created via TempDir::new() and files are
written to temp_dir.path(), removing the manual std::fs::remove_dir_all calls;
apply this change to the test above and the other tests referencing the same
pattern (those that call render_main_agent_dump, use StubTool and ToolCategory,
and manually remove the workspace).
src/openhuman/agent/bus.rs (1)

30-41: Remove duplicated rustdoc block for AgentTurnRequest.

The same doc text appears twice consecutively, which adds noise and makes maintenance harder.

✂️ Suggested cleanup
-/// Full owned payload for a single agentic turn executed through the bus.
-///
-/// All fields are either owned values, [`Arc`]s, or channel handles — the
-/// bus carries them by value without touching serialization. Consumers can
-/// therefore pass trait objects (`Arc<dyn Provider>`, tool trait-object
-/// registries) and streaming senders (`on_delta`) through unchanged.
 /// Full owned payload for a single agentic turn executed through the bus.
 ///
 /// All fields are either owned values, [`Arc`]s, or channel handles — the
 /// bus carries them by value without touching serialization. Consumers can
 /// therefore pass trait objects (`Arc<dyn Provider>`, tool trait-object
 /// registries) and streaming senders (`on_delta`) through unchanged.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/bus.rs` around lines 30 - 41, The rustdoc comment for
AgentTurnRequest is duplicated; remove the redundant copy so only a single doc
block precedes the AgentTurnRequest definition, ensuring the remaining doc text
is unchanged and whitespace/formatting around the struct/type declaration is
preserved.
src/openhuman/agent/harness/session/runtime.rs (1)

491-492: Avoid fixed sleep for async event assertions.

At Line 491, a fixed 20ms delay can be flaky under CI load. Prefer waiting on a bounded condition (timeout + loop) before asserting captured events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/runtime.rs` around lines 491 - 492,
Replace the fixed sleep+lock with a bounded wait loop using
tokio::time::timeout: repeatedly acquire the events Mutex (events.lock().await)
and check the desired condition on captured (e.g., non-empty or expected length)
until it is satisfied or the timeout elapses, returning a test failure or panic
when timeout occurs; use tokio::time::Duration for the overall timeout and a
short async yield (or await on the Mutex again) inside the loop to avoid
busy-waiting so the assertion is robust under CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/harness/mod.rs`:
- Around line 2-5: Normalize the module-level doc comment spacing in the harness
header: remove the empty doc-comment line after the first //! and ensure each
leading //! has a single space then the sentence (no trailing spaces), and apply
the same fix to the duplicated header region (lines referenced in the review as
the block and lines 13-14); update the module-level doc comment in
src/openhuman/agent/harness/mod.rs so all //! lines are consistently formatted
to satisfy Prettier --check.

In `@src/openhuman/agent/memory_loader.rs`:
- Around line 226-230: The test name
new_enforces_minimum_limit_and_zero_budget_disables_output is misleading because
the assertion on DefaultMemoryLoader::new(0, 0.4).with_max_chars(0) and
load_context(&MockMemory, "hello") expects the memory context to be present;
rename the test function to reflect that zero budget still produces output
(e.g., new_enforces_minimum_limit_and_zero_budget_still_outputs_context) so the
name matches the behavior verified by load_context and the assert on "[Memory
context]".

In `@src/openhuman/agent/triage/escalation.rs`:
- Around line 226-227: Replace the timing-dependent
sleep(Duration::from_millis(20)).await calls with condition-based waits: instead
of sleeping, poll the actual condition you care about (e.g. a shared state,
queue length, or an async flag) inside a timeout using tokio::time::timeout and
either a small awaitable backoff (tokio::task::yield_now().await) or await a
synchronization primitive (tokio::sync::Notify, watch, or Barrier) that your
code under test signals; update the test(s) in escalation.rs that call
sleep(...) to wait for the concrete predicate (wrapped in timeout) so assertions
become deterministic and CI-safe.

In `@src/openhuman/agent/triage/events.rs`:
- Line 89: Replace the brittle fixed wait call
sleep(Duration::from_millis(20)).await with a bounded wait-until loop that polls
the event capture predicate and fails on timeout: use tokio::time::timeout (or
Instant+deadline) and repeatedly check the captured events/count/content in a
small sleep loop until the predicate is met or the timeout elapses, then assert
the expected events; this removes the race and makes the test deterministic
(locate the exact call sleep(Duration::from_millis(20)).await in the test and
replace it with the timeout+predicate loop around the captured events).

---

Outside diff comments:
In `@src/openhuman/agent/bus.rs`:
- Around line 361-373: Wrap the test's call to register_agent_handlers() with
the same global test lock used elsewhere to prevent races: acquire
BUS_HANDLER_LOCK at the start of the async test (before calling
register_agent_handlers()) and hold it until after the registry assertion
completes, then release it; this mirrors other tests like mock_agent_run_turn()
and ensures the AGENT_RUN_TURN_METHOD registration check runs under the
BUS_HANDLER_LOCK to avoid nondeterministic handler-install races.

In `@src/openhuman/agent/triage/routing.rs`:
- Around line 391-446: The three tests that mutate DECISION_CACHE
(mark_degraded_forces_remote_on_next_resolve,
decide_with_cache_respects_ttl_window,
cache_snapshot_returns_none_when_empty_and_refreshes_expired_entries) can race;
serialize them by introducing a test-scoped async mutex and acquiring it at the
start of each test to hold for the test duration, and tighten the assertion in
decide_with_cache_respects_ttl_window to assert_eq!(state, CacheState::Degraded)
(replace the current permissive matches! check). Use a single shared symbol like
TEST_DECISION_CACHE_LOCK (a tokio::sync::Mutex<()> created with
once_cell::sync::Lazy or once_cell::sync::OnceLock) and in each test do let
_guard = TEST_DECISION_CACHE_LOCK.lock().await before calling clear_cache /
priming DECISION_CACHE / decide_with_cache so the cache setup and assertion
cannot be interleaved by other tests.

---

Nitpick comments:
In `@src/openhuman/agent/bus.rs`:
- Around line 30-41: The rustdoc comment for AgentTurnRequest is duplicated;
remove the redundant copy so only a single doc block precedes the
AgentTurnRequest definition, ensuring the remaining doc text is unchanged and
whitespace/formatting around the struct/type declaration is preserved.

In `@src/openhuman/agent/harness/session/runtime.rs`:
- Around line 491-492: Replace the fixed sleep+lock with a bounded wait loop
using tokio::time::timeout: repeatedly acquire the events Mutex
(events.lock().await) and check the desired condition on captured (e.g.,
non-empty or expected length) until it is satisfied or the timeout elapses,
returning a test failure or panic when timeout occurs; use tokio::time::Duration
for the overall timeout and a short async yield (or await on the Mutex again)
inside the loop to avoid busy-waiting so the assertion is robust under CI.

In `@src/openhuman/agent/multimodal.rs`:
- Around line 629-632: The test uses a hardcoded absolute path; replace it with
a path derived from a tempfile::tempdir() so the "missing" file is platform-safe
and deterministic: create a TempDir via tempfile::tempdir(), join a non-existent
filename (e.g., temp_dir.path().join("definitely_missing.png")) and pass that
path to normalize_local_image(...) in the test; keep the expect_err assertion
and the check that err.to_string() contains "not found or unreadable" so the
intent of normalize_local_image(...) is preserved.
- Around line 612-614: The test currently asserts exact equality of the
formatted error string from validate_mime("x", "text/plain"), which is brittle;
change the assertion to check for a stable substring or error kind instead — for
example call let err = validate_mime("x",
"text/plain").unwrap_err().to_string(); then assert!(err.contains("multimodal
image MIME type is not allowed") && err.contains("'x'")) or, if the error type
is exposed, match on the error variant from validate_mime instead of comparing
the full Display output; this keeps the test robust while still verifying the
expected failure.

In `@src/openhuman/channels/commands.rs`:
- Around line 307-308: The test creates a Config via Config::default() then
immediately overwrites config.channels_config with ChannelsConfig::default(),
which is redundant because Config::default() already sets channels_config;
remove the explicit reassignment line (the assignment to config.channels_config
= crate::openhuman::config::ChannelsConfig::default()) and rely on
Config::default() to provide the default ChannelsConfig in the test setup for
functions using Config.

In `@src/openhuman/context/debug_dump.rs`:
- Around line 754-785: Tests like
render_main_agent_dump_includes_tool_instructions_and_skill_count create
temporary workspace dirs and call std::fs::remove_dir_all at the end, which
won't run on panic; update these tests to use tempfile::TempDir (or similar RAII
temp-dir helper) so the directory is created via TempDir::new() and files are
written to temp_dir.path(), removing the manual std::fs::remove_dir_all calls;
apply this change to the test above and the other tests referencing the same
pattern (those that call render_main_agent_dump, use StubTool and ToolCategory,
and manually remove the workspace).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eee59033-9afc-445a-b18d-26dbe8420cc6

📥 Commits

Reviewing files that changed from the base of the PR and between ace0006 and f9696c4.

📒 Files selected for processing (28)
  • src/openhuman/agent/bus.rs
  • src/openhuman/agent/dispatcher.rs
  • src/openhuman/agent/error.rs
  • src/openhuman/agent/harness/memory_context.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/types.rs
  • src/openhuman/agent/harness/tool_loop.rs
  • src/openhuman/agent/hooks.rs
  • src/openhuman/agent/host_runtime.rs
  • src/openhuman/agent/memory_loader.rs
  • src/openhuman/agent/mod.rs
  • src/openhuman/agent/multimodal.rs
  • src/openhuman/agent/schemas.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/agent/triage/events.rs
  • src/openhuman/agent/triage/routing.rs
  • src/openhuman/channels/bus.rs
  • src/openhuman/channels/commands.rs
  • src/openhuman/channels/context.rs
  • src/openhuman/channels/routes.rs
  • src/openhuman/context/debug_dump.rs
  • src/openhuman/context/prompt.rs
  • src/openhuman/learning/prompt_sections.rs
  • src/openhuman/learning/reflection.rs
  • src/openhuman/learning/tool_tracker.rs
  • src/openhuman/learning/user_profile.rs

Comment on lines +226 to +230
async fn new_enforces_minimum_limit_and_zero_budget_disables_output() {
let loader = DefaultMemoryLoader::new(0, 0.4).with_max_chars(0);
let context = loader.load_context(&MockMemory, "hello").await.unwrap();
assert!(context.contains("[Memory context]"));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test name contradicts asserted behavior.

At Line 226, the name says “zero budget disables output”, but Line 229 asserts memory context is present. Please rename to match actual semantics and avoid confusion.

Suggested rename
-    async fn new_enforces_minimum_limit_and_zero_budget_disables_output() {
+    async fn new_enforces_minimum_limit_and_zero_budget_is_unlimited() {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async fn new_enforces_minimum_limit_and_zero_budget_disables_output() {
let loader = DefaultMemoryLoader::new(0, 0.4).with_max_chars(0);
let context = loader.load_context(&MockMemory, "hello").await.unwrap();
assert!(context.contains("[Memory context]"));
}
async fn new_enforces_minimum_limit_and_zero_budget_is_unlimited() {
let loader = DefaultMemoryLoader::new(0, 0.4).with_max_chars(0);
let context = loader.load_context(&MockMemory, "hello").await.unwrap();
assert!(context.contains("[Memory context]"));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/memory_loader.rs` around lines 226 - 230, The test name
new_enforces_minimum_limit_and_zero_budget_disables_output is misleading because
the assertion on DefaultMemoryLoader::new(0, 0.4).with_max_chars(0) and
load_context(&MockMemory, "hello") expects the memory context to be present;
rename the test function to reflect that zero budget still produces output
(e.g., new_enforces_minimum_limit_and_zero_budget_still_outputs_context) so the
name matches the behavior verified by load_context and the assert on "[Memory
context]".

Comment on lines +226 to +227
sleep(Duration::from_millis(20)).await;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace fixed sleeps with condition-based waits in async tests.

Using sleep(Duration::from_millis(20)) creates timing-dependent assertions and can intermittently fail in slower CI runs.

🛠️ Suggested deterministic pattern
-    use tokio::time::{sleep, Duration};
+    use tokio::time::{timeout, Duration};

@@
-        sleep(Duration::from_millis(20)).await;
+        timeout(Duration::from_secs(1), async {
+            loop {
+                if seen.lock().await.iter().any(|event| matches!(
+                    event,
+                    DomainEvent::TriggerEvaluated { decision, external_id, .. }
+                        if decision == "drop" && external_id == "esc-1"
+                )) {
+                    break;
+                }
+                tokio::task::yield_now().await;
+            }
+        })
+        .await
+        .expect("timed out waiting for evaluated event");

@@
-        sleep(Duration::from_millis(20)).await;
+        timeout(Duration::from_secs(1), async {
+            loop {
+                if seen.lock().await.iter().any(|event| matches!(
+                    event,
+                    DomainEvent::TriggerEvaluated { decision, external_id, .. }
+                        if decision == "acknowledge" && external_id == "esc-1"
+                )) {
+                    break;
+                }
+                tokio::task::yield_now().await;
+            }
+        })
+        .await
+        .expect("timed out waiting for evaluated event");

Also applies to: 261-262

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/escalation.rs` around lines 226 - 227, Replace the
timing-dependent sleep(Duration::from_millis(20)).await calls with
condition-based waits: instead of sleeping, poll the actual condition you care
about (e.g. a shared state, queue length, or an async flag) inside a timeout
using tokio::time::timeout and either a small awaitable backoff
(tokio::task::yield_now().await) or await a synchronization primitive
(tokio::sync::Notify, watch, or Barrier) that your code under test signals;
update the test(s) in escalation.rs that call sleep(...) to wait for the
concrete predicate (wrapped in timeout) so assertions become deterministic and
CI-safe.

publish_escalated(&envelope, "trigger_reactor");
publish_failed(&envelope, "boom");

sleep(Duration::from_millis(20)).await;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid fixed-delay waiting in the event-bus test.

Line 89 uses a fixed 20ms sleep, which can race with scheduler timing and intermittently fail on busy runners. Prefer a timeout + predicate loop on captured event count/content.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/events.rs` at line 89, Replace the brittle fixed
wait call sleep(Duration::from_millis(20)).await with a bounded wait-until loop
that polls the event capture predicate and fails on timeout: use
tokio::time::timeout (or Instant+deadline) and repeatedly check the captured
events/count/content in a small sleep loop until the predicate is met or the
timeout elapses, then assert the expected events; this removes the race and
makes the test deterministic (locate the exact call
sleep(Duration::from_millis(20)).await in the test and replace it with the
timeout+predicate loop around the captured events).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/openhuman/agent/memory_loader.rs (1)

226-230: ⚠️ Potential issue | 🟡 Minor

Rename this test to match the asserted behavior.

Line 226 says “zero budget disables output”, but Line 229 asserts output is present. This makes the test intent misleading.

Proposed rename
-    async fn new_enforces_minimum_limit_and_zero_budget_disables_output() {
+    async fn new_enforces_minimum_limit_and_zero_budget_is_unlimited() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/memory_loader.rs` around lines 226 - 230, The test
function new_enforces_minimum_limit_and_zero_budget_disables_output is misnamed
because it asserts that output is present; rename it to reflect the asserted
behavior (e.g., new_enforces_minimum_limit_and_zero_budget_allows_output) and
keep the existing test body that constructs DefaultMemoryLoader::new(0,
0.4).with_max_chars(0), calls load_context(&MockMemory, "hello"), and asserts
context.contains("[Memory context]").
src/openhuman/agent/triage/escalation.rs (1)

187-187: ⚠️ Potential issue | 🟡 Minor

Replace fixed sleeps with condition-based waits

These sleeps are timing-dependent and can intermittently fail in CI under load. Please wait on a concrete predicate under timeout instead.

🛠️ Deterministic wait pattern
-    use tokio::time::{sleep, Duration};
+    use tokio::time::{timeout, Duration};

@@
-        sleep(Duration::from_millis(20)).await;
+        timeout(Duration::from_secs(1), async {
+            loop {
+                if seen.lock().await.iter().any(|event| matches!(
+                    event,
+                    DomainEvent::TriggerEvaluated { decision, external_id, .. }
+                        if decision == "drop" && external_id == "esc-1"
+                )) {
+                    break;
+                }
+                tokio::task::yield_now().await;
+            }
+        })
+        .await
+        .expect("timed out waiting for evaluated event");

Also applies to: 243-243, 278-278, 321-321, 362-362

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/escalation.rs` at line 187, The code imports and
uses fixed sleeps (tokio::time::sleep, Duration) in escalation.rs which makes
tests flaky; replace those sleeps with a condition-based wait under a timeout.
Add a small helper (e.g., wait_for_condition or wait_for_predicate) that accepts
an async predicate, a total timeout, and a poll interval, and implement it using
tokio::time::timeout(...) around a loop that checks the predicate and awaits a
short sleep between checks; then replace direct sleep(...) calls in the
functions referencing escalation logic (look for uses of sleep and Duration in
escalation.rs) with calls to this helper so tests wait for the concrete
predicate rather than a fixed delay. Ensure the helper returns a clear
Result/timeout error so callers can handle failures.
🧹 Nitpick comments (2)
src/openhuman/agent/harness/session/turn.rs (2)

1171-1179: Guard scripted provider queue exhaustion to avoid panic-based test failures.

remove(0) panics when the scripted response queue is empty, which makes failures less diagnosable than returning a typed error.

Proposed fix
         async fn chat(
             &self,
             request: ChatRequest<'_>,
             _model: &str,
             _temperature: f64,
         ) -> Result<ChatResponse> {
             self.requests.lock().await.push(request.messages.to_vec());
-            self.responses.lock().await.remove(0)
+            let mut responses = self.responses.lock().await;
+            if responses.is_empty() {
+                return Err(anyhow::anyhow!(
+                    "SequenceProvider response script exhausted"
+                ));
+            }
+            responses.remove(0)
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/turn.rs` around lines 1171 - 1179, The
chat method currently does self.responses.lock().await.remove(0) which will
panic if the scripted response queue is empty; change it to guard against empty
responses and return a typed error instead. Inside chat (function name chat,
types ChatRequest/ChatResponse, fields requests/responses), after pushing the
request, acquire the responses lock and check if responses.is_empty(); if empty,
return an Err with a clear error (e.g., an anyhow/eyre or crate-specific error)
indicating the scripted response queue is exhausted; otherwise pop or remove the
first element and return it as Ok. Ensure no unwrap/remove(0) is left to avoid
panic-based test failures.

1124-1124: Use a bounded wait instead of fixed sleep for async event assertions.

The fixed 20ms delay can be flaky under load. Prefer waiting until expected events arrive (with timeout).

Proposed fix
-    use tokio::time::{sleep, timeout, Duration};
+    use tokio::time::{timeout, Duration};
@@
-        sleep(Duration::from_millis(20)).await;
+        timeout(Duration::from_secs(1), async {
+            loop {
+                if events.lock().await.len() >= 2 {
+                    break;
+                }
+                tokio::task::yield_now().await;
+            }
+        })
+        .await
+        .expect("expected ToolExecutionStarted and ToolExecutionCompleted events");
         let captured = events.lock().await;

Also applies to: 1453-1455

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/harness/session/turn.rs` at line 1124, Replace the fixed
sleep(Duration::from_millis(20)).await used to wait for async events with a
bounded wait that uses tokio::time::timeout and actively awaits the expected
event or condition; specifically, where you currently call sleep(...) use
timeout(Duration::from_millis(<sane-ms>), async { loop { if
check_event_or_stream_has_expected_item() { break } yield to the executor (e.g.,
tokio::task::yield_now().await) } }).await and handle the Err(timeout) case as a
test failure; apply this change for all occurrences that import or use sleep,
timeout, Duration (including the spots referenced by the review: the usage at
the import line and the other occurrences around the second instance).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/openhuman/agent/triage/escalation.rs`:
- Around line 333-335: The test is pattern-matching a non-existent field `error`
on DomainEvent::TriggerEscalationFailed; update the match arms to use the actual
field name `reason` instead (e.g. DomainEvent::TriggerEscalationFailed {
external_id, reason } if external_id == "esc-1" &&
reason.contains("missing-agent")), and apply the same change for the other
occurrence where TriggerEscalationFailed is matched.
- Around line 189-197: The tests share a fixed external_id ("esc-1" and
similarly "esc-ack", "esc-react-fail", "esc-escalate-fail") causing cross-test
collisions; change the envelope() helper(s) to generate a unique external_id per
invocation (for example by appending a short random value or UUID) when calling
TriggerEnvelope::from_composio so each test publishes/asserts against an
isolated id; apply the same change to the other envelope helpers used in the
tests (the ones that currently return "esc-ack", "esc-react-fail",
"esc-escalate-fail") to prevent parallel-test interference.

---

Duplicate comments:
In `@src/openhuman/agent/memory_loader.rs`:
- Around line 226-230: The test function
new_enforces_minimum_limit_and_zero_budget_disables_output is misnamed because
it asserts that output is present; rename it to reflect the asserted behavior
(e.g., new_enforces_minimum_limit_and_zero_budget_allows_output) and keep the
existing test body that constructs DefaultMemoryLoader::new(0,
0.4).with_max_chars(0), calls load_context(&MockMemory, "hello"), and asserts
context.contains("[Memory context]").

In `@src/openhuman/agent/triage/escalation.rs`:
- Line 187: The code imports and uses fixed sleeps (tokio::time::sleep,
Duration) in escalation.rs which makes tests flaky; replace those sleeps with a
condition-based wait under a timeout. Add a small helper (e.g.,
wait_for_condition or wait_for_predicate) that accepts an async predicate, a
total timeout, and a poll interval, and implement it using
tokio::time::timeout(...) around a loop that checks the predicate and awaits a
short sleep between checks; then replace direct sleep(...) calls in the
functions referencing escalation logic (look for uses of sleep and Duration in
escalation.rs) with calls to this helper so tests wait for the concrete
predicate rather than a fixed delay. Ensure the helper returns a clear
Result/timeout error so callers can handle failures.

---

Nitpick comments:
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 1171-1179: The chat method currently does
self.responses.lock().await.remove(0) which will panic if the scripted response
queue is empty; change it to guard against empty responses and return a typed
error instead. Inside chat (function name chat, types ChatRequest/ChatResponse,
fields requests/responses), after pushing the request, acquire the responses
lock and check if responses.is_empty(); if empty, return an Err with a clear
error (e.g., an anyhow/eyre or crate-specific error) indicating the scripted
response queue is exhausted; otherwise pop or remove the first element and
return it as Ok. Ensure no unwrap/remove(0) is left to avoid panic-based test
failures.
- Line 1124: Replace the fixed sleep(Duration::from_millis(20)).await used to
wait for async events with a bounded wait that uses tokio::time::timeout and
actively awaits the expected event or condition; specifically, where you
currently call sleep(...) use timeout(Duration::from_millis(<sane-ms>), async {
loop { if check_event_or_stream_has_expected_item() { break } yield to the
executor (e.g., tokio::task::yield_now().await) } }).await and handle the
Err(timeout) case as a test failure; apply this change for all occurrences that
import or use sleep, timeout, Duration (including the spots referenced by the
review: the usage at the import line and the other occurrences around the second
instance).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4aa53ed4-7cf4-45ec-ae86-16bf485770ff

📥 Commits

Reviewing files that changed from the base of the PR and between f9696c4 and 677510c.

📒 Files selected for processing (12)
  • src/openhuman/agent/harness/definition.rs
  • src/openhuman/agent/harness/interrupt.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/session/builder.rs
  • src/openhuman/agent/harness/session/runtime.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/subagent_runner.rs
  • src/openhuman/agent/memory_loader.rs
  • src/openhuman/agent/pformat.rs
  • src/openhuman/agent/triage/escalation.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/agent/triage/mod.rs
✅ Files skipped from review due to trivial changes (7)
  • src/openhuman/agent/harness/interrupt.rs
  • src/openhuman/agent/triage/mod.rs
  • src/openhuman/agent/pformat.rs
  • src/openhuman/agent/harness/mod.rs
  • src/openhuman/agent/harness/definition.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/agent/harness/session/builder.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/agent/harness/session/runtime.rs

Comment on lines +189 to +197
fn envelope() -> TriggerEnvelope {
TriggerEnvelope::from_composio(
"gmail",
"GMAIL_NEW_GMAIL_MESSAGE",
"triage-escalation",
"esc-1",
json!({ "subject": "hello" }),
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid cross-test event collisions from shared external_id

All tests publish/assert on the same external_id ("esc-1") against a global event bus. With parallel test execution, events from one test can satisfy/fail assertions in another.

🛠️ Proposed isolation fix
-    fn envelope() -> TriggerEnvelope {
+    fn envelope(external_id: &str) -> TriggerEnvelope {
         TriggerEnvelope::from_composio(
             "gmail",
             "GMAIL_NEW_GMAIL_MESSAGE",
             "triage-escalation",
-            "esc-1",
+            external_id,
             json!({ "subject": "hello" }),
         )
     }
@@
-        apply_decision(run(TriageAction::Drop), &envelope())
+        let external_id = "esc-drop";
+        apply_decision(run(TriageAction::Drop), &envelope(external_id))
@@
-            } if decision == "drop" && external_id == "esc-1"
+            } if decision == "drop" && external_id == external_id

(Apply the same pattern to the other tests: esc-ack, esc-react-fail, esc-escalate-fail.)

Also applies to: 240-240, 275-275, 313-316, 354-357

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/escalation.rs` around lines 189 - 197, The tests
share a fixed external_id ("esc-1" and similarly "esc-ack", "esc-react-fail",
"esc-escalate-fail") causing cross-test collisions; change the envelope()
helper(s) to generate a unique external_id per invocation (for example by
appending a short random value or UUID) when calling
TriggerEnvelope::from_composio so each test publishes/asserts against an
isolated id; apply the same change to the other envelope helpers used in the
tests (the ones that currently return "esc-ack", "esc-react-fail",
"esc-escalate-fail") to prevent parallel-test interference.

Comment on lines +333 to +335
DomainEvent::TriggerEscalationFailed { external_id, error }
if external_id == "esc-1" && error.contains("missing-agent")
)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix TriggerEscalationFailed field match (errorreason)

These assertions pattern-match a non-existent field and will fail to compile against the current DomainEvent::TriggerEscalationFailed shape.

🛠️ Proposed fix
-            DomainEvent::TriggerEscalationFailed { external_id, error }
-                if external_id == "esc-1" && error.contains("missing-agent")
+            DomainEvent::TriggerEscalationFailed {
+                external_id,
+                reason,
+                ..
+            } if external_id == "esc-1" && reason.contains("missing-agent")
-            DomainEvent::TriggerEscalationFailed { external_id, error }
-                if external_id == "esc-1" && error.contains("missing-agent")
+            DomainEvent::TriggerEscalationFailed {
+                external_id,
+                reason,
+                ..
+            } if external_id == "esc-1" && reason.contains("missing-agent")

Also applies to: 374-376

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/openhuman/agent/triage/escalation.rs` around lines 333 - 335, The test is
pattern-matching a non-existent field `error` on
DomainEvent::TriggerEscalationFailed; update the match arms to use the actual
field name `reason` instead (e.g. DomainEvent::TriggerEscalationFailed {
external_id, reason } if external_id == "esc-1" &&
reason.contains("missing-agent")), and apply the same change for the other
occurrence where TriggerEscalationFailed is matched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant