test: expand agent harness coverage#513
test: expand agent harness coverage#513senamakel wants to merge 12 commits intotinyhumansai:mainfrom
Conversation
- Introduced a suite of tests for the `fire_hooks` function, ensuring that all hooks are dispatched even if one fails, enhancing reliability in the hook execution flow. - Added tests for the `sanitize_tool_output` function to validate the correct mapping of success and failure messages for various tool outputs. - Implemented a mock memory structure to facilitate testing of memory context building, ensuring that working memory is prioritized and deduplication occurs correctly. - Enhanced the `build_memory_context` function tests to verify filtering and truncation of memory entries, improving the robustness of memory management in the system. - Overall, these tests aim to strengthen the codebase by ensuring that critical functionalities related to hooks and memory context are thoroughly validated.
…mpt options - Updated the `provider_alias_and_route_selection_round_trip` test to dynamically resolve the first provider from the registry, ensuring accurate alias resolution. - Added new tests for `DumpPromptOptions` and `ComposioStubTools`, validating default settings and expected tool names. - Introduced tests for rendering main agent dumps, ensuring tool instructions and skill counts are correctly included in the output. - Enhanced prompt handling tests to cover cache boundary extraction and subagent render options, improving overall test coverage and reliability.
- Added new tests for `AgentError` variants to validate string formatting and error recovery from `anyhow`. - Improved formatting consistency in existing tests for better readability. - Enhanced the `sanitize_tool_output` test to ensure accurate mapping of success and failure messages. - Updated memory loader tests to enforce minimum limits and budget constraints, ensuring robust memory management. - Overall, these changes aim to strengthen the test suite by improving coverage and clarity in error handling scenarios.
…ering - Introduced new tests to validate the filtering of tools based on named scopes and disallowed tools, ensuring correct tool selection in debug dumps. - Added tests for rendering subagent dumps, including handling file prompt fallbacks and missing files without panicking. - Enhanced the workspace prompt handling to prefer custom prompt locations, improving the flexibility and reliability of agent prompt rendering. - Overall, these additions strengthen the test suite by covering critical functionalities related to tool filtering and prompt rendering.
…modal helpers - Introduced new tests for the memory loader to validate behavior when the header exceeds budget and when recall fails, ensuring robust memory management. - Added tests for multimodal helpers, covering image marker counting, payload extraction, and MIME type normalization, enhancing the reliability of multimodal interactions. - Overall, these additions strengthen the test suite by improving coverage and ensuring correct functionality in memory handling and multimodal processing.
- Introduced new tests for the `run_tool_call_loop` function, validating the rejection of vision markers for non-vision providers and ensuring correct streaming of final text chunks. - Added tests to verify that CLI-only tools are blocked in prompt mode and that native tool results are persisted as tool messages. - Enhanced the `Agent` class with tests for error handling and event publishing during single runs, ensuring robust agent behavior in various scenarios. - Overall, these additions strengthen the test suite by improving coverage and reliability in tool execution and agent interactions.
- Added new tests for the `run_tool_call_loop` function, including scenarios for auto-approving supervised tools on non-CLI channels and handling unknown tools with default max iterations. - Introduced `ErrorResultTool` and `FailingTool` to simulate error conditions during tool execution, improving coverage of error handling in the agent. - Updated the `ScriptedProvider` to return results wrapped in `anyhow::Result`, ensuring consistent error handling across test cases. - Overall, these enhancements strengthen the test suite by validating tool execution paths and agent behavior under various conditions.
- Enhanced formatting in various test cases for better clarity and consistency, including the use of line breaks and indentation. - Updated assertions to improve readability by aligning them with Rust's idiomatic style. - Overall, these changes aim to strengthen the test suite by making it more maintainable and easier to understand.
- Added comprehensive documentation to the `mod.rs` file, detailing the agent domain's purpose, key components, and their functionalities. - Improved clarity on how LLMs interact with the system, manage conversation history, and handle autonomous behaviors. - This update aims to enhance understanding and maintainability of the agent domain within the OpenHuman project.
- Enhanced comments and documentation in `dispatcher.rs`, `error.rs`, `hooks.rs`, and `harness/mod.rs` for better clarity and consistency. - Adjusted formatting in the `TurnContext` and `ToolCallRecord` structs to improve readability and understanding of their purpose and functionality. - Overall, these changes aim to strengthen the documentation and maintainability of the agent domain within the OpenHuman project.
📝 WalkthroughWalkthroughAdds extensive test coverage across agent, channels, context, triage, learning, and runtime modules; expands dispatcher ToolDispatcher interface and tool-call metadata (tool_call_id); augments TurnContext/ToolCallRecord with learning fields; and updates documentation for agent bus, harness, and subagent semantics. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…vior - Introduced new tests for the `DefaultMemoryLoader`, validating error propagation during primary recall failures and ensuring correct context emission when working memory is present. - Added tests to verify behavior when the working memory section exceeds budget constraints, enhancing memory management robustness. - Overall, these additions strengthen the test suite by improving coverage and reliability in memory handling and agent interactions.
…tests - Reformatted the `load_context` calls in memory loader tests for better readability by chaining method calls. - Enhanced documentation comments in the `pformat.rs` file to clarify the purpose and functionality of functions. - Improved consistency in spacing and formatting across various sections of the codebase, including the `interrupt.rs` and `builder.rs` files. - Overall, these changes aim to enhance code clarity and maintainability within the test suite and related modules.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/openhuman/agent/bus.rs (1)
361-373:⚠️ Potential issue | 🟠 MajorGuard global-handler registration test with
BUS_HANDLER_LOCKto avoid race conditions.Line 367 calls
register_agent_handlers()directly without acquiring the lock used by other bus tests (e.g.,mock_agent_run_turn()). This can race with tests that install handler stubs, causing nondeterministic test outcomes.Suggested fix
#[tokio::test] async fn register_agent_handlers_exposes_run_turn_on_global_registry() { // Read-only smoke test: prove the production registration path // actually puts `agent.run_turn` on the global registry. Does // NOT dispatch — dispatching from this test would race with any // other test that installs a handler override (e.g. the channel // dispatch integration tests in `runtime_dispatch.rs`). - register_agent_handlers(); + let _guard = use_real_agent_handler().await; let registry = crate::core::event_bus::native_registry() .expect("native registry should be initialized after register_agent_handlers");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/bus.rs` around lines 361 - 373, Wrap the test's call to register_agent_handlers() with the same global test lock used elsewhere to prevent races: acquire BUS_HANDLER_LOCK at the start of the async test (before calling register_agent_handlers()) and hold it until after the registry assertion completes, then release it; this mirrors other tests like mock_agent_run_turn() and ensures the AGENT_RUN_TURN_METHOD registration check runs under the BUS_HANDLER_LOCK to avoid nondeterministic handler-install races.src/openhuman/agent/triage/routing.rs (1)
391-446:⚠️ Potential issue | 🟠 MajorSerialize cache-mutating tests with a lock to prevent race-induced flakiness at line 422.
The test at line 407 (
decide_with_cache_respects_ttl_window) primes the cache withCacheState::Degradedand expects it to be returned within the TTL window. However, line 422 accepts eitherDegradedorRemote, which masks a race where a concurrent test clears the cache between setup and assertion. Since tests run in parallel by default and access the shared staticDECISION_CACHE, the assertion should be strict:assert_eq!(state, CacheState::Degraded).Fix: Add a test-scoped
Mutexlock to serialize the three tokio tests that accessDECISION_CACHE:✅ Deterministic test fix
#[cfg(test)] mod tests { use super::*; + static CACHE_TEST_LOCK: tokio::sync::Mutex<()> = tokio::sync::Mutex::const_new(()); #[tokio::test] async fn mark_degraded_forces_remote_on_next_resolve() { + let _guard = CACHE_TEST_LOCK.lock().await; clear_cache().await; mark_degraded().await;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/triage/routing.rs` around lines 391 - 446, The three tests that mutate DECISION_CACHE (mark_degraded_forces_remote_on_next_resolve, decide_with_cache_respects_ttl_window, cache_snapshot_returns_none_when_empty_and_refreshes_expired_entries) can race; serialize them by introducing a test-scoped async mutex and acquiring it at the start of each test to hold for the test duration, and tighten the assertion in decide_with_cache_respects_ttl_window to assert_eq!(state, CacheState::Degraded) (replace the current permissive matches! check). Use a single shared symbol like TEST_DECISION_CACHE_LOCK (a tokio::sync::Mutex<()> created with once_cell::sync::Lazy or once_cell::sync::OnceLock) and in each test do let _guard = TEST_DECISION_CACHE_LOCK.lock().await before calling clear_cache / priming DECISION_CACHE / decide_with_cache so the cache setup and assertion cannot be interleaved by other tests.
🧹 Nitpick comments (6)
src/openhuman/channels/commands.rs (1)
307-308: Simplify redundant default initialization in test setup.
Config::default()already provides defaultchannels_config, so the explicit reassignment can be removed.Proposed simplification
- let mut config = Config::default(); - config.channels_config = crate::openhuman::config::ChannelsConfig::default(); + let config = Config::default(); doctor_channels(config).await.unwrap();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/channels/commands.rs` around lines 307 - 308, The test creates a Config via Config::default() then immediately overwrites config.channels_config with ChannelsConfig::default(), which is redundant because Config::default() already sets channels_config; remove the explicit reassignment line (the assignment to config.channels_config = crate::openhuman::config::ChannelsConfig::default()) and rely on Config::default() to provide the default ChannelsConfig in the test setup for functions using Config.src/openhuman/agent/multimodal.rs (2)
629-632: Use a tempdir-derived missing path for platform-safe determinism.Line 629 currently uses a hardcoded absolute path; generating a missing path from
tempfile::tempdir()is more reliable across CI environments.Refactor suggestion
- let err = normalize_local_image("/definitely/missing.png", 1024) + let temp = tempfile::tempdir().unwrap(); + let missing = temp.path().join("missing.png"); + let err = normalize_local_image(missing.to_string_lossy().as_ref(), 1024) .await .expect_err("missing local file should fail");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/multimodal.rs` around lines 629 - 632, The test uses a hardcoded absolute path; replace it with a path derived from a tempfile::tempdir() so the "missing" file is platform-safe and deterministic: create a TempDir via tempfile::tempdir(), join a non-existent filename (e.g., temp_dir.path().join("definitely_missing.png")) and pass that path to normalize_local_image(...) in the test; keep the expect_err assertion and the check that err.to_string() contains "not found or unreadable" so the intent of normalize_local_image(...) is preserved.
612-614: Avoid exact full error-string equality in tests.Line 612-Line 614 asserts the entire formatted error text, which is brittle to harmless wording changes.
Refactor suggestion
- assert_eq!( - validate_mime("x", "text/plain").unwrap_err().to_string(), - "multimodal image MIME type is not allowed for 'x': text/plain" - ); + let err = validate_mime("x", "text/plain").unwrap_err().to_string(); + assert!(err.contains("MIME type is not allowed")); + assert!(err.contains("text/plain"));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/multimodal.rs` around lines 612 - 614, The test currently asserts exact equality of the formatted error string from validate_mime("x", "text/plain"), which is brittle; change the assertion to check for a stable substring or error kind instead — for example call let err = validate_mime("x", "text/plain").unwrap_err().to_string(); then assert!(err.contains("multimodal image MIME type is not allowed") && err.contains("'x'")) or, if the error type is exposed, match on the error variant from validate_mime instead of comparing the full Display output; this keeps the test robust while still verifying the expected failure.src/openhuman/context/debug_dump.rs (1)
754-785: Consider using RAII for test workspace cleanup.Several tests create temporary workspaces but use
let _ = std::fs::remove_dir_all(workspace);for cleanup, which won't execute if the test panics. Consider using thetempfilecrate'sTempDirfor automatic cleanup, or at minimum document that manual cleanup may leave directories on panic.♻️ Example with tempfile crate
+use tempfile::TempDir; + #[test] fn render_main_agent_dump_includes_tool_instructions_and_skill_count() { - let workspace = - std::env::temp_dir().join(format!("openhuman_debug_main_{}", uuid::Uuid::new_v4())); - std::fs::create_dir_all(&workspace).unwrap(); + let temp_dir = TempDir::new().unwrap(); + let workspace = temp_dir.path(); std::fs::write(workspace.join("SOUL.md"), "# Soul\nContext").unwrap(); // ... rest of test - let _ = std::fs::remove_dir_all(workspace); + // TempDir automatically cleans up on drop }Also applies to: 817-858, 860-902, 904-961
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/context/debug_dump.rs` around lines 754 - 785, Tests like render_main_agent_dump_includes_tool_instructions_and_skill_count create temporary workspace dirs and call std::fs::remove_dir_all at the end, which won't run on panic; update these tests to use tempfile::TempDir (or similar RAII temp-dir helper) so the directory is created via TempDir::new() and files are written to temp_dir.path(), removing the manual std::fs::remove_dir_all calls; apply this change to the test above and the other tests referencing the same pattern (those that call render_main_agent_dump, use StubTool and ToolCategory, and manually remove the workspace).src/openhuman/agent/bus.rs (1)
30-41: Remove duplicated rustdoc block forAgentTurnRequest.The same doc text appears twice consecutively, which adds noise and makes maintenance harder.
✂️ Suggested cleanup
-/// Full owned payload for a single agentic turn executed through the bus. -/// -/// All fields are either owned values, [`Arc`]s, or channel handles — the -/// bus carries them by value without touching serialization. Consumers can -/// therefore pass trait objects (`Arc<dyn Provider>`, tool trait-object -/// registries) and streaming senders (`on_delta`) through unchanged. /// Full owned payload for a single agentic turn executed through the bus. /// /// All fields are either owned values, [`Arc`]s, or channel handles — the /// bus carries them by value without touching serialization. Consumers can /// therefore pass trait objects (`Arc<dyn Provider>`, tool trait-object /// registries) and streaming senders (`on_delta`) through unchanged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/bus.rs` around lines 30 - 41, The rustdoc comment for AgentTurnRequest is duplicated; remove the redundant copy so only a single doc block precedes the AgentTurnRequest definition, ensuring the remaining doc text is unchanged and whitespace/formatting around the struct/type declaration is preserved.src/openhuman/agent/harness/session/runtime.rs (1)
491-492: Avoid fixed sleep for async event assertions.At Line 491, a fixed 20ms delay can be flaky under CI load. Prefer waiting on a bounded condition (
timeout+ loop) before asserting captured events.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/runtime.rs` around lines 491 - 492, Replace the fixed sleep+lock with a bounded wait loop using tokio::time::timeout: repeatedly acquire the events Mutex (events.lock().await) and check the desired condition on captured (e.g., non-empty or expected length) until it is satisfied or the timeout elapses, returning a test failure or panic when timeout occurs; use tokio::time::Duration for the overall timeout and a short async yield (or await on the Mutex again) inside the loop to avoid busy-waiting so the assertion is robust under CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/harness/mod.rs`:
- Around line 2-5: Normalize the module-level doc comment spacing in the harness
header: remove the empty doc-comment line after the first //! and ensure each
leading //! has a single space then the sentence (no trailing spaces), and apply
the same fix to the duplicated header region (lines referenced in the review as
the block and lines 13-14); update the module-level doc comment in
src/openhuman/agent/harness/mod.rs so all //! lines are consistently formatted
to satisfy Prettier --check.
In `@src/openhuman/agent/memory_loader.rs`:
- Around line 226-230: The test name
new_enforces_minimum_limit_and_zero_budget_disables_output is misleading because
the assertion on DefaultMemoryLoader::new(0, 0.4).with_max_chars(0) and
load_context(&MockMemory, "hello") expects the memory context to be present;
rename the test function to reflect that zero budget still produces output
(e.g., new_enforces_minimum_limit_and_zero_budget_still_outputs_context) so the
name matches the behavior verified by load_context and the assert on "[Memory
context]".
In `@src/openhuman/agent/triage/escalation.rs`:
- Around line 226-227: Replace the timing-dependent
sleep(Duration::from_millis(20)).await calls with condition-based waits: instead
of sleeping, poll the actual condition you care about (e.g. a shared state,
queue length, or an async flag) inside a timeout using tokio::time::timeout and
either a small awaitable backoff (tokio::task::yield_now().await) or await a
synchronization primitive (tokio::sync::Notify, watch, or Barrier) that your
code under test signals; update the test(s) in escalation.rs that call
sleep(...) to wait for the concrete predicate (wrapped in timeout) so assertions
become deterministic and CI-safe.
In `@src/openhuman/agent/triage/events.rs`:
- Line 89: Replace the brittle fixed wait call
sleep(Duration::from_millis(20)).await with a bounded wait-until loop that polls
the event capture predicate and fails on timeout: use tokio::time::timeout (or
Instant+deadline) and repeatedly check the captured events/count/content in a
small sleep loop until the predicate is met or the timeout elapses, then assert
the expected events; this removes the race and makes the test deterministic
(locate the exact call sleep(Duration::from_millis(20)).await in the test and
replace it with the timeout+predicate loop around the captured events).
---
Outside diff comments:
In `@src/openhuman/agent/bus.rs`:
- Around line 361-373: Wrap the test's call to register_agent_handlers() with
the same global test lock used elsewhere to prevent races: acquire
BUS_HANDLER_LOCK at the start of the async test (before calling
register_agent_handlers()) and hold it until after the registry assertion
completes, then release it; this mirrors other tests like mock_agent_run_turn()
and ensures the AGENT_RUN_TURN_METHOD registration check runs under the
BUS_HANDLER_LOCK to avoid nondeterministic handler-install races.
In `@src/openhuman/agent/triage/routing.rs`:
- Around line 391-446: The three tests that mutate DECISION_CACHE
(mark_degraded_forces_remote_on_next_resolve,
decide_with_cache_respects_ttl_window,
cache_snapshot_returns_none_when_empty_and_refreshes_expired_entries) can race;
serialize them by introducing a test-scoped async mutex and acquiring it at the
start of each test to hold for the test duration, and tighten the assertion in
decide_with_cache_respects_ttl_window to assert_eq!(state, CacheState::Degraded)
(replace the current permissive matches! check). Use a single shared symbol like
TEST_DECISION_CACHE_LOCK (a tokio::sync::Mutex<()> created with
once_cell::sync::Lazy or once_cell::sync::OnceLock) and in each test do let
_guard = TEST_DECISION_CACHE_LOCK.lock().await before calling clear_cache /
priming DECISION_CACHE / decide_with_cache so the cache setup and assertion
cannot be interleaved by other tests.
---
Nitpick comments:
In `@src/openhuman/agent/bus.rs`:
- Around line 30-41: The rustdoc comment for AgentTurnRequest is duplicated;
remove the redundant copy so only a single doc block precedes the
AgentTurnRequest definition, ensuring the remaining doc text is unchanged and
whitespace/formatting around the struct/type declaration is preserved.
In `@src/openhuman/agent/harness/session/runtime.rs`:
- Around line 491-492: Replace the fixed sleep+lock with a bounded wait loop
using tokio::time::timeout: repeatedly acquire the events Mutex
(events.lock().await) and check the desired condition on captured (e.g.,
non-empty or expected length) until it is satisfied or the timeout elapses,
returning a test failure or panic when timeout occurs; use tokio::time::Duration
for the overall timeout and a short async yield (or await on the Mutex again)
inside the loop to avoid busy-waiting so the assertion is robust under CI.
In `@src/openhuman/agent/multimodal.rs`:
- Around line 629-632: The test uses a hardcoded absolute path; replace it with
a path derived from a tempfile::tempdir() so the "missing" file is platform-safe
and deterministic: create a TempDir via tempfile::tempdir(), join a non-existent
filename (e.g., temp_dir.path().join("definitely_missing.png")) and pass that
path to normalize_local_image(...) in the test; keep the expect_err assertion
and the check that err.to_string() contains "not found or unreadable" so the
intent of normalize_local_image(...) is preserved.
- Around line 612-614: The test currently asserts exact equality of the
formatted error string from validate_mime("x", "text/plain"), which is brittle;
change the assertion to check for a stable substring or error kind instead — for
example call let err = validate_mime("x",
"text/plain").unwrap_err().to_string(); then assert!(err.contains("multimodal
image MIME type is not allowed") && err.contains("'x'")) or, if the error type
is exposed, match on the error variant from validate_mime instead of comparing
the full Display output; this keeps the test robust while still verifying the
expected failure.
In `@src/openhuman/channels/commands.rs`:
- Around line 307-308: The test creates a Config via Config::default() then
immediately overwrites config.channels_config with ChannelsConfig::default(),
which is redundant because Config::default() already sets channels_config;
remove the explicit reassignment line (the assignment to config.channels_config
= crate::openhuman::config::ChannelsConfig::default()) and rely on
Config::default() to provide the default ChannelsConfig in the test setup for
functions using Config.
In `@src/openhuman/context/debug_dump.rs`:
- Around line 754-785: Tests like
render_main_agent_dump_includes_tool_instructions_and_skill_count create
temporary workspace dirs and call std::fs::remove_dir_all at the end, which
won't run on panic; update these tests to use tempfile::TempDir (or similar RAII
temp-dir helper) so the directory is created via TempDir::new() and files are
written to temp_dir.path(), removing the manual std::fs::remove_dir_all calls;
apply this change to the test above and the other tests referencing the same
pattern (those that call render_main_agent_dump, use StubTool and ToolCategory,
and manually remove the workspace).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eee59033-9afc-445a-b18d-26dbe8420cc6
📒 Files selected for processing (28)
src/openhuman/agent/bus.rssrc/openhuman/agent/dispatcher.rssrc/openhuman/agent/error.rssrc/openhuman/agent/harness/memory_context.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/session/runtime.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/session/types.rssrc/openhuman/agent/harness/tool_loop.rssrc/openhuman/agent/hooks.rssrc/openhuman/agent/host_runtime.rssrc/openhuman/agent/memory_loader.rssrc/openhuman/agent/mod.rssrc/openhuman/agent/multimodal.rssrc/openhuman/agent/schemas.rssrc/openhuman/agent/triage/escalation.rssrc/openhuman/agent/triage/events.rssrc/openhuman/agent/triage/routing.rssrc/openhuman/channels/bus.rssrc/openhuman/channels/commands.rssrc/openhuman/channels/context.rssrc/openhuman/channels/routes.rssrc/openhuman/context/debug_dump.rssrc/openhuman/context/prompt.rssrc/openhuman/learning/prompt_sections.rssrc/openhuman/learning/reflection.rssrc/openhuman/learning/tool_tracker.rssrc/openhuman/learning/user_profile.rs
| async fn new_enforces_minimum_limit_and_zero_budget_disables_output() { | ||
| let loader = DefaultMemoryLoader::new(0, 0.4).with_max_chars(0); | ||
| let context = loader.load_context(&MockMemory, "hello").await.unwrap(); | ||
| assert!(context.contains("[Memory context]")); | ||
| } |
There was a problem hiding this comment.
Test name contradicts asserted behavior.
At Line 226, the name says “zero budget disables output”, but Line 229 asserts memory context is present. Please rename to match actual semantics and avoid confusion.
Suggested rename
- async fn new_enforces_minimum_limit_and_zero_budget_disables_output() {
+ async fn new_enforces_minimum_limit_and_zero_budget_is_unlimited() {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async fn new_enforces_minimum_limit_and_zero_budget_disables_output() { | |
| let loader = DefaultMemoryLoader::new(0, 0.4).with_max_chars(0); | |
| let context = loader.load_context(&MockMemory, "hello").await.unwrap(); | |
| assert!(context.contains("[Memory context]")); | |
| } | |
| async fn new_enforces_minimum_limit_and_zero_budget_is_unlimited() { | |
| let loader = DefaultMemoryLoader::new(0, 0.4).with_max_chars(0); | |
| let context = loader.load_context(&MockMemory, "hello").await.unwrap(); | |
| assert!(context.contains("[Memory context]")); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/memory_loader.rs` around lines 226 - 230, The test name
new_enforces_minimum_limit_and_zero_budget_disables_output is misleading because
the assertion on DefaultMemoryLoader::new(0, 0.4).with_max_chars(0) and
load_context(&MockMemory, "hello") expects the memory context to be present;
rename the test function to reflect that zero budget still produces output
(e.g., new_enforces_minimum_limit_and_zero_budget_still_outputs_context) so the
name matches the behavior verified by load_context and the assert on "[Memory
context]".
| sleep(Duration::from_millis(20)).await; | ||
|
|
There was a problem hiding this comment.
Replace fixed sleeps with condition-based waits in async tests.
Using sleep(Duration::from_millis(20)) creates timing-dependent assertions and can intermittently fail in slower CI runs.
🛠️ Suggested deterministic pattern
- use tokio::time::{sleep, Duration};
+ use tokio::time::{timeout, Duration};
@@
- sleep(Duration::from_millis(20)).await;
+ timeout(Duration::from_secs(1), async {
+ loop {
+ if seen.lock().await.iter().any(|event| matches!(
+ event,
+ DomainEvent::TriggerEvaluated { decision, external_id, .. }
+ if decision == "drop" && external_id == "esc-1"
+ )) {
+ break;
+ }
+ tokio::task::yield_now().await;
+ }
+ })
+ .await
+ .expect("timed out waiting for evaluated event");
@@
- sleep(Duration::from_millis(20)).await;
+ timeout(Duration::from_secs(1), async {
+ loop {
+ if seen.lock().await.iter().any(|event| matches!(
+ event,
+ DomainEvent::TriggerEvaluated { decision, external_id, .. }
+ if decision == "acknowledge" && external_id == "esc-1"
+ )) {
+ break;
+ }
+ tokio::task::yield_now().await;
+ }
+ })
+ .await
+ .expect("timed out waiting for evaluated event");Also applies to: 261-262
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/triage/escalation.rs` around lines 226 - 227, Replace the
timing-dependent sleep(Duration::from_millis(20)).await calls with
condition-based waits: instead of sleeping, poll the actual condition you care
about (e.g. a shared state, queue length, or an async flag) inside a timeout
using tokio::time::timeout and either a small awaitable backoff
(tokio::task::yield_now().await) or await a synchronization primitive
(tokio::sync::Notify, watch, or Barrier) that your code under test signals;
update the test(s) in escalation.rs that call sleep(...) to wait for the
concrete predicate (wrapped in timeout) so assertions become deterministic and
CI-safe.
| publish_escalated(&envelope, "trigger_reactor"); | ||
| publish_failed(&envelope, "boom"); | ||
|
|
||
| sleep(Duration::from_millis(20)).await; |
There was a problem hiding this comment.
Avoid fixed-delay waiting in the event-bus test.
Line 89 uses a fixed 20ms sleep, which can race with scheduler timing and intermittently fail on busy runners. Prefer a timeout + predicate loop on captured event count/content.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/triage/events.rs` at line 89, Replace the brittle fixed
wait call sleep(Duration::from_millis(20)).await with a bounded wait-until loop
that polls the event capture predicate and fails on timeout: use
tokio::time::timeout (or Instant+deadline) and repeatedly check the captured
events/count/content in a small sleep loop until the predicate is met or the
timeout elapses, then assert the expected events; this removes the race and
makes the test deterministic (locate the exact call
sleep(Duration::from_millis(20)).await in the test and replace it with the
timeout+predicate loop around the captured events).
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/openhuman/agent/memory_loader.rs (1)
226-230:⚠️ Potential issue | 🟡 MinorRename this test to match the asserted behavior.
Line 226 says “zero budget disables output”, but Line 229 asserts output is present. This makes the test intent misleading.
Proposed rename
- async fn new_enforces_minimum_limit_and_zero_budget_disables_output() { + async fn new_enforces_minimum_limit_and_zero_budget_is_unlimited() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/memory_loader.rs` around lines 226 - 230, The test function new_enforces_minimum_limit_and_zero_budget_disables_output is misnamed because it asserts that output is present; rename it to reflect the asserted behavior (e.g., new_enforces_minimum_limit_and_zero_budget_allows_output) and keep the existing test body that constructs DefaultMemoryLoader::new(0, 0.4).with_max_chars(0), calls load_context(&MockMemory, "hello"), and asserts context.contains("[Memory context]").src/openhuman/agent/triage/escalation.rs (1)
187-187:⚠️ Potential issue | 🟡 MinorReplace fixed sleeps with condition-based waits
These sleeps are timing-dependent and can intermittently fail in CI under load. Please wait on a concrete predicate under
timeoutinstead.🛠️ Deterministic wait pattern
- use tokio::time::{sleep, Duration}; + use tokio::time::{timeout, Duration}; @@ - sleep(Duration::from_millis(20)).await; + timeout(Duration::from_secs(1), async { + loop { + if seen.lock().await.iter().any(|event| matches!( + event, + DomainEvent::TriggerEvaluated { decision, external_id, .. } + if decision == "drop" && external_id == "esc-1" + )) { + break; + } + tokio::task::yield_now().await; + } + }) + .await + .expect("timed out waiting for evaluated event");Also applies to: 243-243, 278-278, 321-321, 362-362
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/triage/escalation.rs` at line 187, The code imports and uses fixed sleeps (tokio::time::sleep, Duration) in escalation.rs which makes tests flaky; replace those sleeps with a condition-based wait under a timeout. Add a small helper (e.g., wait_for_condition or wait_for_predicate) that accepts an async predicate, a total timeout, and a poll interval, and implement it using tokio::time::timeout(...) around a loop that checks the predicate and awaits a short sleep between checks; then replace direct sleep(...) calls in the functions referencing escalation logic (look for uses of sleep and Duration in escalation.rs) with calls to this helper so tests wait for the concrete predicate rather than a fixed delay. Ensure the helper returns a clear Result/timeout error so callers can handle failures.
🧹 Nitpick comments (2)
src/openhuman/agent/harness/session/turn.rs (2)
1171-1179: Guard scripted provider queue exhaustion to avoid panic-based test failures.
remove(0)panics when the scripted response queue is empty, which makes failures less diagnosable than returning a typed error.Proposed fix
async fn chat( &self, request: ChatRequest<'_>, _model: &str, _temperature: f64, ) -> Result<ChatResponse> { self.requests.lock().await.push(request.messages.to_vec()); - self.responses.lock().await.remove(0) + let mut responses = self.responses.lock().await; + if responses.is_empty() { + return Err(anyhow::anyhow!( + "SequenceProvider response script exhausted" + )); + } + responses.remove(0) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/turn.rs` around lines 1171 - 1179, The chat method currently does self.responses.lock().await.remove(0) which will panic if the scripted response queue is empty; change it to guard against empty responses and return a typed error instead. Inside chat (function name chat, types ChatRequest/ChatResponse, fields requests/responses), after pushing the request, acquire the responses lock and check if responses.is_empty(); if empty, return an Err with a clear error (e.g., an anyhow/eyre or crate-specific error) indicating the scripted response queue is exhausted; otherwise pop or remove the first element and return it as Ok. Ensure no unwrap/remove(0) is left to avoid panic-based test failures.
1124-1124: Use a bounded wait instead of fixed sleep for async event assertions.The fixed 20ms delay can be flaky under load. Prefer waiting until expected events arrive (with timeout).
Proposed fix
- use tokio::time::{sleep, timeout, Duration}; + use tokio::time::{timeout, Duration}; @@ - sleep(Duration::from_millis(20)).await; + timeout(Duration::from_secs(1), async { + loop { + if events.lock().await.len() >= 2 { + break; + } + tokio::task::yield_now().await; + } + }) + .await + .expect("expected ToolExecutionStarted and ToolExecutionCompleted events"); let captured = events.lock().await;Also applies to: 1453-1455
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/agent/harness/session/turn.rs` at line 1124, Replace the fixed sleep(Duration::from_millis(20)).await used to wait for async events with a bounded wait that uses tokio::time::timeout and actively awaits the expected event or condition; specifically, where you currently call sleep(...) use timeout(Duration::from_millis(<sane-ms>), async { loop { if check_event_or_stream_has_expected_item() { break } yield to the executor (e.g., tokio::task::yield_now().await) } }).await and handle the Err(timeout) case as a test failure; apply this change for all occurrences that import or use sleep, timeout, Duration (including the spots referenced by the review: the usage at the import line and the other occurrences around the second instance).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/agent/triage/escalation.rs`:
- Around line 333-335: The test is pattern-matching a non-existent field `error`
on DomainEvent::TriggerEscalationFailed; update the match arms to use the actual
field name `reason` instead (e.g. DomainEvent::TriggerEscalationFailed {
external_id, reason } if external_id == "esc-1" &&
reason.contains("missing-agent")), and apply the same change for the other
occurrence where TriggerEscalationFailed is matched.
- Around line 189-197: The tests share a fixed external_id ("esc-1" and
similarly "esc-ack", "esc-react-fail", "esc-escalate-fail") causing cross-test
collisions; change the envelope() helper(s) to generate a unique external_id per
invocation (for example by appending a short random value or UUID) when calling
TriggerEnvelope::from_composio so each test publishes/asserts against an
isolated id; apply the same change to the other envelope helpers used in the
tests (the ones that currently return "esc-ack", "esc-react-fail",
"esc-escalate-fail") to prevent parallel-test interference.
---
Duplicate comments:
In `@src/openhuman/agent/memory_loader.rs`:
- Around line 226-230: The test function
new_enforces_minimum_limit_and_zero_budget_disables_output is misnamed because
it asserts that output is present; rename it to reflect the asserted behavior
(e.g., new_enforces_minimum_limit_and_zero_budget_allows_output) and keep the
existing test body that constructs DefaultMemoryLoader::new(0,
0.4).with_max_chars(0), calls load_context(&MockMemory, "hello"), and asserts
context.contains("[Memory context]").
In `@src/openhuman/agent/triage/escalation.rs`:
- Line 187: The code imports and uses fixed sleeps (tokio::time::sleep,
Duration) in escalation.rs which makes tests flaky; replace those sleeps with a
condition-based wait under a timeout. Add a small helper (e.g.,
wait_for_condition or wait_for_predicate) that accepts an async predicate, a
total timeout, and a poll interval, and implement it using
tokio::time::timeout(...) around a loop that checks the predicate and awaits a
short sleep between checks; then replace direct sleep(...) calls in the
functions referencing escalation logic (look for uses of sleep and Duration in
escalation.rs) with calls to this helper so tests wait for the concrete
predicate rather than a fixed delay. Ensure the helper returns a clear
Result/timeout error so callers can handle failures.
---
Nitpick comments:
In `@src/openhuman/agent/harness/session/turn.rs`:
- Around line 1171-1179: The chat method currently does
self.responses.lock().await.remove(0) which will panic if the scripted response
queue is empty; change it to guard against empty responses and return a typed
error instead. Inside chat (function name chat, types ChatRequest/ChatResponse,
fields requests/responses), after pushing the request, acquire the responses
lock and check if responses.is_empty(); if empty, return an Err with a clear
error (e.g., an anyhow/eyre or crate-specific error) indicating the scripted
response queue is exhausted; otherwise pop or remove the first element and
return it as Ok. Ensure no unwrap/remove(0) is left to avoid panic-based test
failures.
- Line 1124: Replace the fixed sleep(Duration::from_millis(20)).await used to
wait for async events with a bounded wait that uses tokio::time::timeout and
actively awaits the expected event or condition; specifically, where you
currently call sleep(...) use timeout(Duration::from_millis(<sane-ms>), async {
loop { if check_event_or_stream_has_expected_item() { break } yield to the
executor (e.g., tokio::task::yield_now().await) } }).await and handle the
Err(timeout) case as a test failure; apply this change for all occurrences that
import or use sleep, timeout, Duration (including the spots referenced by the
review: the usage at the import line and the other occurrences around the second
instance).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4aa53ed4-7cf4-45ec-ae86-16bf485770ff
📒 Files selected for processing (12)
src/openhuman/agent/harness/definition.rssrc/openhuman/agent/harness/interrupt.rssrc/openhuman/agent/harness/mod.rssrc/openhuman/agent/harness/session/builder.rssrc/openhuman/agent/harness/session/runtime.rssrc/openhuman/agent/harness/session/turn.rssrc/openhuman/agent/harness/subagent_runner.rssrc/openhuman/agent/memory_loader.rssrc/openhuman/agent/pformat.rssrc/openhuman/agent/triage/escalation.rssrc/openhuman/agent/triage/evaluator.rssrc/openhuman/agent/triage/mod.rs
✅ Files skipped from review due to trivial changes (7)
- src/openhuman/agent/harness/interrupt.rs
- src/openhuman/agent/triage/mod.rs
- src/openhuman/agent/pformat.rs
- src/openhuman/agent/harness/mod.rs
- src/openhuman/agent/harness/definition.rs
- src/openhuman/agent/triage/evaluator.rs
- src/openhuman/agent/harness/session/builder.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/agent/harness/session/runtime.rs
| fn envelope() -> TriggerEnvelope { | ||
| TriggerEnvelope::from_composio( | ||
| "gmail", | ||
| "GMAIL_NEW_GMAIL_MESSAGE", | ||
| "triage-escalation", | ||
| "esc-1", | ||
| json!({ "subject": "hello" }), | ||
| ) | ||
| } |
There was a problem hiding this comment.
Avoid cross-test event collisions from shared external_id
All tests publish/assert on the same external_id ("esc-1") against a global event bus. With parallel test execution, events from one test can satisfy/fail assertions in another.
🛠️ Proposed isolation fix
- fn envelope() -> TriggerEnvelope {
+ fn envelope(external_id: &str) -> TriggerEnvelope {
TriggerEnvelope::from_composio(
"gmail",
"GMAIL_NEW_GMAIL_MESSAGE",
"triage-escalation",
- "esc-1",
+ external_id,
json!({ "subject": "hello" }),
)
}
@@
- apply_decision(run(TriageAction::Drop), &envelope())
+ let external_id = "esc-drop";
+ apply_decision(run(TriageAction::Drop), &envelope(external_id))
@@
- } if decision == "drop" && external_id == "esc-1"
+ } if decision == "drop" && external_id == external_id(Apply the same pattern to the other tests: esc-ack, esc-react-fail, esc-escalate-fail.)
Also applies to: 240-240, 275-275, 313-316, 354-357
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/triage/escalation.rs` around lines 189 - 197, The tests
share a fixed external_id ("esc-1" and similarly "esc-ack", "esc-react-fail",
"esc-escalate-fail") causing cross-test collisions; change the envelope()
helper(s) to generate a unique external_id per invocation (for example by
appending a short random value or UUID) when calling
TriggerEnvelope::from_composio so each test publishes/asserts against an
isolated id; apply the same change to the other envelope helpers used in the
tests (the ones that currently return "esc-ack", "esc-react-fail",
"esc-escalate-fail") to prevent parallel-test interference.
| DomainEvent::TriggerEscalationFailed { external_id, error } | ||
| if external_id == "esc-1" && error.contains("missing-agent") | ||
| ))); |
There was a problem hiding this comment.
Fix TriggerEscalationFailed field match (error → reason)
These assertions pattern-match a non-existent field and will fail to compile against the current DomainEvent::TriggerEscalationFailed shape.
🛠️ Proposed fix
- DomainEvent::TriggerEscalationFailed { external_id, error }
- if external_id == "esc-1" && error.contains("missing-agent")
+ DomainEvent::TriggerEscalationFailed {
+ external_id,
+ reason,
+ ..
+ } if external_id == "esc-1" && reason.contains("missing-agent")- DomainEvent::TriggerEscalationFailed { external_id, error }
- if external_id == "esc-1" && error.contains("missing-agent")
+ DomainEvent::TriggerEscalationFailed {
+ external_id,
+ reason,
+ ..
+ } if external_id == "esc-1" && reason.contains("missing-agent")Also applies to: 374-376
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/agent/triage/escalation.rs` around lines 333 - 335, The test is
pattern-matching a non-existent field `error` on
DomainEvent::TriggerEscalationFailed; update the match arms to use the actual
field name `reason` instead (e.g. DomainEvent::TriggerEscalationFailed {
external_id, reason } if external_id == "esc-1" &&
reason.contains("missing-agent")), and apply the same change for the other
occurrence where TriggerEscalationFailed is matched.
Summary
src/openhuman/agent/harness/session/turn.rs, including transcript resume, full tool-cycle turns, max-iteration failure, hook dispatch, and inline tool-result budgeting.src/openhuman/agent/harness/session/runtime.rs, including accessors, history reset, error sanitization, overlap detection, and native tool-call persistence helpers.src/openhuman/agent/harness/tool_loop.rs, including provider failures, tool execution failures, supervised auto-approval, default iteration fallback, and native tool result persistence.Problem
openhuman::agenttree still had large untested sections in the harness hot path, especially around turn execution, runtime helpers, and the legacy tool loop.Solution
turn.rsso the session lifecycle coverage now includes more of the real runtime path.Submission Checklist
cargo test --lib 'openhuman::agent::'and targeted harness test modulesImpact
src/openhuman/agentincreased, with the main runtime-heavy harness files materially improved:session/turn.rsto87.81%tool_loop.rsto89.52%session/runtime.rsto89.33%Related
openhuman::agentcoverage viamemory_loader,triage/escalation,interrupt, andtriage/routingSummary by CodeRabbit
New Features
Documentation
Tests