Skip to content

Bound external fallback provider execution#32

Merged
jmcte merged 5 commits intomainfrom
codex/issue-3-provider-timeout-rate-limit
May 7, 2026
Merged

Bound external fallback provider execution#32
jmcte merged 5 commits intomainfrom
codex/issue-3-provider-timeout-rate-limit

Conversation

@jmcte
Copy link
Copy Markdown
Contributor

@jmcte jmcte commented May 2, 2026

Summary

  • add configurable timeout and per-session invocation limits for external fallback providers
  • kill timed-out provider processes and return a structured timeout error
  • persist invocation counts in the native app runtime directory and document defaults/config knobs

Closes #3

Validation

  • cargo fmt --manifest-path rust/Cargo.toml -- --check
  • cargo test --manifest-path rust/Cargo.toml native_app::tests::external_provider_ -- --nocapture
  • cargo test --manifest-path rust/Cargo.toml external_provider -- --nocapture
  • cargo test --manifest-path rust/Cargo.toml native_app::tests::login_can_fallback_to_1password_cli
  • cargo test --manifest-path rust/Cargo.toml native_app::tests::login_bitwarden_fallback_matches_uri_before_selecting_item
  • cargo test --manifest-path rust/Cargo.toml --test security_regressions
  • cargo test --manifest-path rust/Cargo.toml --test native_app_e2e login_works_via_direct_fallback_when_service_not_running
  • cargo test --manifest-path rust/Cargo.toml --test native_app_e2e fill_works_via_direct_fallback_when_service_not_running
  • cargo test --manifest-path rust/Cargo.toml --test native_app_e2e direct_fallback_maps_malformed_response_to_proto_invalid_response
  • git diff --check

Notes

  • Full native_app_e2e still hangs in existing socket-backed tests (fill_uses_fill_intent_over_socket); the direct fallback e2e cases relevant to this issue passed individually.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7a06378ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/src/native_app.rs
Comment on lines +822 to +826
let status = loop {
if let Some(status) = child.try_wait().map_err(|error| {
APWError::new(
Status::ProcessNotRunning,
format!(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Read provider output while enforcing timeout

This loop waits for the child to exit via try_wait() before draining stdout/stderr, even though both streams are configured as pipes. If the provider emits more than a pipe buffer (common for large bw list items or op item list JSON), the child can block on write and never reach exit, so APW hits the timeout path and kills a healthy process. The regression is specific to commands with larger output and causes false CommunicationTimeout errors under normal usage with bigger vaults.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

Copy link
Copy Markdown

@athena-omt athena-omt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one substantive blocker.

reserve_external_provider_invocation implements the per-session cap with an unprotected read/modify/write of fallback-provider-session.json. Concurrent native-app fallback requests can all read the same invocations value, each pass the limit check, and then overwrite the file with the same incremented count. That means the configured cap is not actually enforceable under parallel browser/native-host requests, which is the security boundary this PR is trying to add.

Suggested fix: make the reservation atomic across processes, e.g. lock a sidecar/state file with flock/fs2, then read-check-write while holding the lock; or use an atomic append/counter mechanism. Add a concurrency regression test with max invocations set low.

Validation note: git diff --check origin/main...HEAD passed. I attempted the targeted Rust fallback-provider tests, but this Athena Linux environment lacks OpenSSL pkg-config/dev metadata, so the local cargo run failed before tests executed. GitHub CI for the PR is green.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7a06378ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/src/native_app.rs
Comment on lines +788 to +790
let limits = external_provider_limits(config);
reserve_external_provider_invocation(provider, &limits)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reserve invocations only after spawn succeeds

The invocation counter is incremented before command.spawn() runs, so failed launches (for example, missing execute permission or a transient ENOENT) still consume the session budget. With a low fallbackProviderMaxInvocations, users can be locked out by repeated spawn failures even though no external provider process actually ran, which contradicts the configured “invocations” semantics and makes recovery depend on reauth/editing config.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

Comment thread rust/src/native_app.rs
Comment on lines +738 to +742
let state = fs::read_to_string(&path)
.ok()
.and_then(|content| serde_json::from_str::<Value>(&content).ok());
let current_count = state
.as_ref()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make invocation count update atomic across processes

This reads the state file and later writes back an incremented count without any inter-process lock, so concurrent apw calls can both observe the same current_count and proceed, allowing the configured max to be exceeded. The limit is documented as a hard session bound, but this read-modify-write race makes enforcement unreliable under parallel logins/fills.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7a06378ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/src/native_app.rs
Comment thread rust/src/native_app.rs
Copy link
Copy Markdown

@athena-omt athena-omt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not complete a source-backed review because the required Athena worker node is currently not connected. The mandated first smoke/setup check against /openclaw-data/src/_worktrees/athena/omt-global/apw-cli/pr-32 failed before any repository inspection could run.

I’m not approving or requesting changes from source-only/coordinator inspection. The existing CHANGES_REQUESTED review about non-atomic fallback invocation reservation remains the current substantive review signal until a worker-backed follow-up can run.

@jmcte jmcte merged commit f9ff88f into main May 7, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: add rate-limiting and invocation timeout for external fallback provider

2 participants