Skip to content

docs: enhance Dead Letter Queue documentation with detailed explanati…#1330

Open
sergiofilhowz wants to merge 5 commits intomainfrom
feat/dlq-documentation
Open

docs: enhance Dead Letter Queue documentation with detailed explanati…#1330
sergiofilhowz wants to merge 5 commits intomainfrom
feat/dlq-documentation

Conversation

@sergiofilhowz
Copy link
Contributor

@sergiofilhowz sergiofilhowz commented Mar 18, 2026

…ons and examples

Summary by CodeRabbit

  • New Features

    • Added a top-level CLI "trigger" subcommand to invoke functions remotely (with examples and behavior).
    • Exposed a DLQ redrive operation to return messages to main queues.
  • Documentation

    • Revamped Dead-Letter Queue guide with inspection, redrive procedures, diagrams, and best practices.
    • Added RabbitMQ queue naming conventions, Queue Architecture page, and a "Trigger Functions from the CLI" guide; updated navigation.
  • Tests

    • Added end-to-end and unit tests covering DLQ redrive and CLI trigger scenarios.

@vercel
Copy link
Contributor

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
iii-website Ready Ready Preview, Comment Mar 19, 2026 4:10pm
motia-docs Ready Ready Preview, Comment Mar 19, 2026 4:10pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds DLQ operational and redrive documentation, RabbitMQ naming/topology docs, a CLI trigger subcommand (WebSocket invoke), a runtime dependency change, a new iii::queue::redrive builtin function with engine wiring, adapter redrive integration, and end-to-end tests for DLQ redrive.

Changes

Cohort / File(s) Summary
DLQ guide
docs/how-to/dead-letter-queues.mdx
Rewrote guide from config walkthrough to DLQ inspection and manual redrive: added DLQ contents, RabbitMQ naming/topology notes, inspection CLI/UI commands, redrive workflow, Mermaid diagrams, and SDK/CLI examples; removed prior registration/config examples.
Queue architecture & naming
docs/architecture/queues.mdx, docs/how-to/use-queues.mdx, docs/modules/module-queue.mdx
New architecture page and queue naming conventions; documents RabbitMQ resource naming (main/retry/DLQ exchanges & queues), explains delayed-retry topology, and documents builtin queue functions (enqueue, iii::queue::redrive).
CLI & engine wiring
engine/src/cli_trigger.rs, engine/src/main.rs, engine/Cargo.toml
Added iii trigger subcommand (WebSocket invoke flow) with TriggerArgs, build_invoke_message, run_trigger; integrated subcommand into main CLI and split server startup; moved tokio-tungstenite to runtime dependencies.
Queue module & adapter
engine/src/modules/queue/queue.rs
Added RedriveInput/RedriveResult DTOs and QueueCoreModule::redrive (iii::queue::redrive) which calls adapter redrive_dlq; extended mock adapter and added unit tests for redrive behavior.
E2E tests & CLI tests
engine/tests/dlq_redrive_e2e.rs
New end-to-end tests exercising DLQ lifecycle, redrive flows, edge cases, and CLI-trigger behavior (parsing, invalid JSON, connection errors, --version).
Docs navigation & new how-to
docs/docs.json, docs/how-to/trigger-functions-from-cli.mdx
Added CLI trigger how-to page and updated docs navigation to include new architecture/how-to entries.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Worker
    participant Broker as RabbitMQ
    participant Retry as Retry Queue
    participant DLQ as Dead-Letter Queue
    participant Adapter as Builtin Adapter
    rect rgba(135,206,250,0.5)
    Worker->>Broker: Deliver job
    Broker->>Worker: Consume & process
    alt success
      Worker->>Broker: Ack
    else failure
      Worker->>Broker: Nack (or no-ack)
      Broker->>Retry: Route to retry exchange (x-message-ttl)
      Retry->>Broker: After TTL dead-letter -> main exchange
      loop retries < max_retries
        Broker->>Worker: Redeliver
      end
      alt retries exhausted
        Broker->>DLQ: Route to DLX -> DLQ (preserve metadata)
      end
    end
    end
Loading
sequenceDiagram
    participant CLI as iii (cli)
    participant Engine as Engine (ws)
    participant QueueModule as Queue Core
    participant Adapter as Queue Adapter
    CLI->>Engine: WebSocket Invoke (invokefunction: iii::queue::redrive)
    Engine->>QueueModule: Invoke redrive(input.queue)
    QueueModule->>Adapter: redrive_dlq(ns_queue_name)
    alt success
      Adapter-->>QueueModule: redriven_count
      QueueModule-->>Engine: FunctionResult::Success({queue, redriven})
      Engine-->>CLI: Pretty JSON result
    else failure
      Adapter-->>QueueModule: Error
      QueueModule-->>Engine: FunctionResult::Failure(redrive_failed)
      Engine-->>CLI: Error (stderr, exit 1)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • andersonleal
  • guibeira

Poem

🐇 I hopped through queues where messages sleep,
Peeked at DLQs in a burrowed heap,
Redrived the lost with a tiny drum—
Back to the main, they scurry and hum,
Hoppity-hop, the workflows leap!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: enhance Dead Letter Queue documentation with detailed explanations and examples' directly and accurately summarizes the main change, which is comprehensive DLQ documentation improvements across multiple files with enhanced explanations, examples, and architectural details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dlq-documentation
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ytallo
ytallo previously approved these changes Mar 18, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/how-to/dead-letter-queues.mdx`:
- Line 80: The doc text incorrectly states that RabbitMQ's native dead-lettering
via x-dead-letter-exchange routes exhausted messages to the DLQ; update the
sentence to explain that the adapter explicitly publishes a new DLQ message
containing wrapped metadata (original job, error, timestamp) instead of relying
on nack/requeue and x-dead-letter-exchange—reference the adapter logic in
retry.rs and publisher.rs to describe that the code constructs and publishes the
DLQ entry rather than triggering native RabbitMQ dead-lettering.
- Around line 185-243: The docs currently suggest using the Shovel plugin and
rabbitmqadmin with ackmode=ack_requeue_false, which move or ACK the raw DLQ
envelope (containing fields job, error, exhausted_at) instead of the safe
redrive behavior used by the adapter; update both recipes to match the adapter:
explicitly extract the nested job object from the DLQ envelope, reset
job.attempts_made = 0, publish that cleaned job to the main
exchange/routing_key, and only ACK the DLQ message after publish success (on
publish failure NACK/Requeue the DLQ message), and remove the Shovel-based “Move
messages” recommendation because it transfers the envelope unchanged.

In `@docs/how-to/use-queues.mdx`:
- Around line 928-930: The callout uses the wrong main queue name; update the
text to reference the actual queue identifier iii.__fn_queue::payment.queue (not
iii.__fn_queue::payment) and ensure the retry and DLQ examples remain consistent
(iii.__fn_queue::payment::retry.queue and iii.__fn_queue::payment::dlq.queue);
change the sentence in the Info block to say "For a queue named `payment`, the
main queue is `iii.__fn_queue::payment.queue`" and make the wording consistent
with the other docs that reference the same naming convention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 870bd520-d547-4574-bee8-3d82602cf004

📥 Commits

Reviewing files that changed from the base of the PR and between a90a4e6 and 04d4f0e.

📒 Files selected for processing (3)
  • docs/how-to/dead-letter-queues.mdx
  • docs/how-to/use-queues.mdx
  • docs/modules/module-queue.mdx

| DLQ exchange | `iii.__fn_queue::<queue_name>::dlq` |
| DLQ queue | `iii.__fn_queue::<queue_name>::dlq.queue` |

The main queue is configured with `x-dead-letter-exchange` pointing to the DLQ exchange. When the engine `nack`s a message with `requeue=false` after all retries are exhausted, RabbitMQ routes it to the DLQ automatically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the documentation file and verify its location
find . -path "*/docs/how-to/dead-letter-queues.mdx" -type f

Repository: iii-hq/iii

Length of output: 90


🏁 Script executed:

# Next, find the adapter files mentioned in the review
find . -path "*/engine/src/modules/queue/adapters/rabbitmq/retry.rs" -type f
find . -path "*/engine/src/modules/queue/adapters/rabbitmq/publisher.rs" -type f

Repository: iii-hq/iii

Length of output: 165


🏁 Script executed:

# Read the documentation file to see the context around line 80
cat -n docs/how-to/dead-letter-queues.mdx | head -100

Repository: iii-hq/iii

Length of output: 4351


🏁 Script executed:

# Read the retry.rs file, specifically lines 42-68
sed -n '40,70p' engine/src/modules/queue/adapters/rabbitmq/retry.rs

Repository: iii-hq/iii

Length of output: 940


🏁 Script executed:

# Read the publisher.rs file, specifically lines 71-95
sed -n '69,97p' engine/src/modules/queue/adapters/rabbitmq/publisher.rs

Repository: iii-hq/iii

Length of output: 927


🏁 Script executed:

# Also check the full context of retry.rs to understand the flow
wc -l engine/src/modules/queue/adapters/rabbitmq/retry.rs

Repository: iii-hq/iii

Length of output: 108


🏁 Script executed:

# And for publisher.rs
wc -l engine/src/modules/queue/adapters/rabbitmq/publisher.rs

Repository: iii-hq/iii

Length of output: 113


Clarify that DLQ entries are explicitly published with wrapped metadata, not routed via native RabbitMQ dead-lettering.

The documented explanation incorrectly describes the operational model. The adapter code (retry.rs and publisher.rs) explicitly publishes a new DLQ entry containing the original job, error, and timestamp, rather than nacking the original message to trigger RabbitMQ's native dead-lettering via x-dead-letter-exchange.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/how-to/dead-letter-queues.mdx` at line 80, The doc text incorrectly
states that RabbitMQ's native dead-lettering via x-dead-letter-exchange routes
exhausted messages to the DLQ; update the sentence to explain that the adapter
explicitly publishes a new DLQ message containing wrapped metadata (original
job, error, timestamp) instead of relying on nack/requeue and
x-dead-letter-exchange—reference the adapter logic in retry.rs and publisher.rs
to describe that the code constructs and publishes the DLQ entry rather than
triggering native RabbitMQ dead-lettering.

Comment on lines +928 to +930
<Info title="RabbitMQ queue naming">
When using the RabbitMQ adapter, iii creates exchanges and queues using a predictable naming convention. For a queue named `payment`, the main queue is `iii.__fn_queue::payment`, the retry queue is `iii.__fn_queue::payment::retry.queue`, and the DLQ is `iii.__fn_queue::payment::dlq.queue`. See [Dead Letter Queues](./dead-letter-queues#dlq-naming-convention) for the full resource map.
</Info>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the actual main queue name in this callout.

iii.__fn_queue::payment is the main exchange, not the queue. The queue users need to inspect is iii.__fn_queue::payment.queue per engine/src/modules/queue/adapters/rabbitmq/naming.rs:1-67, and this already disagrees with docs/modules/module-queue.mdx:172-183.

✏️ Proposed fix
-  When using the RabbitMQ adapter, iii creates exchanges and queues using a predictable naming convention. For a queue named `payment`, the main queue is `iii.__fn_queue::payment`, the retry queue is `iii.__fn_queue::payment::retry.queue`, and the DLQ is `iii.__fn_queue::payment::dlq.queue`. See [Dead Letter Queues](./dead-letter-queues#dlq-naming-convention) for the full resource map.
+  When using the RabbitMQ adapter, iii creates exchanges and queues using a predictable naming convention. For a queue named `payment`, the main queue is `iii.__fn_queue::payment.queue`, the retry queue is `iii.__fn_queue::payment::retry.queue`, and the DLQ queue is `iii.__fn_queue::payment::dlq.queue`. See [Dead Letter Queues](./dead-letter-queues#dlq-naming-convention) for the full resource map.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<Info title="RabbitMQ queue naming">
When using the RabbitMQ adapter, iii creates exchanges and queues using a predictable naming convention. For a queue named `payment`, the main queue is `iii.__fn_queue::payment`, the retry queue is `iii.__fn_queue::payment::retry.queue`, and the DLQ is `iii.__fn_queue::payment::dlq.queue`. See [Dead Letter Queues](./dead-letter-queues#dlq-naming-convention) for the full resource map.
</Info>
<Info title="RabbitMQ queue naming">
When using the RabbitMQ adapter, iii creates exchanges and queues using a predictable naming convention. For a queue named `payment`, the main queue is `iii.__fn_queue::payment.queue`, the retry queue is `iii.__fn_queue::payment::retry.queue`, and the DLQ queue is `iii.__fn_queue::payment::dlq.queue`. See [Dead Letter Queues](./dead-letter-queues#dlq-naming-convention) for the full resource map.
</Info>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/how-to/use-queues.mdx` around lines 928 - 930, The callout uses the
wrong main queue name; update the text to reference the actual queue identifier
iii.__fn_queue::payment.queue (not iii.__fn_queue::payment) and ensure the retry
and DLQ examples remain consistent (iii.__fn_queue::payment::retry.queue and
iii.__fn_queue::payment::dlq.queue); change the sentence in the Info block to
say "For a queue named `payment`, the main queue is
`iii.__fn_queue::payment.queue`" and make the wording consistent with the other
docs that reference the same naming convention.

…ng functions

- Bump version of the "iii", "iii-console", and "iii-sdk" packages to 0.9.0 in Cargo.lock.
- Add new documentation for triggering functions from the CLI, detailing usage and command structure.
- Introduce a new `iii::queue::redrive` function to move messages from the dead-letter queue back to the main queue, with appropriate input and output structures.
- Update existing documentation to reference the new queue architecture and CLI capabilities.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
engine/src/cli_trigger.rs (1)

86-89: Consider returning an error instead of std::process::exit(1).

Using std::process::exit(1) bypasses normal control flow and prevents proper cleanup. Returning an Err would allow main to handle the exit code, improving testability and consistency.

♻️ Proposed refactor
             if let Some(error) = parsed.get("error").filter(|e| !e.is_null()) {
                 eprintln!("Error: {}", serde_json::to_string_pretty(error)?);
-                std::process::exit(1);
+                return Err(anyhow::anyhow!("Function returned an error"));
             }

Then in main.rs, you can handle the error and set the exit code:

match &cli.command {
    Some(Commands::Trigger(args)) => {
        if let Err(e) = cli_trigger::run_trigger(args).await {
            eprintln!("{}", e);
            std::process::exit(1);
        }
        Ok(())
    }
    None => run_serve(&cli).await,
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/cli_trigger.rs` around lines 86 - 89, The code in cli_trigger.rs
currently calls std::process::exit(1) when an "error" field is present; change
run_trigger (or whatever async entry function in cli_trigger.rs) to return a
Result<(), E> (e.g., anyhow::Result or crate error type) instead of exiting,
replace the exit call with returning Err(...) that includes the formatted error,
and propagate that Result to main where main matches Commands::Trigger and maps
Err to printing the error and calling std::process::exit(1) as shown in the
suggested refactor; update any callers of run_trigger to await and handle the
Result accordingly so normal cleanup and testability are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@engine/src/cli_trigger.rs`:
- Around line 86-89: The code in cli_trigger.rs currently calls
std::process::exit(1) when an "error" field is present; change run_trigger (or
whatever async entry function in cli_trigger.rs) to return a Result<(), E>
(e.g., anyhow::Result or crate error type) instead of exiting, replace the exit
call with returning Err(...) that includes the formatted error, and propagate
that Result to main where main matches Commands::Trigger and maps Err to
printing the error and calling std::process::exit(1) as shown in the suggested
refactor; update any callers of run_trigger to await and handle the Result
accordingly so normal cleanup and testability are preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e3a90da-2f8f-4cf2-ad50-c159280e5d39

📥 Commits

Reviewing files that changed from the base of the PR and between 04d4f0e and ba4ff9b.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/architecture/queues.mdx
  • docs/docs.json
  • docs/how-to/dead-letter-queues.mdx
  • docs/how-to/trigger-functions-from-cli.mdx
  • docs/how-to/use-queues.mdx
  • docs/modules/module-queue.mdx
  • engine/Cargo.toml
  • engine/src/cli_trigger.rs
  • engine/src/main.rs
  • engine/src/modules/queue/queue.rs
✅ Files skipped from review due to trivial changes (2)
  • docs/docs.json
  • docs/how-to/use-queues.mdx

ytallo
ytallo previously approved these changes Mar 19, 2026
- Introduce a new test file `dlq_redrive_e2e.rs` to validate the full lifecycle of the dead-letter queue (DLQ) redrive process.
- Implement tests covering enqueueing, failure handling, DLQ management, and reprocessing of messages.
- Ensure integration with the `iii::queue::redrive` function and CLI trigger capabilities.
… config option

- Add a conflict condition to the `use_default_config` argument, ensuring it cannot be combined with the `--config` flag.
… queue modules

- Adjusted formatting in `cli_trigger.rs` and `main.rs` for better alignment and consistency.
- Enhanced readability of function signatures and argument parsing in tests.
- Updated comments and assertions in `dlq_redrive_e2e.rs` for clarity and consistency.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
engine/src/main.rs (1)

95-171: Consider adding tests for use_default_config flag.

The test suite covers the trigger subcommand well but doesn't test the --use-default-config flag behavior. Consider adding:

  1. A test that --use-default-config parses successfully without --config
  2. A test that --use-default-config --config custom.yaml fails due to the conflict
💡 Suggested test additions
#[test]
fn use_default_config_parses_without_explicit_config() {
    let cli = Cli::try_parse_from(["iii", "--use-default-config"])
        .expect("should parse with use_default_config");
    assert!(cli.use_default_config);
}

#[test]
fn use_default_config_conflicts_with_explicit_config() {
    let result = Cli::try_parse_from([
        "iii",
        "--use-default-config",
        "--config",
        "custom.yaml",
    ]);
    assert!(result.is_err(), "should fail when both flags are provided");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/main.rs` around lines 95 - 171, Add two unit tests in the tests
module to cover the use_default_config flag: one test named
use_default_config_parses_without_explicit_config that calls
Cli::try_parse_from(["iii", "--use-default-config"]) and asserts
cli.use_default_config is true, and one test named
use_default_config_conflicts_with_explicit_config that calls
Cli::try_parse_from(["iii", "--use-default-config", "--config", "custom.yaml"])
and asserts the result is an error; place them alongside the existing tests that
reference Cli and Commands so they exercise the same parsing logic.
engine/src/cli_trigger.rs (2)

59-67: Consider logging malformed server messages.

The unwrap_or_default() silently ignores JSON parse failures. While this is acceptable for skipping irrelevant messages, malformed messages from the server could indicate a protocol issue. Consider logging at debug/trace level:

let parsed: serde_json::Value = match serde_json::from_str(&text) {
    Ok(v) => v,
    Err(e) => {
        tracing::trace!("Ignoring non-JSON message: {}", e);
        continue;
    }
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/cli_trigger.rs` around lines 59 - 67, The loop reading messages
from the WebSocket uses serde_json::from_str(...).unwrap_or_default() which
silently swallows JSON parse errors; update the handling in the loop that calls
socket.next() and matches WsMessage::Text (the block that assigns parsed) to
explicitly match serde_json::from_str(&text) and on Err(e) call tracing::trace!
(or tracing::debug!) with the error and the raw text (or a truncated form), then
continue the loop so malformed server messages are logged but ignored; keep the
existing check for parsed.get("type") == Some("workerregistered") unchanged.

29-39: Consider omitting explicit null fields.

The server-side InvokeFunction definition (in engine/src/protocol.rs) marks invocation_id, traceparent, baggage, and action with #[serde(skip_serializing_if = "Option::is_none")], meaning these fields are optional. The explicit null values work but are unnecessary—the server will correctly default them to None if omitted.

💡 Simplified message construction
 pub fn build_invoke_message(function_id: &str, data: serde_json::Value) -> serde_json::Value {
     serde_json::json!({
         "type": "invokefunction",
-        "invocation_id": null,
         "function_id": function_id,
         "data": data,
-        "traceparent": null,
-        "baggage": null,
-        "action": null,
     })
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/cli_trigger.rs` around lines 29 - 39, The build_invoke_message
function currently includes explicit nulls for optional fields; remove the
"invocation_id", "traceparent", "baggage", and "action" keys from the JSON so
they are omitted instead of set to null (matching the server-side InvokeFunction
#[serde(skip_serializing_if = "Option::is_none")] behavior). Update
build_invoke_message to only emit "type", "function_id", and "data" (and any
non-optional fields) so the server will treat the missing fields as None.
engine/tests/dlq_redrive_e2e.rs (1)

240-247: Consider using deterministic synchronization instead of fixed sleeps.

The test uses fixed tokio::time::sleep durations to wait for async queue processing. While the timeouts appear generous (2000ms for max_retries=2, backoff_ms=50), this pattern can be flaky under CI load. Consider using a polling loop with timeout instead:

// Example: poll until condition or timeout
let deadline = tokio::time::Instant::now() + Duration::from_secs(5);
while dlq_count(&engine, "orders").await == 0 {
    if tokio::time::Instant::now() > deadline {
        panic!("Timeout waiting for DLQ entry");
    }
    tokio::time::sleep(Duration::from_millis(50)).await;
}

This is a nice-to-have improvement for test reliability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/tests/dlq_redrive_e2e.rs` around lines 240 - 247, Replace the fixed
tokio::time::sleep(Duration::from_millis(2000)).await with a deterministic
polling loop that waits until the desired condition is met or a deadline
expires: repeatedly check dlq_count(&engine, "orders").await (or inspect
fail_count.load(Ordering::SeqCst)) in a loop using tokio::time::Instant::now() +
timeout as the deadline, sleeping a short interval (e.g., 50ms) between polls,
and panic/err if the deadline is exceeded; after the loop assert the expected
range for total_failures as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/src/cli_trigger.rs`:
- Around line 81-84: The code in cli_trigger.rs currently prints the parsed
error and calls std::process::exit(1) which prevents proper cleanup and makes
the function untestable; modify the block that checks parsed.get("error") so it
returns an Err (e.g., propagate a anyhow::Error or your crate's Error type)
instead of calling std::process::exit(1), preserving the
serde_json::to_string_pretty(error)? message in the error; then update the
caller in main.rs where the trigger dispatch is invoked to handle the Result
(map or match the Err to print the same error and call std::process::exit(1)
only at the top-level main) so exit remains only in main.

---

Nitpick comments:
In `@engine/src/cli_trigger.rs`:
- Around line 59-67: The loop reading messages from the WebSocket uses
serde_json::from_str(...).unwrap_or_default() which silently swallows JSON parse
errors; update the handling in the loop that calls socket.next() and matches
WsMessage::Text (the block that assigns parsed) to explicitly match
serde_json::from_str(&text) and on Err(e) call tracing::trace! (or
tracing::debug!) with the error and the raw text (or a truncated form), then
continue the loop so malformed server messages are logged but ignored; keep the
existing check for parsed.get("type") == Some("workerregistered") unchanged.
- Around line 29-39: The build_invoke_message function currently includes
explicit nulls for optional fields; remove the "invocation_id", "traceparent",
"baggage", and "action" keys from the JSON so they are omitted instead of set to
null (matching the server-side InvokeFunction #[serde(skip_serializing_if =
"Option::is_none")] behavior). Update build_invoke_message to only emit "type",
"function_id", and "data" (and any non-optional fields) so the server will treat
the missing fields as None.

In `@engine/src/main.rs`:
- Around line 95-171: Add two unit tests in the tests module to cover the
use_default_config flag: one test named
use_default_config_parses_without_explicit_config that calls
Cli::try_parse_from(["iii", "--use-default-config"]) and asserts
cli.use_default_config is true, and one test named
use_default_config_conflicts_with_explicit_config that calls
Cli::try_parse_from(["iii", "--use-default-config", "--config", "custom.yaml"])
and asserts the result is an error; place them alongside the existing tests that
reference Cli and Commands so they exercise the same parsing logic.

In `@engine/tests/dlq_redrive_e2e.rs`:
- Around line 240-247: Replace the fixed
tokio::time::sleep(Duration::from_millis(2000)).await with a deterministic
polling loop that waits until the desired condition is met or a deadline
expires: repeatedly check dlq_count(&engine, "orders").await (or inspect
fail_count.load(Ordering::SeqCst)) in a loop using tokio::time::Instant::now() +
timeout as the deadline, sleeping a short interval (e.g., 50ms) between polls,
and panic/err if the deadline is exceeded; after the loop assert the expected
range for total_failures as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d07affd0-c311-4733-97c3-5d8f74698af6

📥 Commits

Reviewing files that changed from the base of the PR and between ba4ff9b and eae843c.

📒 Files selected for processing (4)
  • engine/src/cli_trigger.rs
  • engine/src/main.rs
  • engine/src/modules/queue/queue.rs
  • engine/tests/dlq_redrive_e2e.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • engine/src/modules/queue/queue.rs

Comment on lines +81 to +84
if let Some(error) = parsed.get("error").filter(|e| !e.is_null()) {
eprintln!("Error: {}", serde_json::to_string_pretty(error)?);
std::process::exit(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid std::process::exit(1) in library code.

Calling std::process::exit(1) directly bypasses normal Rust cleanup (drop handlers, async runtime shutdown) and makes the function untestable for error cases. Instead, return an Err and let the caller in main.rs decide how to exit.

🐛 Proposed fix

In cli_trigger.rs:

             if let Some(error) = parsed.get("error").filter(|e| !e.is_null()) {
-                eprintln!("Error: {}", serde_json::to_string_pretty(error)?);
-                std::process::exit(1);
+                return Err(anyhow::anyhow!(
+                    "Function invocation failed: {}",
+                    serde_json::to_string_pretty(error)?
+                ));
             }

In main.rs, update the trigger dispatch to handle the error:

     match &cli.command {
-        Some(Commands::Trigger(args)) => cli_trigger::run_trigger(args).await,
+        Some(Commands::Trigger(args)) => {
+            if let Err(e) = cli_trigger::run_trigger(args).await {
+                eprintln!("Error: {}", e);
+                std::process::exit(1);
+            }
+            Ok(())
+        }
         None => run_serve(&cli).await,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/src/cli_trigger.rs` around lines 81 - 84, The code in cli_trigger.rs
currently prints the parsed error and calls std::process::exit(1) which prevents
proper cleanup and makes the function untestable; modify the block that checks
parsed.get("error") so it returns an Err (e.g., propagate a anyhow::Error or
your crate's Error type) instead of calling std::process::exit(1), preserving
the serde_json::to_string_pretty(error)? message in the error; then update the
caller in main.rs where the trigger dispatch is invoked to handle the Result
(map or match the Err to print the same error and call std::process::exit(1)
only at the top-level main) so exit remains only in main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants