Skip to content

Add marmotd (MLS over Nostr) as optional messaging provider#10

Merged
AnthonyRonning merged 11 commits intomasterfrom
feat/marmot-messenger
Feb 22, 2026
Merged

Add marmotd (MLS over Nostr) as optional messaging provider#10
AnthonyRonning merged 11 commits intomasterfrom
feat/marmot-messenger

Conversation

@AnthonyRonning
Copy link
Owner

@AnthonyRonning AnthonyRonning commented Feb 21, 2026

Summary

Add marmotd as an optional messaging provider, giving Sage a decentralized MLS-encrypted transport over Nostr relays as an alternative to Signal.

Changes

New files:

  • messenger.rsMessenger trait abstracting message send/receive, shared IncomingMessage/IncomingAttachment types
  • marmot.rsMarmotClient that spawns marmotd as a subprocess, speaks JSONL over stdio, handles welcome auto-accept, keypackage publishing, and pubkey->group routing

Modified:

  • config.rsMESSENGER env var (signal|marmot), marmot-specific config (relays, state dir, allowed pubkeys, auto-accept), npub (bech32) normalization
  • main.rs — Messenger-polymorphic event loop via Arc<Mutex<dyn Messenger>>, agents keyed by identity (Signal UUID or Nostr pubkey)
  • signal.rs — Implements Messenger trait, uses shared types from messenger.rs
  • Dockerfile — Builds marmotd from pika source (v0.3.2) to avoid GLIBC mismatch with prebuilt binaries
  • justfile — Messenger-aware start/restart/stop; conditionally skips signal-cli for marmot mode, mounts marmot-state volume

Design Decisions

  • Pubkey as identity: Marmot agents are keyed by sender pubkey (not MLS group ID), keeping parity with Signal's 1:1 UUID model. A routing table maps pubkey -> latest group for reply delivery.
  • Optional, not default: MESSENGER=signal remains the default. Set MESSENGER=marmot to switch.
  • Build from source: marmotd prebuilt binaries target Ubuntu 24.04 (GLIBC 2.39), but our runtime is Debian Bookworm (2.36). A dedicated Dockerfile stage builds from the pika repo at the release tag.
  • Future multi-agent: Comments note where per-group agent threads could be layered on top when multi-agent/subagent support lands.

Configuration

MESSENGER=marmot
MARMOT_RELAYS=wss://relay.damus.io,wss://nos.lol,wss://relay.primal.net
MARMOT_STATE_DIR=/data/marmot-state
MARMOT_ALLOWED_PUBKEYS=npub1...
MARMOT_AUTO_ACCEPT_WELCOMES=true

Testing

  • All existing tests pass, fmt/clippy clean
  • Docker image builds and runs successfully
  • End-to-end tested: Pika client -> marmotd -> Sage agent -> marmotd -> Pika client

Summary by CodeRabbit

  • New Features

    • Multi-messenger support: choose Signal or Marmot via MESSENGER; unified messaging types and provider selection.
    • Marmot support: optional marmotd sidecar, relay/state config, allowed pubkeys, auto-accept welcomes.
    • Persistent reply context: store and restore per-chat reply_context for routing.
  • Refactor

    • Unified Messenger interface with shared send/receive/typing and health checks.
  • Chores

    • Updated startup tooling to handle Marmot state/volumes.
  • Tests

    • Added unit tests for pubkey normalization.

Open with Devin

AnthonyRonning and others added 2 commits February 21, 2026 15:44
Introduce a Messenger trait abstraction that allows Sage to use either
Signal (default) or Marmot for message transport. Marmot uses the marmotd
sidecar daemon from pika, communicating via JSONL over stdio to provide
MLS-encrypted messaging over Nostr relays.

Key changes:
- messenger.rs: Messenger trait with shared IncomingMessage/IncomingAttachment types
- marmot.rs: MarmotClient spawning marmotd subprocess, JSONL protocol, npub support
- config.rs: MESSENGER env var (signal|marmot) with marmot-specific config fields
- main.rs: Messenger-polymorphic event loop using Arc<Mutex<dyn Messenger>>
- signal.rs: Refactored to implement Messenger trait, uses shared types
- Dockerfile: Build marmotd from pika source to avoid GLIBC mismatch
- justfile: Messenger-aware start/restart/stop with conditional signal-cli

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Treat each Nostr pubkey as a stable identity (like Signal UUID) so the
same user gets the same agent regardless of which MLS group they message
from. MarmotClient maintains a pubkey -> latest group_id routing table
so replies reach the correct group.

This keeps parity with Signal's 1:1 identity model. When multi-agent
support lands, per-group agent threads can be layered on top while
sharing a parent identity for cross-thread memory.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Messenger abstraction with Signal and new Marmot backends, Marmot client/daemon orchestration, config fields and DB migration for reply_context, refactors main runtime to use Messenger, updates build/dev tooling (Dockerfile, justfile) to optionally build/run marmotd.

Changes

Cohort / File(s) Summary
Core messaging API
crates/sage-core/src/messenger.rs, crates/sage-core/src/lib.rs
Add public Messenger trait and types IncomingMessage/IncomingAttachment; export marmot and messenger modules.
Marmot integration
crates/sage-core/src/marmot.rs
New MarmotConfig/MarmotClient, bech32/pubkey helpers, marmotd spawn and stdout receive loop, group routing, request IDs, and Messenger impl. Includes unit tests.
Signal provider updates
crates/sage-core/src/signal.rs
Remove local message structs; use shared messenger types; implement Messenger for SignalClient; populate reply_to and align parsing with new flow.
Configuration & schema
crates/sage-core/src/config.rs, crates/sage-core/src/schema.rs
Add MessengerType and Marmot-related config fields; marmot_config() and allowed_users() accessors; add nullable reply_context column to chat_contexts.
Agent context & DB
crates/sage-core/src/agent_manager.rs, crates/sage-core/migrations/*
Add reply_context: Option<String> to ChatContext; methods update_reply_context and load_reply_contexts; migration to add reply_context column.
Application wiring
crates/sage-core/src/main.rs
Refactor startup and message routing to use Messenger abstraction; conditional init for Signal or Marmot; unified send/typing/health flows; route by reply_to.
Build & dev tooling
Dockerfile, justfile
Add optional marmotd-builder stage and copy marmotd into runtime image; create /data/marmot-state; update justfile tasks to select MESSENGER (signal
Migrations
crates/sage-core/migrations/2026-02-22-010953-0000_add_reply_context_to_chat_contexts/*
SQL up/down to add/remove reply_context column in chat_contexts.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Main as Main.rs
    participant Config as Config
    participant Agent as AgentManager
    participant Messenger as Messenger Trait
    participant Signal as SignalClient
    participant Marmot as MarmotClient
    participant Storage as Storage/LLM

    User->>Main: Start application
    Main->>Config: Load config (MESSENGER)
    alt Messenger = Signal
        Main->>Signal: Initialize Signal client
        Main->>Main: Spawn Signal receive loop
    else Messenger = Marmot
        Main->>Marmot: Spawn marmotd daemon
        Marmot->>Marmot: Publish keypackage / join relays
        Main->>Main: Spawn Marmot receive loop
    end

    loop Receive & process
        Signal->>Main: IncomingMessage
        Marmot->>Main: IncomingMessage
        Main->>Config: allowed_users()
        Main->>Agent: get_or_create_agent(reply_to)
        Agent->>Storage: Generate response
        Storage-->>Agent: Response
        Agent->>Messenger: send_message(recipient, text)
        alt Backend = Signal
            Messenger->>Signal: deliver via SignalClient
        else Backend = Marmot
            Messenger->>Marmot: deliver via MarmotClient (group routing)
        end
    end

    loop Periodic health
        Main->>Messenger: refresh()
        Messenger->>Signal: refresh / reconnect
        Messenger->>Marmot: health check
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped where messengers now meet and play,
Marmot hums up, Signal guards the way,
One trait threads relays and reply,
I thump for routes that let messages fly,
A joyful rabbit queues the day.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.97% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding marmotd as an optional messaging provider alongside Signal.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/marmot-messenger

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
justfile (2)

130-181: Duplication between start and restart tasks.

The messenger setup logic (lines 139-163) is largely duplicated from the start task. Consider extracting the common messenger configuration into a helper recipe to reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@justfile` around lines 130 - 181, The messenger configuration block
duplicated in the restart recipe should be extracted into a reusable helper
recipe (e.g., messenger_setup) and invoked from both start and restart; move the
logic that sets MESSENGER, MESSENGER_VOLUMES, and MESSENGER_ENV (the conditional
branches for "signal" and "marmot") into that helper, export or source the
resulting variables so restart and start call messenger_setup before running
podman, and update both recipes to remove the duplicated conditional and instead
rely on the shared messenger_setup helper.

90-103: Consider the security implications of eval with environment variables.

Using eval with dynamically constructed command strings works here since the environment variables are sourced from a user-controlled .env file. However, this pattern can be fragile if any variable contains special shell characters.

For future maintainability, consider constructing the podman command using arrays or a more explicit approach, though this is low priority for a development justfile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@justfile` around lines 90 - 103, The command uses eval to run the podman
command string (the line beginning with eval podman run -d --name sage) which
risks word-splitting and shell injection when variables like $MESSENGER_VOLUMES,
$MESSENGER_ENV, MAPLE_* or BRAVE_API_KEY contain special characters; change to
build and execute the podman invocation without eval (expand variables safely),
e.g., pass arguments directly or construct an argument array and execute podman
with those arguments so quoting and whitespace are preserved and no untrusted
shell evaluation occurs.
crates/sage-core/src/config.rs (1)

92-106: Silent fallback for invalid pubkeys may mask configuration errors.

When normalize_pubkey fails, the code falls back to the original unparsed value via .unwrap_or(p). This means invalid pubkeys will be silently passed through and likely fail later at runtime when marmotd rejects them.

Consider logging a warning or failing fast during config loading to surface configuration mistakes early:

🔧 Suggested improvement
                         .map(|p| {
                             if p == "*" {
                                 p
                             } else {
-                                crate::marmot::normalize_pubkey(&p).unwrap_or(p)
+                                match crate::marmot::normalize_pubkey(&p) {
+                                    Ok(normalized) => normalized,
+                                    Err(e) => {
+                                        tracing::warn!("Invalid pubkey in MARMOT_ALLOWED_PUBKEYS '{}': {} - using as-is", p, e);
+                                        p
+                                    }
+                                }
                             }
                         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/config.rs` around lines 92 - 106, The parsing of
marmot_allowed_pubkeys silently accepts invalid keys because
crate::marmot::normalize_pubkey(&p).unwrap_or(p) falls back to the raw value;
change this to surface errors during config load: call normalize_pubkey and on
Err either (a) return/propagate an error from the config builder so config
loading fails fast, or (b) log a clear warning and filter out the invalid entry
instead of passing it through. Update the marmot_allowed_pubkeys mapping to
handle the Result from normalize_pubkey explicitly (using map_err/and_then or
matching) and either propagate the error or emit a warning via your logger
before collecting the valid keys.
crates/sage-core/src/marmot.rs (1)

13-40: Consider using the established bech32 crate for npub decoding.

The custom bech32_decode_payload implementation works for the specific use case but skips checksum validation—it simply strips the last 6 characters without verifying they match the expected bech32 checksum. Using the standard bech32 crate would:

  1. Automatically validate the checksum (supporting both Bech32 and Bech32m formats per NIP-19)
  2. Catch malformed or corrupted inputs that pass the current length check
  3. Reduce maintenance burden and align with ecosystem standards

While the current bytes.len() != 32 check in normalize_pubkey provides some protection, checksum validation is the proper way to ensure bech32 integrity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 13 - 40, Replace the ad-hoc
bech32_decode_payload implementation with the established bech32 crate: in
bech32_decode_payload, call bech32::decode to parse and validate the
human-readable part and checksum (handling both Bech32 and Bech32m per NIP-19),
then convert the returned 5-bit base32 data to bytes using the crate's
from_base32 helper (or an equivalent 5->8 bit conversion) and return that
Vec<u8> only if checksum/encoding validation succeeds; update any callers such
as normalize_pubkey to use the validated output (remove the strip-last-6
heuristic and rely on the crate to reject malformed inputs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sage-core/src/main.rs`:
- Around line 208-219: The marmotd child handle returned from spawn_marmot is
currently discarded as `_child`; change this to store the handle (e.g., rename
`_child` to `child_handle` and keep it alongside `messenger`/`receive_handle` or
inside a shared struct) and ensure it is explicitly terminated during shutdown
(call the child's kill() and wait for its exit or use a Drop impl similar to
SignalClient's subprocess cleanup). Update the shutdown path that currently
drops `receive_handle`/`messenger` to also attempt a graceful kill and await the
child process to avoid orphaned marmotd processes.
- Around line 356-360: The code builds a hardcoded attachment_path and logs it
(attachment_path, attachment.file, attachment.content_type) which breaks for
non-Signal messengers like Marmot; update processing to use the
provider-supplied full path from IncomingAttachment if available (prefer
IncomingAttachment.path or similar), or else derive a messenger-aware base path
from config based on the message source, and if neither exists skip vision
processing for non-Signal providers (i.e., do not assume
"/signal-cli-data/..."). Locate the attachment handling in main.rs where
attachment_path is formed and replace the hardcoded construction with a check
for IncomingAttachment.path or a messenger-config lookup, falling back to
skipping vision for Marmot.

In `@crates/sage-core/src/marmot.rs`:
- Around line 304-438: The receive loop for marmotd (the loop reading via
reader.read_line and handling events into tx/IncomingMessage, updating
group_routes, and calling send_cmd) currently exits on Ok(0) or Err(_) and never
restarts; wrap this logic in a supervisor/reconnect routine that respawns or
reconnects marmotd, re-initializes the reader/stdout handles and any necessary
state (e.g., re-locking group_routes), and retries the loop with exponential
backoff on successive failures (reset backoff on successful start). On Ok(0) and
Err(e) branches, replace the break with signaling the supervisor to attempt a
restart (preserving/closing tx appropriately and avoiding duplicated receivers),
and ensure send_cmd and channel usage are safe across restarts. Use an
exponential backoff timer and a limited retry policy to avoid tight restart
loops.

---

Nitpick comments:
In `@crates/sage-core/src/config.rs`:
- Around line 92-106: The parsing of marmot_allowed_pubkeys silently accepts
invalid keys because crate::marmot::normalize_pubkey(&p).unwrap_or(p) falls back
to the raw value; change this to surface errors during config load: call
normalize_pubkey and on Err either (a) return/propagate an error from the config
builder so config loading fails fast, or (b) log a clear warning and filter out
the invalid entry instead of passing it through. Update the
marmot_allowed_pubkeys mapping to handle the Result from normalize_pubkey
explicitly (using map_err/and_then or matching) and either propagate the error
or emit a warning via your logger before collecting the valid keys.

In `@crates/sage-core/src/marmot.rs`:
- Around line 13-40: Replace the ad-hoc bech32_decode_payload implementation
with the established bech32 crate: in bech32_decode_payload, call bech32::decode
to parse and validate the human-readable part and checksum (handling both Bech32
and Bech32m per NIP-19), then convert the returned 5-bit base32 data to bytes
using the crate's from_base32 helper (or an equivalent 5->8 bit conversion) and
return that Vec<u8> only if checksum/encoding validation succeeds; update any
callers such as normalize_pubkey to use the validated output (remove the
strip-last-6 heuristic and rely on the crate to reject malformed inputs).

In `@justfile`:
- Around line 130-181: The messenger configuration block duplicated in the
restart recipe should be extracted into a reusable helper recipe (e.g.,
messenger_setup) and invoked from both start and restart; move the logic that
sets MESSENGER, MESSENGER_VOLUMES, and MESSENGER_ENV (the conditional branches
for "signal" and "marmot") into that helper, export or source the resulting
variables so restart and start call messenger_setup before running podman, and
update both recipes to remove the duplicated conditional and instead rely on the
shared messenger_setup helper.
- Around line 90-103: The command uses eval to run the podman command string
(the line beginning with eval podman run -d --name sage) which risks
word-splitting and shell injection when variables like $MESSENGER_VOLUMES,
$MESSENGER_ENV, MAPLE_* or BRAVE_API_KEY contain special characters; change to
build and execute the podman invocation without eval (expand variables safely),
e.g., pass arguments directly or construct an argument array and execute podman
with those arguments so quoting and whitespace are preserved and no untrusted
shell evaluation occurs.

Comment on lines +208 to +219
let (client, stdout, _child) = marmot::spawn_marmot(&marmot_config)?;
let writer = marmot::writer_handle(&client);
let group_routes = marmot::group_routes_handle(&client);
let messenger: Arc<Mutex<dyn Messenger>> = Arc::new(Mutex::new(client));

let receive_handle = tokio::spawn(async move { run_receive_loop(reader, tx).await });
let receive_handle = tokio::spawn(async move {
marmot::run_marmot_receive_loop(stdout, writer, tx, marmot_config, group_routes)
.await
});

(signal_client, receive_handle)
(messenger, receive_handle)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Marmotd child process handle is discarded.

The _child process handle from spawn_marmot is ignored with an underscore prefix. This means:

  1. The marmotd process won't be explicitly killed on Sage shutdown
  2. The process could become orphaned if Sage crashes

Consider storing the child handle and killing it during cleanup, similar to how SignalClient handles subprocess mode in its Drop implementation.

🔧 Store child handle for cleanup
-            let (client, stdout, _child) = marmot::spawn_marmot(&marmot_config)?;
+            let (client, stdout, marmot_child) = marmot::spawn_marmot(&marmot_config)?;
             let writer = marmot::writer_handle(&client);
             let group_routes = marmot::group_routes_handle(&client);
             let messenger: Arc<Mutex<dyn Messenger>> = Arc::new(Mutex::new(client));
+            
+            // Store child handle for cleanup at shutdown
+            let marmot_child = Arc::new(std::sync::Mutex::new(Some(marmot_child)));

Then in the shutdown section, kill the marmotd process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/main.rs` around lines 208 - 219, The marmotd child
handle returned from spawn_marmot is currently discarded as `_child`; change
this to store the handle (e.g., rename `_child` to `child_handle` and keep it
alongside `messenger`/`receive_handle` or inside a shared struct) and ensure it
is explicitly terminated during shutdown (call the child's kill() and wait for
its exit or use a Drop impl similar to SignalClient's subprocess cleanup).
Update the shutdown path that currently drops `receive_handle`/`messenger` to
also attempt a graceful kill and await the child process to avoid orphaned
marmotd processes.

Comment on lines 356 to +360
let attachment_path = format!(
"/signal-cli-data/.local/share/signal-cli/attachments/{}",
attachment.file
);
info!("🖼️ Image attachment detected: {} ({}) at {}", attachment.file, attachment.content_type, attachment_path);
info!("Image attachment detected: {} ({}) at {}", attachment.file, attachment.content_type, attachment_path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded Signal attachment path breaks image processing for Marmot.

The attachment path is hardcoded to /signal-cli-data/.local/share/signal-cli/attachments/, which only works for Signal. When using Marmot as the messenger, this path won't exist or contain the relevant attachments.

Consider either:

  1. Making the attachment path messenger-aware via config
  2. Skipping vision processing for Marmot until attachment support is added
  3. Including the full attachment path in IncomingAttachment from each provider
🔧 Quick fix to skip vision for non-Signal
                 // Check for image attachments and run vision pre-processing
                 let attachment_text = {
+                    // Vision processing currently only works with Signal attachments
+                    if config.messenger_type != MessengerType::Signal {
+                        None
+                    } else {
                     let image_attachment = msg.attachments.iter().find(|a| vision::is_supported_image(&a.content_type));
                     // ... rest of vision processing
+                    }
                 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/main.rs` around lines 356 - 360, The code builds a
hardcoded attachment_path and logs it (attachment_path, attachment.file,
attachment.content_type) which breaks for non-Signal messengers like Marmot;
update processing to use the provider-supplied full path from IncomingAttachment
if available (prefer IncomingAttachment.path or similar), or else derive a
messenger-aware base path from config based on the message source, and if
neither exists skip vision processing for non-Signal providers (i.e., do not
assume "/signal-cli-data/..."). Locate the attachment handling in main.rs where
attachment_path is formed and replace the hardcoded construction with a check
for IncomingAttachment.path or a messenger-config lookup, falling back to
skipping vision for Marmot.

Comment on lines +304 to +438
// Phase 3: Main receive loop
loop {
line.clear();
match reader.read_line(&mut line) {
Ok(0) => {
warn!("marmotd closed stdout");
break;
}
Ok(_) => {
let event: serde_json::Value = match serde_json::from_str(line.trim()) {
Ok(v) => v,
Err(e) => {
debug!("marmotd non-json output: {} ({})", line.trim(), e);
continue;
}
};
let event_type = event.get("type").and_then(|t| t.as_str()).unwrap_or("");

match event_type {
"welcome_received" => {
let wrapper_id = event
.get("wrapper_event_id")
.and_then(|x| x.as_str())
.unwrap_or("");
let from = event
.get("from_pubkey")
.and_then(|x| x.as_str())
.unwrap_or("unknown");
let group_name = event
.get("group_name")
.and_then(|x| x.as_str())
.unwrap_or("");
info!(
"Marmot welcome received from {} (group: {}, wrapper: {})",
from, group_name, wrapper_id
);

if config.auto_accept_welcomes && !wrapper_id.is_empty() {
let req_id = format!("auto_{}", wrapper_id);
if let Err(e) = send_cmd(json!({
"cmd": "accept_welcome",
"request_id": req_id,
"wrapper_event_id": wrapper_id
})) {
warn!("Failed to auto-accept welcome: {}", e);
}
}
}
"group_joined" => {
let group_id = event
.get("nostr_group_id")
.and_then(|x| x.as_str())
.unwrap_or("unknown");
info!("Marmot joined group: {}", group_id);
}
"message_received" => {
let from_pubkey = event
.get("from_pubkey")
.and_then(|x| x.as_str())
.unwrap_or("");
let content =
event.get("content").and_then(|x| x.as_str()).unwrap_or("");
let group_id = event
.get("nostr_group_id")
.and_then(|x| x.as_str())
.unwrap_or("");
let created_at = event
.get("created_at")
.and_then(|x| x.as_u64())
.unwrap_or(0);

if content.is_empty() {
continue;
}

let preview_end = {
let max_len = 100.min(content.len());
let mut end = max_len;
while end > 0 && !content.is_char_boundary(end) {
end -= 1;
}
end
};
info!(
"Marmot message from {} in group {}: {}",
from_pubkey,
group_id,
&content[..preview_end]
);

// Track pubkey -> latest group for reply routing.
// This means the most recent group a user messages from
// becomes the reply target. When we add multi-agent support,
// each group could maintain its own agent thread instead.
if !from_pubkey.is_empty() && !group_id.is_empty() {
if let Ok(mut routes) = group_routes.lock() {
routes.insert(from_pubkey.to_string(), group_id.to_string());
}
}

let msg = IncomingMessage {
source: from_pubkey.to_string(),
source_name: None,
message: content.to_string(),
attachments: vec![],
timestamp: created_at,
reply_to: from_pubkey.to_string(),
};

if tx.blocking_send(msg).is_err() {
error!("Failed to send marmot message to channel");
break;
}
}
"ok" | "keypackage_published" => {
debug!("marmotd: {}", line.trim());
}
"error" => {
let msg = event
.get("message")
.and_then(|m| m.as_str())
.unwrap_or("unknown");
warn!("marmotd error: {}", msg);
}
_ => {
debug!("marmotd event: {}", line.trim());
}
}
}
Err(e) => {
error!("Error reading from marmotd: {}", e);
break;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Marmot receive loop lacks reconnection resilience unlike Signal TCP mode.

When marmotd closes stdout or encounters an error, the loop simply ends with a warning (lines 308-311, 433-436). Unlike the Signal TCP receive loop which has exponential backoff and automatic reconnection, the Marmot loop will terminate and not recover.

Consider adding a similar supervisor pattern that restarts marmotd on failure:

🔧 Example reconnection pattern (in main.rs)
             let receive_handle = tokio::spawn(async move {
-                marmot::run_marmot_receive_loop(stdout, writer, tx, marmot_config, group_routes)
-                    .await
+                let mut backoff = std::time::Duration::from_millis(250);
+                let backoff_max = std::time::Duration::from_secs(60);
+                
+                loop {
+                    match marmot::run_marmot_receive_loop(/* ... */).await {
+                        Ok(()) => {
+                            warn!("Marmot receive loop exited; restarting in {:?}", backoff);
+                        }
+                        Err(e) => {
+                            warn!("Marmot receive loop error; restarting in {:?}: {}", backoff, e);
+                        }
+                    }
+                    tokio::time::sleep(backoff).await;
+                    backoff = (backoff * 2).min(backoff_max);
+                    // Re-spawn marmotd here...
+                }
             });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 304 - 438, The receive loop for
marmotd (the loop reading via reader.read_line and handling events into
tx/IncomingMessage, updating group_routes, and calling send_cmd) currently exits
on Ok(0) or Err(_) and never restarts; wrap this logic in a supervisor/reconnect
routine that respawns or reconnects marmotd, re-initializes the reader/stdout
handles and any necessary state (e.g., re-locking group_routes), and retries the
loop with exponential backoff on successive failures (reset backoff on
successful start). On Ok(0) and Err(e) branches, replace the break with
signaling the supervisor to attempt a restart (preserving/closing tx
appropriately and avoiding duplicated receivers), and ensure send_cmd and
channel usage are safe across restarts. Use an exponential backoff timer and a
limited retry policy to avoid tight restart loops.

devin-ai-integration[bot]

This comment was marked as resolved.

When MARMOT_ALLOWED_PUBKEYS=* is configured, don't pass any
--allow-pubkey args to marmotd so it accepts messages from everyone.
Previously normalize_pubkey("*") would error and crash on startup.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
crates/sage-core/src/marmot.rs (3)

401-405: Silent lock failure may hide issues.

If group_routes.lock() fails (e.g., mutex poisoned), the route update is silently skipped. Consider logging a warning when the lock fails to aid debugging.

🔧 Suggested improvement
                             if !from_pubkey.is_empty() && !group_id.is_empty() {
-                                if let Ok(mut routes) = group_routes.lock() {
-                                    routes.insert(from_pubkey.to_string(), group_id.to_string());
+                                match group_routes.lock() {
+                                    Ok(mut routes) => {
+                                        routes.insert(from_pubkey.to_string(), group_id.to_string());
+                                    }
+                                    Err(e) => {
+                                        warn!("Failed to update group routes: {}", e);
+                                    }
                                 }
                             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 401 - 405, The code silently
ignores a failed Mutex lock on group_routes (the if let Ok(mut routes) =
group_routes.lock() { routes.insert(from_pubkey.to_string(),
group_id.to_string()); }) which can hide poisoning or other locking issues;
update the block to handle the Err case by logging a warning (including
from_pubkey and group_id context and the lock error) when group_routes.lock()
returns Err so failures are visible for debugging instead of being silently
skipped.

117-142: Missing retry logic for send_message compared to Signal implementation.

The SignalClient::send_message implementation includes retry logic with up to 3 attempts and automatic reconnection on pipe/connection errors. The MarmotClient::send_message has no such resilience—a single transient write failure will propagate directly to the caller.

Consider adding similar retry logic if marmotd can experience transient communication failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 117 - 142, Add retry logic to
MarmotClient::send_message similar to SignalClient: wrap the send_cmd call in a
loop that attempts up to 3 tries (use the same next_request_id()/resolve_group()
for each attempt or regenerate request id per attempt as appropriate), and on
transient errors (pipe/connection/write errors from send_cmd) call the existing
reconnect/reopen logic used by SignalClient before retrying; if all attempts
fail, return the last error. Locate the send flow in MarmotClient::send_message
(uses resolve_group, next_request_id, and send_cmd) and implement the
retry+reconnect around the send_cmd invocation while preserving the log/preview
behavior.

15-40: Consider using the bech32 crate instead of this custom decoder.

This custom implementation skips checksum validation; while the 32-byte length check in normalize_pubkey provides some protection, it doesn't catch all transmission errors that the bech32 checksum is designed to detect. The bech32 crate (v0.11.1 available) would provide full checksum verification and reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 15 - 40, The custom bech32
decoder in bech32_decode_payload skips checksum validation; replace its logic
with the bech32 crate (e.g., bech32::decode) to perform proper checksum
verification and conversion to raw bytes, return None on any decode/convert
error, and update callers (such as normalize_pubkey) to rely on this validated
output; remove reliance on BECH32_CHARSET and the manual bit-manipulation code
and ensure error mapping/logging preserves original behavior (Option<Vec<u8>> on
failure).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/sage-core/src/marmot.rs`:
- Around line 401-405: The code silently ignores a failed Mutex lock on
group_routes (the if let Ok(mut routes) = group_routes.lock() {
routes.insert(from_pubkey.to_string(), group_id.to_string()); }) which can hide
poisoning or other locking issues; update the block to handle the Err case by
logging a warning (including from_pubkey and group_id context and the lock
error) when group_routes.lock() returns Err so failures are visible for
debugging instead of being silently skipped.
- Around line 117-142: Add retry logic to MarmotClient::send_message similar to
SignalClient: wrap the send_cmd call in a loop that attempts up to 3 tries (use
the same next_request_id()/resolve_group() for each attempt or regenerate
request id per attempt as appropriate), and on transient errors
(pipe/connection/write errors from send_cmd) call the existing reconnect/reopen
logic used by SignalClient before retrying; if all attempts fail, return the
last error. Locate the send flow in MarmotClient::send_message (uses
resolve_group, next_request_id, and send_cmd) and implement the retry+reconnect
around the send_cmd invocation while preserving the log/preview behavior.
- Around line 15-40: The custom bech32 decoder in bech32_decode_payload skips
checksum validation; replace its logic with the bech32 crate (e.g.,
bech32::decode) to perform proper checksum verification and conversion to raw
bytes, return None on any decode/convert error, and update callers (such as
normalize_pubkey) to rely on this validated output; remove reliance on
BECH32_CHARSET and the manual bit-manipulation code and ensure error
mapping/logging preserves original behavior (Option<Vec<u8>> on failure).

AnthonyRonning and others added 2 commits February 21, 2026 18:34
Store the Child handle inside MarmotClient and implement Drop to
kill + wait on the process, preventing orphaned marmotd daemons
on shutdown. Mirrors Signal's subprocess cleanup pattern.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Adds send_typing support (no more unknown variant errors),
--version flag, audio file streaming, and video call rejection.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/sage-core/src/marmot.rs (1)

154-168: Verify typing “stop” behavior with marmotd.

stop=true is a no-op. If marmotd supports explicit stop/timeout controls, consider sending them (or document that stop is unsupported).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 154 - 168, The send_typing
method currently treats stop=true as a no-op; verify marmotd's protocol and if
it supports explicit stop/timeout semantics, modify send_typing (which uses
resolve_group, next_request_id, and send_cmd) to send an explicit stop typing
command (e.g., include a "stop": true or timeout field in the JSON payload) when
stop is true, otherwise add a clear code comment or documentation stating that
marmotd does not support stopping typing and the method will be a no-op; ensure
the request_id and nostr_group_id logic remains unchanged when emitting the stop
command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/sage-core/src/marmot.rs`:
- Around line 15-63: The bech32_decode_payload currently just strips the last 6
chars and ignores checksum, so update normalize_pubkey/bech32_decode_payload to
perform full bech32 checksum validation (or delegate to the bech32 crate)
instead of trusting the payload: use bech32::decode on the full npub string (or
implement the polymod checksum verification) to verify hrp and checksum, then
extract the data bytes and convert from 5-bit words to 8-bit bytes and ensure
length==32; if decode/checksum fails, return Err from normalize_pubkey as you
already do for other errors. Ensure you reference bech32_decode_payload and
normalize_pubkey when making the change so checksum validation replaces the
current naive slice-and-ignore behavior.

---

Duplicate comments:
In `@crates/sage-core/src/main.rs`:
- Around line 352-391: The code hardcodes a Signal attachment path when an image
is found (using vision::is_supported_image and constructing attachment_path),
which breaks other providers; update the logic in the image-attachment branch to
be provider-aware: inspect each attachment's provider/metadata (e.g., a provider
or source field on msg.attachments items or an absolute path field if present)
and only construct the Signal-specific path and call vision::describe_image when
the attachment clearly originates from Signal, otherwise either skip vision
preprocessing for that attachment (return None) or resolve the correct
provider-specific path before calling vision::describe_image; adjust the branch
around attachment_path/vision::describe_image and the detection using
vision::is_supported_image so non-Signal attachments are not fed the hardcoded
Signal path.

In `@crates/sage-core/src/marmot.rs`:
- Around line 235-458: The receive loop in run_marmot_receive_loop (the
spawn_blocking closure that reads from reader and currently breaks/returns on
closed stdout or errors) exits permanently on any read error or when marmotd
closes stdout; update this so the task is supervised and restarted with backoff
instead of silently ending—wrap the current spawn_blocking logic in a
retry/supervisor loop (or extract it to a helper like run_marmot_reader) that
catches the Err/closed conditions (the places logging "marmotd closed stdout",
"marmotd closed stdout during keypackage publish", and the Err branch logging
"Error reading from marmotd") and on those cases sleeps with exponential/backoff
jitter and respawns the blocking reader (re-establishing write/send_cmd state as
needed) rather than breaking; keep existing logging and ensure failing to send
via tx still propagates or causes a supervised restart so messaging isn’t
silently lost.

---

Nitpick comments:
In `@crates/sage-core/src/marmot.rs`:
- Around line 154-168: The send_typing method currently treats stop=true as a
no-op; verify marmotd's protocol and if it supports explicit stop/timeout
semantics, modify send_typing (which uses resolve_group, next_request_id, and
send_cmd) to send an explicit stop typing command (e.g., include a "stop": true
or timeout field in the JSON payload) when stop is true, otherwise add a clear
code comment or documentation stating that marmotd does not support stopping
typing and the method will be a no-op; ensure the request_id and nostr_group_id
logic remains unchanged when emitting the stop command.

Comment on lines +15 to +63
/// Decode a bech32-encoded string (npub1...) into its raw bytes.
fn bech32_decode_payload(s: &str) -> Option<Vec<u8>> {
let pos = s.rfind('1')?;
let data_part = &s[pos + 1..];
if data_part.len() < 6 {
return None;
}
let values: Vec<u8> = data_part
.chars()
.map(|c| BECH32_CHARSET.find(c).map(|i| i as u8))
.collect::<Option<Vec<_>>>()?;
let data_values = &values[..values.len() - 6];
let mut acc: u32 = 0;
let mut bits: u32 = 0;
let mut result = Vec::new();
for &v in data_values {
acc = (acc << 5) | (v as u32);
bits += 5;
if bits >= 8 {
bits -= 8;
result.push((acc >> bits) as u8);
acc &= (1 << bits) - 1;
}
}
Some(result)
}

/// Convert an npub (bech32) or hex pubkey string to hex.
/// Accepts both "npub1..." and raw 64-char hex.
pub fn normalize_pubkey(input: &str) -> Result<String> {
let trimmed = input.trim();
if trimmed.starts_with("npub1") {
let bytes =
bech32_decode_payload(trimmed).ok_or_else(|| anyhow!("invalid npub: {}", trimmed))?;
if bytes.len() != 32 {
return Err(anyhow!(
"npub decoded to {} bytes, expected 32",
bytes.len()
));
}
Ok(bytes.iter().map(|b| format!("{:02x}", b)).collect())
} else if trimmed.len() == 64 && trimmed.chars().all(|c| c.is_ascii_hexdigit()) {
Ok(trimmed.to_lowercase())
} else {
Err(anyhow!(
"invalid pubkey (expected npub1... or 64-char hex): {}",
trimmed
))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for bech32_decode_payload and related code in the repository
cd crates/sage-core && find . -name "*.rs" -type f | head -20

Repository: AnthonyRonning/sage

Length of output: 486


🏁 Script executed:

# Check the full marmot.rs file to understand the context
wc -l crates/sage-core/src/marmot.rs

Repository: AnthonyRonning/sage

Length of output: 97


🏁 Script executed:

# Read the marmot.rs file to see the full implementation
cat -n crates/sage-core/src/marmot.rs

Repository: AnthonyRonning/sage

Length of output: 21723


🏁 Script executed:

# Search for bech32-related dependencies in Cargo.toml
rg "bech32" --type toml

Repository: AnthonyRonning/sage

Length of output: 45


🏁 Script executed:

# Search for MARMOT_ALLOWED_PUBKEYS usage
rg "MARMOT_ALLOWED_PUBKEYS" --type rust

Repository: AnthonyRonning/sage

Length of output: 292


🏁 Script executed:

# Search for any bech32 tests
rg "bech32" --type rust -A 5

Repository: AnthonyRonning/sage

Length of output: 1410


🏁 Script executed:

# Check the full Cargo.toml to see current dependencies
find . -name "Cargo.toml" -path "*/sage-core/*" | xargs cat

Repository: AnthonyRonning/sage

Length of output: 1136


🏁 Script executed:

# Search for any bech32 references in the codebase
rg "bech32" --type rust -C 3

Repository: AnthonyRonning/sage

Length of output: 1488


🌐 Web query:

bech32 rust crate crates.io

💡 Result:

The Rust Bech32 crate you likely want is bech32 (from the rust-bitcoin ecosystem). (docs.rs)

  • Docs: docs.rs/bech32 (latest shown as 0.11.1) (docs.rs)
  • What it does: encode/decode Bech32 and supports Bech32m checksums (docs.rs)

Add it to Cargo.toml:

[dependencies]
bech32 = "0.11.1"

(Use the version you see as “latest” on docs.rs/crates.io.)

If you specifically need Bitcoin SegWit address encode/decode, the bech32 docs note that’s handled by bitcoin-bech32. (docs.rs)

Security note: crates.io has had typosquatting/malicious crates incidents; double-check the crate name and publisher before adding dependencies. (blog.rust-lang.org)

Citations:


Validate bech32 checksum for npub inputs.

bech32_decode_payload discards the checksum (last 6 characters) without validation, allowing typos in MARMOT_ALLOWED_PUBKEYS to decode to unintended pubkeys. A typo that remains within the bech32 charset and decodes to exactly 32 bytes would be silently accepted. Use a spec-compliant decoder such as the bech32 crate (available on crates.io) or add explicit checksum verification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 15 - 63, The
bech32_decode_payload currently just strips the last 6 chars and ignores
checksum, so update normalize_pubkey/bech32_decode_payload to perform full
bech32 checksum validation (or delegate to the bech32 crate) instead of trusting
the payload: use bech32::decode on the full npub string (or implement the
polymod checksum verification) to verify hrp and checksum, then extract the data
bytes and convert from 5-bit words to 8-bit bytes and ensure length==32; if
decode/checksum fails, return Err from normalize_pubkey as you already do for
other errors. Ensure you reference bech32_decode_payload and normalize_pubkey
when making the change so checksum validation replaces the current naive
slice-and-ignore behavior.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/sage-core/src/main.rs (2)

156-179: Consider resetting backoff on prolonged successful connection.

The exponential backoff never resets after a successful period of operation. If the connection runs stably for hours then drops, the backoff will resume from the previous value. Consider resetting backoff to the initial value when a connection has been stable for some threshold duration.

♻️ Optional improvement to reset backoff
                 let receive_handle = tokio::spawn(async move {
                     let mut backoff = std::time::Duration::from_millis(250);
                     let backoff_max = std::time::Duration::from_secs(60);
+                    let backoff_initial = std::time::Duration::from_millis(250);

                     loop {
+                        let start = std::time::Instant::now();
                         match run_receive_loop_tcp(&host, port, &account, tx.clone()).await {
                             Ok(()) => {
+                                // Reset backoff if connection was stable for >5 minutes
+                                if start.elapsed() > std::time::Duration::from_secs(300) {
+                                    backoff = backoff_initial;
+                                }
                                 warn!(
                                     "Signal TCP receive loop exited unexpectedly; restarting in {:?}",
                                     backoff
                                 );
                             }
                             Err(e) => {
+                                if start.elapsed() > std::time::Duration::from_secs(300) {
+                                    backoff = backoff_initial;
+                                }
                                 warn!(
                                     "Signal TCP receive loop error; restarting in {:?}: {}",
                                     backoff, e
                                 );
                             }
                         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/main.rs` around lines 156 - 179, The receive loop's
exponential backoff (variables backoff/backoff_max in the tokio::spawn block
around receive_handle and run_receive_loop_tcp) never resets after long
successful connections; modify the loop to measure how long each run of
run_receive_loop_tcp(&host, port, &account, tx.clone()) lasted (e.g., use
Instant::now() before the await and check elapsed after it returns) and if the
elapsed time exceeds a chosen threshold (e.g., 30s or make it configurable)
reset backoff to the initial Duration::from_millis(250); otherwise keep the
exponential backoff behavior on failures, and continue to sleep and double
backoff as currently implemented.

541-554: Abrupt shutdown may leave receive loop in inconsistent state.

Using abort() forcefully terminates the receive task without allowing graceful cleanup. Consider using a cancellation token or shutdown channel to signal the receive loops to exit cleanly before aborting, especially for Marmot which may have pending JSONL I/O.

♻️ Graceful shutdown pattern
+    // Create shutdown signal
+    let (shutdown_tx, mut shutdown_rx) = tokio::sync::oneshot::channel::<()>();
+
     // Main event loop
     loop {
         tokio::select! {
             // ... existing arms ...

             // Handle shutdown
             _ = tokio::signal::ctrl_c() => {
                 info!("Shutting down...");
+                let _ = shutdown_tx.send(());
                 break;
             }
         }
     }

     // Cleanup
-    receive_handle.abort();
+    // Give receive loop a moment to exit gracefully
+    tokio::time::sleep(std::time::Duration::from_millis(100)).await;
+    if !receive_handle.is_finished() {
+        receive_handle.abort();
+    }
     info!("🌿 Sage has shut down.");

Note: The receive loops would need to incorporate shutdown_rx to respond to the signal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/main.rs` around lines 541 - 554, The code currently
calls receive_handle.abort() which forcefully kills the receive task; instead
implement a graceful shutdown signal (e.g., a tokio::sync::watch or broadcast
channel or a CancellationToken) that the receive loop(s) listen to (add a
shutdown_rx parameter to the Marmot receive loop and any other receive tasks),
send the shutdown signal from the ctrl_c handler, then await
receive_handle.join().or_timeout to let the loop flush JSONL I/O and cleanup
before forcibly aborting as a last resort; update the receive loop function
signatures and the task spawn sites to accept the shutdown receiver and return
cleanly on signal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/sage-core/src/main.rs`:
- Around line 352-360: The code in the image-attachment handling block (inside
main.rs where msg.attachments, vision::is_supported_image, and the local
variable attachment_path are used) hardcodes Signal's attachments directory;
change it to derive the attachment location from the attachment metadata or from
a configurable source instead: use a property on the attachment (e.g.,
attachment.path or attachment.url) if available, or construct the path from a
configurable "attachments_dir" (env/config) rather than the hardcoded
"/signal-cli-data/…", and add a clear fallback/error log if neither is present
so vision pre-processing can find the file or fail noisily.

---

Nitpick comments:
In `@crates/sage-core/src/main.rs`:
- Around line 156-179: The receive loop's exponential backoff (variables
backoff/backoff_max in the tokio::spawn block around receive_handle and
run_receive_loop_tcp) never resets after long successful connections; modify the
loop to measure how long each run of run_receive_loop_tcp(&host, port, &account,
tx.clone()) lasted (e.g., use Instant::now() before the await and check elapsed
after it returns) and if the elapsed time exceeds a chosen threshold (e.g., 30s
or make it configurable) reset backoff to the initial
Duration::from_millis(250); otherwise keep the exponential backoff behavior on
failures, and continue to sleep and double backoff as currently implemented.
- Around line 541-554: The code currently calls receive_handle.abort() which
forcefully kills the receive task; instead implement a graceful shutdown signal
(e.g., a tokio::sync::watch or broadcast channel or a CancellationToken) that
the receive loop(s) listen to (add a shutdown_rx parameter to the Marmot receive
loop and any other receive tasks), send the shutdown signal from the ctrl_c
handler, then await receive_handle.join().or_timeout to let the loop flush JSONL
I/O and cleanup before forcibly aborting as a last resort; update the receive
loop function signatures and the task spawn sites to accept the shutdown
receiver and return cleanly on signal.

devin-ai-integration[bot]

This comment was marked as resolved.

Add reply_context column to chat_contexts table to store transport-
specific routing info (e.g. Marmot nostr_group_id for a pubkey).
Updated on each incoming message, restored into group_routes on
startup. Ensures scheduled tasks can deliver after restarts.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/sage-core/src/agent_manager.rs (1)

327-341: update_reply_context silently succeeds when no row matches.

If signal_identifier doesn't exist in chat_contexts, the UPDATE will affect 0 rows and return Ok(()). This could mask bugs where reply context is set before the chat context is created.

Consider checking the affected row count or documenting that the context must exist:

🔧 Optional: Verify at least one row was updated
     pub fn update_reply_context(&self, signal_identifier: &str, reply_ctx: &str) -> Result<()> {
         let mut conn = self
             .db_conn
             .lock()
             .map_err(|_| anyhow::anyhow!("Failed to acquire database lock"))?;

-        diesel::update(
+        let rows_updated = diesel::update(
             chat_contexts::table.filter(chat_contexts::signal_identifier.eq(signal_identifier)),
         )
         .set(chat_contexts::reply_context.eq(Some(reply_ctx)))
         .execute(&mut *conn)?;

+        if rows_updated == 0 {
+            tracing::warn!("update_reply_context: no row found for {}", signal_identifier);
+        }
+
         Ok(())
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/agent_manager.rs` around lines 327 - 341, The
update_reply_context function currently ignores the number of rows affected by
the diesel::update on chat_contexts::table and will silently succeed when 0 rows
are updated; change update_reply_context to capture the usize returned by
.execute(&mut *conn) and check it, returning an Err (e.g. anyhow::anyhow!("No
chat_context found for signal_identifier: {}", signal_identifier)) when the
count is 0 so callers know the signal_identifier must exist before setting
chat_contexts::reply_context.
crates/sage-core/src/marmot.rs (1)

302-312: Keypackage publish failure is logged but not propagated.

If the keypackage publish fails (lines 302-311), the loop breaks and continues to the main receive phase. This may be intentional (graceful degradation), but incoming messages requiring a keypackage exchange may fail silently.

Consider either retrying or returning an error if keypackage publishing is critical for operation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/sage-core/src/marmot.rs` around lines 302 - 312, The loop currently
treats a failed keypackage publish (when event_type == "error" and request_id ==
"init_kp") by logging via warn! and breaking, which hides the failure; instead,
modify the handler around that check in marmot.rs so the failure is propagated
or retried: replace the warn! + break behavior with either (a) return an
Err(...) from the enclosing function (or propagate the underlying error) so
callers can handle the failure, or (b) implement a bounded retry (e.g., 3
attempts with backoff) around the keypackage publish before returning an error;
update the branch that inspects event.get("message") / request_id == "init_kp"
to perform one of these actions and ensure the function signature and callers
accept the propagated error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/sage-core/src/main.rs`:
- Around line 208-235: The marmot receive path should be made resilient by
supervising and auto-restarting the spawn + receive loop when marmotd
disconnects: replace the current one-shot spawn of marmot::spawn_marmot and
tokio::spawn of marmot::run_marmot_receive_loop with a supervisor loop that
calls marmot::spawn_marmot, restores routes into group_routes (as currently done
with agent_manager.load_reply_contexts()), then spawns/awaits
marmot::run_marmot_receive_loop; on any error or exit retry with exponential
backoff and logging, recreate the messenger Arc<Mutex<dyn Messenger>> and
receive_handle for each restart, and ensure shared state (group_routes, tx) is
preserved across restarts.
- Around line 379-382: The hardcoded attachment_path string prevents Marmot from
finding attachments; update the code that builds attachment_path (the variable
currently formatted from "/signal-cli-data/...") to compute the base attachments
directory based on the active messenger (e.g., read MESSENGER env or use
existing config) and then join attachment.file using PathBuf/Path join rather
than a hardcoded Signal path; modify the logic where attachment_path is created
so it selects the correct base directory for "signal" vs "marmot" (or a
configurable ATTACHMENTS_DIR) and then constructs the full path by joining
attachment.file.

In `@crates/sage-core/src/marmot.rs`:
- Around line 15-40: The bech32_decode_payload function currently ignores the
6-character checksum and thus accepts payloads with invalid checksums; update
bech32_decode_payload to validate the Bech32 checksum before returning the
decoded payload (and return None on checksum failure). Specifically, when
parsing data_part in bech32_decode_payload, decode all characters (including the
last 6 checksum chars) into values using BECH32_CHARSET, split values into
payload and checksum, compute the Bech32 polymod checksum over the
human-readable part (derived from the prefix in the original string) and the
data values per the Bech32 spec, compare it to the decoded checksum, and only
proceed to convert and return the payload bytes if the checksum matches;
otherwise return None. Ensure this validation is applied where
MARMOT_ALLOWED_PUBKEYS are checked so typos with valid charset but bad checksum
are rejected.
- Around line 316-451: The receive loop currently breaks on reader.read_line
Ok(0) ("marmotd closed stdout") or Err and never recovers; replace the immediate
break with a reconnection routine that uses exponential backoff and retries
until success (e.g., on those branches call a reconnect function instead of
breaking). Implement or call a reconnect_marmot / start_marmot routine that
reinitializes the marmot process and reader (and re-sends any necessary setup
like auto_accept_welcomes via send_cmd), log each attempt and error, and upon
successful reconnection reset any transient state if needed (e.g., reader, any
per-connection buffers) and continue the main loop; refer to the
reader.read_line handling, the warn!("marmotd closed stdout") and error!("Error
reading from marmotd") branches, plus symbols group_routes and tx to ensure
routing/state is preserved across reconnects.

---

Nitpick comments:
In `@crates/sage-core/src/agent_manager.rs`:
- Around line 327-341: The update_reply_context function currently ignores the
number of rows affected by the diesel::update on chat_contexts::table and will
silently succeed when 0 rows are updated; change update_reply_context to capture
the usize returned by .execute(&mut *conn) and check it, returning an Err (e.g.
anyhow::anyhow!("No chat_context found for signal_identifier: {}",
signal_identifier)) when the count is 0 so callers know the signal_identifier
must exist before setting chat_contexts::reply_context.

In `@crates/sage-core/src/marmot.rs`:
- Around line 302-312: The loop currently treats a failed keypackage publish
(when event_type == "error" and request_id == "init_kp") by logging via warn!
and breaking, which hides the failure; instead, modify the handler around that
check in marmot.rs so the failure is propagated or retried: replace the warn! +
break behavior with either (a) return an Err(...) from the enclosing function
(or propagate the underlying error) so callers can handle the failure, or (b)
implement a bounded retry (e.g., 3 attempts with backoff) around the keypackage
publish before returning an error; update the branch that inspects
event.get("message") / request_id == "init_kp" to perform one of these actions
and ensure the function signature and callers accept the propagated error.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

AnthonyRonning and others added 2 commits February 21, 2026 20:45
Phases 1 (wait for ready) and 2 (keypackage publish) now use the same
continue-on-parse-error pattern as Phase 3, preventing silent receive
loop death if marmotd outputs non-JSON lines during startup.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Respawns marmotd on any failure (process exit, read error) with
exponential backoff (250ms to 60s). Each restart re-initializes
stdin/stdout handles, kills the old process, and re-runs all three
phases (ready, keypackage, message loop). Mirrors Signal TCP mode's
supervisor pattern. Stops retrying only when the message channel is
closed (Sage shutting down).

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Replace spawn_marmot() with new_marmot_client() which only validates
config and creates placeholder handles. The supervisor loop is now
solely responsible for spawning marmotd, eliminating the race where
two instances ran against the same state directory.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
@AnthonyRonning AnthonyRonning merged commit 1bfbd53 into master Feb 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant