Add marmotd (MLS over Nostr) as optional messaging provider#10
Add marmotd (MLS over Nostr) as optional messaging provider#10AnthonyRonning merged 11 commits intomasterfrom
Conversation
Introduce a Messenger trait abstraction that allows Sage to use either Signal (default) or Marmot for message transport. Marmot uses the marmotd sidecar daemon from pika, communicating via JSONL over stdio to provide MLS-encrypted messaging over Nostr relays. Key changes: - messenger.rs: Messenger trait with shared IncomingMessage/IncomingAttachment types - marmot.rs: MarmotClient spawning marmotd subprocess, JSONL protocol, npub support - config.rs: MESSENGER env var (signal|marmot) with marmot-specific config fields - main.rs: Messenger-polymorphic event loop using Arc<Mutex<dyn Messenger>> - signal.rs: Refactored to implement Messenger trait, uses shared types - Dockerfile: Build marmotd from pika source to avoid GLIBC mismatch - justfile: Messenger-aware start/restart/stop with conditional signal-cli Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Treat each Nostr pubkey as a stable identity (like Signal UUID) so the same user gets the same agent regardless of which MLS group they message from. MarmotClient maintains a pubkey -> latest group_id routing table so replies reach the correct group. This keeps parity with Signal's 1:1 identity model. When multi-agent support lands, per-group agent threads can be layered on top while sharing a parent identity for cross-thread memory. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Messenger abstraction with Signal and new Marmot backends, Marmot client/daemon orchestration, config fields and DB migration for reply_context, refactors main runtime to use Messenger, updates build/dev tooling (Dockerfile, justfile) to optionally build/run marmotd. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Main as Main.rs
participant Config as Config
participant Agent as AgentManager
participant Messenger as Messenger Trait
participant Signal as SignalClient
participant Marmot as MarmotClient
participant Storage as Storage/LLM
User->>Main: Start application
Main->>Config: Load config (MESSENGER)
alt Messenger = Signal
Main->>Signal: Initialize Signal client
Main->>Main: Spawn Signal receive loop
else Messenger = Marmot
Main->>Marmot: Spawn marmotd daemon
Marmot->>Marmot: Publish keypackage / join relays
Main->>Main: Spawn Marmot receive loop
end
loop Receive & process
Signal->>Main: IncomingMessage
Marmot->>Main: IncomingMessage
Main->>Config: allowed_users()
Main->>Agent: get_or_create_agent(reply_to)
Agent->>Storage: Generate response
Storage-->>Agent: Response
Agent->>Messenger: send_message(recipient, text)
alt Backend = Signal
Messenger->>Signal: deliver via SignalClient
else Backend = Marmot
Messenger->>Marmot: deliver via MarmotClient (group routing)
end
end
loop Periodic health
Main->>Messenger: refresh()
Messenger->>Signal: refresh / reconnect
Messenger->>Marmot: health check
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
justfile (2)
130-181: Duplication betweenstartandrestarttasks.The messenger setup logic (lines 139-163) is largely duplicated from the
starttask. Consider extracting the common messenger configuration into a helper recipe to reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@justfile` around lines 130 - 181, The messenger configuration block duplicated in the restart recipe should be extracted into a reusable helper recipe (e.g., messenger_setup) and invoked from both start and restart; move the logic that sets MESSENGER, MESSENGER_VOLUMES, and MESSENGER_ENV (the conditional branches for "signal" and "marmot") into that helper, export or source the resulting variables so restart and start call messenger_setup before running podman, and update both recipes to remove the duplicated conditional and instead rely on the shared messenger_setup helper.
90-103: Consider the security implications ofevalwith environment variables.Using
evalwith dynamically constructed command strings works here since the environment variables are sourced from a user-controlled.envfile. However, this pattern can be fragile if any variable contains special shell characters.For future maintainability, consider constructing the podman command using arrays or a more explicit approach, though this is low priority for a development justfile.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@justfile` around lines 90 - 103, The command uses eval to run the podman command string (the line beginning with eval podman run -d --name sage) which risks word-splitting and shell injection when variables like $MESSENGER_VOLUMES, $MESSENGER_ENV, MAPLE_* or BRAVE_API_KEY contain special characters; change to build and execute the podman invocation without eval (expand variables safely), e.g., pass arguments directly or construct an argument array and execute podman with those arguments so quoting and whitespace are preserved and no untrusted shell evaluation occurs.crates/sage-core/src/config.rs (1)
92-106: Silent fallback for invalid pubkeys may mask configuration errors.When
normalize_pubkeyfails, the code falls back to the original unparsed value via.unwrap_or(p). This means invalid pubkeys will be silently passed through and likely fail later at runtime when marmotd rejects them.Consider logging a warning or failing fast during config loading to surface configuration mistakes early:
🔧 Suggested improvement
.map(|p| { if p == "*" { p } else { - crate::marmot::normalize_pubkey(&p).unwrap_or(p) + match crate::marmot::normalize_pubkey(&p) { + Ok(normalized) => normalized, + Err(e) => { + tracing::warn!("Invalid pubkey in MARMOT_ALLOWED_PUBKEYS '{}': {} - using as-is", p, e); + p + } + } } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/config.rs` around lines 92 - 106, The parsing of marmot_allowed_pubkeys silently accepts invalid keys because crate::marmot::normalize_pubkey(&p).unwrap_or(p) falls back to the raw value; change this to surface errors during config load: call normalize_pubkey and on Err either (a) return/propagate an error from the config builder so config loading fails fast, or (b) log a clear warning and filter out the invalid entry instead of passing it through. Update the marmot_allowed_pubkeys mapping to handle the Result from normalize_pubkey explicitly (using map_err/and_then or matching) and either propagate the error or emit a warning via your logger before collecting the valid keys.crates/sage-core/src/marmot.rs (1)
13-40: Consider using the establishedbech32crate for npub decoding.The custom
bech32_decode_payloadimplementation works for the specific use case but skips checksum validation—it simply strips the last 6 characters without verifying they match the expected bech32 checksum. Using the standardbech32crate would:
- Automatically validate the checksum (supporting both Bech32 and Bech32m formats per NIP-19)
- Catch malformed or corrupted inputs that pass the current length check
- Reduce maintenance burden and align with ecosystem standards
While the current
bytes.len() != 32check innormalize_pubkeyprovides some protection, checksum validation is the proper way to ensure bech32 integrity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/marmot.rs` around lines 13 - 40, Replace the ad-hoc bech32_decode_payload implementation with the established bech32 crate: in bech32_decode_payload, call bech32::decode to parse and validate the human-readable part and checksum (handling both Bech32 and Bech32m per NIP-19), then convert the returned 5-bit base32 data to bytes using the crate's from_base32 helper (or an equivalent 5->8 bit conversion) and return that Vec<u8> only if checksum/encoding validation succeeds; update any callers such as normalize_pubkey to use the validated output (remove the strip-last-6 heuristic and rely on the crate to reject malformed inputs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sage-core/src/main.rs`:
- Around line 208-219: The marmotd child handle returned from spawn_marmot is
currently discarded as `_child`; change this to store the handle (e.g., rename
`_child` to `child_handle` and keep it alongside `messenger`/`receive_handle` or
inside a shared struct) and ensure it is explicitly terminated during shutdown
(call the child's kill() and wait for its exit or use a Drop impl similar to
SignalClient's subprocess cleanup). Update the shutdown path that currently
drops `receive_handle`/`messenger` to also attempt a graceful kill and await the
child process to avoid orphaned marmotd processes.
- Around line 356-360: The code builds a hardcoded attachment_path and logs it
(attachment_path, attachment.file, attachment.content_type) which breaks for
non-Signal messengers like Marmot; update processing to use the
provider-supplied full path from IncomingAttachment if available (prefer
IncomingAttachment.path or similar), or else derive a messenger-aware base path
from config based on the message source, and if neither exists skip vision
processing for non-Signal providers (i.e., do not assume
"/signal-cli-data/..."). Locate the attachment handling in main.rs where
attachment_path is formed and replace the hardcoded construction with a check
for IncomingAttachment.path or a messenger-config lookup, falling back to
skipping vision for Marmot.
In `@crates/sage-core/src/marmot.rs`:
- Around line 304-438: The receive loop for marmotd (the loop reading via
reader.read_line and handling events into tx/IncomingMessage, updating
group_routes, and calling send_cmd) currently exits on Ok(0) or Err(_) and never
restarts; wrap this logic in a supervisor/reconnect routine that respawns or
reconnects marmotd, re-initializes the reader/stdout handles and any necessary
state (e.g., re-locking group_routes), and retries the loop with exponential
backoff on successive failures (reset backoff on successful start). On Ok(0) and
Err(e) branches, replace the break with signaling the supervisor to attempt a
restart (preserving/closing tx appropriately and avoiding duplicated receivers),
and ensure send_cmd and channel usage are safe across restarts. Use an
exponential backoff timer and a limited retry policy to avoid tight restart
loops.
---
Nitpick comments:
In `@crates/sage-core/src/config.rs`:
- Around line 92-106: The parsing of marmot_allowed_pubkeys silently accepts
invalid keys because crate::marmot::normalize_pubkey(&p).unwrap_or(p) falls back
to the raw value; change this to surface errors during config load: call
normalize_pubkey and on Err either (a) return/propagate an error from the config
builder so config loading fails fast, or (b) log a clear warning and filter out
the invalid entry instead of passing it through. Update the
marmot_allowed_pubkeys mapping to handle the Result from normalize_pubkey
explicitly (using map_err/and_then or matching) and either propagate the error
or emit a warning via your logger before collecting the valid keys.
In `@crates/sage-core/src/marmot.rs`:
- Around line 13-40: Replace the ad-hoc bech32_decode_payload implementation
with the established bech32 crate: in bech32_decode_payload, call bech32::decode
to parse and validate the human-readable part and checksum (handling both Bech32
and Bech32m per NIP-19), then convert the returned 5-bit base32 data to bytes
using the crate's from_base32 helper (or an equivalent 5->8 bit conversion) and
return that Vec<u8> only if checksum/encoding validation succeeds; update any
callers such as normalize_pubkey to use the validated output (remove the
strip-last-6 heuristic and rely on the crate to reject malformed inputs).
In `@justfile`:
- Around line 130-181: The messenger configuration block duplicated in the
restart recipe should be extracted into a reusable helper recipe (e.g.,
messenger_setup) and invoked from both start and restart; move the logic that
sets MESSENGER, MESSENGER_VOLUMES, and MESSENGER_ENV (the conditional branches
for "signal" and "marmot") into that helper, export or source the resulting
variables so restart and start call messenger_setup before running podman, and
update both recipes to remove the duplicated conditional and instead rely on the
shared messenger_setup helper.
- Around line 90-103: The command uses eval to run the podman command string
(the line beginning with eval podman run -d --name sage) which risks
word-splitting and shell injection when variables like $MESSENGER_VOLUMES,
$MESSENGER_ENV, MAPLE_* or BRAVE_API_KEY contain special characters; change to
build and execute the podman invocation without eval (expand variables safely),
e.g., pass arguments directly or construct an argument array and execute podman
with those arguments so quoting and whitespace are preserved and no untrusted
shell evaluation occurs.
crates/sage-core/src/main.rs
Outdated
| let (client, stdout, _child) = marmot::spawn_marmot(&marmot_config)?; | ||
| let writer = marmot::writer_handle(&client); | ||
| let group_routes = marmot::group_routes_handle(&client); | ||
| let messenger: Arc<Mutex<dyn Messenger>> = Arc::new(Mutex::new(client)); | ||
|
|
||
| let receive_handle = tokio::spawn(async move { run_receive_loop(reader, tx).await }); | ||
| let receive_handle = tokio::spawn(async move { | ||
| marmot::run_marmot_receive_loop(stdout, writer, tx, marmot_config, group_routes) | ||
| .await | ||
| }); | ||
|
|
||
| (signal_client, receive_handle) | ||
| (messenger, receive_handle) | ||
| } |
There was a problem hiding this comment.
Marmotd child process handle is discarded.
The _child process handle from spawn_marmot is ignored with an underscore prefix. This means:
- The marmotd process won't be explicitly killed on Sage shutdown
- The process could become orphaned if Sage crashes
Consider storing the child handle and killing it during cleanup, similar to how SignalClient handles subprocess mode in its Drop implementation.
🔧 Store child handle for cleanup
- let (client, stdout, _child) = marmot::spawn_marmot(&marmot_config)?;
+ let (client, stdout, marmot_child) = marmot::spawn_marmot(&marmot_config)?;
let writer = marmot::writer_handle(&client);
let group_routes = marmot::group_routes_handle(&client);
let messenger: Arc<Mutex<dyn Messenger>> = Arc::new(Mutex::new(client));
+
+ // Store child handle for cleanup at shutdown
+ let marmot_child = Arc::new(std::sync::Mutex::new(Some(marmot_child)));Then in the shutdown section, kill the marmotd process.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sage-core/src/main.rs` around lines 208 - 219, The marmotd child
handle returned from spawn_marmot is currently discarded as `_child`; change
this to store the handle (e.g., rename `_child` to `child_handle` and keep it
alongside `messenger`/`receive_handle` or inside a shared struct) and ensure it
is explicitly terminated during shutdown (call the child's kill() and wait for
its exit or use a Drop impl similar to SignalClient's subprocess cleanup).
Update the shutdown path that currently drops `receive_handle`/`messenger` to
also attempt a graceful kill and await the child process to avoid orphaned
marmotd processes.
| let attachment_path = format!( | ||
| "/signal-cli-data/.local/share/signal-cli/attachments/{}", | ||
| attachment.file | ||
| ); | ||
| info!("🖼️ Image attachment detected: {} ({}) at {}", attachment.file, attachment.content_type, attachment_path); | ||
| info!("Image attachment detected: {} ({}) at {}", attachment.file, attachment.content_type, attachment_path); |
There was a problem hiding this comment.
Hardcoded Signal attachment path breaks image processing for Marmot.
The attachment path is hardcoded to /signal-cli-data/.local/share/signal-cli/attachments/, which only works for Signal. When using Marmot as the messenger, this path won't exist or contain the relevant attachments.
Consider either:
- Making the attachment path messenger-aware via config
- Skipping vision processing for Marmot until attachment support is added
- Including the full attachment path in
IncomingAttachmentfrom each provider
🔧 Quick fix to skip vision for non-Signal
// Check for image attachments and run vision pre-processing
let attachment_text = {
+ // Vision processing currently only works with Signal attachments
+ if config.messenger_type != MessengerType::Signal {
+ None
+ } else {
let image_attachment = msg.attachments.iter().find(|a| vision::is_supported_image(&a.content_type));
// ... rest of vision processing
+ }
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sage-core/src/main.rs` around lines 356 - 360, The code builds a
hardcoded attachment_path and logs it (attachment_path, attachment.file,
attachment.content_type) which breaks for non-Signal messengers like Marmot;
update processing to use the provider-supplied full path from IncomingAttachment
if available (prefer IncomingAttachment.path or similar), or else derive a
messenger-aware base path from config based on the message source, and if
neither exists skip vision processing for non-Signal providers (i.e., do not
assume "/signal-cli-data/..."). Locate the attachment handling in main.rs where
attachment_path is formed and replace the hardcoded construction with a check
for IncomingAttachment.path or a messenger-config lookup, falling back to
skipping vision for Marmot.
crates/sage-core/src/marmot.rs
Outdated
| // Phase 3: Main receive loop | ||
| loop { | ||
| line.clear(); | ||
| match reader.read_line(&mut line) { | ||
| Ok(0) => { | ||
| warn!("marmotd closed stdout"); | ||
| break; | ||
| } | ||
| Ok(_) => { | ||
| let event: serde_json::Value = match serde_json::from_str(line.trim()) { | ||
| Ok(v) => v, | ||
| Err(e) => { | ||
| debug!("marmotd non-json output: {} ({})", line.trim(), e); | ||
| continue; | ||
| } | ||
| }; | ||
| let event_type = event.get("type").and_then(|t| t.as_str()).unwrap_or(""); | ||
|
|
||
| match event_type { | ||
| "welcome_received" => { | ||
| let wrapper_id = event | ||
| .get("wrapper_event_id") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or(""); | ||
| let from = event | ||
| .get("from_pubkey") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or("unknown"); | ||
| let group_name = event | ||
| .get("group_name") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or(""); | ||
| info!( | ||
| "Marmot welcome received from {} (group: {}, wrapper: {})", | ||
| from, group_name, wrapper_id | ||
| ); | ||
|
|
||
| if config.auto_accept_welcomes && !wrapper_id.is_empty() { | ||
| let req_id = format!("auto_{}", wrapper_id); | ||
| if let Err(e) = send_cmd(json!({ | ||
| "cmd": "accept_welcome", | ||
| "request_id": req_id, | ||
| "wrapper_event_id": wrapper_id | ||
| })) { | ||
| warn!("Failed to auto-accept welcome: {}", e); | ||
| } | ||
| } | ||
| } | ||
| "group_joined" => { | ||
| let group_id = event | ||
| .get("nostr_group_id") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or("unknown"); | ||
| info!("Marmot joined group: {}", group_id); | ||
| } | ||
| "message_received" => { | ||
| let from_pubkey = event | ||
| .get("from_pubkey") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or(""); | ||
| let content = | ||
| event.get("content").and_then(|x| x.as_str()).unwrap_or(""); | ||
| let group_id = event | ||
| .get("nostr_group_id") | ||
| .and_then(|x| x.as_str()) | ||
| .unwrap_or(""); | ||
| let created_at = event | ||
| .get("created_at") | ||
| .and_then(|x| x.as_u64()) | ||
| .unwrap_or(0); | ||
|
|
||
| if content.is_empty() { | ||
| continue; | ||
| } | ||
|
|
||
| let preview_end = { | ||
| let max_len = 100.min(content.len()); | ||
| let mut end = max_len; | ||
| while end > 0 && !content.is_char_boundary(end) { | ||
| end -= 1; | ||
| } | ||
| end | ||
| }; | ||
| info!( | ||
| "Marmot message from {} in group {}: {}", | ||
| from_pubkey, | ||
| group_id, | ||
| &content[..preview_end] | ||
| ); | ||
|
|
||
| // Track pubkey -> latest group for reply routing. | ||
| // This means the most recent group a user messages from | ||
| // becomes the reply target. When we add multi-agent support, | ||
| // each group could maintain its own agent thread instead. | ||
| if !from_pubkey.is_empty() && !group_id.is_empty() { | ||
| if let Ok(mut routes) = group_routes.lock() { | ||
| routes.insert(from_pubkey.to_string(), group_id.to_string()); | ||
| } | ||
| } | ||
|
|
||
| let msg = IncomingMessage { | ||
| source: from_pubkey.to_string(), | ||
| source_name: None, | ||
| message: content.to_string(), | ||
| attachments: vec![], | ||
| timestamp: created_at, | ||
| reply_to: from_pubkey.to_string(), | ||
| }; | ||
|
|
||
| if tx.blocking_send(msg).is_err() { | ||
| error!("Failed to send marmot message to channel"); | ||
| break; | ||
| } | ||
| } | ||
| "ok" | "keypackage_published" => { | ||
| debug!("marmotd: {}", line.trim()); | ||
| } | ||
| "error" => { | ||
| let msg = event | ||
| .get("message") | ||
| .and_then(|m| m.as_str()) | ||
| .unwrap_or("unknown"); | ||
| warn!("marmotd error: {}", msg); | ||
| } | ||
| _ => { | ||
| debug!("marmotd event: {}", line.trim()); | ||
| } | ||
| } | ||
| } | ||
| Err(e) => { | ||
| error!("Error reading from marmotd: {}", e); | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Marmot receive loop lacks reconnection resilience unlike Signal TCP mode.
When marmotd closes stdout or encounters an error, the loop simply ends with a warning (lines 308-311, 433-436). Unlike the Signal TCP receive loop which has exponential backoff and automatic reconnection, the Marmot loop will terminate and not recover.
Consider adding a similar supervisor pattern that restarts marmotd on failure:
🔧 Example reconnection pattern (in main.rs)
let receive_handle = tokio::spawn(async move {
- marmot::run_marmot_receive_loop(stdout, writer, tx, marmot_config, group_routes)
- .await
+ let mut backoff = std::time::Duration::from_millis(250);
+ let backoff_max = std::time::Duration::from_secs(60);
+
+ loop {
+ match marmot::run_marmot_receive_loop(/* ... */).await {
+ Ok(()) => {
+ warn!("Marmot receive loop exited; restarting in {:?}", backoff);
+ }
+ Err(e) => {
+ warn!("Marmot receive loop error; restarting in {:?}: {}", backoff, e);
+ }
+ }
+ tokio::time::sleep(backoff).await;
+ backoff = (backoff * 2).min(backoff_max);
+ // Re-spawn marmotd here...
+ }
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sage-core/src/marmot.rs` around lines 304 - 438, The receive loop for
marmotd (the loop reading via reader.read_line and handling events into
tx/IncomingMessage, updating group_routes, and calling send_cmd) currently exits
on Ok(0) or Err(_) and never restarts; wrap this logic in a supervisor/reconnect
routine that respawns or reconnects marmotd, re-initializes the reader/stdout
handles and any necessary state (e.g., re-locking group_routes), and retries the
loop with exponential backoff on successive failures (reset backoff on
successful start). On Ok(0) and Err(e) branches, replace the break with
signaling the supervisor to attempt a restart (preserving/closing tx
appropriately and avoiding duplicated receivers), and ensure send_cmd and
channel usage are safe across restarts. Use an exponential backoff timer and a
limited retry policy to avoid tight restart loops.
When MARMOT_ALLOWED_PUBKEYS=* is configured, don't pass any
--allow-pubkey args to marmotd so it accepts messages from everyone.
Previously normalize_pubkey("*") would error and crash on startup.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
crates/sage-core/src/marmot.rs (3)
401-405: Silent lock failure may hide issues.If
group_routes.lock()fails (e.g., mutex poisoned), the route update is silently skipped. Consider logging a warning when the lock fails to aid debugging.🔧 Suggested improvement
if !from_pubkey.is_empty() && !group_id.is_empty() { - if let Ok(mut routes) = group_routes.lock() { - routes.insert(from_pubkey.to_string(), group_id.to_string()); + match group_routes.lock() { + Ok(mut routes) => { + routes.insert(from_pubkey.to_string(), group_id.to_string()); + } + Err(e) => { + warn!("Failed to update group routes: {}", e); + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/marmot.rs` around lines 401 - 405, The code silently ignores a failed Mutex lock on group_routes (the if let Ok(mut routes) = group_routes.lock() { routes.insert(from_pubkey.to_string(), group_id.to_string()); }) which can hide poisoning or other locking issues; update the block to handle the Err case by logging a warning (including from_pubkey and group_id context and the lock error) when group_routes.lock() returns Err so failures are visible for debugging instead of being silently skipped.
117-142: Missing retry logic for send_message compared to Signal implementation.The
SignalClient::send_messageimplementation includes retry logic with up to 3 attempts and automatic reconnection on pipe/connection errors. TheMarmotClient::send_messagehas no such resilience—a single transient write failure will propagate directly to the caller.Consider adding similar retry logic if marmotd can experience transient communication failures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/marmot.rs` around lines 117 - 142, Add retry logic to MarmotClient::send_message similar to SignalClient: wrap the send_cmd call in a loop that attempts up to 3 tries (use the same next_request_id()/resolve_group() for each attempt or regenerate request id per attempt as appropriate), and on transient errors (pipe/connection/write errors from send_cmd) call the existing reconnect/reopen logic used by SignalClient before retrying; if all attempts fail, return the last error. Locate the send flow in MarmotClient::send_message (uses resolve_group, next_request_id, and send_cmd) and implement the retry+reconnect around the send_cmd invocation while preserving the log/preview behavior.
15-40: Consider using thebech32crate instead of this custom decoder.This custom implementation skips checksum validation; while the 32-byte length check in
normalize_pubkeyprovides some protection, it doesn't catch all transmission errors that the bech32 checksum is designed to detect. Thebech32crate (v0.11.1 available) would provide full checksum verification and reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/marmot.rs` around lines 15 - 40, The custom bech32 decoder in bech32_decode_payload skips checksum validation; replace its logic with the bech32 crate (e.g., bech32::decode) to perform proper checksum verification and conversion to raw bytes, return None on any decode/convert error, and update callers (such as normalize_pubkey) to rely on this validated output; remove reliance on BECH32_CHARSET and the manual bit-manipulation code and ensure error mapping/logging preserves original behavior (Option<Vec<u8>> on failure).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/sage-core/src/marmot.rs`:
- Around line 401-405: The code silently ignores a failed Mutex lock on
group_routes (the if let Ok(mut routes) = group_routes.lock() {
routes.insert(from_pubkey.to_string(), group_id.to_string()); }) which can hide
poisoning or other locking issues; update the block to handle the Err case by
logging a warning (including from_pubkey and group_id context and the lock
error) when group_routes.lock() returns Err so failures are visible for
debugging instead of being silently skipped.
- Around line 117-142: Add retry logic to MarmotClient::send_message similar to
SignalClient: wrap the send_cmd call in a loop that attempts up to 3 tries (use
the same next_request_id()/resolve_group() for each attempt or regenerate
request id per attempt as appropriate), and on transient errors
(pipe/connection/write errors from send_cmd) call the existing reconnect/reopen
logic used by SignalClient before retrying; if all attempts fail, return the
last error. Locate the send flow in MarmotClient::send_message (uses
resolve_group, next_request_id, and send_cmd) and implement the retry+reconnect
around the send_cmd invocation while preserving the log/preview behavior.
- Around line 15-40: The custom bech32 decoder in bech32_decode_payload skips
checksum validation; replace its logic with the bech32 crate (e.g.,
bech32::decode) to perform proper checksum verification and conversion to raw
bytes, return None on any decode/convert error, and update callers (such as
normalize_pubkey) to rely on this validated output; remove reliance on
BECH32_CHARSET and the manual bit-manipulation code and ensure error
mapping/logging preserves original behavior (Option<Vec<u8>> on failure).
Store the Child handle inside MarmotClient and implement Drop to kill + wait on the process, preventing orphaned marmotd daemons on shutdown. Mirrors Signal's subprocess cleanup pattern. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Adds send_typing support (no more unknown variant errors), --version flag, audio file streaming, and video call rejection. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/sage-core/src/marmot.rs (1)
154-168: Verify typing “stop” behavior with marmotd.
stop=trueis a no-op. If marmotd supports explicit stop/timeout controls, consider sending them (or document that stop is unsupported).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/marmot.rs` around lines 154 - 168, The send_typing method currently treats stop=true as a no-op; verify marmotd's protocol and if it supports explicit stop/timeout semantics, modify send_typing (which uses resolve_group, next_request_id, and send_cmd) to send an explicit stop typing command (e.g., include a "stop": true or timeout field in the JSON payload) when stop is true, otherwise add a clear code comment or documentation stating that marmotd does not support stopping typing and the method will be a no-op; ensure the request_id and nostr_group_id logic remains unchanged when emitting the stop command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/sage-core/src/marmot.rs`:
- Around line 15-63: The bech32_decode_payload currently just strips the last 6
chars and ignores checksum, so update normalize_pubkey/bech32_decode_payload to
perform full bech32 checksum validation (or delegate to the bech32 crate)
instead of trusting the payload: use bech32::decode on the full npub string (or
implement the polymod checksum verification) to verify hrp and checksum, then
extract the data bytes and convert from 5-bit words to 8-bit bytes and ensure
length==32; if decode/checksum fails, return Err from normalize_pubkey as you
already do for other errors. Ensure you reference bech32_decode_payload and
normalize_pubkey when making the change so checksum validation replaces the
current naive slice-and-ignore behavior.
---
Duplicate comments:
In `@crates/sage-core/src/main.rs`:
- Around line 352-391: The code hardcodes a Signal attachment path when an image
is found (using vision::is_supported_image and constructing attachment_path),
which breaks other providers; update the logic in the image-attachment branch to
be provider-aware: inspect each attachment's provider/metadata (e.g., a provider
or source field on msg.attachments items or an absolute path field if present)
and only construct the Signal-specific path and call vision::describe_image when
the attachment clearly originates from Signal, otherwise either skip vision
preprocessing for that attachment (return None) or resolve the correct
provider-specific path before calling vision::describe_image; adjust the branch
around attachment_path/vision::describe_image and the detection using
vision::is_supported_image so non-Signal attachments are not fed the hardcoded
Signal path.
In `@crates/sage-core/src/marmot.rs`:
- Around line 235-458: The receive loop in run_marmot_receive_loop (the
spawn_blocking closure that reads from reader and currently breaks/returns on
closed stdout or errors) exits permanently on any read error or when marmotd
closes stdout; update this so the task is supervised and restarted with backoff
instead of silently ending—wrap the current spawn_blocking logic in a
retry/supervisor loop (or extract it to a helper like run_marmot_reader) that
catches the Err/closed conditions (the places logging "marmotd closed stdout",
"marmotd closed stdout during keypackage publish", and the Err branch logging
"Error reading from marmotd") and on those cases sleeps with exponential/backoff
jitter and respawns the blocking reader (re-establishing write/send_cmd state as
needed) rather than breaking; keep existing logging and ensure failing to send
via tx still propagates or causes a supervised restart so messaging isn’t
silently lost.
---
Nitpick comments:
In `@crates/sage-core/src/marmot.rs`:
- Around line 154-168: The send_typing method currently treats stop=true as a
no-op; verify marmotd's protocol and if it supports explicit stop/timeout
semantics, modify send_typing (which uses resolve_group, next_request_id, and
send_cmd) to send an explicit stop typing command (e.g., include a "stop": true
or timeout field in the JSON payload) when stop is true, otherwise add a clear
code comment or documentation stating that marmotd does not support stopping
typing and the method will be a no-op; ensure the request_id and nostr_group_id
logic remains unchanged when emitting the stop command.
| /// Decode a bech32-encoded string (npub1...) into its raw bytes. | ||
| fn bech32_decode_payload(s: &str) -> Option<Vec<u8>> { | ||
| let pos = s.rfind('1')?; | ||
| let data_part = &s[pos + 1..]; | ||
| if data_part.len() < 6 { | ||
| return None; | ||
| } | ||
| let values: Vec<u8> = data_part | ||
| .chars() | ||
| .map(|c| BECH32_CHARSET.find(c).map(|i| i as u8)) | ||
| .collect::<Option<Vec<_>>>()?; | ||
| let data_values = &values[..values.len() - 6]; | ||
| let mut acc: u32 = 0; | ||
| let mut bits: u32 = 0; | ||
| let mut result = Vec::new(); | ||
| for &v in data_values { | ||
| acc = (acc << 5) | (v as u32); | ||
| bits += 5; | ||
| if bits >= 8 { | ||
| bits -= 8; | ||
| result.push((acc >> bits) as u8); | ||
| acc &= (1 << bits) - 1; | ||
| } | ||
| } | ||
| Some(result) | ||
| } | ||
|
|
||
| /// Convert an npub (bech32) or hex pubkey string to hex. | ||
| /// Accepts both "npub1..." and raw 64-char hex. | ||
| pub fn normalize_pubkey(input: &str) -> Result<String> { | ||
| let trimmed = input.trim(); | ||
| if trimmed.starts_with("npub1") { | ||
| let bytes = | ||
| bech32_decode_payload(trimmed).ok_or_else(|| anyhow!("invalid npub: {}", trimmed))?; | ||
| if bytes.len() != 32 { | ||
| return Err(anyhow!( | ||
| "npub decoded to {} bytes, expected 32", | ||
| bytes.len() | ||
| )); | ||
| } | ||
| Ok(bytes.iter().map(|b| format!("{:02x}", b)).collect()) | ||
| } else if trimmed.len() == 64 && trimmed.chars().all(|c| c.is_ascii_hexdigit()) { | ||
| Ok(trimmed.to_lowercase()) | ||
| } else { | ||
| Err(anyhow!( | ||
| "invalid pubkey (expected npub1... or 64-char hex): {}", | ||
| trimmed | ||
| )) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for bech32_decode_payload and related code in the repository
cd crates/sage-core && find . -name "*.rs" -type f | head -20Repository: AnthonyRonning/sage
Length of output: 486
🏁 Script executed:
# Check the full marmot.rs file to understand the context
wc -l crates/sage-core/src/marmot.rsRepository: AnthonyRonning/sage
Length of output: 97
🏁 Script executed:
# Read the marmot.rs file to see the full implementation
cat -n crates/sage-core/src/marmot.rsRepository: AnthonyRonning/sage
Length of output: 21723
🏁 Script executed:
# Search for bech32-related dependencies in Cargo.toml
rg "bech32" --type tomlRepository: AnthonyRonning/sage
Length of output: 45
🏁 Script executed:
# Search for MARMOT_ALLOWED_PUBKEYS usage
rg "MARMOT_ALLOWED_PUBKEYS" --type rustRepository: AnthonyRonning/sage
Length of output: 292
🏁 Script executed:
# Search for any bech32 tests
rg "bech32" --type rust -A 5Repository: AnthonyRonning/sage
Length of output: 1410
🏁 Script executed:
# Check the full Cargo.toml to see current dependencies
find . -name "Cargo.toml" -path "*/sage-core/*" | xargs catRepository: AnthonyRonning/sage
Length of output: 1136
🏁 Script executed:
# Search for any bech32 references in the codebase
rg "bech32" --type rust -C 3Repository: AnthonyRonning/sage
Length of output: 1488
🌐 Web query:
bech32 rust crate crates.io
💡 Result:
The Rust Bech32 crate you likely want is bech32 (from the rust-bitcoin ecosystem). (docs.rs)
- Docs:
docs.rs/bech32(latest shown as 0.11.1) (docs.rs) - What it does: encode/decode Bech32 and supports Bech32m checksums (docs.rs)
Add it to Cargo.toml:
[dependencies]
bech32 = "0.11.1"(Use the version you see as “latest” on docs.rs/crates.io.)
If you specifically need Bitcoin SegWit address encode/decode, the bech32 docs note that’s handled by bitcoin-bech32. (docs.rs)
Security note: crates.io has had typosquatting/malicious crates incidents; double-check the crate name and publisher before adding dependencies. (blog.rust-lang.org)
Citations:
- 1: https://docs.rs/bech32?utm_source=openai
- 2: https://docs.rs/crate/bech32/latest?utm_source=openai
- 3: https://docs.rs/crate/bech32/latest?utm_source=openai
- 4: https://docs.rs/crate/bech32/latest?utm_source=openai
- 5: https://blog.rust-lang.org/2022/05/10/malicious-crate-rustdecimal/?utm_source=openai
Validate bech32 checksum for npub inputs.
bech32_decode_payload discards the checksum (last 6 characters) without validation, allowing typos in MARMOT_ALLOWED_PUBKEYS to decode to unintended pubkeys. A typo that remains within the bech32 charset and decodes to exactly 32 bytes would be silently accepted. Use a spec-compliant decoder such as the bech32 crate (available on crates.io) or add explicit checksum verification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/sage-core/src/marmot.rs` around lines 15 - 63, The
bech32_decode_payload currently just strips the last 6 chars and ignores
checksum, so update normalize_pubkey/bech32_decode_payload to perform full
bech32 checksum validation (or delegate to the bech32 crate) instead of trusting
the payload: use bech32::decode on the full npub string (or implement the
polymod checksum verification) to verify hrp and checksum, then extract the data
bytes and convert from 5-bit words to 8-bit bytes and ensure length==32; if
decode/checksum fails, return Err from normalize_pubkey as you already do for
other errors. Ensure you reference bech32_decode_payload and normalize_pubkey
when making the change so checksum validation replaces the current naive
slice-and-ignore behavior.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/sage-core/src/main.rs (2)
156-179: Consider resetting backoff on prolonged successful connection.The exponential backoff never resets after a successful period of operation. If the connection runs stably for hours then drops, the backoff will resume from the previous value. Consider resetting
backoffto the initial value when a connection has been stable for some threshold duration.♻️ Optional improvement to reset backoff
let receive_handle = tokio::spawn(async move { let mut backoff = std::time::Duration::from_millis(250); let backoff_max = std::time::Duration::from_secs(60); + let backoff_initial = std::time::Duration::from_millis(250); loop { + let start = std::time::Instant::now(); match run_receive_loop_tcp(&host, port, &account, tx.clone()).await { Ok(()) => { + // Reset backoff if connection was stable for >5 minutes + if start.elapsed() > std::time::Duration::from_secs(300) { + backoff = backoff_initial; + } warn!( "Signal TCP receive loop exited unexpectedly; restarting in {:?}", backoff ); } Err(e) => { + if start.elapsed() > std::time::Duration::from_secs(300) { + backoff = backoff_initial; + } warn!( "Signal TCP receive loop error; restarting in {:?}: {}", backoff, e ); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/main.rs` around lines 156 - 179, The receive loop's exponential backoff (variables backoff/backoff_max in the tokio::spawn block around receive_handle and run_receive_loop_tcp) never resets after long successful connections; modify the loop to measure how long each run of run_receive_loop_tcp(&host, port, &account, tx.clone()) lasted (e.g., use Instant::now() before the await and check elapsed after it returns) and if the elapsed time exceeds a chosen threshold (e.g., 30s or make it configurable) reset backoff to the initial Duration::from_millis(250); otherwise keep the exponential backoff behavior on failures, and continue to sleep and double backoff as currently implemented.
541-554: Abrupt shutdown may leave receive loop in inconsistent state.Using
abort()forcefully terminates the receive task without allowing graceful cleanup. Consider using a cancellation token or shutdown channel to signal the receive loops to exit cleanly before aborting, especially for Marmot which may have pending JSONL I/O.♻️ Graceful shutdown pattern
+ // Create shutdown signal + let (shutdown_tx, mut shutdown_rx) = tokio::sync::oneshot::channel::<()>(); + // Main event loop loop { tokio::select! { // ... existing arms ... // Handle shutdown _ = tokio::signal::ctrl_c() => { info!("Shutting down..."); + let _ = shutdown_tx.send(()); break; } } } // Cleanup - receive_handle.abort(); + // Give receive loop a moment to exit gracefully + tokio::time::sleep(std::time::Duration::from_millis(100)).await; + if !receive_handle.is_finished() { + receive_handle.abort(); + } info!("🌿 Sage has shut down.");Note: The receive loops would need to incorporate
shutdown_rxto respond to the signal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/main.rs` around lines 541 - 554, The code currently calls receive_handle.abort() which forcefully kills the receive task; instead implement a graceful shutdown signal (e.g., a tokio::sync::watch or broadcast channel or a CancellationToken) that the receive loop(s) listen to (add a shutdown_rx parameter to the Marmot receive loop and any other receive tasks), send the shutdown signal from the ctrl_c handler, then await receive_handle.join().or_timeout to let the loop flush JSONL I/O and cleanup before forcibly aborting as a last resort; update the receive loop function signatures and the task spawn sites to accept the shutdown receiver and return cleanly on signal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/sage-core/src/main.rs`:
- Around line 352-360: The code in the image-attachment handling block (inside
main.rs where msg.attachments, vision::is_supported_image, and the local
variable attachment_path are used) hardcodes Signal's attachments directory;
change it to derive the attachment location from the attachment metadata or from
a configurable source instead: use a property on the attachment (e.g.,
attachment.path or attachment.url) if available, or construct the path from a
configurable "attachments_dir" (env/config) rather than the hardcoded
"/signal-cli-data/…", and add a clear fallback/error log if neither is present
so vision pre-processing can find the file or fail noisily.
---
Nitpick comments:
In `@crates/sage-core/src/main.rs`:
- Around line 156-179: The receive loop's exponential backoff (variables
backoff/backoff_max in the tokio::spawn block around receive_handle and
run_receive_loop_tcp) never resets after long successful connections; modify the
loop to measure how long each run of run_receive_loop_tcp(&host, port, &account,
tx.clone()) lasted (e.g., use Instant::now() before the await and check elapsed
after it returns) and if the elapsed time exceeds a chosen threshold (e.g., 30s
or make it configurable) reset backoff to the initial
Duration::from_millis(250); otherwise keep the exponential backoff behavior on
failures, and continue to sleep and double backoff as currently implemented.
- Around line 541-554: The code currently calls receive_handle.abort() which
forcefully kills the receive task; instead implement a graceful shutdown signal
(e.g., a tokio::sync::watch or broadcast channel or a CancellationToken) that
the receive loop(s) listen to (add a shutdown_rx parameter to the Marmot receive
loop and any other receive tasks), send the shutdown signal from the ctrl_c
handler, then await receive_handle.join().or_timeout to let the loop flush JSONL
I/O and cleanup before forcibly aborting as a last resort; update the receive
loop function signatures and the task spawn sites to accept the shutdown
receiver and return cleanly on signal.
Add reply_context column to chat_contexts table to store transport- specific routing info (e.g. Marmot nostr_group_id for a pubkey). Updated on each incoming message, restored into group_routes on startup. Ensures scheduled tasks can deliver after restarts. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/sage-core/src/agent_manager.rs (1)
327-341:update_reply_contextsilently succeeds when no row matches.If
signal_identifierdoesn't exist inchat_contexts, theUPDATEwill affect 0 rows and returnOk(()). This could mask bugs where reply context is set before the chat context is created.Consider checking the affected row count or documenting that the context must exist:
🔧 Optional: Verify at least one row was updated
pub fn update_reply_context(&self, signal_identifier: &str, reply_ctx: &str) -> Result<()> { let mut conn = self .db_conn .lock() .map_err(|_| anyhow::anyhow!("Failed to acquire database lock"))?; - diesel::update( + let rows_updated = diesel::update( chat_contexts::table.filter(chat_contexts::signal_identifier.eq(signal_identifier)), ) .set(chat_contexts::reply_context.eq(Some(reply_ctx))) .execute(&mut *conn)?; + if rows_updated == 0 { + tracing::warn!("update_reply_context: no row found for {}", signal_identifier); + } + Ok(()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/agent_manager.rs` around lines 327 - 341, The update_reply_context function currently ignores the number of rows affected by the diesel::update on chat_contexts::table and will silently succeed when 0 rows are updated; change update_reply_context to capture the usize returned by .execute(&mut *conn) and check it, returning an Err (e.g. anyhow::anyhow!("No chat_context found for signal_identifier: {}", signal_identifier)) when the count is 0 so callers know the signal_identifier must exist before setting chat_contexts::reply_context.crates/sage-core/src/marmot.rs (1)
302-312: Keypackage publish failure is logged but not propagated.If the keypackage publish fails (lines 302-311), the loop breaks and continues to the main receive phase. This may be intentional (graceful degradation), but incoming messages requiring a keypackage exchange may fail silently.
Consider either retrying or returning an error if keypackage publishing is critical for operation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/sage-core/src/marmot.rs` around lines 302 - 312, The loop currently treats a failed keypackage publish (when event_type == "error" and request_id == "init_kp") by logging via warn! and breaking, which hides the failure; instead, modify the handler around that check in marmot.rs so the failure is propagated or retried: replace the warn! + break behavior with either (a) return an Err(...) from the enclosing function (or propagate the underlying error) so callers can handle the failure, or (b) implement a bounded retry (e.g., 3 attempts with backoff) around the keypackage publish before returning an error; update the branch that inspects event.get("message") / request_id == "init_kp" to perform one of these actions and ensure the function signature and callers accept the propagated error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/sage-core/src/main.rs`:
- Around line 208-235: The marmot receive path should be made resilient by
supervising and auto-restarting the spawn + receive loop when marmotd
disconnects: replace the current one-shot spawn of marmot::spawn_marmot and
tokio::spawn of marmot::run_marmot_receive_loop with a supervisor loop that
calls marmot::spawn_marmot, restores routes into group_routes (as currently done
with agent_manager.load_reply_contexts()), then spawns/awaits
marmot::run_marmot_receive_loop; on any error or exit retry with exponential
backoff and logging, recreate the messenger Arc<Mutex<dyn Messenger>> and
receive_handle for each restart, and ensure shared state (group_routes, tx) is
preserved across restarts.
- Around line 379-382: The hardcoded attachment_path string prevents Marmot from
finding attachments; update the code that builds attachment_path (the variable
currently formatted from "/signal-cli-data/...") to compute the base attachments
directory based on the active messenger (e.g., read MESSENGER env or use
existing config) and then join attachment.file using PathBuf/Path join rather
than a hardcoded Signal path; modify the logic where attachment_path is created
so it selects the correct base directory for "signal" vs "marmot" (or a
configurable ATTACHMENTS_DIR) and then constructs the full path by joining
attachment.file.
In `@crates/sage-core/src/marmot.rs`:
- Around line 15-40: The bech32_decode_payload function currently ignores the
6-character checksum and thus accepts payloads with invalid checksums; update
bech32_decode_payload to validate the Bech32 checksum before returning the
decoded payload (and return None on checksum failure). Specifically, when
parsing data_part in bech32_decode_payload, decode all characters (including the
last 6 checksum chars) into values using BECH32_CHARSET, split values into
payload and checksum, compute the Bech32 polymod checksum over the
human-readable part (derived from the prefix in the original string) and the
data values per the Bech32 spec, compare it to the decoded checksum, and only
proceed to convert and return the payload bytes if the checksum matches;
otherwise return None. Ensure this validation is applied where
MARMOT_ALLOWED_PUBKEYS are checked so typos with valid charset but bad checksum
are rejected.
- Around line 316-451: The receive loop currently breaks on reader.read_line
Ok(0) ("marmotd closed stdout") or Err and never recovers; replace the immediate
break with a reconnection routine that uses exponential backoff and retries
until success (e.g., on those branches call a reconnect function instead of
breaking). Implement or call a reconnect_marmot / start_marmot routine that
reinitializes the marmot process and reader (and re-sends any necessary setup
like auto_accept_welcomes via send_cmd), log each attempt and error, and upon
successful reconnection reset any transient state if needed (e.g., reader, any
per-connection buffers) and continue the main loop; refer to the
reader.read_line handling, the warn!("marmotd closed stdout") and error!("Error
reading from marmotd") branches, plus symbols group_routes and tx to ensure
routing/state is preserved across reconnects.
---
Nitpick comments:
In `@crates/sage-core/src/agent_manager.rs`:
- Around line 327-341: The update_reply_context function currently ignores the
number of rows affected by the diesel::update on chat_contexts::table and will
silently succeed when 0 rows are updated; change update_reply_context to capture
the usize returned by .execute(&mut *conn) and check it, returning an Err (e.g.
anyhow::anyhow!("No chat_context found for signal_identifier: {}",
signal_identifier)) when the count is 0 so callers know the signal_identifier
must exist before setting chat_contexts::reply_context.
In `@crates/sage-core/src/marmot.rs`:
- Around line 302-312: The loop currently treats a failed keypackage publish
(when event_type == "error" and request_id == "init_kp") by logging via warn!
and breaking, which hides the failure; instead, modify the handler around that
check in marmot.rs so the failure is propagated or retried: replace the warn! +
break behavior with either (a) return an Err(...) from the enclosing function
(or propagate the underlying error) so callers can handle the failure, or (b)
implement a bounded retry (e.g., 3 attempts with backoff) around the keypackage
publish before returning an error; update the branch that inspects
event.get("message") / request_id == "init_kp" to perform one of these actions
and ensure the function signature and callers accept the propagated error.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Phases 1 (wait for ready) and 2 (keypackage publish) now use the same continue-on-parse-error pattern as Phase 3, preventing silent receive loop death if marmotd outputs non-JSON lines during startup. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Respawns marmotd on any failure (process exit, read error) with exponential backoff (250ms to 60s). Each restart re-initializes stdin/stdout handles, kills the old process, and re-runs all three phases (ready, keypackage, message loop). Mirrors Signal TCP mode's supervisor pattern. Stops retrying only when the message channel is closed (Sage shutting down). Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace spawn_marmot() with new_marmot_client() which only validates config and creates placeholder handles. The supervisor loop is now solely responsible for spawning marmotd, eliminating the race where two instances ran against the same state directory. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Summary
Add marmotd as an optional messaging provider, giving Sage a decentralized MLS-encrypted transport over Nostr relays as an alternative to Signal.
Changes
New files:
messenger.rs—Messengertrait abstracting message send/receive, sharedIncomingMessage/IncomingAttachmenttypesmarmot.rs—MarmotClientthat spawns marmotd as a subprocess, speaks JSONL over stdio, handles welcome auto-accept, keypackage publishing, and pubkey->group routingModified:
config.rs—MESSENGERenv var (signal|marmot), marmot-specific config (relays, state dir, allowed pubkeys, auto-accept), npub (bech32) normalizationmain.rs— Messenger-polymorphic event loop viaArc<Mutex<dyn Messenger>>, agents keyed by identity (Signal UUID or Nostr pubkey)signal.rs— ImplementsMessengertrait, uses shared types frommessenger.rsDockerfile— Builds marmotd from pika source (v0.3.2) to avoid GLIBC mismatch with prebuilt binariesjustfile— Messenger-aware start/restart/stop; conditionally skips signal-cli for marmot mode, mounts marmot-state volumeDesign Decisions
MESSENGER=signalremains the default. SetMESSENGER=marmotto switch.Configuration
Testing
Summary by CodeRabbit
New Features
Refactor
Chores
Tests