feat: support multimodal input (images, files, voice) from Discord attachments

I've searched existing issues and I confirm this is not a duplicate.

## Description

Support multimodal input by forwarding Discord message attachments (images, documents, voice messages, etc.) to downstream ACP agents.

Currently, `discord.rs` only extracts `msg.content` (text) and ignores `msg.attachments` entirely. The ACP prompt in `connection.rs` is hardcoded to `"type": "text"` only:

```rust
"prompt": [{"type": "text", "text": prompt}],
```

This means any image, file, or voice attachment sent by users in Discord is silently dropped.

## Proposed Changes

1. **Discord handler** (`src/discord.rs`): Parse `msg.attachments`, download or extract URLs for images/files/voice.
2. **ACP prompt** (`src/acp/connection.rs`): Extend the `prompt` array to include additional content types (e.g. `"type": "image"`, `"type": "file"`) alongside the existing text.
3. **Validation**: Check whether the downstream ACP agent supports multimodal content types and gracefully degrade to text-only if not.

## Use Case

Users on Discord frequently share screenshots, error logs, documents, and voice messages when asking for help. Without multimodal support, the agent cannot see or process these attachments, limiting its usefulness to text-only interactions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support multimodal input (images, files, voice) from Discord attachments #161

Description

Proposed Changes

Use Case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: support multimodal input (images, files, voice) from Discord attachments #161

Description

Description

Proposed Changes

Use Case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions