feat(channels/matrix): add Matrix channel integration by emadomedher · Pull Request #356 · sipeed/picoclaw

emadomedher · 2026-02-17T08:14:48Z

Summary

Adds a full Matrix protocol channel using mautrix-go, enabling picoclaw agents to communicate over any Matrix homeserver — matrix.org, self-hosted Synapse, Conduit, Dendrite, etc.

Matrix is an open federated protocol used by ~115M accounts across thousands of homeservers.

Features

Feature	Detail
Text messages	Markdown → Matrix HTML (`m.text` / `m.notice`)
Inbound voice	Audio events → STT via `Transcriber` interface (Whisper or Groq)
Outbound media	`m.image` / `m.audio` / `m.video` / `m.file` with MIME detection + Matrix content repo upload
Outbound voice	`voice=true` on message tool (requires #355)
Typing indicator	Native `PUT /typing` — shows "bot is typing…" natively, no placeholder messages
Group chat	`require_mention_in_group` (default: true) — bot only responds when @-mentioned
Auto-join	`join_on_invite` (default: true)
Allow-list	`allow_from` — restrict to specific Matrix user IDs
Historical guard	Events before process start are ignored to prevent reply floods on restart

New files

pkg/channels/matrix.go: MatrixChannel implementation (~750 lines)
docs/MATRIX_SETUP.md: step-by-step setup guide

Config changes

pkg/config/config.go: MatrixConfig added to ChannelsConfig
config/config.example.json: matrix, tts, whisper example sections added

New dependency

maunium.net/go/mautrix v0.26.3 — mature Matrix client library, widely used in the Matrix ecosystem (bridges, bots, etc.)

Config example

"channels": {
  "matrix": {
    "enabled": true,
    "homeserver": "https://matrix.example.com",
    "user_id": "@bot:matrix.example.com",
    "access_token": "syt_...",
    "allow_from": [],
    "join_on_invite": true,
    "require_mention_in_group": true
  }
}

Depends on

➡️ refactor(voice): introduce Transcriber interface for pluggable STT #353 — Transcriber interface (inbound voice)
➡️ feat(voice/stt): add local Whisper STT provider #354 — Whisper STT (optional, for local voice transcription)
➡️ feat(voice/tts): add TTS synthesis and voice parameter on message tool #355 — TTS voice output (optional, for outbound voice replies)

Can be reviewed and merged independently for text-only use; voice features activate automatically when #353–#355 are present.

Tests

All existing tests pass (go test ./...).

Currently Discord, Slack, and Telegram all hardcode *voice.GroqTranscriber as their transcription dependency. This makes it impossible to swap in a different STT backend without changing each channel file. Add a Transcriber interface to pkg/voice/transcriber.go: type Transcriber interface { Transcribe(ctx context.Context, audioFilePath string) (*TranscriptionResponse, error) IsAvailable() bool } GroqTranscriber already implements this interface (no change to its implementation). Update Discord, Slack, and Telegram to depend on the interface instead of the concrete type. No behaviour change — this is a pure refactor that enables future STT providers (e.g. local Whisper) to be dropped in without modifying channel code.

Add a Whisper transcription backend that talks to any OpenAI-compatible /v1/audio/transcriptions endpoint (Faster-Whisper, Whisper.cpp, etc.), allowing self-hosted, offline speech-to-text without a Groq API key. Changes: - pkg/voice/whisper.go: WhisperTranscriber implementing the Transcriber interface. Sends audio as multipart/form-data to the configured API base. Health-checks via GET /v1/models so IsAvailable() is network-aware. - pkg/config/config.go: WhisperConfig{Enabled, APIBase} added to ToolsConfig. Default API base: http://localhost:8200. - cmd/picoclaw/main.go: Whisper is tried first when enabled; falls back to Groq if Whisper is not reachable. Both attach to Telegram, Discord, and Slack via the Transcriber interface introduced in the previous commit. Config example: "tools": { "whisper": { "enabled": true, "api_base": "http://localhost:8200" } } Depends-on: refactor(voice): introduce Transcriber interface

Adds outbound voice capability: the agent can now reply with audio by setting voice=true on the message tool, useful when the user sends a voice message or explicitly requests audio. Changes: - pkg/voice/synthesizer.go: Synthesizer interface (Synthesize, IsAvailable) - pkg/voice/kokoro.go: KokoroSynthesizer — talks to any OpenAI-compatible /v1/audio/speech endpoint (Kokoro, Piper, etc.). Health-check via GET /v1/models. Returns a temp .mp3 path; caller cleans up. - pkg/bus/types.go: add Media []string to OutboundMessage (backward- compatible, omitempty). Enables any channel to receive file paths. - pkg/channels/manager.go: add SendFileToChannel() — synchronous media send that routes local file paths through the channel's Send(). - pkg/tools/message.go: add voice=true parameter + SynthesizeCallback + SendMediaCallback. Voice path: synthesize → send file → cleanup. Falls back to text if TTS unavailable. HasSentInRound fires for both. - pkg/agent/loop.go: add SetVoiceCallbacks() to attach TTS to message tool after channel manager is available. - cmd/picoclaw/main.go: wire Kokoro TTS after channels init; attaches to message tool via SetVoiceCallbacks(). Config example: "tools": { "tts": { "enabled": true, "api_base": "http://localhost:8100", "voice": "en_us-lessac-medium" } } Depends-on: feat(voice/stt): add local Whisper STT provider

Expand TTSConfig with model, format, and speed so the agent's voice is fully configurable from config.json without touching code. - TTSConfig gains: Model, Format, Speed fields (all env-overridable) - KokoroSynthesizer: add TTSProfile struct + NewKokoroSynthesizerFromProfile() NewKokoroSynthesizer() is kept as a convenience wrapper (backward-compat) - kokoroRequest: pass format and speed through to the API - Temp file extension follows configured format (mp3/wav/ogg/etc.) - main.go: wire all profile fields from config - config.example.json: updated with model/format/speed examples Full profile example: "tts": { "enabled": true, "api_base": "http://localhost:8100", "voice": "af_nova", "model": "kokoro", "format": "mp3", "speed": 1.0 }

Chatterbox exposes a /synthesize endpoint alongside the standard /v1/audio/speech one. The native endpoint adds two parameters unavailable in the OpenAI-compatible API: - exaggeration (0.0–1.0): emotional expressiveness of the voice - cfg_weight (0.0–1.0): how closely the voice follows the prompt Routing: when model starts with 'chatterbox' (case-insensitive), Synthesize() posts to /synthesize with the Chatterbox body; otherwise it uses the standard /v1/audio/speech path. All other backends are unaffected. Changes: - kokoro.go: chatterboxRequest struct, isChatterbox() helper, Synthesize() branching logic, exaggeration/cfgWeight fields on KokoroSynthesizer - TTSProfile: Exaggeration + CFGWeight fields (defaults: 0.5 / 0.5) - config.go: TTSConfig gains Exaggeration + CFGWeight (env-overridable) - main.go: wire new fields through TTSProfile - config.example.json: document exaggeration + cfg_weight Chatterbox config example: "tts": { "enabled": true, "api_base": "http://localhost:8100", "model": "chatterbox-1", "voice": "default", "format": "mp3", "exaggeration": 0.5, "cfg_weight": 0.5 }

/v1/models returns 404 on Chatterbox — use /health instead. All other backends keep using /v1/models.

@bot

Adds a full Matrix protocol channel using mautrix-go, enabling agents to communicate over any Matrix homeserver (matrix.org, self-hosted Synapse, Conduit, etc.). Features: - Text messages: Markdown → Matrix HTML (m.text / m.notice) - Inbound voice: audio events passed through the Transcriber interface (Whisper or Groq) → text before reaching the agent - Outbound media: m.image / m.audio / m.video / m.file events with proper MIME detection and Matrix content repository upload - Outbound voice: works via voice=true on the message tool (TTS PR) - Native typing indicator: PUT /typing instead of a placeholder message - Group chat: configurable require_mention_in_group (default: true) - invite handling: join_on_invite (default: true) - allow_from filter: restrict to specific Matrix user IDs - Historical event guard: events before process start are ignored New files: - pkg/channels/matrix.go: MatrixChannel implementation - docs/MATRIX_SETUP.md: step-by-step setup guide Config changes: - pkg/config/config.go: MatrixConfig added to ChannelsConfig - config/config.example.json: matrix, tts, and whisper example sections Dependency: maunium.net/go/mautrix v0.26.3 (go.mod / go.sum updated) Config example: "channels": { "matrix": { "enabled": true, "homeserver": "https://matrix.example.com", "user_id": "@bot:matrix.example.com", "access_token": "syt_...", "allow_from": [], "join_on_invite": true, "require_mention_in_group": true } } Depends-on: - refactor(voice): introduce Transcriber interface - feat(voice/tts): add TTS synthesis and voice parameter on message tool

Myka added 4 commits February 17, 2026 11:09

emadomedher force-pushed the feat/matrix-channel branch from 53bdc64 to 0d23efb Compare February 17, 2026 08:22

emadomedher force-pushed the feat/matrix-channel branch from 0d23efb to f9b143c Compare February 17, 2026 08:30

Myka added 2 commits February 17, 2026 13:03

fix(voice/tts): use /health endpoint for Chatterbox availability check

b21ff3e

/v1/models returns 404 on Chatterbox — use /health instead. All other backends keep using /v1/models.

emadomedher force-pushed the feat/matrix-channel branch from f9b143c to 886d308 Compare February 17, 2026 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(channels/matrix): add Matrix channel integration#356

feat(channels/matrix): add Matrix channel integration#356
emadomedher wants to merge 7 commits intosipeed:mainfrom
emadomedher:feat/matrix-channel

emadomedher commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

emadomedher commented Feb 17, 2026

Summary

Features

New files

Config changes

New dependency

Config example

Depends on

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant