feat(channels/matrix): add Matrix channel integration#356
Open
emadomedher wants to merge 7 commits intosipeed:mainfrom
Open
feat(channels/matrix): add Matrix channel integration#356emadomedher wants to merge 7 commits intosipeed:mainfrom
emadomedher wants to merge 7 commits intosipeed:mainfrom
Conversation
added 4 commits
February 17, 2026 11:09
Currently Discord, Slack, and Telegram all hardcode *voice.GroqTranscriber
as their transcription dependency. This makes it impossible to swap in a
different STT backend without changing each channel file.
Add a Transcriber interface to pkg/voice/transcriber.go:
type Transcriber interface {
Transcribe(ctx context.Context, audioFilePath string) (*TranscriptionResponse, error)
IsAvailable() bool
}
GroqTranscriber already implements this interface (no change to its
implementation). Update Discord, Slack, and Telegram to depend on the
interface instead of the concrete type.
No behaviour change — this is a pure refactor that enables future STT
providers (e.g. local Whisper) to be dropped in without modifying channel
code.
Add a Whisper transcription backend that talks to any OpenAI-compatible
/v1/audio/transcriptions endpoint (Faster-Whisper, Whisper.cpp, etc.),
allowing self-hosted, offline speech-to-text without a Groq API key.
Changes:
- pkg/voice/whisper.go: WhisperTranscriber implementing the Transcriber
interface. Sends audio as multipart/form-data to the configured API base.
Health-checks via GET /v1/models so IsAvailable() is network-aware.
- pkg/config/config.go: WhisperConfig{Enabled, APIBase} added to
ToolsConfig. Default API base: http://localhost:8200.
- cmd/picoclaw/main.go: Whisper is tried first when enabled; falls back to
Groq if Whisper is not reachable. Both attach to Telegram, Discord, and
Slack via the Transcriber interface introduced in the previous commit.
Config example:
"tools": {
"whisper": {
"enabled": true,
"api_base": "http://localhost:8200"
}
}
Depends-on: refactor(voice): introduce Transcriber interface
Adds outbound voice capability: the agent can now reply with audio by
setting voice=true on the message tool, useful when the user sends a voice
message or explicitly requests audio.
Changes:
- pkg/voice/synthesizer.go: Synthesizer interface (Synthesize, IsAvailable)
- pkg/voice/kokoro.go: KokoroSynthesizer — talks to any OpenAI-compatible
/v1/audio/speech endpoint (Kokoro, Piper, etc.). Health-check via GET
/v1/models. Returns a temp .mp3 path; caller cleans up.
- pkg/bus/types.go: add Media []string to OutboundMessage (backward-
compatible, omitempty). Enables any channel to receive file paths.
- pkg/channels/manager.go: add SendFileToChannel() — synchronous media
send that routes local file paths through the channel's Send().
- pkg/tools/message.go: add voice=true parameter + SynthesizeCallback +
SendMediaCallback. Voice path: synthesize → send file → cleanup.
Falls back to text if TTS unavailable. HasSentInRound fires for both.
- pkg/agent/loop.go: add SetVoiceCallbacks() to attach TTS to message tool
after channel manager is available.
- cmd/picoclaw/main.go: wire Kokoro TTS after channels init; attaches to
message tool via SetVoiceCallbacks().
Config example:
"tools": {
"tts": {
"enabled": true,
"api_base": "http://localhost:8100",
"voice": "en_us-lessac-medium"
}
}
Depends-on: feat(voice/stt): add local Whisper STT provider
Expand TTSConfig with model, format, and speed so the agent's voice is
fully configurable from config.json without touching code.
- TTSConfig gains: Model, Format, Speed fields (all env-overridable)
- KokoroSynthesizer: add TTSProfile struct + NewKokoroSynthesizerFromProfile()
NewKokoroSynthesizer() is kept as a convenience wrapper (backward-compat)
- kokoroRequest: pass format and speed through to the API
- Temp file extension follows configured format (mp3/wav/ogg/etc.)
- main.go: wire all profile fields from config
- config.example.json: updated with model/format/speed examples
Full profile example:
"tts": {
"enabled": true,
"api_base": "http://localhost:8100",
"voice": "af_nova",
"model": "kokoro",
"format": "mp3",
"speed": 1.0
}
53bdc64 to
0d23efb
Compare
Chatterbox exposes a /synthesize endpoint alongside the standard
/v1/audio/speech one. The native endpoint adds two parameters unavailable
in the OpenAI-compatible API:
- exaggeration (0.0–1.0): emotional expressiveness of the voice
- cfg_weight (0.0–1.0): how closely the voice follows the prompt
Routing: when model starts with 'chatterbox' (case-insensitive), Synthesize()
posts to /synthesize with the Chatterbox body; otherwise it uses the standard
/v1/audio/speech path. All other backends are unaffected.
Changes:
- kokoro.go: chatterboxRequest struct, isChatterbox() helper, Synthesize()
branching logic, exaggeration/cfgWeight fields on KokoroSynthesizer
- TTSProfile: Exaggeration + CFGWeight fields (defaults: 0.5 / 0.5)
- config.go: TTSConfig gains Exaggeration + CFGWeight (env-overridable)
- main.go: wire new fields through TTSProfile
- config.example.json: document exaggeration + cfg_weight
Chatterbox config example:
"tts": {
"enabled": true,
"api_base": "http://localhost:8100",
"model": "chatterbox-1",
"voice": "default",
"format": "mp3",
"exaggeration": 0.5,
"cfg_weight": 0.5
}
0d23efb to
f9b143c
Compare
added 2 commits
February 17, 2026 13:03
/v1/models returns 404 on Chatterbox — use /health instead. All other backends keep using /v1/models.
Adds a full Matrix protocol channel using mautrix-go, enabling agents to
communicate over any Matrix homeserver (matrix.org, self-hosted Synapse,
Conduit, etc.).
Features:
- Text messages: Markdown → Matrix HTML (m.text / m.notice)
- Inbound voice: audio events passed through the Transcriber interface
(Whisper or Groq) → text before reaching the agent
- Outbound media: m.image / m.audio / m.video / m.file events with
proper MIME detection and Matrix content repository upload
- Outbound voice: works via voice=true on the message tool (TTS PR)
- Native typing indicator: PUT /typing instead of a placeholder message
- Group chat: configurable require_mention_in_group (default: true)
- invite handling: join_on_invite (default: true)
- allow_from filter: restrict to specific Matrix user IDs
- Historical event guard: events before process start are ignored
New files:
- pkg/channels/matrix.go: MatrixChannel implementation
- docs/MATRIX_SETUP.md: step-by-step setup guide
Config changes:
- pkg/config/config.go: MatrixConfig added to ChannelsConfig
- config/config.example.json: matrix, tts, and whisper example sections
Dependency: maunium.net/go/mautrix v0.26.3 (go.mod / go.sum updated)
Config example:
"channels": {
"matrix": {
"enabled": true,
"homeserver": "https://matrix.example.com",
"user_id": "@bot:matrix.example.com",
"access_token": "syt_...",
"allow_from": [],
"join_on_invite": true,
"require_mention_in_group": true
}
}
Depends-on:
- refactor(voice): introduce Transcriber interface
- feat(voice/tts): add TTS synthesis and voice parameter on message tool
f9b143c to
886d308
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a full Matrix protocol channel using mautrix-go, enabling picoclaw agents to communicate over any Matrix homeserver — matrix.org, self-hosted Synapse, Conduit, Dendrite, etc.
Matrix is an open federated protocol used by ~115M accounts across thousands of homeservers.
Features
m.text/m.notice)Transcriberinterface (Whisper or Groq)m.image/m.audio/m.video/m.filewith MIME detection + Matrix content repo uploadvoice=trueon message tool (requires #355)PUT /typing— shows "bot is typing…" natively, no placeholder messagesrequire_mention_in_group(default: true) — bot only responds when @-mentionedjoin_on_invite(default: true)allow_from— restrict to specific Matrix user IDsNew files
pkg/channels/matrix.go:MatrixChannelimplementation (~750 lines)docs/MATRIX_SETUP.md: step-by-step setup guideConfig changes
pkg/config/config.go:MatrixConfigadded toChannelsConfigconfig/config.example.json: matrix, tts, whisper example sections addedNew dependency
maunium.net/go/mautrix v0.26.3— mature Matrix client library, widely used in the Matrix ecosystem (bridges, bots, etc.)Config example
Depends on
Can be reviewed and merged independently for text-only use; voice features activate automatically when #353–#355 are present.
Tests
All existing tests pass (
go test ./...).