Skip to content

feat(channels/matrix): add Matrix channel integration#356

Open
emadomedher wants to merge 7 commits intosipeed:mainfrom
emadomedher:feat/matrix-channel
Open

feat(channels/matrix): add Matrix channel integration#356
emadomedher wants to merge 7 commits intosipeed:mainfrom
emadomedher:feat/matrix-channel

Conversation

@emadomedher
Copy link

Summary

Adds a full Matrix protocol channel using mautrix-go, enabling picoclaw agents to communicate over any Matrix homeserver — matrix.org, self-hosted Synapse, Conduit, Dendrite, etc.

Matrix is an open federated protocol used by ~115M accounts across thousands of homeservers.

Features

Feature Detail
Text messages Markdown → Matrix HTML (m.text / m.notice)
Inbound voice Audio events → STT via Transcriber interface (Whisper or Groq)
Outbound media m.image / m.audio / m.video / m.file with MIME detection + Matrix content repo upload
Outbound voice voice=true on message tool (requires #355)
Typing indicator Native PUT /typing — shows "bot is typing…" natively, no placeholder messages
Group chat require_mention_in_group (default: true) — bot only responds when @-mentioned
Auto-join join_on_invite (default: true)
Allow-list allow_from — restrict to specific Matrix user IDs
Historical guard Events before process start are ignored to prevent reply floods on restart

New files

  • pkg/channels/matrix.go: MatrixChannel implementation (~750 lines)
  • docs/MATRIX_SETUP.md: step-by-step setup guide

Config changes

  • pkg/config/config.go: MatrixConfig added to ChannelsConfig
  • config/config.example.json: matrix, tts, whisper example sections added

New dependency

maunium.net/go/mautrix v0.26.3 — mature Matrix client library, widely used in the Matrix ecosystem (bridges, bots, etc.)

Config example

"channels": {
  "matrix": {
    "enabled": true,
    "homeserver": "https://matrix.example.com",
    "user_id": "@bot:matrix.example.com",
    "access_token": "syt_...",
    "allow_from": [],
    "join_on_invite": true,
    "require_mention_in_group": true
  }
}

Depends on

Can be reviewed and merged independently for text-only use; voice features activate automatically when #353#355 are present.

Tests

All existing tests pass (go test ./...).

Myka added 4 commits February 17, 2026 11:09
Currently Discord, Slack, and Telegram all hardcode *voice.GroqTranscriber
as their transcription dependency. This makes it impossible to swap in a
different STT backend without changing each channel file.

Add a Transcriber interface to pkg/voice/transcriber.go:

  type Transcriber interface {
      Transcribe(ctx context.Context, audioFilePath string) (*TranscriptionResponse, error)
      IsAvailable() bool
  }

GroqTranscriber already implements this interface (no change to its
implementation). Update Discord, Slack, and Telegram to depend on the
interface instead of the concrete type.

No behaviour change — this is a pure refactor that enables future STT
providers (e.g. local Whisper) to be dropped in without modifying channel
code.
Add a Whisper transcription backend that talks to any OpenAI-compatible
/v1/audio/transcriptions endpoint (Faster-Whisper, Whisper.cpp, etc.),
allowing self-hosted, offline speech-to-text without a Groq API key.

Changes:
- pkg/voice/whisper.go: WhisperTranscriber implementing the Transcriber
  interface. Sends audio as multipart/form-data to the configured API base.
  Health-checks via GET /v1/models so IsAvailable() is network-aware.
- pkg/config/config.go: WhisperConfig{Enabled, APIBase} added to
  ToolsConfig. Default API base: http://localhost:8200.
- cmd/picoclaw/main.go: Whisper is tried first when enabled; falls back to
  Groq if Whisper is not reachable. Both attach to Telegram, Discord, and
  Slack via the Transcriber interface introduced in the previous commit.

Config example:
  "tools": {
    "whisper": {
      "enabled": true,
      "api_base": "http://localhost:8200"
    }
  }

Depends-on: refactor(voice): introduce Transcriber interface
Adds outbound voice capability: the agent can now reply with audio by
setting voice=true on the message tool, useful when the user sends a voice
message or explicitly requests audio.

Changes:
- pkg/voice/synthesizer.go: Synthesizer interface (Synthesize, IsAvailable)
- pkg/voice/kokoro.go: KokoroSynthesizer — talks to any OpenAI-compatible
  /v1/audio/speech endpoint (Kokoro, Piper, etc.). Health-check via GET
  /v1/models. Returns a temp .mp3 path; caller cleans up.
- pkg/bus/types.go: add Media []string to OutboundMessage (backward-
  compatible, omitempty). Enables any channel to receive file paths.
- pkg/channels/manager.go: add SendFileToChannel() — synchronous media
  send that routes local file paths through the channel's Send().
- pkg/tools/message.go: add voice=true parameter + SynthesizeCallback +
  SendMediaCallback. Voice path: synthesize → send file → cleanup.
  Falls back to text if TTS unavailable. HasSentInRound fires for both.
- pkg/agent/loop.go: add SetVoiceCallbacks() to attach TTS to message tool
  after channel manager is available.
- cmd/picoclaw/main.go: wire Kokoro TTS after channels init; attaches to
  message tool via SetVoiceCallbacks().

Config example:
  "tools": {
    "tts": {
      "enabled": true,
      "api_base": "http://localhost:8100",
      "voice": "en_us-lessac-medium"
    }
  }

Depends-on: feat(voice/stt): add local Whisper STT provider
Expand TTSConfig with model, format, and speed so the agent's voice is
fully configurable from config.json without touching code.

- TTSConfig gains: Model, Format, Speed fields (all env-overridable)
- KokoroSynthesizer: add TTSProfile struct + NewKokoroSynthesizerFromProfile()
  NewKokoroSynthesizer() is kept as a convenience wrapper (backward-compat)
- kokoroRequest: pass format and speed through to the API
- Temp file extension follows configured format (mp3/wav/ogg/etc.)
- main.go: wire all profile fields from config
- config.example.json: updated with model/format/speed examples

Full profile example:
  "tts": {
    "enabled": true,
    "api_base": "http://localhost:8100",
    "voice":   "af_nova",
    "model":   "kokoro",
    "format":  "mp3",
    "speed":   1.0
  }
Chatterbox exposes a /synthesize endpoint alongside the standard
/v1/audio/speech one. The native endpoint adds two parameters unavailable
in the OpenAI-compatible API:
  - exaggeration (0.0–1.0): emotional expressiveness of the voice
  - cfg_weight  (0.0–1.0): how closely the voice follows the prompt

Routing: when model starts with 'chatterbox' (case-insensitive), Synthesize()
posts to /synthesize with the Chatterbox body; otherwise it uses the standard
/v1/audio/speech path. All other backends are unaffected.

Changes:
- kokoro.go: chatterboxRequest struct, isChatterbox() helper, Synthesize()
  branching logic, exaggeration/cfgWeight fields on KokoroSynthesizer
- TTSProfile: Exaggeration + CFGWeight fields (defaults: 0.5 / 0.5)
- config.go: TTSConfig gains Exaggeration + CFGWeight (env-overridable)
- main.go: wire new fields through TTSProfile
- config.example.json: document exaggeration + cfg_weight

Chatterbox config example:
  "tts": {
    "enabled": true,
    "api_base": "http://localhost:8100",
    "model":    "chatterbox-1",
    "voice":    "default",
    "format":   "mp3",
    "exaggeration": 0.5,
    "cfg_weight":   0.5
  }
Myka added 2 commits February 17, 2026 13:03
/v1/models returns 404 on Chatterbox — use /health instead.
All other backends keep using /v1/models.
Adds a full Matrix protocol channel using mautrix-go, enabling agents to
communicate over any Matrix homeserver (matrix.org, self-hosted Synapse,
Conduit, etc.).

Features:
- Text messages: Markdown → Matrix HTML (m.text / m.notice)
- Inbound voice: audio events passed through the Transcriber interface
  (Whisper or Groq) → text before reaching the agent
- Outbound media: m.image / m.audio / m.video / m.file events with
  proper MIME detection and Matrix content repository upload
- Outbound voice: works via voice=true on the message tool (TTS PR)
- Native typing indicator: PUT /typing instead of a placeholder message
- Group chat: configurable require_mention_in_group (default: true)
- invite handling: join_on_invite (default: true)
- allow_from filter: restrict to specific Matrix user IDs
- Historical event guard: events before process start are ignored

New files:
- pkg/channels/matrix.go: MatrixChannel implementation
- docs/MATRIX_SETUP.md: step-by-step setup guide

Config changes:
- pkg/config/config.go: MatrixConfig added to ChannelsConfig
- config/config.example.json: matrix, tts, and whisper example sections

Dependency: maunium.net/go/mautrix v0.26.3 (go.mod / go.sum updated)

Config example:
  "channels": {
    "matrix": {
      "enabled": true,
      "homeserver": "https://matrix.example.com",
      "user_id": "@bot:matrix.example.com",
      "access_token": "syt_...",
      "allow_from": [],
      "join_on_invite": true,
      "require_mention_in_group": true
    }
  }

Depends-on:
  - refactor(voice): introduce Transcriber interface
  - feat(voice/tts): add TTS synthesis and voice parameter on message tool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant