Skip to content

benjaminbelaga/voice-midi-controller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Voice MIDI Controller

Control hardware synthesizers with your voice. AI-powered musical collaborator that listens, executes, proposes, and learns from your references.

Three Workflows

This project combines three creative workflows:

A — Voice → MIDI Hardware

Speak musical ideas, control your Elektron Syntakt (or any MIDI synth) in real-time.

"Add a note on E3 on step 5"     → MIDI note sent
"Make the filter darker"          → CC 74 lowered
"Propose a bassline"              → AI generates, you say "next" or "yes"

B — Reference-Based Composition

Feed 5 reference tracks. The AI analyzes them, extracts patterns, and proposes ideas inspired by the pool.

5 tracks → stems separated → key/BPM/chords extracted → AI proposes
"Propose something inspired by the references"
"More like track 2 but slower"
"Next" / "Yes" / "Darker"

C — Audio Analysis & Stem Separation

Separate any track into stems (drums, bass, vocals, synths) and extract musical data (key, BPM, chords, MIDI).

python analyzer/pipeline.py references/ --output output/
# → stems/ + midi/ + analysis.json for each track

How They Connect

C (analyze refs) → feeds → B (creative proposals) → outputs via → A (MIDI to hardware)

Each workflow works independently. Use just A for live control, just C for analysis, or the full pipeline.

Architecture

Component Tech Runtime
MIDI Engine Node.js + easymidi (MCP server) Real-time
Creative Engine Claude AI (MCP tools) Interactive
Audio Analyzer Python: Demucs + Essentia + Basic Pitch Offline/batch
Voice Input STT → Claude → MCP Real-time

See docs/architecture-overview.md for the full breakdown.

Primary Hardware: Elektron Syntakt

First target is the Syntakt (12-track digital/analog groovebox). Full CC mapping included. Works with any MIDI-capable hardware — just update the config file.

See examples/syntakt-config.json for the complete mapping.

Key Concept: The "Next" Paradigm

Claude doesn't just execute commands — it proposes musical ideas. The musician navigates:

Voice Effect
"Propose a bassline" AI generates, sends to synth
"Next" New variation
"Yes" / "Keep" Lock it in
"More like that" Variations in same direction
"Darker" / "Simpler" Guide the direction
"Something completely different" Reset approach

See examples/collaborative-workflow.md for a full session example.

Documentation

Doc Contents
Architecture Overview Three workflows, tool choices, how they connect
Research 5 approaches evaluated, STT options, existing tools
Design Syntakt MIDI mapping, interaction modes, MCP tools
Speech-to-Text STT engines compared
Approaches Detailed evaluation of each technical approach

Tool Stack

Real-time (MCP Server)

Tool Role
easymidi MIDI output to hardware
MCP SDK Claude ↔ MIDI bridge

Audio Analysis (Python CLI)

Tool Role
Demucs v4 Stem separation (best OSS)
Essentia Key, BPM, chord detection
Basic Pitch Audio → MIDI conversion

Alternatives Evaluated

Tool Verdict
ableton-mcp Good if you use Ableton (not required)
strudel-mcp Good for patterns, but browser-dependent
daw-mcp Supports Bitwig + Ableton
Spleeter Outdated, surpassed by Demucs
LALAL.ai / AudioShake Commercial, unnecessary for this use

Status

Phase: Research & Design complete. Implementation next.

  • Research: 5 approaches evaluated
  • Design: Architecture decided, MCP tools defined
  • Hardware: Syntakt MIDI CC mapping complete
  • Tool stack: Demucs + Essentia + Basic Pitch + easymidi
  • Examples: Patterns, CC control, notes, collaborative workflow
  • Phase 1: Build MCP server + sequencer
  • Phase 2: Musical intelligence + proposal engine
  • Phase 3: Voice integration (STT)
  • Phase 4: Reference analysis pipeline

Quick Start (Coming Soon)

# Install MCP server
cd mcp-server && npm install

# Install analyzer
cd analyzer && pip install demucs essentia basic-pitch

# Configure Claude Code
# Add MCP server to ~/.claude.json

# Start creating
claude "Connect to Syntakt and play a C minor arpeggio"

License

MIT

About

Control hardware synthesizers with your voice via MIDI. Research, architecture and tools for voice-to-MIDI systems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors