Production-grade, renderer-agnostic streaming lip-sync engine for browser-based 2D animation. Real-time viseme detection from streaming audio via AudioWorklet + Web Audio API.
Zero dependencies. ~15KB minified. Works with any 2D rendering approach.
Pixel art cowgirl talking via OpenAI Realtime API with real-time lip sync β try the demo
Existing lip-sync solutions are either C++ desktop tools (Rhubarb), tied to 3D avatars (TalkingHead), or require paid cloud APIs (Azure, ElevenLabs viseme endpoints). This library fills the gap: a lightweight, browser-native engine that takes streaming audio in and emits viseme events out β bring your own renderer.
npm install @beer-digital/lipsync-engineimport { LipSyncEngine, SVGMouthRenderer, base64ToInt16 } from '@beer-digital/lipsync-engine';
// 1. Create engine
const engine = new LipSyncEngine({
sampleRate: 24000,
workletUrl: '/streaming-processor.js', // Copy from dist/worklet/
});
// 2. Create SVG mouth renderer (no sprite sheet needed)
const mouth = new SVGMouthRenderer(document.getElementById('avatar-mouth'), {
width: 120,
height: 80,
lipColor: '#cc4444',
showTeeth: true,
});
// 3. Initialize (must be after user gesture)
await engine.init();
// 4. Wire up viseme events to renderer
engine.on('viseme', (frame) => mouth.render(frame));
engine.startAnalysis();
// 5. Feed audio from OpenAI Realtime API
const ws = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-realtime', ...);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'response.output_audio.delta') {
engine.feedAudio(base64ToInt16(data.delta));
}
};const engine = new LipSyncEngine();
await engine.init();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
engine.attachStream(stream); // Analyzes without playing back (no feedback)
engine.on('viseme', (frame) => {
console.log(frame.viseme, frame.intensity, frame.shape);
});
engine.startAnalysis();const engine = new LipSyncEngine();
await engine.init();
const audio = document.querySelector('audio');
engine.attachElement(audio);
engine.startAnalysis();
engine.on('viseme', (frame) => {
myCharacter.setMouth(frame.simpleViseme); // 'A' through 'F'
});
audio.play();ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LipSyncEngine β
β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββ β
β β Audio Input βββββΆβ AudioWorklet βββββΆβ AnalyserNode β β
β β - PCM feed β β Ring buffer β β FFT analysis β β
β β - MediaStr β β Gapless play β β Band energies β β
β β - Element β β Position rpt β β β β
β βββββββββββββββ ββββββββββββββββ ββββββββββ¬βββββββββββ β
β β β
β ββββββββββΌβββββββββββ β
β β FrequencyAnalyzer β β
β β Viseme detection β β
β β Smoothing/holdoff β β
β β Shape interp. β β
β ββββββββββ¬βββββββββββ β
β β β
β emit('viseme', frame) β
ββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β β β
ββββββββββΌββββ βββββββββΌβββββ ββββββββΌβββββββ
βSVGMouth β βCanvas β βCSSClass β
βRenderer β βRenderer β βRenderer β
β(Procedural)β β(Sprites) β β(CSS classes)β
ββββββββββββββ ββββββββββββββ βββββββββββββββ
| Key | Phonemes | Description | Mouth Shape |
|---|---|---|---|
| sil | (silence) | Mouth closed | open: 0, width: 0.5 |
| PP | P, B, M | Lips pressed together | open: 0, width: 0.4 |
| FF | F, V | Lower lip to upper teeth | open: 0.05, width: 0.55 |
| TH | TH | Tongue between teeth | open: 0.1, width: 0.5 |
| DD | D, T, N, L | Tongue to upper palate | open: 0.2, width: 0.5 |
| kk | K, G | Back of tongue raised | open: 0.25, width: 0.45 |
| CH | CH, SH, J | Lips pursed forward | open: 0.15, round: 0.6 |
| SS | S, Z | Teeth close, slight smile | open: 0.05, width: 0.6 |
| nn | N, NG | Mouth slightly open | open: 0.15, width: 0.5 |
| RR | R | Lips slightly rounded | open: 0.2, round: 0.4 |
| aa | AA, AH | Wide open mouth | open: 0.9, width: 0.6 |
| E | EH, AE | Mouth open, slight smile | open: 0.5, width: 0.65 |
| I | IH, IY | Small opening, smile | open: 0.25, width: 0.7 |
| O | OH, AO | Rounded, medium open | open: 0.6, round: 0.8 |
| U | UW, OW | Small rounded opening | open: 0.2, round: 0.9 |
| Key | Maps to | Use for |
|---|---|---|
| A | sil | Rest / closed |
| B | PP, nn | M, B, P sounds |
| C | E, I, SS | EE, soft sounds |
| D | aa, DD | AH, wide open |
| E | O, RR, CH | OH, round sounds |
| F | FF, TH, U | OO, F/V, tight |
No sprite sheet needed β generates an animated SVG mouth driven by {open, width, round} shape parameters.
import { SVGMouthRenderer } from '@beer-digital/lipsync-engine';
const mouth = new SVGMouthRenderer(container, {
width: 120,
height: 80,
lipColor: '#cc4444',
innerColor: '#3a1111',
teethColor: '#fff',
showTeeth: true,
lipThickness: 3,
});
engine.on('viseme', (frame) => mouth.render(frame));Draw mouth frames from a sprite sheet image.
import { CanvasRenderer } from '@beer-digital/lipsync-engine';
const renderer = new CanvasRenderer(canvas, {
spriteSheet: 'mouth-sprites.png',
frameWidth: 128,
frameHeight: 128,
visemeMap: { sil: 0, PP: 1, FF: 2, aa: 3, E: 4, O: 5 },
columns: 4,
});
engine.on('viseme', (frame) => renderer.render(frame));Sets data attributes and CSS classes on any element. Great for CSS animations, Lottie, or framework components.
import { CSSClassRenderer } from '@beer-digital/lipsync-engine';
const renderer = new CSSClassRenderer(avatarElement, {
attribute: 'data-viseme',
classPrefix: 'mouth-',
useSimpleVisemes: true, // Uses A-F
setIntensity: true, // Sets CSS custom properties
});
// In CSS:
// .mouth-A { background-position: 0 0; }
// .mouth-D { background-position: -128px 0; }
// Transform with: transform: scaleY(var(--lip-open));Just listen for viseme events and render however you want:
engine.on('viseme', (frame) => {
// frame.viseme β 'aa', 'PP', 'sil', etc.
// frame.simpleViseme β 'A' through 'F'
// frame.intensity β 0..1 speech intensity
// frame.shape.open β 0..1 mouth openness
// frame.shape.width β 0..1 mouth width
// frame.shape.round β 0..1 lip roundness
// frame.confidence β 0..1 classification confidence
// frame.bands β { sub, low, mid, high, veryHigh }
// frame.transition β { from, to, progress }
// frame.timeMs β Playback position in ms
myLottieAnimation.goToFrame(visemeToFrame[frame.viseme]);
// or
myPixiSprite.texture = textures[frame.simpleViseme];
// or
myThreeJSMesh.morphTargetInfluences[0] = frame.shape.open;
});new LipSyncEngine({
sampleRate: 24000, // Expected input sample rate
fftSize: 256, // FFT window size (power of 2)
analyserSmoothing: 0.5, // AnalyserNode smoothingTimeConstant
silenceThreshold: 0.015, // RMS below this = silence
smoothingFactor: 0.35, // Viseme transition smoothing (0β1)
holdFrames: 2, // Min frames before viseme switch
volume: 1.0, // Playback volume
startThresholdMs: 50, // Buffer ms before auto-play
bufferSeconds: 5, // Ring buffer capacity
analysisMode: 'raf', // 'raf' or 'interval'
analysisIntervalMs: 16, // For interval mode
workletUrl: null, // Custom worklet URL
disablePlayback: false, // Analyze only, no audio output
});| Method | Description |
|---|---|
init(ctx?) |
Initialize audio pipeline (async, needs user gesture) |
feedAudio(samples, rate?) |
Feed Int16Array, Float32Array, or ArrayBuffer |
attachStream(stream) |
Attach MediaStream (mic, WebRTC) |
attachElement(el) |
Attach audio/video element |
startAnalysis() |
Start viseme detection loop |
stopAnalysis() |
Stop analysis |
setVolume(0β1) |
Set playback volume |
clearBuffer() |
Clear audio buffer |
play() / pause() |
Control playback |
reset() |
Reset all state |
getState() |
Get current state snapshot |
destroy() |
Release all resources |
| Event | Data | Description |
|---|---|---|
viseme |
VisemeFrame |
Emitted every analysis frame |
position |
{timeMs, bufferLevel, bufferMs, isPlaying} |
Playback position |
playbackStarted |
β | Audio playback began |
playbackEnded |
β | Fade-out complete |
bufferUnderrun |
{timeMs} |
Buffer empty |
initialized |
β | Engine ready |
destroyed |
β | Engine torn down |
import {
base64ToInt16, // Decode base64 PCM (for TTS WebSocket APIs)
int16ToBase64, // Encode PCM to base64
int16ToFloat32, // Convert Int16 β Float32
float32ToInt16, // Convert Float32 β Int16
calculateRMS, // Root Mean Square amplitude
resample, // Resample between sample rates
interpolateShapes, // Blend between viseme mouth shapes
} from '@beer-digital/lipsync-engine';const ws = new WebSocket('wss://api.elevenlabs.io/v1/text-to-speech/...');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.audio) {
engine.feedAudio(base64ToInt16(data.audio));
}
};const utterance = new SpeechSynthesisUtterance('Hello world');
const dest = engine.audioContext.createMediaStreamDestination();
// Note: Web Speech API doesn't provide raw audio access in most browsers.
// Use a TTS API with audio output for best results.peerConnection.ontrack = (event) => {
engine.attachStream(event.streams[0]);
engine.startAnalysis();
};Test with microphone, audio files, or a synthetic waveform:
npm install
npm run dev
# Opens http://localhost:3000Full voice conversation with real-time lip sync on a pixel art avatar:
npm install
OPENAI_API_KEY=sk-... npm run demo:realtime
# Opens http://localhost:3000/demo/realtime.htmlSpeak into your mic β the AI responds with voice and the avatar's mouth animates in real time. Uses a lightweight WebSocket proxy (server.js) to keep your API key server-side.
npm install
npm run dev # Interactive demo
npm run build # Build for distribution
npm run test # Run tests
npm run lint # Lint sourcelipsync-engine/
βββ src/
β βββ index.js # Main entry + exports
β βββ types.d.ts # TypeScript definitions
β βββ core/
β β βββ LipSyncEngine.js # Main orchestrator
β β βββ visemes.js # Viseme constants + mappings
β βββ analyzers/
β β βββ FrequencyAnalyzer.js # Real-time viseme detection
β βββ renderers/
β β βββ SVGMouthRenderer.js # Procedural SVG mouth
β β βββ CanvasRenderer.js # Sprite sheet renderer
β β βββ CSSClassRenderer.js # CSS class toggler
β βββ utils/
β β βββ EventEmitter.js # Typed event system
β β βββ RingBuffer.js # Lock-free ring buffer
β β βββ audio-utils.js # PCM conversion + DSP
β βββ worklets/
β βββ streaming-processor.js # AudioWorklet (standalone)
βββ demo/
β βββ index.html # Interactive demo (mic/file/synth)
β βββ realtime.html # OpenAI Realtime voice demo
β βββ avatar.png # Pixel art avatar
βββ server.js # WebSocket proxy for Realtime API
βββ package.json
βββ vite.config.js
βββ README.md
| Browser | Minimum Version | Notes |
|---|---|---|
| Chrome | 66+ | Full support |
| Firefox | 76+ | Full support |
| Safari | 14.1+ | AudioWorklet support |
| Edge | 79+ | Chromium-based |
The streaming-processor.js worklet file must be served from the same origin as your page (or with appropriate CORS headers). Copy it to your public assets:
cp node_modules/@beer-digital/lipsync-engine/dist/worklet/streaming-processor.js public/Then reference it:
const engine = new LipSyncEngine({
workletUrl: '/streaming-processor.js',
});Browsers require a user gesture before creating an AudioContext. Always call engine.init() inside a click/touch handler.
For SharedArrayBuffer support (not required but improves performance):
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
MIT β Beer Digital LLC
