Real-time Karaoke System for Smart Glasses

Overview

This system provides real-time song recognition and synchronized lyric display for smart glasses using MentraOS. Users wearing smart glasses can see song information and synchronized lyrics for any music playing around them, creating an immersive karaoke experience.

🚀 Current Status

Working! The system successfully recognizes songs and displays lyrics on smart glasses. See PROJECT_STATUS.md for detailed implementation status, known issues, and next steps.

Core Features

Real-time Audio Recognition - Continuously identify songs from ambient audio
Synchronized Lyrics Display - Show timed lyrics chunked for smart glasses
Position Tracking - Track current position in detected songs with error correction
Song History - Maintain log of all identified songs
Fallback Display - Show song info + timestamp when lyrics unavailable

Architecture

Main Application Class

class KaraokeApp extends AppServer {
  private userSessions = new Map<string, UserSession>();
  
  // Handles new user connections, creates UserSession instances
  protected async onSession(session: AppSession, sessionId: string, userId: string): Promise<void>
  
  // Cleanup when users disconnect
  protected async onStop(sessionId: string, userId: string, reason: string): Promise<void>
}

UserSession Class

Each connected user gets an isolated UserSession containing all managers:

class UserSession {
  // Core identification
  userId: string;
  sessionId: string;
  session: AppSession;  // MentraOS session for audio/display
  
  // State management
  currentSong?: CurrentSong;
  appState: AppState;
  
  // Managers (each handles specific functionality)
  recognitionManager: RecognitionManager;
  lyricsManager: LyricsManager;
  positionTracker: PositionTracker;
  displayManager: DisplayManager;
  historyManager: HistoryManager;
  
  constructor(userId: string, sessionId: string, session: AppSession)
  startListening(): void
  cleanup(): void
}

Core Types and Interfaces

enum AppState {
  LISTENING,                    // Waiting for song detection
  SONG_DETECTED_NO_LYRICS,     // Song found but no LRC available
  SONG_DETECTED_WITH_LYRICS,   // Song found with synchronized lyrics
  PROCESSING                   // Fetching lyrics/processing
}

interface CurrentSong {
  title: string;
  artist: string;
  album?: string;
  duration: number;           // Total song length in seconds
  detectedAt: number;         // Timestamp when first detected
  hasLyrics: boolean;
  lrcData?: LRCLine[];
  confidence: number;         // Recognition confidence (0-1)
}

interface LRCLine {
  timestamp: number;          // Time in seconds
  text: string;              // Lyric text
  endTime?: number;          // Optional end time for line
}

interface RecognitionResult {
  title: string;
  artist: string;
  album?: string;
  duration?: number;
  offsetSeconds?: number;     // Where in song detection occurred
  confidence: number;
  error?: string;
}

interface LyricsChunk {
  lines: string[];           // Max 2 lines
  startTime: number;         // When chunk should start displaying
  endTime: number;           // When chunk should end
  wordsPerLine: number[];    // Words count per line for validation
}

interface RecognitionPoint {
  timestamp: number;         // When recognition occurred
  detectedOffset: number;    // Where in song we detected
  confidence: number;
  apiLatency: number;        // How long API call took
}

interface SongHistoryEntry {
  title: string;
  artist: string;
  album?: string;
  identifiedAt: number;      // Timestamp
  duration?: number;
  confidence: number;
}

Manager Responsibilities

RecognitionManager

Handles continuous audio processing and ACRCloud integration:

class RecognitionManager {
  private audioBuffer: Buffer[];
  private isRecording: boolean;
  private lastRecognitionTime: number;
  private readonly RECOGNITION_INTERVAL = 12000; // 12 seconds between checks
  private readonly AUDIO_BUFFER_DURATION = 8000; // 8 seconds of audio
  
  startListening(): void                          // Begin audio capture
  processAudioChunk(audioData: Buffer): void      // Handle incoming audio
  performRecognition(): Promise<RecognitionResult | null>  // Call ACRCloud
  shouldRecognize(): boolean                      // Check if time for next recognition
  reset(): void                                   // Clear buffers and state
}

LyricsManager

Fetches and processes LRC files, chunks lyrics for display:

class LyricsManager {
  private cachedLRC = new Map<string, LRCLine[]>();
  
  fetchLyrics(song: CurrentSong): Promise<LRCLine[] | null>    // Get LRC from sources
  chunkLyrics(lrcData: LRCLine[]): LyricsChunk[]              // Split into display chunks
  getCurrentChunk(position: number): LyricsChunk | null        // Get chunk for current time
  private parseLRC(lrcContent: string): LRCLine[]             // Parse LRC format
  private smartChunk(lines: LRCLine[]): LyricsChunk[]         // Apply 8-word/60-char limits
}

Chunking Rules:

Max 8 words per line
Max 60 characters per line
Max 2 lines per chunk
Never break words in middle
If single word exceeds 60 chars, show on its own line

PositionTracker

Tracks song position with continuous calibration:

class PositionTracker {
  private songStartTime?: number;
  private detectedOffset?: number;
  private recognitionHistory: RecognitionPoint[];
  private estimatedDrift: number;
  
  startSong(detectedAt: number, songOffset: number, apiLatency: number): void
  getCurrentPosition(): number                    // Current seconds into song
  validatePosition(newOffset: number, detectionTime: number): boolean
  recalibrate(newOffset: number, detectionTime: number): void
  getConfidence(): number                         // How confident we are in timing
  reset(): void
}

Position Calculation:

current_position = (now - song_start_time) + detected_offset - estimated_drift

DisplayManager

Manages what appears on smart glasses:

class DisplayManager {
  private currentDisplay: string;
  private lastUpdateTime: number;
  
  showListening(): void                           // "♪ Listening..." state
  showSongInfo(song: CurrentSong, position: number): void  // Song title/artist + timestamp
  showLyrics(chunk: LyricsChunk): void           // Chunked lyrics display
  showProcessing(): void                          // "Processing..." state
  private formatSongInfo(song: CurrentSong, position: number): string
  private formatLyrics(chunk: LyricsChunk): string
}

Display Formats:

Listening: ♪ Listening...
Song Info: ♪ Bohemian Rhapsody\n Queen\n 2:34 / 5:55
Lyrics: Is this the real life?\nIs this just fantasy?

HistoryManager

Tracks identified songs over time:

class HistoryManager {
  private history: SongHistoryEntry[];
  private readonly MAX_HISTORY_SIZE = 100;
  
  addSong(song: CurrentSong): void               // Add to history
  getRecentSongs(limit?: number): SongHistoryEntry[]  // Get recent entries
  isDuplicate(song: CurrentSong): boolean        // Check if song recently added
  clearHistory(): void                           // Reset history
  exportHistory(): string                        // JSON export of history
}

External Services

ACRCloudService

Wrapper for ACRCloud API calls:

class ACRCloudService {
  private host: string;
  private accessKey: string;
  private secretKey: string;
  
  recognize(audioBuffer: Buffer): Promise<RecognitionResult>
  private buildSignature(timestamp: number): string
  private createFormData(audioBuffer: Buffer, signature: string, timestamp: number): FormData
}

LRCService

Fetches LRC files from multiple sources:

class LRCService {
  private sources: LRCSource[];
  
  fetchLRC(title: string, artist: string): Promise<string | null>
  private trySource(source: LRCSource, title: string, artist: string): Promise<string | null>
}

interface LRCSource {
  name: string;
  url: string;
  searchEndpoint: string;
  downloadEndpoint: string;
}

Audio Processing Flow

Audio Capture: MentraOS streams audio chunks to RecognitionManager
Buffer Management: Maintain 8-second rolling buffer of audio
Recognition Timing: Perform recognition every 12 seconds
Result Processing: Parse ACRCloud response, extract song metadata
State Update: Update UserSession state, trigger appropriate actions

Lyrics Processing Flow

LRC Fetching: Try multiple sources to find synchronized lyrics
Parsing: Convert LRC format to internal LRCLine objects
Chunking: Split lyrics into smart glasses friendly chunks
Synchronization: Match chunks to current song position
Display: Send formatted chunks to smart glasses

Position Tracking Flow

Initial Detection: Record when song first detected + estimated offset
Continuous Validation: Re-recognize every 12 seconds
Drift Calculation: Compare predicted vs actual position
Recalibration: Adjust timing if drift exceeds threshold (±3 seconds)
Confidence Scoring: Weight recent recognitions more heavily

Error Handling

Recognition Errors

No match found: Continue listening, show "♪ Listening..."
Low confidence: Require multiple confirmations before state change
API failure: Retry with exponential backoff

Lyrics Errors

No LRC available: Fall back to song info + timestamp display
Parse failure: Log error, treat as no lyrics available
Chunk timing issues: Skip problematic chunks, continue with valid ones

Position Tracking Errors

Negative position: Reset to 0, recalibrate on next recognition
Position beyond song: Assume song ended, return to listening state
Large drift: Force recalibration, log for debugging

Performance Considerations

Caching: Cache LRC files to avoid repeated API calls
Memory Management: Limit audio buffer size, clear old data
Rate Limiting: Respect ACRCloud API limits
Battery Optimization: Efficient audio processing for mobile devices

Configuration

interface KaraokeConfig {
  recognition: {
    intervalMs: 12000;        // Time between recognitions
    bufferDurationMs: 8000;   // Audio buffer size
    confidenceThreshold: 0.7; // Minimum confidence for recognition
    maxDriftSeconds: 3;       // Max timing drift before recalibration
  };
  display: {
    maxWordsPerLine: 8;
    maxCharsPerLine: 60;
    linesPerChunk: 2;
    updateIntervalMs: 500;    // Display refresh rate
  };
  history: {
    maxEntries: 100;
    duplicateWindowMs: 30000; // Don't add same song within 30 seconds
  };
}

Implementation Status

✅ Completed

Phase 1: Basic recognition + song info display
Phase 2: LRC fetching + basic lyrics display
Phase 3: Basic chunking + position tracking (needs refinement)
Phase 4: History management + error handling (partial)

🚧 In Progress

Smart chunking for 5-line display limit
Position recalibration
Song switching detection

📋 Planned

Phase 5: Performance optimization + caching
Test suite with visual simulator
Intelligent LRC preprocessing

Quick Start

Set environment variables:

ACRCLOUD_HOST=identify-us-west-2.acrcloud.com
ACRCLOUD_ACCESS_KEY=your_key
ACRCLOUD_ACCESS_SECRET=your_secret

Install dependencies:
```
bun install
```
Run the app:
```
bun run dev
```

This system creates a seamless real-time karaoke experience where users can see synchronized lyrics for any song playing around them, with intelligent fallbacks and continuous accuracy improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
.idea		.idea
docs		docs
public/css		public/css
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
app_config.json		app_config.json
bun.lock		bun.lock
package-lock.json		package-lock.json
package.json		package.json
porter.yaml		porter.yaml

Folders and files

Latest commit

History

Repository files navigation

Real-time Karaoke System for Smart Glasses

Overview

🚀 Current Status

Core Features

Architecture

Main Application Class

UserSession Class

Core Types and Interfaces

Manager Responsibilities

RecognitionManager

LyricsManager

PositionTracker

DisplayManager

HistoryManager

External Services

ACRCloudService

LRCService

Audio Processing Flow

Lyrics Processing Flow

Position Tracking Flow

Error Handling

Recognition Errors

Lyrics Errors

Position Tracking Errors

Performance Considerations

Configuration

Implementation Status

✅ Completed

🚧 In Progress

📋 Planned

Quick Start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages