Skip to content

voiceapiai/js-tts-wrapper

 
 

Repository files navigation

js-tts-wrapper

A JavaScript/TypeScript library that provides a unified API for working with multiple cloud-based Text-to-Speech (TTS) services. Inspired by py3-TTS-Wrapper, it simplifies the use of services like Azure, Google Cloud, IBM Watson, and ElevenLabs.

Features

  • Unified API: Consistent interface across multiple TTS providers
  • SSML Support: Use Speech Synthesis Markup Language to enhance speech synthesis
  • Speech Markdown: Optional support for easier speech markup
  • Voice Selection: Easily browse and select from available voices
  • Streaming Synthesis: Stream audio as it's being synthesized
  • Playback Control: Pause, resume, and stop audio playback
  • Word Boundaries: Get callbacks for word timing (where supported)
  • File Output: Save synthesized speech to audio files

Installation

npm install js-tts-wrapper

Optional Dependencies

The library uses a modular approach where TTS engine-specific dependencies are optional. Install only what you need using the provided scripts:

# For Azure TTS
npm run install:azure

# For Google Cloud TTS
npm run install:google

# For ElevenLabs TTS
npm run install:elevenlabs

# For AWS Polly TTS
npm run install:polly

# For OpenAI TTS
npm run install:openai

# For PlayHT TTS
npm run install:playht

# For SherpaOnnx TTS (offline TTS)
npm run install:sherpaonnx

# Install all cloud-based engines (Azure, Google, OpenAI, Polly)
npm run install:cloud

# Install all supported engines
npm run install:all

Or install dependencies manually:

# For Azure TTS
npm install @azure/cognitiveservices-speechservices@^1.0.0 microsoft-cognitiveservices-speech-sdk@^1.43.1

# For Google Cloud TTS
npm install @google-cloud/text-to-speech@^6.0.1

# For ElevenLabs TTS
npm install node-fetch@^2.0.0

# For AWS Polly TTS
npm install @aws-sdk/client-polly@^3.782.0

# For OpenAI TTS
npm install openai@^4.93.0

# For PlayHT TTS
npm install node-fetch@^2.0.0

# For SherpaOnnx TTS (offline TTS)
# Note: This is a native module that requires compilation
npm install sherpa-onnx-node@^1.11.3 decompress@^4.2.1 decompress-bzip2@^4.0.0 decompress-tarbz2@^4.1.1 decompress-targz@^4.1.1 tar-stream@^3.1.7

# For native platforms (macOS, Linux, Windows), you need to set environment variables
# You can use the provided helper scripts which handle all platforms:
node scripts/run-with-sherpaonnx.js your-script.js
# Or set manually:
# - macOS: export DYLD_LIBRARY_PATH=/path/to/your/project/node_modules/sherpa-onnx-darwin-arm64:$DYLD_LIBRARY_PATH
# - Linux: export LD_LIBRARY_PATH=/path/to/your/project/node_modules/sherpa-onnx-linux-x64:$LD_LIBRARY_PATH
# - Windows: set PATH=C:\path\to\your\project\node_modules\sherpa-onnx-win32-x64;%PATH%

# For SherpaOnnx WebAssembly TTS (browser-compatible TTS)
# No additional npm packages are needed, but you need to build the WebAssembly module
npm run install:sherpaonnx-wasm

# Then follow the instructions in docs/sherpaonnx-wasm.md to build the WebAssembly module

If you encounter issues installing or using sherpa-onnx-node, you can still use the wrapper with a mock implementation for testing purposes.

Quick Start

import { AzureTTSClient } from 'js-tts-wrapper';

// Initialize the client with your credentials
const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// List available voices
const voices = await tts.getVoices();
console.log(voices);

// Set a voice
tts.setVoice('en-US-AriaNeural');

// Speak some text
await tts.speak('Hello, world!');

// Use SSML for more control
const ssml = '<speak>Hello <break time="500ms"/> world!</speak>';
await tts.speak(ssml);

Supported Providers

Provider SSML Streaming Word Timing File Output Notes Version
Azure Yes Yes Yes Yes Full SSML support 1.43.1
Google Cloud Yes Yes Yes Yes Full SSML + markup 6.0.1
ElevenLabs No* Yes Partial Yes Strip SSML automatically node-fetch 2.0.0
OpenAI No* Yes Estimated** Yes Multiple voices available 4.93.0
PlayHT No* Yes Estimated** Yes Multiple voice engines node-fetch 2.0.0
AWS Polly Yes Yes Yes Yes Full SSML support 3.782.0
SherpaOnnx No* Yes Estimated** Yes Offline TTS, no internet 1.11.3
SherpaOnnx-Wasm No* Yes Estimated** Yes Browser-compatible TTS WebAssembly

*Engines that don't support SSML will automatically strip SSML tags and process the plain text.

**Word timings are estimated based on the total audio duration and word count.

Coming Soon

Provider Status
IBM Watson Planned

Core API

// Main methods
speak(text: string | SSML, options?: SpeakOptions): Promise<void>
speakStreamed(text: string | SSML, options?: SpeakOptions): Promise<void>
synthToBytes(text: string | SSML, options?: SpeakOptions): Promise<Uint8Array>
synthToBytestream(text: string | SSML, options?: SpeakOptions): Promise<ReadableStream<Uint8Array>>
getVoices(): Promise<UnifiedVoice[]>
setVoice(voiceId: string): void

// Playback control
pause(): void
resume(): void
stop(): void

// Events
on(event: 'start' | 'end' | 'boundary', callback: Function): void

SSML and Speech Markdown

The library provides utilities for working with SSML and Speech Markdown:

import { SSMLBuilder, SpeechMarkdownConverter } from 'js-tts-wrapper';

// Using the SSML builder
const builder = new SSMLBuilder();
const ssml = builder
  .add('Hello world')
  .addBreak('500ms')
  .addProsody('This is important', 'medium', 'high', '90')
  .toString();

// Using Speech Markdown
const markdown = 'Hello [500ms] (pitch:high world)';
const ssml = SpeechMarkdownConverter.toSSML(markdown);

Advanced Usage

Offline TTS with SherpaOnnx

The library includes support for SherpaOnnx, an offline TTS engine that doesn't require internet access:

import { SherpaOnnxTTSClient } from 'js-tts-wrapper';

// Initialize the client
const tts = new SherpaOnnxTTSClient();

// The client will automatically download models when needed
// You can also specify a custom models directory
const tts = new SherpaOnnxTTSClient({
  modelsDir: '/path/to/models'
});

// List available voices
const voices = await tts.getVoices();
console.log(voices);

// Set a voice
await tts.setVoice('icefall-fs-ljspeech-medium');

// Speak some text
await tts.speak('Hello, world!');

Note: SherpaOnnx requires the sherpa-onnx-node package, which is a native module that requires compilation.

On native platforms (macOS, Linux, Windows), you need to set environment variables. You can do this in several ways:

  1. Use the provided helper scripts which handle all platforms:
# Using the Node.js script
node scripts/run-with-sherpaonnx.js your-script.js

# Or using the shell script (Unix/macOS only)
./scripts/run-with-sherpaonnx.sh your-script.js
  1. Set it manually before running your script:
# macOS
export DYLD_LIBRARY_PATH=/path/to/your/project/node_modules/sherpa-onnx-darwin-arm64:$DYLD_LIBRARY_PATH

# Linux
export LD_LIBRARY_PATH=/path/to/your/project/node_modules/sherpa-onnx-linux-x64:$LD_LIBRARY_PATH

# Windows (Command Prompt)
set PATH=C:\path\to\your\project\node_modules\sherpa-onnx-win32-x64;%PATH%

If you encounter issues installing or using sherpa-onnx-node, the wrapper will fall back to a mock implementation for testing purposes.

OpenAI TTS

The library includes support for OpenAI's text-to-speech API:

import { OpenAITTSClient } from 'js-tts-wrapper';

// Initialize the client
const tts = new OpenAITTSClient({
  apiKey: 'your-openai-api-key', // Optional, defaults to OPENAI_API_KEY environment variable
});

// List available voices
const voices = await tts.getVoices();
console.log(voices);

// Set a voice
tts.setVoice('alloy');

// Set the model (defaults to gpt-4o-mini-tts)
tts.setModel('gpt-4o-mini-tts');

// Set instructions for the TTS engine
tts.setInstructions('Speak in a friendly and clear tone.');

// Set the response format (mp3, opus, aac, flac, wav, pcm)
tts.setResponseFormat('mp3');

// Speak some text
await tts.speak('Hello, world!');

// Stream some text
await tts.speakStreamed('This is streaming audio from OpenAI.');

Note: OpenAI TTS does not support SSML, and word boundaries are estimated since OpenAI doesn't provide word timing information.

PlayHT TTS

The library includes support for PlayHT's text-to-speech API:

import { PlayHTTTSClient } from 'js-tts-wrapper';

// Initialize the client
const tts = new PlayHTTTSClient({
  apiKey: 'your-playht-api-key', // Optional, defaults to PLAYHT_API_KEY environment variable
  userId: 'your-playht-user-id', // Optional, defaults to PLAYHT_USER_ID environment variable
});

// List available voices
const voices = await tts.getVoices();
console.log(voices);

// Set a voice
tts.setVoice('s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json');

// Set the voice engine (PlayHT1.0, PlayHT2.0, or null)
tts.setVoiceEngine('PlayHT1.0');

// Set the output format (wav, mp3)
tts.setOutputFormat('wav');

// Speak some text
await tts.speak('Hello, world!');

// Stream some text
await tts.speakStreamed('This is streaming audio from PlayHT.');

Note: PlayHT TTS does not support SSML, and word boundaries are estimated since PlayHT doesn't provide word timing information.

Language Normalization

The library provides a unified language normalization system that works across all TTS engines:

import { AzureTTSClient, LanguageNormalizer } from 'js-tts-wrapper';

// Normalize a language code
const normalized = LanguageNormalizer.normalize('en-US');
console.log(normalized);
// { bcp47: 'en-US', iso639_3: 'eng', display: 'English (United States)' }

// Get voices by language (works with both BCP-47 and ISO 639-3 codes)
const tts = new AzureTTSClient({ /* credentials */ });
const enVoices = await tts.getVoicesByLanguage('en-US'); // BCP-47
const engVoices = await tts.getVoicesByLanguage('eng');  // ISO 639-3

For more details, see Language Normalization.

Word Boundary Callbacks

import { AzureTTSClient } from 'js-tts-wrapper';

const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// Set up callbacks
function onWordBoundary(word, startTime, endTime) {
  console.log(`Word: ${word}, Duration: ${endTime - startTime}ms`);
}

function onStart() {
  console.log('Speech started');
}

function onEnd() {
  console.log('Speech ended');
}

// Connect callbacks
tts.connect('onStart', onStart);
tts.connect('onEnd', onEnd);

// Start playback with word boundary callbacks
await tts.startPlaybackWithCallbacks('Hello world', onWordBoundary);

Saving to File

import { AzureTTSClient } from 'js-tts-wrapper';
import { writeFile } from 'fs/promises';

const tts = new AzureTTSClient({
  subscriptionKey: 'your-subscription-key',
  region: 'westeurope'
});

// Synthesize to bytes and save to file
const audioBytes = await tts.synthToBytes('Hello world', { format: 'mp3' });
await writeFile('output.mp3', Buffer.from(audioBytes));

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Testing

The library includes a comprehensive test suite. For details on running tests, see TESTING.md.

# Run all tests
npm test

# Run TTS engine tests
npm run test:tts

# Run tests for specific engines
npm run test:azure
npm run test:elevenlabs
npm run test:google
npm run test:polly
npm run test:sherpaonnx

Examples

Examples are available in the examples directory:

# Run the unified example
npm run example

# Run examples for specific engines
npm run example:azure
npm run example:elevenlabs
npm run example:google
npm run example:polly
npm run example:sherpaonnx

Compatibility

This library is designed to be compatible with modern JavaScript frameworks and environments:

  • Node.js: Compatible with Node.js 18.x and 20.x
  • React: Compatible with React 18 and React 19
  • Next.js: Compatible with Next.js 13 and 14
  • TypeScript: Full TypeScript support with type definitions

The library uses a modular dependency approach that allows you to install only what you need, reducing bundle size and avoiding unnecessary dependencies.

SherpaOnnx WebAssembly TTS

SherpaOnnx WebAssembly TTS is a browser-compatible version of SherpaOnnx TTS. It uses WebAssembly to run the TTS engine directly in the browser, without requiring native modules or environment variables.

import { SherpaOnnxWasmTTSClient } from 'js-tts-wrapper';

// Create a SherpaOnnx WebAssembly TTS client
const tts = new SherpaOnnxWasmTTSClient({
  wasmPath: './sherpaonnx-wasm/tts.js'
});

// Initialize the WebAssembly module
await tts.initializeWasm('./sherpaonnx-wasm/tts.js');

// Get available voices
const voices = await tts.getVoices();
console.log(`Found ${voices.length} voices`);

// Set a voice
await tts.setVoice('piper_en_US');

// Synthesize speech
const text = 'This is a test of SherpaOnnx WebAssembly Text to Speech synthesis.';
const audioBytes = await tts.synthToBytes(text, { format: 'wav' });

// Play the audio in a browser
const audioBlob = new Blob([audioBytes], { type: 'audio/wav' });
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();

Note: SherpaOnnx WebAssembly TTS requires building the WebAssembly module first. See docs/sherpaonnx-wasm.md for details.

License

MIT

About

JS TTS (Text to Speech) Wrapper Module enabling TTS for all TTS

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • TypeScript 93.4%
  • JavaScript 4.3%
  • Shell 2.3%