Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions fern/pages/01-getting-started/universal-3-pro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,26 @@ import { AudioPlayer } from "../../assets/components/AudioPlayer";

## Overview

Universal-3-Pro is our most powerful Voice AI model yet, designed to capture the "hard stuff" that traditional ASR models struggle with, namely:
Universal-3-Pro is our most powerful Voice AI model yet, designed to capture the "hard stuff" that traditional ASR models struggle with.

- **Prompting** - control the style and context of transcription
- **Keyterms prompting** - boost accuracy of known rare words in transcription
- **Built-in code switching** - native switching between languages and context
- **Verbatim transcription** - control elements like disfluencies, stutters, false starts, colloquialisms, and more
- **Audio tags for non-speech** - add markers for non-speech events in the audio file
### Key Universal-3-Pro capabilities

Using the above altogether, you can get an entirely customized transcription output that rivals near-human-level transcription.
- **Keyterm prompting**: Improve recognition of domain-specific terminology, rare words, and proper nouns
- **Prompting**: Guide transcription style, formatting, and output characteristics

Without any prompting or changes, the model out of the box outperforms all ASR models on the market on accuracy, especially as it pertains to entities and rare words.
### Prompting controls

- **Verbatim transcription and disfluencies**: Capture speech exactly as spoken, including disfluencies, filler words, and false starts
- **Output style and formatting**: Control punctuation, capitalization, number formatting
- **Context aware clues**: Help with jargon, names, and domain expectations
- **Entity accuracy and spelling**: Improve accuracy for proper nouns, brands, technical terms
- **Speaker attribution**: Mark speaker turns and add labels
- **Audio event tags**: Mark laughter, music, applause, background sounds
- **Code switching and multilingual**: Handle multilingual audio in same transcript
- **Numbers and measurements**: Control how numbers, percentages, and measurements are formatted
- **Difficult audio handling**: Guidance for unclear audio, overlapping speech, interruptions

The model out of the box outperforms all ASR models on the market on accuracy, especially as it pertains to entities and rare words. With prompting, you can get an entirely customized transcription output that rivals near-human-level transcription.

## Quick start

Expand Down
Loading