diff --git a/fern/pages/01-getting-started/universal-3-pro.mdx b/fern/pages/01-getting-started/universal-3-pro.mdx index bf0c7ab3..2c4ed6ff 100644 --- a/fern/pages/01-getting-started/universal-3-pro.mdx +++ b/fern/pages/01-getting-started/universal-3-pro.mdx @@ -9,17 +9,26 @@ import { AudioPlayer } from "../../assets/components/AudioPlayer"; ## Overview -Universal-3-Pro is our most powerful Voice AI model yet, designed to capture the "hard stuff" that traditional ASR models struggle with, namely: +Universal-3-Pro is our most powerful Voice AI model yet, designed to capture the "hard stuff" that traditional ASR models struggle with. -- **Prompting** - control the style and context of transcription -- **Keyterms prompting** - boost accuracy of known rare words in transcription -- **Built-in code switching** - native switching between languages and context -- **Verbatim transcription** - control elements like disfluencies, stutters, false starts, colloquialisms, and more -- **Audio tags for non-speech** - add markers for non-speech events in the audio file +### Key Universal-3-Pro capabilities -Using the above altogether, you can get an entirely customized transcription output that rivals near-human-level transcription. +- **Keyterm prompting**: Improve recognition of domain-specific terminology, rare words, and proper nouns +- **Prompting**: Guide transcription style, formatting, and output characteristics -Without any prompting or changes, the model out of the box outperforms all ASR models on the market on accuracy, especially as it pertains to entities and rare words. +### Prompting controls + +- **Verbatim transcription and disfluencies**: Capture speech exactly as spoken, including disfluencies, filler words, and false starts +- **Output style and formatting**: Control punctuation, capitalization, number formatting +- **Context aware clues**: Help with jargon, names, and domain expectations +- **Entity accuracy and spelling**: Improve accuracy for proper nouns, brands, technical terms +- **Speaker attribution**: Mark speaker turns and add labels +- **Audio event tags**: Mark laughter, music, applause, background sounds +- **Code switching and multilingual**: Handle multilingual audio in same transcript +- **Numbers and measurements**: Control how numbers, percentages, and measurements are formatted +- **Difficult audio handling**: Guidance for unclear audio, overlapping speech, interruptions + +The model out of the box outperforms all ASR models on the market on accuracy, especially as it pertains to entities and rare words. With prompting, you can get an entirely customized transcription output that rivals near-human-level transcription. ## Quick start