Skip to content

awwester/assemblyai-code-switching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AssemblyAI Code Switching Demo

This is a minimal reproducible example demonstrating an issue with AssemblyAI's code switching feature for English/French bilingual audio.

Issue Description

When transcribing Canadian Parliament floor audio (which contains both English and French), the transcription appears to only return English text, with French portions being omitted or removed.

Setup

  1. Add your AssemblyAI API key to the .env file:
ASSEMBLYAI_API_KEY='your-api-key-here'
  1. Install dependencies with pipenv:
pipenv install
  1. Activate the pipenv environment:
pipenv shell
  1. Run the demo:
python assemblyai_code_switching_demo.py

Expected Behavior

The audio file contains mixed English and French speech from Canadian Parliament proceedings. With code switching enabled, we expect:

  • Both English and French utterances to be transcribed
  • language_code attribute on utterances indicating "en" or "fr"
  • Full transcript containing both languages

Actual Behavior

Only French text appears in the transcript, with English portions missing or removed.

Audio Sample

URL: https://twocapitals.ca/assets/floor-audio-test.m4a

This is floor audio from Canadian Parliament, which naturally contains both English and French as both are official languages.

Environment

  • AssemblyAI Python SDK: 0.45.4
  • Python: 3.11+
  • Audio format: M4A
  • Languages: English (en) + French (fr)

Notes

According to AssemblyAI documentation, code switching can be enabled in two ways:

  1. Manual: Set language_codes to a list of two language codes (one must be 'en'), e.g., ["en", "fr"]
  2. Automatic: Enable language_detection=True and set code_switching=True within language_detection_options

However, the English-French language pair is not listed as one of the "optimal" pairs (only English-Spanish and English-German are listed as optimal). For other language combinations, the documentation notes that optimal results typically require the non-English language to be dominant in the audio.

Reference: https://www.assemblyai.com/docs/pre-recorded-audio/code-switching

About

Demonstration of issue with AssemblyAI code switching

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages