This directory contains a Python script (songSplitter.py) to automatically transcribe an audio file, split it into individual word segments, and generate a JSON mapping file. This is useful for creating datasets for projects that need audio corresponding to specific words.
This one was used as a part of April Fools' day where Twitch chat could play song, word by word with typing their messages.
- Python 3.x
- OpenAI Whisper: For audio transcription.
- pydub: For audio manipulation (splitting).
- ffmpeg: Required by
pydubfor handling various audio formats (like MP3). Ensureffmpegis installed and accessible in your system's PATH.
You can install the Python libraries using pip:
pip install -U openai-whisper pydubThe songSplitter.py script takes the path to an audio file as input and performs the transcription and splitting process.
python extractor/songSplitter.py <audio_file_path> [options]<audio_file_path>: (Required) Path to the input audio file (e.g.,rickroll.mp3).-o,--output_dir: Directory to save the segmented audio files (defaults tooutput).-j,--json_path: Path to save the output JSON mapping file (defaults tosplitsong.json).-m,--model: Whisper model name to use for transcription (e.g.,tiny,base,small,medium,large). Defaults tomedium. Larger models are more accurate but require more resources (VRAM/RAM) and time.
Let's say you have the Rick Roll song saved as rickroll.mp3 in the extractor directory parent directory. To process it using the base model and save the results in a directory named rickroll_words:
python extractor/songSplitter.py ../rickroll.mp3 -m base -o rickroll_words -j rickroll_map.jsonThe script will generate:
- Segmented Audio Files: Inside the specified output directory (
outputor--output_dir), you will find numerous small MP3 files (e.g.,000.mp3,001.mp3,002.mp3, ...), each corresponding to a word detected in the original audio. - JSON Mapping File: A JSON file (
splitsong.jsonor--json_path) containing a list of objects, where each object maps a detected word (lowercase) to its corresponding audio segment file path.
Example splitsong.json structure:
[
{
"word": "we're",
"sound": "output/000.mp3"
},
{
"word": "no",
"sound": "output/001.mp3"
},
{
"word": "strangers",
"sound": "output/002.mp3"
},
{
"word": "to",
"sound": "output/003.mp3"
},
// ... more words
]