Wraps the NVIDIA NeMo Forced Aligner tool to temporally align transcribed text with its audio source.
General user instructions for CLAMS apps are available at CLAMS Apps documentation.
This app requires Python 3.10.12 or higher. For local installation of required Python modules, see requirements.txt.
For the full list of parameters, please refer to the app metadata from the CLAMS App Directory or the metadata.py file in this repository.
This app accepts an empty MMIF file with the file locations of the required AudioDocument/VideoDocument
and TextDocument sources.
Example input:
clams source audio:/path/to/source text:/path/to/text
The app outputs a Token annotation corresponding to each
whitespace-delimited token in the source transcript, as well as a TimeFrame
annotation identifying the audio segment where it appears and an Alignment
annotation linking the Token and TimeFrame.
For more details, see the output section of the app metadata.