-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Thank you very much for your excellent work with this tool.
I am currently trying to use it in ESPnet to generate the alignments for training a model with durations similar to the MFA tool.
However, I am facing some minor issues.
I understand the tool is still under construction, but a little guide will help me continue with my implementation.
- you need to add a
__init__.pyat thealqlignfolder so you execute the commandalqalign.run. - In
run.py,what is step 1 supposed to do? Transcribe the audio into word/text or just only into phonemes; should it have a similar behavior as step 2? - In the case of using step 1, text is no longer required, right? So the argument text becomes unneeded? Is it also necessary to move the text processing after step 1 () >>>
Lines 70 to 92 in 10081d5
if text_file.is_dir(): for text_path in sorted(text_file.glob('*')): utt_id = text_path.stem if utt_id in utt2audio: audio_files.append(utt2audio[utt_id]) text_files.append(text_path) output_dirs.append(output_dir / utt_id) utt_ids.append(utt_id) else: for i, line in enumerate(open(text_file, 'r')): if text_format == 'kaldi': fields = line.strip().split() utt_id = fields[0] sent = ' '.join(fields[1:]) else: utt_id = str(i) sent = line if utt_id in utt2audio: audio_files.append(utt2audio[utt_id]) text_files.append(sent) output_dirs.append(output_dir / utt_id) utt_ids.append(utt_id) Line 107 in 10081d5
- when using a scp file, do you need to use a text format to load the file? , Is it not possible to just use kaldiio.load_scp to load the file?
Lines 60 to 62 in 10081d5
for line in open(audio_file): utt_id, ark_key = line.strip().split() utt2audio[utt_id] = ark_key
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels