Few questions

Thank you very much for your excellent work with this tool.

I am currently trying to use it in ESPnet to generate the alignments for training a model with durations similar to the MFA tool.
However, I am facing some minor issues.

I understand the tool is still under construction, but a little guide will help me continue with my implementation.

- you need to add a `__init__.py` at the `alqlign` folder so you execute the command `alqalign.run`.
- In `run.py,` what is step 1 supposed to do? Transcribe the audio into word/text or just only into phonemes; should it have a similar behavior as step 2? 
- In the case of using step 1, text is no longer required, right? So the argument text becomes unneeded? Is it also necessary to move the text processing after step 1 (https://github.com/xinjli/alqalign/blob/10081d56d6e29c3739a3b01be275282d26bb5310/alqalign/run.py#L70-L92) >>> https://github.com/xinjli/alqalign/blob/10081d56d6e29c3739a3b01be275282d26bb5310/alqalign/run.py#L107
- when using a scp file, do you need to use a text format to load the file? https://github.com/xinjli/alqalign/blob/10081d56d6e29c3739a3b01be275282d26bb5310/alqalign/run.py#L60-L62, Is it not possible to just use kaldiio.load_scp to load the file?


	if text_file.is_dir():
	for text_path in sorted(text_file.glob('*')):
	utt_id = text_path.stem
	if utt_id in utt2audio:
	audio_files.append(utt2audio[utt_id])
	text_files.append(text_path)
	output_dirs.append(output_dir / utt_id)
	utt_ids.append(utt_id)
	else:
	for i, line in enumerate(open(text_file, 'r')):
	if text_format == 'kaldi':
	fields = line.strip().split()
	utt_id = fields[0]
	sent = ' '.join(fields[1:])
	else:
	utt_id = str(i)
	sent = line

	if utt_id in utt2audio:
	audio_files.append(utt2audio[utt_id])
	text_files.append(sent)
	output_dirs.append(output_dir / utt_id)
	utt_ids.append(utt_id)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Few questions #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	for line in open(audio_file):
	utt_id, ark_key = line.strip().split()
	utt2audio[utt_id] = ark_key

Few questions #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions