Skip to content

Commit 8a61687

Browse files
authored
Update README.md
1 parent 21b1084 commit 8a61687

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,17 @@ Each utterance has been first transcribed by an open-source ASR. The transcripti
1616

1717
For each human transcriber, a transcription pipeline is built by the transcription system. For the quality control purposes, 5% of the utterances were taken from an existing spoken corpus (Mozilla Common Voice)
1818

19-
Each utterance has been transcribed by two human transcribers. In the case where the relative WER of transcriptions was over 5%, the third transcriber resolved the conflict.
19+
Each utterance has been transcribed by two human transcribers. In the case where the relative WER of transcriptions was over 5%, the third transcriber resolved the conflict.
20+
21+
# Normalized Alphabets
22+
The alphabets have been normalized as per the table below:
23+
Language | Alphabet
24+
---------|----------
25+
French | azertyuiopqsdfghjklmùwxcvbné'èçàêôâûœ
26+
Spanish | abcdefghijklmnñopqrstuvwxyzáéíóúüé
27+
Arabic | أنت سيرإلىمحةاقثعهذفبئضودجصكخشزطءغظآؤ
28+
Turkish | abcçdefgğhıijklmnoöprsştuüvyz
29+
2030

2131
# License and copyright
2232
The MediaSpeech dataset is distributed under the Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video.

0 commit comments

Comments
 (0)