These scripts are sufficient to convert the distributed forms of dictionaries into forms useful for our tools (notably HTK and ISS). Once a dictionary is in a standard form, the generic tools in ISS can be used to manipulate it further.
Instructions:
- Run
./CreateLinks.sh- This will link the media directories in each directory
Then for each dictionary that you need, cd <directory> and:
-
Run
./CreateDicts.sh- Converts the native dictionary format to something more standard
-
Run
./CreatePSaurus.sh- This will run phonitisaurus to generate an FST for the dictionary
Note that phonetisaurus FST creation (rather, the alignment stage) can use up a lot of memory, so it may be necessary to run it in the grid.
Phil Garner, March 2013