-
Notifications
You must be signed in to change notification settings - Fork 2
To save parsed output #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Removed extra whitespaces found in dictionaray input file
- removed an irrelevant line from the input file
- ran the new code and generated output
- output is in data/dict.txt
- entries not parsed by grammar are in error.log
- Changes in the Grammar
- an entry is valid even if there is a period at the end of the line
- a pos can be terminated with either a fullstop or a comma
- comma is a typo
- glosses can be terminated with fullcolon
- full colons are typo
- attempted to support phrase entries (multiple words in headword)
- failed, hence commented out the code
- bailey now generate an sfm output (in MDF) of the input text
- better handling of input file
- in case of a mal constructructed line in the input text,
- bailey will copy the line to error.log
- continue to parse the next line
- the parsed output will be stored in dict.txt file
This has been recreated from the MDF documentation.
| entry = hash headword comma pos ws senses subentry period emptyline | ||
| # entry = hash headphrase comma pos ws senses subentry period emptyline | ||
| hash = (~"#")* | ||
| # headphrase = headword (ws headword)* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this to capture the head words with multiple words. Most of the exception are due to this.
However I couldn't get this to work.
| sense = (ml ws ml)* ml | ||
| ml = ~"[\u0d00-\u0d7f]*" | ||
| semicolon = ~";" | ||
| semicolon = ~"[;:]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in many places the keyboardists made typos where they put a : in the place of ;. Since we are not preserving the data, I thought of bypassing them.
I've made the following changes
Please review the code before merge. The rest of the files/data may be merged directly