-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Not sure if this is the best place to post, but I'm new to GitHub so please let me know.
I'm interested in helping with replacing MeCab with another parser, particularly out of frustration with 1. homophonic grammar structures marked as 'known' actually have more than one, often different semantic uses, and 2. disregard to collocations, colloquialisms, and figures of speech and instead breaking them up... both of which in my experience have brought cards to i+2 or greater. It seems that what would be needed to solve this is beyond the scope of general tokenizers / morphological analyzers. Morphemizers like MeCab or even Sudachi seem to tokenize a sentence into “morphemes", but my expected results are actually 文節 (clauses)... the only software I can find that does that is J.depP https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/, and I'm unsure how this would be implemented.
I would also like to revamp the system involving comprehension cards, as often the morph indicated as the target morph of that sentence is not actually the morph that is unknown to the user. Ideally I would like to see implementation that asks for user input to redefine the target, or rather unfamiliar/unknown morph in a sentence when the parsing dictionary gets it incorrect. It is unclear at this time to whether or not improving the parser would even facilitate the need for this implementation, but as of right now, I think that could be a potential band-aid.
I would love to help out the development of this, but am a little unsure on where to start. Please let me know if I can do anything.
Thank you.