-
Notifications
You must be signed in to change notification settings - Fork 5
Names can refer to сlasses of humans/locations/organizations #4
Description
Thank you for publishing the preprocessing pipeline! Successfully launched it on linux. It is convenient that the preprocessed entries are dumped during the process so that you can monitor it in progress.
After looking at some resulting entries in the combined/ folder, I see that the terms "PERSON", "ORGANIZATION" and "LOCATION" are more loose than I expected it to be (my misunderstanding here).
Besides taking instances of "Q5"(persons), "Q82794" (locations) and "Q43229" (organizations), the paranames/io/wikidata_dump_transliterations.py is taking instances of their subclasses as well. This leads to "Hispanic and Latino-American teenage boys" and "Government secretaries of Policies for Women of the State of Bahia" be classified as a person along with ["Samuel Hamington"[(https://www.wikidata.org/wiki/Q111165240), while I expected only "Samuel Hamington" to be included.

Maybe it would be good to have data samples in the git repo or the original paper =)