-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Hi!
Came across a bug where MER fails to identify entities if the entity is right next to a punctuation mark.
For example, in the following picture, Calcimycin is identified in the sentences "I like Calcimycin" and "I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", but not in "I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase" which has a comma right after the word Calcimycin.

The code that I used to create the lexicon (mesh_lex) is the following:
import merpy
with open('MeSH_name_id_mapping.txt', encoding='utf-8') as finput_terms:
l_terms = finput_terms.readlines()
dict_terms = {}
for i in l_terms:
aux = i.split('=')
dict_terms[aux[0].strip()] = aux[1].replace('\n','')
with open('mesh_terms_synonyms.txt', encoding='utf-8') as finput_terms:
l_terms_syn = finput_terms.readlines()
dict_terms_synonyms = {}
for i in l_terms_syn:
aux = i.split('\t')
dict_terms_synonyms[aux[0]] = aux[1].replace('\n','')
conv_dict = {}
for key, values in dict_terms_synonyms.items():
l_synonyms = values.split(',')
if key not in l_synonyms:
l_synonyms.append(key)
for i in l_synonyms:
conv_dict[i.strip()] = dict_terms.get(key)
merpy.create_lexicon(conv_dict.keys(), "mesh_lex")
merpy.create_mappings(conv_dict, "mesh_lex")
merpy.show_lexicons()
merpy.process_lexicon("mesh_lex")
#Examples
print(merpy.get_entities("I like abdominal injuries", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))
Here are the files with the entities:
MeSH_name_id_mapping.txt
mesh_terms_synonyms.txt
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels