Skip to content

MER failling to identify entities in text #3

@Cobollero

Description

@Cobollero

Hi!

Came across a bug where MER fails to identify entities if the entity is right next to a punctuation mark.

For example, in the following picture, Calcimycin is identified in the sentences "I like Calcimycin" and "I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", but not in "I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase" which has a comma right after the word Calcimycin.
Sem Título

The code that I used to create the lexicon (mesh_lex) is the following:

import merpy

with open('MeSH_name_id_mapping.txt', encoding='utf-8') as finput_terms:
    l_terms = finput_terms.readlines()

dict_terms = {}    
for i in l_terms: 
    aux = i.split('=')
    dict_terms[aux[0].strip()] = aux[1].replace('\n','')

with open('mesh_terms_synonyms.txt', encoding='utf-8') as finput_terms:
    l_terms_syn = finput_terms.readlines()

dict_terms_synonyms = {}    
for i in l_terms_syn: 
    aux = i.split('\t')
    dict_terms_synonyms[aux[0]] = aux[1].replace('\n','')

conv_dict = {}
for key, values in dict_terms_synonyms.items():
    l_synonyms = values.split(',')
    if key not in l_synonyms:
        l_synonyms.append(key)

    for i in l_synonyms:
        conv_dict[i.strip()] = dict_terms.get(key)

merpy.create_lexicon(conv_dict.keys(), "mesh_lex")
merpy.create_mappings(conv_dict, "mesh_lex")
merpy.show_lexicons()
merpy.process_lexicon("mesh_lex")

#Examples
print(merpy.get_entities("I like abdominal injuries", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))

Here are the files with the entities:
MeSH_name_id_mapping.txt
mesh_terms_synonyms.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions