-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi @YoumiMa, I am using the following codes to verify if entity mentions are present in text
nsplits = ['train','dev','test']
for nsplit in nsplits:
root_path = "JacRED/{}.json".format(nsplit)
data = get_json(root_path) # read json file
list_samples = []
for raw_sample in data:
entities = raw_sample['vertexSet']
rel_name_dict = get_json("JacRED/meta/rel_info.json")
sample_text = "".join(["".join(x) for x in raw_sample['sents']])
ent2str = dict()
for ient,entity in enumerate(entities):
list_name = [x['name'] for x in entity]
set_name = list(set(list_name))
ent2str[ient]=set_name
rel_list = []
for rel_ in raw_sample['labels']:
h_list,t_list,r,_ = rel_['h'],rel_['t'],rel_['r'],rel_['evidence']
h_list = ent2str[h_list]
t_list = ent2str[t_list]
r_str = rel_name_dict[r]
for h in h_list:
for t in t_list:
assert h in sample_text and t in sample_text # verify de-tokenization
rel_json = {"head":h,"tail":t,"relation":r_str}
rel_list.append(rel_json)
sample = {"text":sample_text,"relation":rel_list}
Turns out this code fails for 478/104/88 samples on the train/dev/test split. Could you help me check ?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels