in
|
slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])]) |
|
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]] |
|
json.dump(vocab, open(dataset + "vocab.txt", "w+")) |
Slot BIO labels are stored in a python set, then saved into a python list using a for loop.
But set is unordered. In my experiment, vocab.txt is different in two runs. So I changed the code to
slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
slots = sorted(list(slots))
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))
and get the same vocab.txt for every run.
in
dialoglue/data_utils/process_slot.py
Lines 47 to 49 in 42737da
Slot BIO labels are stored in a python
set, then saved into a pythonlistusing a for loop.But
setis unordered. In my experiment,vocab.txtis different in two runs. So I changed the code toslots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])slots = sorted(list(slots))vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]json.dump(vocab, open(dataset + "vocab.txt", "w+"))and get the same
vocab.txtfor every run.