Skip to content

reproducibility for slot filling task #8

@libing125

Description

@libing125

in

slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))

Slot BIO labels are stored in a python set, then saved into a python list using a for loop.
But set is unordered. In my experiment, vocab.txt is different in two runs. So I changed the code to

slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
slots = sorted(list(slots))
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))

and get the same vocab.txt for every run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions