This repository contains a ParTAGe-compliant, PyTorch-based implementation of a TAG/TWG supertagger.
Table of Contents
The tool requires Python 3.8+. If you use conda, you can set up an appropriate
environment using the following commands (substituting <env-name> for the
name of the environment):
conda create --name <env-name> python=3.8
conda activate <env-name>Then, to install the tool (together with its dependencies), run:
pip install .Finally, install disco-dop from its github
repository.
The tool supports the same format as partage.
The model (embedding size, BiLSTM depth, etc.) and training (number of epochs,
learning rates, etc.) configuration is currently hard-coded in
supertagger/config.py. It can be replaced during
training by providing appropriate .json configuration files.
To train a supertagging model, you will need:
fastText.bin: binary fastText model (important: the embedding size of the model must be specified in the configuration)train.supertags: training dataset (see data format)dev.supertags(optional): development dataset
Then, to train a model and save it in model.pth:
python -m supertagger train -f fastText.bin -t train.supertags -d dev.supertags --save model.pthSee python -m supertagger train --help for additional training options.
To use an existing model to supertag a given input.supertags file:
python -m supertagger tag -f fastText.bin -i input.supertagsAdd a command to remove supertagging information from a given supertagging file, e.g.:
python -m supertagger blind -i input.supertags > input.blind.supertags