Skip to content

kawu/partage-sup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TAG/TWG Supertagger

This repository contains a ParTAGe-compliant, PyTorch-based implementation of a TAG/TWG supertagger.

Table of Contents

Installation

The tool requires Python 3.8+. If you use conda, you can set up an appropriate environment using the following commands (substituting <env-name> for the name of the environment):

conda create --name <env-name> python=3.8
conda activate <env-name>

Then, to install the tool (together with its dependencies), run:

pip install .

Finally, install disco-dop from its github repository.

Usage

Data format

The tool supports the same format as partage.

Configuration

The model (embedding size, BiLSTM depth, etc.) and training (number of epochs, learning rates, etc.) configuration is currently hard-coded in supertagger/config.py. It can be replaced during training by providing appropriate .json configuration files.

Training

To train a supertagging model, you will need:

  • fastText.bin: binary fastText model (important: the embedding size of the model must be specified in the configuration)
  • train.supertags: training dataset (see data format)
  • dev.supertags (optional): development dataset

Then, to train a model and save it in model.pth:

python -m supertagger train -f fastText.bin -t train.supertags -d dev.supertags --save model.pth

See python -m supertagger train --help for additional training options.

Tagging

To use an existing model to supertag a given input.supertags file:

python -m supertagger tag -f fastText.bin -i input.supertags

TODO: Blind

Add a command to remove supertagging information from a given supertagging file, e.g.:

python -m supertagger blind -i input.supertags > input.blind.supertags

About

PyTorch-based TAG/TWG supertagger

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages