Right now, this repository simply contains a script for aligning the Riverside and Oxford editions of Chaucer's Canterbury Tales, along with grammatical information from the former, and additional metrical annotations collected by Chris Cannon and Tom Lippincott in 2021.
The only dependencies, other than the data itself, are a recent version of Python, and the pandoc command-line tool.
The data can be downloaded here.
If you have cloned or otherwise downloaded this repository into the directory chaucer, and have downloaded the data as /some/file.zip, you can run:
cd chaucer/
unzip /some/file.zip
to unpack the data. You can run the following to set up and enter a suitable Python virtual environment:
python3 -m venv local
source local/bin/activate
make sure you have pandoc installed by running:
pandoc -h
and, finally, you can run the script:
python scripts/prepare_chaucer.py --output final_data.json