shakespeare-contabulate

Search Shakespeare's works by token, phrase, and regex across plays, acts/scenes, characters, and lines.

This repo contains:

A Python build pipeline (build.py) that reads Folger TEI XML in tei/ and emits JSON indexes.
A static web UI in docs/ that loads those generated JSON files.
Python and Playwright tests for data and UI behavior.

Requirements

Python 3.8+
Node.js + npm (for Playwright tests)
Optional: lxml (listed in requirements.txt, but build.py currently uses xml.etree.ElementTree)

Build Data

Generate/re-generate data files:

python3 build.py

Current build.py behavior is fixed to:

Input: tei/*.xml
Output root: docs/
Data output: docs/data/*.json
Line output: docs/lines/*.json

Generated files include:

docs/data/plays.json
docs/data/chunks.json
docs/data/characters.json
docs/data/tokens.json
docs/data/tokens2.json
docs/data/tokens3.json
docs/data/tokens_char.json
docs/data/tokens_char2.json
docs/data/tokens_char3.json
docs/lines/all_lines.json
docs/lines/<scene_id>.json (one file per scene)

The build also reads optional metadata from:

play_metadata.json
character_metadata.json

Run Locally

Serve the static site from docs/:

python3 -m http.server 8766 -d docs

Then open http://localhost:8766.

Tests

Run Python tests:

python3 -m unittest test_parse_play.py tests.test_build_output -v

or

python3 -m pytest tests test_parse_play.py -v

Run Playwright UI tests:

npm install
npx playwright install
npx playwright test

The Playwright config auto-starts a local server on port 8766 from docs/.

Notes

Generated JSON files under docs/data/ and docs/lines/ are committed in this repository.
If you want to build a subset corpus, keep only the desired TEI files in tei/ and run python3 build.py.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.vscode		.vscode
docs		docs
tei		tei
tests		tests
.gitignore		.gitignore
README.md		README.md
README.txt		README.txt
add_abbreviations.py		add_abbreviations.py
build.py		build.py
character_metadata.generated.json		character_metadata.generated.json
character_metadata.json		character_metadata.json
package-lock.json		package-lock.json
package.json		package.json
play_metadata.json		play_metadata.json
playwright.config.js		playwright.config.js
requirements.txt		requirements.txt
test_parse_play.py		test_parse_play.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

shakespeare-contabulate

Requirements

Build Data

Run Locally

Tests

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

shakespeare-contabulate

Requirements

Build Data

Run Locally

Tests

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages