The Typability Index

This repo contains the R project for creating and testing The Typability Index, including scripts and generalisation data. This project is not required to use Web App. The accompanying paper is currently under review as of December 2024 - view the Pre-print.

Datasets

Training and Testing Dataset: The 136M Keystrokes Dataset (Dhakal et al., 2018) must be downloaded manually (instructions below).
Generalisation Dataset: Provided within this repository.

License

This GitHub repository is licensed under CC BY-NC-SA 4.0 to comply with the stipulation of the 136M Keystrokes Dataset (Dhakal et al., 2018) of non-commercial research purposes only.

Notes on the Scripts

Format: Scripts are written in R Markdown but are not intended to be knitted. They are designed for execution chunk-by-chunk.
Outputs: Pre-generated outputs are included in the repository. Running the scripts will overwrite these files.
Recommendation: Set up the data following the three steps below, then skip the first script (scripts/0-handle-timestamp-errors.Rmd) . Its resultant amended datafiles are already provided, which otherwise take a long time to recreate.

Ignored Files / Setting Up Correctly

Certain files are intentionally excluded from GitHub due to size or licensing constraints. To fully use this repository, you need to manually download a separate dataset.

1. 136M Keystrokes Dataset

Ignored Files:
data/dhakal_files_with_replacements_for_both_issues/*_keystrokes.txt

The Typability Index relies on the 136M Keystrokes Dataset (Dhakal et al., 2018), available at https://userinterfaces.aalto.fi/136Mkeystrokes/.
We do not redistribute this original dataset here. You must manually download and amend it as described below.

Minor corrections (<6% of files) were applied to the dataset for parsing and timestamp errors (details in Appendix 2 of our associated paper).
This repository includes only the amended files (67 manually fixed + 10,098 programmatically fixed). Rights remain with Dhakal et al. (2018); usage must comply with their original licence.

Steps to Configure the Dataset

Download: Obtain the original dataset from the website above.
Organise: Place the participant keystroke .txt files directly in data/dhakal_files_with_replacements_for_both_issues/ (no intermediate subfolders).
Apply Amendments:
- Copy files from data/fixed_parsing_issues/ and paste into data/dhakal_files_with_replacements_for_both_issues/ to overwrite 67 files with parsing fixes.
- Copy files from data/fixed_timestamp_error_codes/ into the same directory to overwrite 10,098 files with timestamp fixes. (Alternatively, you can regenerate these amended files by running scripts/0-handle-timestamp-errors.Rmd.)

2. Stitched Participant Speed File

Ignored File:
output/processed_data/ptps_sents_inputs_wpms/full_ptps_sents_inputs_wpms.txt

This file contains every participant's speed for each of their 15 sentences, and is created by concatenating 34 other txt files (each containing 5000 participants, except the final file). Reason for Ignoring: File size exceeds GitHub's 100MB limit.

Recreating This File

You don’t need to take any additional steps to recreate this file. It will be generated automatically when running the readBatched chunk in scripts/1-calculate-typability-scores.Rmd.

Additional Notes

All rights to the amended data remain with the original authors (Dhakal et al., 2018). See their website for licence details.
If you encounter issues or have questions, please raise them via GitHub Issues.

Thank you for your interest in The Typability Index!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
SUPPLEMENTARY-MATERIALS		SUPPLEMENTARY-MATERIALS
data		data
output		output
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Typability-Index.Rproj		Typability-Index.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Typability Index

Datasets

License

Notes on the Scripts

Ignored Files / Setting Up Correctly

1. 136M Keystrokes Dataset

Steps to Configure the Dataset

2. Stitched Participant Speed File

Recreating This File

Additional Notes

About

Uh oh!

License

EA-Williams/The-Typability-Index

Folders and files

Latest commit

History

Repository files navigation

The Typability Index

Datasets

License

Notes on the Scripts

Ignored Files / Setting Up Correctly

1. 136M Keystrokes Dataset

Steps to Configure the Dataset

2. Stitched Participant Speed File

Recreating This File

Additional Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks