Bible N-Gram Analysis

This project is intended as a collection of various methods to collect an optimized seed corpus.

ScriptureReference.py

This class provides the Bible verses for any other script to acquire verses as a list within a given range. The verses are outputted as a 2D list of [verse_reference, verse_text] pairs.
The class constructor also has an argument for translation (from ebible corpus). Capability for different versifications upcoming (currently not standard).

n_gram_approach.py

This script performs the n-gram analysis. Generally, it finds the verses containing n-grams which are most frequently distributed throughout the corpus. Verses are in order of how frequent and undiscovered its n-grams are. All n-gram frequency scores added to and removed from verses are normalized for verse length.

The resulting seed corpus should contain verses with n-grams which are very common in the Bible, and the seed corpus verses should also be very different from one another.
The script also provides a visualization of the scores and time taken for each seed.

There are six variables that you can adjust in this script:

j: This is the n-gram order. N-grams up to n=J are scored by frequency count.
seed_size: This is the size of the seed corpus. It determines the number of verses that will be included in the seed corpus.
start_verse: This is the starting verse for the range of verses to be analyzed.
end_verse: This is the ending verse for the range of verses to be analyzed.
ebible_filename = filename from ebible corpus (currently only handles romanized text)
versification: default 'eng'. Possible values are based on files found here: https://github.com/BibleNLP/ebible/tree/main/metadata

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
media		media
.gitignore		.gitignore
4-gram_seed_size_1000_verse_range_gen 1_1-rev 22_21_06-10-2024_16-41-18.txt		4-gram_seed_size_1000_verse_range_gen 1_1-rev 22_21_06-10-2024_16-41-18.txt
ScriptureReference.py		ScriptureReference.py
n_gram_approach.py		n_gram_approach.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bible N-Gram Analysis

ScriptureReference.py

n_gram_approach.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bible N-Gram Analysis

ScriptureReference.py

n_gram_approach.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages