Final Project: A Deeper Look into spaCy using the US Declaration of Independence

Student Name: Alexandra Coffin

GitHub Repository: https://github.com/accoffin12/mining_Declaration_Independence

GitHub Profile: https://github.com/accoffin12

CSIS 44610: Web Mining & Applied Natural Language Processing

Objective:

Natural Language Processing (NLP) has evolved over the past decade as algorithms become more powerful. With their use becoming common in daily life, the concern becomes how powerful an NLP is. It has shown to be competent in terms of analyzing basic texts, tweets, and emails. The question is if it has the capability to analyze an older and more complex document. Language has changed since the Colonization of America, and one of the best documents, which still contains Colonial English that is easily accessible, is the Declaration of Independence.

In this project, I examine the ability of spaCy and NLTK to analyze the 248-year-old documents through multiple processes. With each process serving as the backbone, the results are compared through Sentiment Analysis, Polarity, and a Dependency Parse.

Installations & Environment Requirements:

This notebook requires 3.1x version of python as well as the following Library Installations. To run this notebook, create a separate environment using the following command. If running in VS Code and it asks to set it as workspace folder, select yes.

python -m venv .venv

Once created activate the environment using the following call:

.venv\Scripts\activate

After activated install the following Libraries to use the Notebook.

Pip install spacy
Python –m spacy download en_core_web_sm
Pip install beautifulsoup4
Pip install nltk

Load & Test Needed Modules
Importing the Document & Creating a Pickle
Read the Pickle & Print Test
Creating the Doc Object & Tokenizing
Stop Words part 1
Sentence Detection: 6.1: Number of Sentences in the Document 6.2: Locating the first word in Sentences 6.3: Index Position of Tokens
Attributes for Token Class
Sentiment Analysis & Word Frequency using Tokenization: 8.1: Stop Words Part 2: Updates 8.2: Polarity 8.3: Visualizations 8.3.1: The Dependency Parse 8.3.2: Graph for Distribution of Sentence Scores by Tokens
Lemmatization 9.1: Lemmatization Process 9.2: Word Frequency Using Lemmas 9.3: Speech Tagging Nouns, Verbs, Adjectives, and Interjection 9.4: Name Entity Analysis 9.5: Visualizations using Lemmas: 9.5.1: Dependency Parsing 9.5.2: Unique words 9.5.3: Polarity 9.5.4: Graph for Distribution of Sentence Scores by Lemmas
Conclusion

Methods:

To execute this project a Transcript of the Declaration of Independence is pulled from the National Archives Page and then pickled. Once picked the data is pulled into an environment and run through two separate types of analysis.

Tokenization
Lemmatization

Website: https://www.archives.gov/founding-docs/declaration-transcript

Resources:

This notebook was created using portions of the following tutorial: https://realpython.com/natural-language-processing-spacy-python/

This was used as a base to begin developing the process of analyzing the document as this is an older document and it required a more structured process.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
DeclarIndep.pkl		DeclarIndep.pkl
Installations.ipynb		Installations.ipynb
README.md		README.md
spaCy_dive1.html		spaCy_dive1.html
spaCy_dive1.ipynb		spaCy_dive1.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project: A Deeper Look into spaCy using the US Declaration of Independence

Student Name: Alexandra Coffin

GitHub Repository: https://github.com/accoffin12/mining_Declaration_Independence

GitHub Profile: https://github.com/accoffin12

CSIS 44610: Web Mining & Applied Natural Language Processing

Objective:

Installations & Environment Requirements:

Contents:

Methods:

Website: https://www.archives.gov/founding-docs/declaration-transcript

Resources:

About

Uh oh!

Releases

Packages

Languages

accoffin12/mining_Declaration_Independence

Folders and files

Latest commit

History

Repository files navigation

Final Project: A Deeper Look into spaCy using the US Declaration of Independence

Student Name: Alexandra Coffin

GitHub Repository: https://github.com/accoffin12/mining_Declaration_Independence

GitHub Profile: https://github.com/accoffin12

CSIS 44610: Web Mining & Applied Natural Language Processing

Objective:

Installations & Environment Requirements:

Contents:

Methods:

Website: https://www.archives.gov/founding-docs/declaration-transcript

Resources:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages