Assignment-2

Repository for the Assignment 2 of course CS613, Natural Language Processing on Language Modeling and Smoothing.

About

In this Assignment, we had to train n-gram models from unigram to quadgram on the dataset provided. We had to implemenent different smoothing techniques and compare the preplexities of the models. Details of each task can be found here - documentation

File Structure

The repository contains the following folders. Their descriptions are as follows:

average_preplexity: average_preplexity: This folder contains csv files that contains all the average perplexities over all the models for both train and test dataset.
dataset: This folder contains csv files that contain all the data. This includes the raw data, processed data, and train and test data subsets.
perplexities: This folder contains subfolders for all the different smoothing techniques. Each subfolder (i.e., for each smoothing technique), contains csv files that contain the perplexities for each model after smoothing.
Plots: This folder contains image files of plots for trends of different smoothing techniques over the different models.

The repository contains the following python files. Their descriptions are as follows:

NGramProcessor.py: This file contains the class for the n-gram model. This includes all the methods to calculate the perplexities for the model as well.
ngram_train.py: This file contains code that creates, trains, and gets the perplexities of the n-gram model.
plot_saver.py: This file contains functions to plot and save the average perplexities.
preprocessing.py: This file contains a function to preprocess the data and save it.

Usage

Usage can be found in point 5 of the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Plots		Plots
average_perplexity		average_perplexity
dataset		dataset
perplexities		perplexities
.DS_Store		.DS_Store
.gitignore		.gitignore
NGramProcessor.py		NGramProcessor.py
README.md		README.md
Tasks_and_Results.pdf		Tasks_and_Results.pdf
ngrams_train.py		ngrams_train.py
plot_saver.py		plot_saver.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment-2

Table of Contents

About

File Structure

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

CS613-NLP/assignment-2

Folders and files

Latest commit

History

Repository files navigation

Assignment-2

Table of Contents

About

File Structure

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages