Skip to content

Okja88/NLP-Deep-Learning-Projects

Repository files navigation

Deep Learning in Natural Language Processing Portfolio This repository contains two major NLP projects developed for the Specialist Diploma in Applied Generative AI (SDGAI). It explores sentiment classification using scraped mobile app data and creative text generation using Recurrent Neural Networks (RNNs).

📝 Project Overview: This repository contains two core projects exploring the capabilities of Recurrent Neural Networks (RNNs) and advanced text preprocessing:

Problem 1: Sentiment Analysis of App Reviews An end-to-end pipeline that scrapes real-world data from the Google Play Store using google-play-scraper. It utilizes a TensorFlow-based Deep Learning model to classify player sentiment into 5 categories, featuring a custom preprocessing engine that handles emojis, contractions, and game-specific "entity locking" (e.g., preserving terms like "remote_raid").

Problem 2: Character-Level LSTM Text Generator A generative model built using LSTM layers and Batch Normalization. The model was trained on the 1.1 million characters of Harry Potter and the Goblet of Fire to learn stylistic nuances, vocabulary, and dialogue structures, eventually generating original "fan-fiction" style text from a provided seed.

📁 Project Structure Part 1: Sentiment Analysis Model

Objective: Scrape, preprocess, and classify user reviews to identify player sentiment.

Dataset Source: Live user reviews scraped from the Google Play Store (Pokémon GO).

Part 2: Character-Level Text Generator

Objective: Build an LSTM model to generate original "fan-fiction" text based on a literary corpus.

Dataset Source: The full text of Harry Potter and the Goblet of Fire (Harry_Potter_Book.txt).

. ├── Assignment_p1_NathanOngKeeWee.ipynb # Sentiment Analysis (Pokémon GO Reviews) ├── Assignment_p2_NathanOngKeeWee.ipynb # Character-Level LSTM Text Generator ├── requirements.txt ├── README.md └── data/ # (Note: Place datasets here locally) ├── Harry_Potter_Book.txt └── *.pkl # Processed data files

🛠️ Setup & Requirements

  1. Environment Setup It is recommended to use Python 3.11+. Install the necessary libraries using the following command:

Bash pip install tensorflow pandas numpy scikit-learn matplotlib seaborn google-play-scraper emoji contractions langdetect

  1. Dataset Preparation For Part 1: The script automatically scrapes reviews from the Google Play Store using the google-play-scraper library. No manual download is required, though the processed data is saved as .pkl files for consistency.

For Part 2: Ensure the file Harry_Potter_Book.txt is placed in the root directory. This project processes approximately 1.18 million characters from the text.

🚀 Module Highlights Part 1: Pokémon GO Sentiment Analysis Balanced Scraping: Implemented a custom scraper to collect a balanced dataset of ~40,050 reviews (8,010 reviews per star rating) to prevent model bias.

Robust NLP Pipeline:

Entity Locking: Phrases like "Remote Raid," "Stardust," and "Niantic" are locked (e.g., remote_raid) to ensure the model treats them as single tokens.

Modern Text Handling: The pipeline converts emojis to text (e.g., 😭 to :loudly_crying_face:) and expands contractions.

Language Audit: Uses langdetect to filter out non-English reviews that slip through the initial scraper filters.

Model: A Deep Learning classifier built with TensorFlow using TextVectorization and CategoricalCrossentropy.

Part 2: Harry Potter Text Generator Architecture: A Sequential model using LSTM (Long Short-Term Memory) layers, Batch Normalization, and Dropout to prevent overfitting.

Preprocessing: The text is cleaned and converted into a 3D Tensor using one-hot encoding for character-level training.

Creative Inference: The model uses a "seed" text to generate new sentences, effectively learning the writing style and vocabulary of the Harry Potter series.

🚀 Key Highlights: Technical details: Sentiment Analysis (Problem 1): Data Scrapping: Implemented a balanced scraper for Google Play Store reviews, targeting ~40,000 reviews across 5 rating buckets.

Preprocessing Pipeline: Built a robust NLP pipeline handling "squeezed" text, emoji-to-text conversion, and game-specific entity locking (e.g., remote_raid, stardust).

Language Audit: Integrated a language detection step to filter out non-English noise from the dataset.

Character Generator (Problem 2): Model Architecture: Developed an LSTM-based Recurrent Neural Network (RNN) with Batch Normalization and Dropout for stable training.

Creative Inference: Utilized a "seed" text to generate original fan-fiction style content based on the Harry Potter and the Goblet of Fire corpus.

Data Processing: Cleaned and vectorized over 1.1 million characters into 3D Tensors for deep learning training.

📊 Results & Insights Sentiment Analysis: The model effectively distinguishes between "Outright Hate" (high confidence) and "Technical Complaints" (moderate confidence), providing a probability "spread" for debugging ambiguous reviews.

Text Generation: The LSTM achieved approximately 65% accuracy in predicting the next character, allowing it to generate recognizable character names and thematic dialogue.

👤 Author Nathan Ong Kee Wee Developed for the Deep Learning in Natural Language Processing Module (AY2025/26)

About

A collection of Deep Learning projects for NLP, featuring a sentiment classifier for Pokémon GO reviews scraped from Google Play and a character-level LSTM text generator trained on the Harry Potter corpus. Developed for the Specialist Diploma in Applied Generative AI (SDGAI).

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors