GitHub - JalilaMuadi/NLP-

🧠 Arabic Spell Checker – NLP-based Contextual Correction Tool

📋 Overview

This project presents an Arabic spell checker that leverages Natural Language Processing (NLP) techniques to provide intelligent, context-aware corrections. It aims to improve the accuracy and reliability of spell checking for Arabic, one of the most morphologically rich and complex languages.

🎯 Objectives

Detect misspelled Arabic words.
Suggest corrections based on context and linguistic patterns.
Compare traditional edit-distance methods with probabilistic language models.

🧩 Methodology

🔹 Dataset

Dictionary dataset: ~890,000 Arabic words for Levenshtein algorithm.
Arabic corpus: 8,660 lines for the n-gram model.
Both datasets were collected from open online linguistic resources.

🔹 Algorithms Used

Levenshtein Distance – for edit-distance-based correction.
N-gram Language Model – for context-based probability scoring.
Beam Search – for combining frequency-based scoring with contextual correction.
Tokenization, Stemming, POS Tagging – for text preprocessing and linguistic analysis.

🛠️ Tools & Technologies

Language: Python
Libraries: NLTK, NumPy, Tkinter
NLP Techniques: Tokenization, Stemming, POS Tagging
Interface: GUI built using Tkinter

🧮 Results

The Levenshtein method effectively handled minor typographical errors (insertions, deletions, substitutions).
The N-gram model achieved higher contextual accuracy and better prediction of common word sequences.
Combining both approaches enhanced overall correction accuracy and flexibility.

🧰 How to Run

Open the UI folder.
Run the file:
```
main.exe
```
or
```
python main.py
```
Enter Arabic text in the GUI to test the spell checker interactively.

👥 Group Members

Jalila Moaddi – 1201611@student.birzeit.edu
Maryan Kassis – 1200861@student.birzeit.edu
Obada Hattab – 1171616@student.birzeit.edu

Department of Electrical and Computer Engineering – Birzeit University

🔮 Future Enhancements

Expand the Arabic corpus for better contextual accuracy.
Integrate modern BERT-based language models for semantic understanding.
Optimize runtime efficiency and GUI usability.
Develop a web-based interface for public testing.

📚 References

Jurafsky, D., & Martin, J. H. (2024). N-gram Language Models. Stanford University.
Saturn Cloud (2024). Stemming in Natural Language Processing.
UBIAI (2023). NLP Techniques: Tokenization, POS Tagging and NER.
ScienceDirect (2024). Levenshtein Distance Overview.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
UI		UI
NLP-presentation.pptx		NLP-presentation.pptx
NLP_Paper.pdf		NLP_Paper.pdf
ProjectAbstract.pdf		ProjectAbstract.pdf
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Arabic Spell Checker – NLP-based Contextual Correction Tool

📋 Overview

🎯 Objectives

🧩 Methodology

🔹 Dataset

🔹 Algorithms Used

🛠️ Tools & Technologies

🧮 Results

🧰 How to Run

👥 Group Members

🔮 Future Enhancements

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Arabic Spell Checker – NLP-based Contextual Correction Tool

📋 Overview

🎯 Objectives

🧩 Methodology

🔹 Dataset

🔹 Algorithms Used

🛠️ Tools & Technologies

🧮 Results

🧰 How to Run

👥 Group Members

🔮 Future Enhancements

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages