DistilBERT Fine-tuning on Emotion Classification Dataset

This repository contains a PyTorch and Transformers implementation for fine-tuning the DistilBERT base uncased model on the dair-ai/emotion dataset. The pipeline includes downloading the dataset, loading the pretrained model, training, saving the model and checkpoints, validating, evaluating, resuming training from a checkpoint, and performing inference.

Features

Fine-tune DistilBERT on the dair-ai/emotion dataset.
Save and load model checkpoints.
Resume training from the last saved checkpoint.
Perform inference on custom text inputs.
Use PyTorch with CUDA support.
Includes a requirements.txt for dependencies.

Installation

Prerequisites

Ensure you have Python 3.10+ installed.

Install PyTorch

To install PyTorch, visit the official PyTorch website and follow the instructions to select the appropriate version for your system and CUDA setup.

Install Dependencies

Run the following command to install the required Python packages:

pip install -r requirements.txt

Training the Model

Steps

Clone this repository:

git clone https://github.com/Rahul2991/DistilBert-Based-Emotion-Classification.git
cd DistilBert-Based-Emotion-Classification

Start training:
```
python train.py
```
Training output includes:
- Checkpoints: Saved in the checkpoints/ directory.
- Final model: Saved in results/ directory.

Resume training from a checkpoint:

python train.py --resume_from_checkpoint checkpoints/<checkpoint_folder>

Evaluation and Validation

After training, you can validate the model on the test dataset:

python evaluate_model.py

Evaluation Metrics

Accuracy
Precision
Recall
F1 Score

Results

Accuracy: 0.9210
Precision: 0.9218
Recall: 0.9210
F1 Score: 0.9212

Inference

To perform inference on custom text:

Run the inference script:
```
python inference.py
```
The script will output the predicted emotion label.

Directory Structure

.
├── train.py             # Script for training the model
├── evaluate_model.py    # Script for evaluating the model
├── inference.py         # Script for running inference
├── requirements.txt     # Dependencies
├── results/             # Directory for saving the final model
├── checkpoints/         # Directory for saving training checkpoints
└── README.md            # Project documentation

Example Usage

Training

python train.py

Resuming Training

python train.py --resume_training 1 --resume_training_checkpoint checkpoints/checkpoint-10000

Evaluation

python evaluate_model.py

Inference

python inference.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Hugging Face for the transformers and datasets libraries.
dair-ai/emotion dataset for providing labeled emotion data.
PyTorch for the deep learning framework.

Citations

@inproceedings{saravia-etal-2018-carer,
    title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
    author = "Saravia, Elvis  and
      Liu, Hsien-Chi Toby  and
      Huang, Yen-Hao  and
      Wu, Junlin  and
      Chen, Yi-Shin",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D18-1404",
    doi = "10.18653/v1/D18-1404",
    pages = "3687--3697",
    abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
evaluate_model.py		evaluate_model.py
globals.py		globals.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DistilBERT Fine-tuning on Emotion Classification Dataset

Features

Installation

Prerequisites

Install PyTorch

Install Dependencies

Training the Model

Steps

Evaluation and Validation

Evaluation Metrics

Results

Inference

Directory Structure

Example Usage

Training

Resuming Training

Evaluation

Inference

License

Acknowledgments

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

DistilBERT Fine-tuning on Emotion Classification Dataset

Features

Installation

Prerequisites

Install PyTorch

Install Dependencies

Training the Model

Steps

Evaluation and Validation

Evaluation Metrics

Results

Inference

Directory Structure

Example Usage

Training

Resuming Training

Evaluation

Inference

License

Acknowledgments

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages