This repository contains a PyTorch and Transformers implementation for fine-tuning the DistilBERT base uncased model on the dair-ai/emotion dataset. The pipeline includes downloading the dataset, loading the pretrained model, training, saving the model and checkpoints, validating, evaluating, resuming training from a checkpoint, and performing inference.
- Fine-tune DistilBERT on the
dair-ai/emotiondataset. - Save and load model checkpoints.
- Resume training from the last saved checkpoint.
- Perform inference on custom text inputs.
- Use PyTorch with CUDA support.
- Includes a
requirements.txtfor dependencies.
Ensure you have Python 3.10+ installed.
To install PyTorch, visit the official PyTorch website and follow the instructions to select the appropriate version for your system and CUDA setup.
Run the following command to install the required Python packages:
pip install -r requirements.txt-
Clone this repository:
git clone https://github.com/Rahul2991/DistilBert-Based-Emotion-Classification.git cd DistilBert-Based-Emotion-Classification -
Start training:
python train.py
-
Training output includes:
- Checkpoints: Saved in the
checkpoints/directory. - Final model: Saved in
results/directory.
- Checkpoints: Saved in the
-
Resume training from a checkpoint:
python train.py --resume_from_checkpoint checkpoints/<checkpoint_folder>
After training, you can validate the model on the test dataset:
python evaluate_model.py- Accuracy
- Precision
- Recall
- F1 Score
- Accuracy: 0.9210
- Precision: 0.9218
- Recall: 0.9210
- F1 Score: 0.9212
To perform inference on custom text:
-
Run the inference script:
python inference.py
-
The script will output the predicted emotion label.
.
├── train.py # Script for training the model
├── evaluate_model.py # Script for evaluating the model
├── inference.py # Script for running inference
├── requirements.txt # Dependencies
├── results/ # Directory for saving the final model
├── checkpoints/ # Directory for saving training checkpoints
└── README.md # Project documentation
python train.pypython train.py --resume_training 1 --resume_training_checkpoint checkpoints/checkpoint-10000python evaluate_model.pypython inference.pyThis project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for the
transformersanddatasetslibraries. dair-ai/emotiondataset for providing labeled emotion data.- PyTorch for the deep learning framework.
@inproceedings{saravia-etal-2018-carer,
title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
author = "Saravia, Elvis and
Liu, Hsien-Chi Toby and
Huang, Yen-Hao and
Wu, Junlin and
Chen, Yi-Shin",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
month = oct # "-" # nov,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D18-1404",
doi = "10.18653/v1/D18-1404",
pages = "3687--3697",
abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.",
}