This repository contains the implementation module for Speech+Transcript Emotion Recognition utilized on Graph Contrastive Learning (GCL).
The goal of this project is to recognize emotions from multimodal data (Text and Audio). The model utilizes Graph Contrastive Learning to effectively capture the interplay between different modalities within conversational contexts.
The database consists of text and audio recordings acquired from scripts designed to evoke specific emotions.
The model classifies input data into one of the following 7 emotion categories:
- Joy (기쁨)
- Neutral (중립)
- Afraid (불안)
- Surprise (당황)
- Disgust (상처)
- Sadness (슬픔)
- Anger (분노)
This project is built with Python and requires the following libraries:
torchpandasnumpysklearnpyyamltypingmatplotlibdatetime
- Clone this repository.
- Install the required packages using
pip:
pip install -r requirements.txtTrain and evaluate the model by executing as
python train.py --dataset IITP-SMED --cuda_id 0Available datasets should be one of [IITP-SMED, IITP-SMED-STT]
IITP-SMED and IITP-SMED-STT are our empirical datasets constructed by taking funds from IITP in South Korea.
Experiment on additional datasets need to be reproduced by owns.