🎤 Speech Emotion Recognition

MARS Open Project 2025 – Project 1 Submission

Author: Rudra Sharma
Contact: rudra310sharma@gmail.com

[📹 Watch Demo Video(https://drive.google.com/file/d/1adyPLr-APf5GupYBAbAqJzR5N_2CpJsU/view?usp=drive_link)]

🧠 Project Overview

This project implements a complete pipeline to classify human emotions from speech using deep learning. Built using the RAVDESS dataset, it leverages feature extraction with Librosa and a custom-trained 1D Convolutional Neural Network (Conv1D) model.

Predict emotions like: neutral, calm, happy, sad, angry, fear, disgust, surprise

🎯 Objective

Build a robust, modular, and interactive pipeline that:

Extracts acoustic features from .wav audio files
Augments and preprocesses data
Trains a deep learning model for emotion classification
Supports both web-based (Streamlit) and CLI-based prediction

📁 Dataset

Dataset Used: RAVDESS – Ryerson Audio-Visual Database of Emotional Speech and Song

Files: Audio_Speech_Actors_01-24, Audio_Song_Actors_01-24
Each file encodes emotion metadata in its filename

📊 Feature Extraction

We use librosa to extract meaningful audio features:

MFCCs – Mel-Frequency Cepstral Coefficients (40)
Chroma – Pitch class information
Mel Spectrogram
RMS Energy
Zero-Crossing Rate

👉 Each audio file is converted into a 162-length feature vector.

🧱 Model Architecture

Our model is a deep Conv1D-based neural network:

Conv1D (256) → MaxPooling → Conv1D (256) → MaxPooling → Conv1D (128) → MaxPooling → Dropout → Conv1D (64) → MaxPooling → Flatten → Dense (32) → Dropout → Output Layer (8 softmax classes)

Loss: Categorical Crossentropy
Optimizer: Adam
Validation Accuracy: ~85%
Evaluation: Confusion matrix, accuracy, F1-score

🔄 Data Augmentation

Each training sample is augmented into three variations:

Original
With noise
Time-stretch + Pitch-shift

This tripling improves generalization and robustness.

🚀 How to Run

🔧 1. Install Dependencies

pip install -r requirements.txt

🔁 2. Train the Model (optional)

In Jupyter or Colab

Run model_training.ipynb Will generate: emotion_classification_model.h5 scaler.pkl

📂 3. Predict from CLI

python test_model.py path_to_audio.wav

🌐 4. Run Streamlit App

streamlit run app.py Upload .wav audio file

Get real-time prediction

🗂 Project Structure emotion-classification/ ├── model_training.ipynb # Full training and preprocessing ├── emotion_classification_model.h5 # Trained Conv1D model ├── scaler.pkl # Fitted StandardScaler ├── app.py # Streamlit-based frontend ├── test_model.py # CLI prediction script ├── requirements.txt # Dependencies └── README.md # You’re here

Covers:

Model pipeline

Web app prediction

CLI usage

📈 Performance Metrics Validation Accuracy: ~85%

Macro F1-Score: >80%

🧪 Model Evaluation

Confusion Matrix

Results Summary

🔮 Future Improvements 🎙️ Live mic recording input

📦 Docker-based deployment

🎛️ Probabilistic class output visualization

🎵 Multi-language dataset support

📝 Deliverables Checklist Deliverable Status Dataset description ✅ Feature extraction ✅ Model architecture ✅ Code and training pipeline ✅ Trained model + scaler saved ✅ CLI-based inference ✅ Streamlit web UI ✅ Demo video ✅ Clean README.md with instructions ✅

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
emotion-classification		emotion-classification
README.md		README.md
Results.png		Results.png
confusion_matrix.png		confusion_matrix.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 Speech Emotion Recognition

MARS Open Project 2025 – Project 1 Submission

🧠 Project Overview

🎯 Objective

📁 Dataset

📊 Feature Extraction

🧱 Model Architecture

🔄 Data Augmentation

🚀 How to Run

🔧 1. Install Dependencies

🔁 2. Train the Model (optional)

In Jupyter or Colab

📂 3. Predict from CLI

🌐 4. Run Streamlit App

🧪 Model Evaluation

Confusion Matrix

Results Summary

About

Uh oh!

Releases

Packages

Languages

emit-humane/MARS-Open-Project

Folders and files

Latest commit

History

Repository files navigation

🎤 Speech Emotion Recognition

MARS Open Project 2025 – Project 1 Submission

🧠 Project Overview

🎯 Objective

📁 Dataset

📊 Feature Extraction

🧱 Model Architecture

🔄 Data Augmentation

🚀 How to Run

🔧 1. Install Dependencies

🔁 2. Train the Model (optional)

In Jupyter or Colab

📂 3. Predict from CLI

🌐 4. Run Streamlit App

🧪 Model Evaluation

Confusion Matrix

Results Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages