TriadNet: A Multimodal Architecture for Big Five Personality Traits Assessment

Official implementation of TriadNet, a multimodal deep learning framework for Big Five personality trait assessment from video, audio, and text.

TriadNet: A Multimodal Architecture for Big Five Personality Traits Assessment
Soham Pahari*, Sandeep Chand Kumain*, Lalit K Awasthi†
*School of Computer Science, UPES, Dehradun, India
†SPU Mandi, Himachal Pradesh, India

🔥 Highlights

State-of-the-art Performance: Achieves 97.8% cluster accuracy and 96.9% mean trait accuracy on ChaLearn LAP 2017
Multimodal Fusion: Jointly models visual, auditory, and textual behavioral cues
Novel Architecture Components:
- Dynamic Face Graph Network (DFGN): Captures spatial-temporal facial dynamics via landmark graphs
- Prosody-Aware Capsule Network (PACN): Models prosodic and acoustic-semantic patterns
- Contextual Sentiment Flow (CSF): Tracks emotional and linguistic shifts in text
- Iterative Modality Dialogue (IMD): Cross-attention-based fusion mechanism
- Trait-Specific Attention Gate (TSAG): Individualized reasoning for each personality trait

📋 Table of Contents

🚀 Installation

Clone the repo

Clone the code from GitHub with this command.

git clone https://github.com/sohampahari/TriadNet.git
cd TriadNet

Setup

Python 3.10 is suggested. Create a virtual env first. Then install packages from requirements.txt.

Example:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

If you run on GPU install CUDA drivers and torch with the matching build.

📦 Dataset Preparation

Prepare data folder

Create a data folder before running the scripts. Use this bash command.

# create data folder
mkdir -p data

Data

Place all data under the data/ folder. Use the First Impressions V2 dataset from ChaLearn. Download link:

https://chalearnlap.cvc.uab.cat/

We used the First Impressions V2 set from CVPR 2017.

Dataset Statistics:

~10,000 video clips total (6,000 train / 2,000 validation / 2,000 test)
Average duration: 15 seconds
Big Five trait annotations: [0, 1] fractional scores

🔄 Training Pipeline

Fault tolerance

For fault tolerance we save embeddings after each step. This lets you resume work if a run stops.

CSV files produced

The pipeline writes these CSV files. For each split you run, the file names follow this pattern.

Video embeddings

video_embeddings_train.csv video_embeddings_vali.csv video_embeddings_test.csv

Audio embeddings

audio_embeddings_train.csv audio_embeddings_vali.csv audio_embeddings_test.csv

Text embeddings

text_embeddings_train.csv text_embeddings_vali.csv text_embeddings_test.csv

Transcriptions

transcribed_videos.csv

Fused features

fused_av.csv (video + audio, 1024 dims)
fused_avt.csv (video + audio + text, 1536 dims)

You can change the file names in the code if you prefer another naming scheme.

Workflow

Prepare the dataset files under data/.
Run the video embedding script. This creates video_embeddings_*.csv.
Run the audio embedding script. This creates audio_embeddings_*.csv.
Run the text transcriber. This creates transcribed_videos.csv and text_embeddings_*.csv.
Run the fusion script. This creates fused_av.csv or fused_avt.csv (use fused_avt.csv).
Run traingate.py to train models using the fused features.

We recommend computing all embeddings for all splits first. Then run training.

Files of interest

video_emb.py : video feature extractor and GCN.
audio_emb.py : audio extractor and PACN.
text_emb.py : whisper transcriber and BERT + CSF.
fusion.py : IMD and iterative fusion. Use mode av or avt.
traingate.py : training script.
requirements.txt : Python packages.

💻 Usage

Step 1: Extract Video Features

Run for each split (train, validation, test):

python video_emb.py --split train
python video_emb.py --split validation
python video_emb.py --split test

What it does:

Detects faces using MTCNN
Extracts 68 facial landmarks per frame
Applies Dynamic Face Graph Network (DFGN)
Outputs 512-D video embeddings

Step 2: Extract Audio Features

Run for each split:

python audio_emb.py --split train
python audio_emb.py --split validation
python audio_emb.py --split test

What it does:

Segments audio into windows
Extracts MFCCs + Wav2Vec2 embeddings
Applies Prosody-Aware Capsule Network (PACN)
Outputs 512-D audio embeddings

Step 3: Process Text

Run for each split:

python csf_emb.py --split train
python csf_emb.py --split validation
python csf_emb.py --split test

What it does:

Transcribes videos using Whisper
Generates BERT contextual embeddings
Applies Contextual Sentiment Flow (CSF)
Outputs 512-D text embeddings

Step 4: Fuse Modalities

Choose fusion mode:

# Audio-Visual fusion (1024-D)
python fusion.py --mode av

# Audio-Visual-Text fusion (1536-D)
python fusion.py --mode avt

What it does:

Applies Iterative Modality Dialogue (IMD)
Uses cross-attention for multimodal fusion
Creates fused representations

Step 5: Train Model

python traitgate.py

Training details:

Uses fused features from Step 4
Applies Trait-Specific Attention Gate (TSAG)
Hierarchical prediction: clusters → traits
Loss: L = 0.3 * L_cluster + 0.7 * L_traits

Component Details

DFGN: Graph convolution + LSTM for facial dynamics + EmotionNet for appearance
PACN: Capsule routing for prosodic hierarchies
CSF: BERT + LSTM + attention for sentiment flow
IMD: Multi-head cross-attention fusion (3 iterations)
TSAG: Per-trait attention gates
Hierarchical Predictor: Cluster classification + trait regression

Results

Performance on ChaLearn LAP 2017 Test Set

Metric	Value
Cluster Accuracy	97.80%
Mean Trait Accuracy	96.92%
Mean Squared Error	0.0116

📝 Notes and Tips

The MTCNN face detector needs a working CPU or GPU.
Whisper models are heavy. Use GPU for speed.
If memory is tight reduce max_frames or segment_length.
If you want one node per landmark change the feature design in the video code.

📧 Contact

If you want changes, tell me which part to update.

Lead Developer: Soham Pahari
📧 Email: paharisoham@gmail.com

📄 License

This project is licensed under the MIT License.

Last Updated: November 2025

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
audio_emb.py		audio_emb.py
csf_emb.py		csf_emb.py
emonet.py		emonet.py
evl.py		evl.py
fusion.py		fusion.py
param_count.py		param_count.py
requirements.txt		requirements.txt
traitgate.py		traitgate.py
video_emb.py		video_emb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TriadNet: A Multimodal Architecture for Big Five Personality Traits Assessment

🔥 Highlights

📋 Table of Contents

🚀 Installation

Clone the repo

Setup

📦 Dataset Preparation

Prepare data folder

Data

🔄 Training Pipeline

Fault tolerance

CSV files produced

Workflow

Files of interest

💻 Usage

Step 1: Extract Video Features

Step 2: Extract Audio Features

Step 3: Process Text

Step 4: Fuse Modalities

Step 5: Train Model

Component Details

Results

Performance on ChaLearn LAP 2017 Test Set

📝 Notes and Tips

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TriadNet: A Multimodal Architecture for Big Five Personality Traits Assessment

🔥 Highlights

📋 Table of Contents

🚀 Installation

Clone the repo

Setup

📦 Dataset Preparation

Prepare data folder

Data

🔄 Training Pipeline

Fault tolerance

CSV files produced

Workflow

Files of interest

💻 Usage

Step 1: Extract Video Features

Step 2: Extract Audio Features

Step 3: Process Text

Step 4: Fuse Modalities

Step 5: Train Model

Component Details

Results

Performance on ChaLearn LAP 2017 Test Set

📝 Notes and Tips

📧 Contact

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages