Learning Symmetrical Cross-Modal Correlations for Speech-Preserving Facial Expression Manipulation

This repository contains the source code for our work:

Maintainer: Guohua Zhang

Abstract: Speech-preserving facial expression manipulation (SPFEM) aims to automatically modify facial emotions while maintaining speech content animations. However, it lacks paired training examples where two corresponding frames represent identical speech content but with different emotions. In this research, we investigate the inherent characteristic of similar speeches having comparable facial expressions and mouth movements. We introduce a novel symmetrical cross-modal correlation learning (SCMCL) framework that leverages this characteristic to establish paired supervision to improve facial expression manipulation. Specifically, the framework initially learns a symmetrical cross-modal met- ric, ensuring that the similarity metric in one modality (e.g., audio) is strongly correlated with that in another modality (e.g., images). Given an input video clip, we can extract similar audio clips and their corresponding image frames in a specific emotion. We then ensure that the visual similarity between the generated image and the retrieved image correlates with the corresponding audio similarity. This approach can be effortlessly integrated with existing algorithms as an additional objective, providing detailed paired supervision for high-quality facial expression manipulation. Our extensive qualitative and quantitative evaluations across various settings demonstrate the effectiveness of the proposed algorithm.

Framework of facial emotion manipulation while retaining the original mouth motion, i.e. speech.

We show examples of 3 basic emotions.

Updates

03/04/2023: We have realeased some related works.

07/02/2024: We have added code and instructions for the training of our work.

Getting Started

Clone the repo:

git clone https://github.com/guohua-zhang/LSCMC.git
cd LSCMC

Requirements

Create a conda environment, using the provided environment.yml file.

conda env create -f environment.yml

Activate the environment.

conda activate NED

Files

Follow the instructions in DECA (under the Prepare data section) to acquire the 3 files ('generic_model.pkl', 'deca_model.tar', 'FLAME_albedo_from_BFM.npz') and place them under "./DECA/data".
Fill out the form to get access to the FSGAN's pretrained models. Then download 'lfw_figaro_unet_256_2_0_segmentation_v1.pth' (from the "v1" folder) and place it under "./preprocessing/segmentation".

Run

./scripts/preprocess.sh
./scripts/train.sh
./scipts/postprocess.sh

Acknowledgements

We would like to thank the following great repositories that our code borrows from:

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.images		.images
manipulator		manipulator
metrics		metrics
postprocessing		postprocessing
preprocessing		preprocessing
renderer		renderer
scripts		scripts
similarity		similarity
README.md		README.md
environment.yml		environment.yml
train_options.py		train_options.py
train_union.py		train_union.py
union_dataset.py		union_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Symmetrical Cross-Modal Correlations for Speech-Preserving Facial Expression Manipulation

Updates

Getting Started

Requirements

Files

Run

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

guohua-zhang/LSCMC

Folders and files

Latest commit

History

Repository files navigation

Learning Symmetrical Cross-Modal Correlations for Speech-Preserving Facial Expression Manipulation

Updates

Getting Started

Requirements

Files

Run

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages