AuthentiFace is a real-time authentication system that uses micro-expression analysis and liveness detection to secure face-based logins. It introduces a dual-head Vision Transformer capable of predicting both emotional state and live/spoof status from the same shared backbone.
- RAF-DB – Real-world Affective Faces Dataset
https://www.kaggle.com/datasets/ashishpatel26/raf-db
- SAMM v1 – Micro-Expression Dataset
https://www.kaggle.com/datasets/sajidshahriar/samm-v1-micro-expression-dataset
- CelebA-Spoof – Face Anti-Spoofing Dataset
https://www.kaggle.com/datasets/kpvisionlab/celeb-a-spoof-dataset
Traditional face authentication is vulnerable to:
- High-quality printed photos
- Screen replay attacks
- Deepfake videos
Common liveness cues (blinks, texture patterns, simple CNN-based PAD methods) are now easily spoofed. A stronger, involuntary biometric cue is required for modern security systems.
- Micro-expressions occur for less than 1/25th of a second and cannot be voluntarily controlled or reproduced by screens or deepfake systems.
- AuthentiFace is the first system to integrate micro-expression learning into liveness detection.
- A dual-head Vision Transformer jointly handles emotion classification and liveness prediction.
- Three-stage curriculum training improves generalization:
- RAF-DB for macro-expression learning
- SAMM for micro-expression fine-tuning
- CelebA-Spoof for liveness classification
- User accesses the login interface.
- Webcam frames are captured and processed.
- Each frame is passed through the ViT model to generate:
- Emotion prediction
- Live/spoof probability
- Temporal smoothing ensures stable decisions across frames.
- Decision logic:
- Live → Login granted
- Spoof → Access denied + automated email alert
- Fully edge-based inference
- Real-time visualization of liveness and emotion
- Stable prediction through smoothing
- Security alerts for confirmed spoof attempts
- Real-time webcam frames
- BGR → RGB conversion
- Center crop
- Resize to 224×224
- ImageNet-normalized tensor as model input
- Patch embedding (16×16 patches)
- Transformer encoder layers
- Global pooled embedding feeding both prediction heads
- Emotion Head: 7-class macro/micro expression classification
- Liveness Head: binary live/spoof classification
- Shared backbone improves feature reuse and micro-expression sensitivity
- Rolling window of recent probabilities
- Moving average to reduce noise
- Threshold-based decision logic for reliable authorization
Learns general facial expression structure.
Fine-tunes backbone for micro-expression sensitivity.
Trains the liveness head for presentation attack detection.
- Accuracy: 93%
- Precision (Spoof): 1.00
- No spoof misclassified as live
- Strong security and low false positives
- Accuracy: 80%
- Macro F1: 0.7985
- Reliable across emotion categories
- Integrating datasets with different structures
- Achieving stable real-time performance
- Handling noise and lighting variability
- Designing secure decision thresholds
- Creating an interpretable and responsive UI
- SAMM dataset size limits micro-expression diversity
- Sensitive to lighting and low-resolution cameras
- No depth/IR/rPPG sensing; relies only on RGB
- Possible demographic variation in micro-expression patterns
- Tested under varying lighting, occlusions, pose changes, and motion blur
- Temporal smoothing improved reliability
- User study (six participants) showed:
- High usability
- Successful detection of printed and replay attacks
- Automated email alerts working as intended