Skip to content

MirrorMetrics: How to evaluate Stable Diffusion LoRAs. A visual diagnostic tool to detect overfitting, check dataset quality, and fix training settings using InsightFace biometrics.

License

Notifications You must be signed in to change notification settings

AndyLone22/MirrorMetrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ MirrorMetrics: AI Identity Consistency Analyzer

License: MIT Python 3.10+

MirrorMetrics is a scientific benchmarking tool for evaluating Face LoRAs (Stable Diffusion fine-tuned models). It uses InsightFace (ArcFace) to perform local biometric analysis and generates a rich, interactive Plotly dashboard β€” all running entirely on your machine.

Why use this tool? (The Problem)

Training a LoRA often feels like guessing. You might ask:

  • How do I know if my LoRA is overtrained?
  • Why does my character look rigid?
  • Is my dataset consistent?

MirrorMetrics solves this by measuring Identity Consistency, Face Geometry, and Flexibility mathematically.

In two quick and easy words

You put the dataset you used to train your LoRA in the Reference_Images folder and the generated images in the Lora_Candidates folder (Create one folder for each LoRA you want to compare. The name of the folder will be used as the name of the LoRA in the dashboard.).

You can have more LoRAs from the same training, maybe with different settings, or different steps, for example. Or you could have LoRA trained for entirely different models, that's fine too.

Then you run the script and it will generate a dashboard with all the metrics. The way I use it the most at the moment is:

  • Compare the plots for the dataset, to see if I have some outliers that can skew the results.
  • Compare the plots for the LoRAs, to see which one is the most consistent with the dataset, especially graph one which shows the general similarity score of the various images compared to the median of the dataset.
  • Look at the plots 5 and 6 to see if the LoRA is good at generating the face in different angles and how much does it tend to do so (of course mind the prompts you're using, if you specifically tell it to generate a right profile image of course you'll have a bunch of dots on the right side of the graph).

"How consistent is the identity your LoRA generates?" β€” MirrorMetrics answers this question with data.


✨ Features

Feature Description
🧬 Biometric Similarity Cosine similarity between face embeddings and a reference centroid
πŸ”„ Leave-One-Out (LOO) Robust reference scoring that excludes each image from its own centroid calculation β€” great for cleaning noisy datasets
πŸ“Š Interactive Plotly Dashboard 7-panel dark-themed HTML dashboard with floating control panel
🎯 Pose Analysis Yaw / Pitch scatter plots to evaluate identity stability across head orientations
πŸ—ΊοΈ t-SNE Identity Map 2D projection of face embeddings to visualize identity clusters
πŸ‘€ Age Detection Per-image age estimation via deep learning
πŸ”’ Privacy-Focused Everything runs locally β€” no images are ever uploaded

πŸ“‹ Prerequisites

  • Python 3.10+
  • NVIDIA GPU (recommended, not required) β€” the script runs on CPU too, just slower. See the installation tips below.

πŸš€ Installation

# Clone the repository
git clone https://github.com/AndyLone22/MirrorMetrics.git
cd MirrorMetrics

# Create a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # macOS / Linux

# Install dependencies
pip install -r requirements.txt

Important

The requirements.txt includes all NVIDIA CUDA libraries (cuBLAS, cuDNN, etc.) so the setup is fully standalone β€” no system-level CUDA Toolkit installation needed. This makes the total venv size ~4.5 GB.

Tip

Already have CUDA 12 installed system-wide? You can save ~3 GB by removing all nvidia-* lines from requirements.txt before running pip install. The script will use your system CUDA libraries instead.

No NVIDIA GPU? Replace onnxruntime-gpu with onnxruntime in requirements.txt and remove all nvidia-* lines. The script will run on CPU (slower but functional).

Note

On the first run, InsightFace will automatically download the buffalo_l model (~300 MB). This is a one-time operation.


πŸ“– Usage

1. Prepare your folders.

It's imperative to use good structured datasets for the Analysis: The reference images should be the ones used in the dataset for Training the LoRAs. Then with those LoRAs, produce at least 10 different prompts, in batches of 3 per prompts, to have a good variety, and mind to have differentiated prompts that reach for many positions, angles, situations... If you only produce images of portraits, of course the variance will be flat, but that won't mean that the model is not variable, just that the dataset was created poorly. I suggest you decide 10 standard prompts and always use those, so that you'll learn better how to read the results with experience.

MirrorMetrics/
β”œβ”€β”€ Reference_Images/       ← Your real reference photos (the "ground truth")
β”‚   β”œβ”€β”€ photo1.jpg
β”‚   β”œβ”€β”€ photo2.jpg
β”‚   └── ...
β”œβ”€β”€ Lora_Candidates/        ← One subfolder per LoRA to evaluate
β”‚   β”œβ”€β”€ MyLoRA_v1/
β”‚   β”‚   β”œβ”€β”€ gen_001.png
β”‚   β”‚   └── gen_002.png
β”‚   β”œβ”€β”€ MyLoRA_v2/
β”‚   β”‚   └── ...
β”‚   └── ...
└── mirror_metrics.py
  • Reference_Images/ β€” Place your real face photos here. More variety (angles, lighting) = better centroid.
  • Lora_Candidates/ β€” Create one subfolder per LoRA (or per experiment). Each subfolder will appear as a separate group in the dashboard.

2. Run the analysis

Windows: Just double-click run.bat β€” it activates the venv and launches the script automatically.

Linux / macOS: Run chmod +x run.sh once, then ./run.sh.

Or manually from any platform:

python mirror_metrics.py

3. View the results

The script generates two output files in the project root:

File Description
Dashboard_<timestamp>.html Interactive Plotly dashboard β€” open in any browser
Data_<timestamp>.csv Raw data export for further analysis

πŸ“Š Dashboard Panels

The generated dashboard contains 7 analysis panels:

  1. Face Similarity β€” Box plot of cosine similarity scores per group
  2. Age Distribution β€” Violin plot of estimated ages
  3. Face Ratio β€” Bounding-box aspect ratio distribution
  4. Detection Confidence β€” Face detector confidence scores
  5. Profile Stability β€” Similarity vs. absolute yaw angle (does identity hold in profile views?)
  6. Pose Variety β€” Yaw vs. Pitch bubble chart (bubble size = identity strength)
  7. Identity Map β€” t-SNE 2D projection of face embeddings

A floating control panel lets you toggle individual groups on/off or cycle through them in solo mode.


πŸ§ͺ Methodology

Leave-One-Out (LOO) Scoring

For reference images, each photo is scored against the centroid of all other reference images (excluding itself). This prevents inflated self-similarity and helps identify outlier photos in your reference set.

For generated images (LoRA candidates), each photo is compared against the full reference centroid.

t-SNE Visualization

All face embeddings (reference + generated) are projected into 2D using t-SNE. Tight clusters indicate consistent identity; scattered points suggest identity drift.


βš™οΈ Configuration

You can customize these variables at the top of mirror_metrics.py:

Variable Default Description
PATH_REFERENCE Reference_Images Path to reference images folder
PATH_CANDIDATES_ROOT Lora_Candidates Path to LoRA candidates root folder
EXTENSIONS jpg, jpeg, png, webp, bmp Supported image formats

FAQ

How do I check if my LoRA is overtrained?

Look at the Face Similarity chart. If the score is extremely high (>0.85) on all images and the Face Ratio variance is near zero, your model is likely overfitted (memorizing pixels instead of concepts).

Can I use this for Flux or SDXL?

Yes. MirrorMetrics works on the output images, so it is compatible with Stable Diffusion 1.5, SDXL, Pony, Flux, QWEN, Z-Image and any other model of any kind.

How to evaluate dataset quality?

Use the "Reference" box in the charts. If the purple box is very tall or has outliers, check those images to see why they give a high difference evaluation to the mean of the rest of the images, then discard them if you feel it's best.

Why does the Age Distribution vary so much?

This usually indicates inconsistent skin texture in your dataset. The biometric engine uses high-frequency details (pores, wrinkles, skin smoothing) to estimate age. If you mixed "soft" lighting images with "harsh" realistic photos, the tool might interpret the texture difference as an age difference.

Why does Detection Confidence drop on good images?

This often happens with extreme angles (e.g. looking back over the shoulder, steep profiles). The detector expects a standard facial geometry, so perspective compression can lower the confidence score even if the anatomy is correct. Low confidence on a profile shot is acceptable; low confidence on a front-facing shot means your model is broken, so interpretation of the data is always needed!

Can I use this tool before training?

Yes! You can run the tool pointing only to your Dataset folder to analyze the Purple Box (Reference). If the box is very tall or has dots floating far below it, you have "poison" images (outliers) in your dataset. Removing them before training could save you time and GPU hours, but it's data: not a suggestion so you always must interpret the data before deciding how to act.

How do I measure Flexibility (Creativity)?

Look at the Face Ratio and Pose Variety charts. If the Face Ratio is a flat line and Pose Variety is clustered at the center, the model is just "photocopying" the data (Low Creativity). If the charts show wide variance (like a violin shape) and scattered dots, the model understands the 3D structure well enough to generate new expressions and angles (High Flexibility). Of course all this needs to have a good set of produced images to evaluate with different poses, angles, positions, backgrounds etc...


πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE.txt file for details.


🀝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

About

MirrorMetrics: How to evaluate Stable Diffusion LoRAs. A visual diagnostic tool to detect overfitting, check dataset quality, and fix training settings using InsightFace biometrics.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published