Facial Recognition Systems¶

One-line summary: End-to-end pipelines that detect, align, embed, and match human faces for verification (1:1) and identification (1:N).

Modality: Face
Related concepts: Deep Learning Architectures for Biometrics, Transformer Architectures for Biometrics, Anti Spoofing Techniques, Bias and Fairness in Biometrics, Biometric Image Quality, Multimodal Biometrics
Last updated: 2026-04-04

Overview¶

Modern facial recognition operates as a four-stage pipeline:

Detection — Localize faces in an image (RetinaFace, SCRFD, YOLOv8-Face).
Alignment — Warp the detected crop to a canonical pose using 5-point or 68-point landmarks.
Embedding — Map the aligned face to a compact feature vector (typically 128–512 dims) using a deep network trained with a margin-based loss.
Matching — Compare embeddings via cosine similarity or L2 distance against a gallery for verification or identification.

The field is dominated by metric-learning losses that enforce inter-class separation and intra-class compactness in the embedding space.

Technical Details¶

Loss Functions (Evolution)¶

Loss	Year	Key Idea
Contrastive Loss	2006	Pair-based distance learning
Triplet Loss (FaceNet)	2015	Anchor-positive-negative margin
Center Loss	2016	Penalizes distance to class center
SphereFace (A-Softmax)	2017	Angular margin in weight space
CosFace (LMCL)	2018	Additive cosine margin
ArcFace	2019	Additive angular margin — current default
AdaFace	2022	Quality-adaptive margin
UniTSFace	2024	Unified sample-to-sample loss with hard-pair mining

Backbone Architectures¶

ResNet-100/200 — Workhorse backbone; ArcFace + R100 remains a strong baseline.
MobileFaceNet — Lightweight backbone for on-device inference (~1M params).
EfficientNet-B4 — Balanced accuracy-efficiency trade-off.
ViT-based — See Transformer Architectures for Biometrics; ViT-B/16 + ArcFace now competitive with CNNs on IJB-C.
EdgeFace (2024) — Hybrid CNN-ViT optimized for mobile via NAS.

Inference Pipeline Considerations¶

Template aggregation — When multiple frames are available (video, multi-crop), embeddings are pooled (mean, quality-weighted, attention-based).
Score normalization — Z-norm, T-norm, and S-norm calibrate raw cosine scores for large-scale identification.
Quantization — INT8 / FP16 embeddings reduce storage for billion-scale galleries with minimal accuracy loss.

Key Models & Papers¶

Model / Paper	Year	Contribution
DeepFace (Facebook)	2014	First deep-learning face verification system, 97.35% on LFW
FaceNet (Google)	2015	Triplet loss + 128-d embeddings; 99.63% LFW
ArcFace (Deng et al.)	2019	Additive angular margin; SOTA across IJB-B/C, MegaFace
AdaFace (Kim et al.)	2022	Quality-adaptive margin; strong on low-quality benchmarks (IJB-S, TinyFace)
TransFace	2023	Pure ViT backbone with patch-level attention for face recognition
TopoFR	2024	Topology-preserving face recognition with persistent homology regularization
UniTSFace	2024	Unified triplet-softmax with curriculum hard-pair mining

Datasets¶

Dataset	Size	Year	Notes
LFW	13K images / 5.7K identities	2007	Saturated benchmark (>99.8% for SOTA)
MS1MV2 (MS-Celeb-1M cleaned)	5.8M / 85K ids	2019	Standard training set after noise cleaning
WebFace260M	260M / 4M ids	2022	Largest public face dataset; noisy
CASIA-WebFace	500K / 10.5K ids	2014	Smaller training set, useful for ablations
IJB-B / IJB-C	76K / 3.5K ids	2017/2018	Mixed media (still + video); standard eval
IJB-S	202 videos / surveillance	2018	Low-quality surveillance benchmark
TinyFace	169K / 5.1K ids	2017	Low-resolution faces in the wild
BUPT-BalancedFace	1.3M / 28K ids	2020	Racially balanced training set

Challenges¶

Low-quality / unconstrained faces — Pose, illumination, occlusion, low resolution, and motion blur degrade embedding quality. Biometric Image Quality plays a critical role.
Demographic bias — Accuracy varies across race, gender, and age. See Bias and Fairness in Biometrics.
Aging and appearance change — Longitudinal stability of embeddings over years.
Billion-scale search — Efficient ANN indexing (FAISS, ScaNN) needed for national ID or web-scale galleries.
Presentation attacks — Print, replay, 3D mask, and deepfake attacks. See Anti Spoofing Techniques.
Privacy regulation — GDPR, Illinois BIPA, EU AI Act restrict collection and use. See Privacy Preserving Biometrics.

State of the Art (SOTA)¶

As of early 2026: - IJB-C TAR@FAR=1e-4: ~97.5% (ArcFace-R200 + quality-weighted fusion), ~97.8% with ViT-L ensembles. - LFW: 99.87%+ (essentially saturated). - IJB-S (surveillance): AdaFace and quality-aware methods lead (~72% Rank-1 at far end). - Real-world error rates: NIST FRVT ongoing evaluations show top commercial systems at FNMR < 0.2% @ FMR=1e-6 for frontal images. - On-device: Sub-5ms inference on flagship mobile SoCs with MobileFaceNet/EdgeFace.

Open Questions¶

Can self-supervised or foundation-model pre-training (DINOv2, MAE) replace supervised ArcFace training for face recognition?
How to build truly fair systems that equalize error rates across all demographic groups without sacrificing overall accuracy?
What is the theoretical limit of face recognition under extreme pose/illumination variation?
Will synthetic training data (generated by diffusion models) fully replace privacy-sensitive real face datasets?

References¶

Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). ArcFace: Additive Angular Margin Loss for Deep Face Recognition. CVPR.
Kim, M., Jain, A. K., & Liu, X. (2022). AdaFace: Quality Adaptive Margin for Face Recognition. CVPR.
Dan, J. et al. (2024). TopoFR: A Closer Look at Topology Alignment on Face Recognition. NeurIPS.
Grother, P. et al. (ongoing). NIST FRVT. https://pages.nist.gov/frvt/

Backlinks: Deep Learning Architectures for Biometrics, Transformer Architectures for Biometrics, Anti Spoofing Techniques, Bias and Fairness in Biometrics, Multimodal Biometrics, Biometric Image Quality, Biometric Datasets and Benchmarks, Real World Biometric Deployments