Anti-Spoofing Techniques¶
One-line summary: Presentation attack detection (PAD) methods that distinguish genuine biometric samples from spoofs — prints, replays, masks, synthetic audio, silicone fingers, and deepfakes — across all modalities.
Modality: Cross-modal
Related concepts: Facial Recognition Systems, Iris Recognition, Fingerprint Recognition, Voice Biometrics, Palm Recognition, Deep Learning Architectures for Biometrics, Biometric Datasets and Benchmarks
Last updated: 2026-04-04
Overview¶
A biometric system is only as secure as its ability to reject spoofs (presentation attacks). The ISO/IEC 30107 standard defines:
- Presentation Attack (PA) — Presenting a biometric artifact to the sensor with intent to interfere with system operation.
- Presentation Attack Instrument (PAI) — The artifact used (printed photo, silicone finger, replay video, 3D mask, synthesized audio).
- Presentation Attack Detection (PAD) — The subsystem that detects PAs.
Taxonomy by Modality¶
| Modality | Common Attack Types | Common Defenses |
|---|---|---|
| Face | Print attack, screen replay, 3D silicone/resin mask, deepfake video | Texture analysis, depth estimation, liveness cues (blink, head motion), rPPG (remote photoplethysmography) |
| Iris | Printed iris, cosmetic contact lens, prosthetic eye, screen replay | Texture/spectral analysis, pupil dynamics, 3D iris structure |
| Fingerprint | Silicone/gelatin/Play-Doh mold, printed fingerprint, 3D-printed ridge pattern | Liveness detection (pulse, sweat, temperature), material classification, distortion analysis |
| Voice | Replay, text-to-speech, voice conversion, deepfake audio | Spectral artifact detection, channel analysis, challenge-response (random phrase) |
| Palm | Printed palm image, synthetic vein overlay | NIR vein liveness (blood flow), depth sensing, texture analysis |
Technical Details¶
Face Anti-Spoofing (FAS)¶
Hardware-based: - Structured-light depth (Apple Face ID) — 30K dot projector maps 3D face; rejects flat prints/screens. - Time-of-flight (ToF) depth — Similar depth-based rejection. - Multi-spectral imaging — NIR + VIS; skin reflectance differs from paper/screen.
Software-based (single RGB camera): | Method | Year | Approach | |---|---|---| | LBP / texture analysis | 2011+ | Local Binary Patterns detect micro-texture differences between real skin and print/screen | | De Moiré pattern detection | 2015+ | Detect moiré artifacts from screen replay | | Depth from RGB (PRNet, 3DDFA) | 2018+ | Estimate 3D face shape; flat → spoof | | rPPG (remote PPG) | 2016+ | Detect subtle pulse-induced color changes in facial skin; absent in spoofs | | CDCN (Yu et al.) | 2020 | Central Difference Convolution for fine-grained texture + depth map supervision | | ViTranZFAS | 2022 | Vision Transformer for zero-shot cross-dataset FAS | | FLIP (Cai et al.) | 2023 | Foundation model (CLIP) adapted for FAS; strong cross-dataset generalization | | One-Class FAS | 2024 | Train only on real faces; detect anomalies as spoofs; handles unseen attack types |
Key challenge: Cross-dataset generalization. Models trained on one dataset (CelebA-Spoof) often fail on another (OULU-NPU). Domain generalization and adaptation are active research areas.
Iris Anti-Spoofing¶
- Textured contact lens detection — Analyze high-frequency patterns; deep classifiers (D-NetPAD) achieve ~99% accuracy on LivDet-Iris.
- Printed iris — Detect print artifacts (dot patterns, paper texture, lack of specular reflection).
- Pupil dynamics — Real pupils dilate/constrict in response to light; static images don't.
- 3D structure — Real irises are concave; prints are flat.
Fingerprint Anti-Spoofing (Liveness Detection)¶
- Hardware: Multispectral sensors capture subsurface features (sweat pores, dermal ridges). Pulse oximetry detects blood flow.
- Software: CNN classifiers on fingerprint images (LivDet competition). Texture-based: pore analysis, ridge distortion patterns. Material-specific: silicone, gelatin, wood glue have different optical properties.
- LivDet competition (2009–2023): Accuracy has improved from ~90% to ~99%+ on known materials; unknown materials still challenging.
Voice Anti-Spoofing¶
- ASVspoof challenge series — Benchmark for replay, TTS, and voice conversion detection.
- Features: Linear frequency cepstral coefficients (LFCCs), constant-Q transform (CQT), raw waveform.
- Models: LCNN, RawNet2, AASIST (graph attention), Wav2Vec 2.0 fine-tuned.
- Deepfake audio challenge: VALL-E, XTTS, and GPT-4o voice cloning create increasingly realistic spoofs; detection relies on subtle codec artifacts and prosody anomalies.
Datasets¶
| Dataset | Modality | Size | Notes |
|---|---|---|---|
| OULU-NPU | Face | 4.9K videos / 55 subjects | 4 protocols testing generalization |
| CelebA-Spoof | Face | 625K images / 10K subjects | Large-scale; rich annotations |
| SiW-Mv2 | Face | 785 videos / unknown attacks | Cross-domain FAS benchmark |
| CASIA-FASD | Face | 600 videos / 50 subjects | Early benchmark; print + replay |
| LivDet-Iris (2017–2023) | Iris | Varies | Contact lens + print attacks |
| LivDet-Fingerprint (2009–2023) | Fingerprint | Varies by year | Multi-material spoof competition |
| ASVspoof 2019/2021/2024 | Voice | ~600K utterances | Replay + TTS + VC attacks |
| WildDeepfake | Face video | 7K clips | In-the-wild deepfake detection |
Challenges¶
- Unseen attack types — A PAD trained on silicone fingers may fail against wood glue or 3D-printed spoofs. Zero-shot and one-class methods are critical.
- Cross-dataset generalization — Domain shift between training and deployment environments (different cameras, lighting, populations).
- Deepfakes — Rapid advances in generative models (diffusion-based face swap, neural voice cloning) outpace detection.
- Computational cost — PAD must run in real-time alongside the biometric matcher, often on resource-constrained devices.
- User experience — Active liveness (blink, turn head, say a phrase) adds friction; passive liveness preferred but harder.
- Adversarial attacks on PAD — Adversarial perturbations can fool PAD classifiers into accepting spoofs.
State of the Art (SOTA)¶
As of early 2026: - Face PAD (within-dataset): ACER < 1% on OULU-NPU Protocol 1; ACER < 5% on hardest protocol (unseen attack + environment). - Face PAD (cross-dataset): HTER 5–10% (FLIP, CDCN with domain generalization). - Fingerprint PAD (LivDet 2023): Average classification error ~1.5% on known materials; ~5% on unknown. - Iris PAD (LivDet-Iris 2023): APCER < 1% for textured contact lenses. - Voice PAD (ASVspoof 2024): EER ~2% for best systems on LA track; replay detection EER < 1%. - Deepfake detection: AUC ~95% within-dataset; ~80% cross-dataset. Active research area.
Open Questions¶
- Can a unified PAD model work across all modalities?
- Will generative AI (diffusion models, neural codecs) permanently outpace detection, or will the arms race stabilize?
- How to certify PAD systems for regulatory compliance (EU AI Act, ISO 30107-3)?
- Can physiological signals (pulse, blood oxygenation) be reliably extracted from consumer-grade sensors for passive liveness?
References¶
- Yu, Z. et al. (2020). Searching Central Difference Convolutional Networks for Face Anti-Spoofing. CVPR.
- Cai, R. et al. (2023). FLIP: Cross-domain Face Anti-Spoofing with Language Guidance. ICCV.
- Todisco, M. et al. (2019). ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. Interspeech.
- Yambay, D. et al. (2023). LivDet 2023 — Fingerprint Liveness Detection Competition. IJCB.
- ISO/IEC 30107-3:2023. Biometric Presentation Attack Detection.
Backlinks: Facial Recognition Systems, Iris Recognition, Fingerprint Recognition, Voice Biometrics, Palm Recognition, Multimodal Biometrics, Deep Learning Architectures for Biometrics, Biometric Datasets and Benchmarks