Anti-Spoofing Techniques¶

One-line summary: Presentation attack detection (PAD) methods that distinguish genuine biometric samples from spoofs — prints, replays, masks, synthetic audio, silicone fingers, and deepfakes — across all modalities.

Modality: Cross-modal
Related concepts: Facial Recognition Systems, Iris Recognition, Fingerprint Recognition, Voice Biometrics, Palm Recognition, Deep Learning Architectures for Biometrics, Biometric Datasets and Benchmarks
Last updated: 2026-04-04

Overview¶

A biometric system is only as secure as its ability to reject spoofs (presentation attacks). The ISO/IEC 30107 standard defines:

Presentation Attack (PA) — Presenting a biometric artifact to the sensor with intent to interfere with system operation.
Presentation Attack Instrument (PAI) — The artifact used (printed photo, silicone finger, replay video, 3D mask, synthesized audio).
Presentation Attack Detection (PAD) — The subsystem that detects PAs.

Taxonomy by Modality¶

Modality	Common Attack Types	Common Defenses
Face	Print attack, screen replay, 3D silicone/resin mask, deepfake video	Texture analysis, depth estimation, liveness cues (blink, head motion), rPPG (remote photoplethysmography)
Iris	Printed iris, cosmetic contact lens, prosthetic eye, screen replay	Texture/spectral analysis, pupil dynamics, 3D iris structure
Fingerprint	Silicone/gelatin/Play-Doh mold, printed fingerprint, 3D-printed ridge pattern	Liveness detection (pulse, sweat, temperature), material classification, distortion analysis
Voice	Replay, text-to-speech, voice conversion, deepfake audio	Spectral artifact detection, channel analysis, challenge-response (random phrase)
Palm	Printed palm image, synthetic vein overlay	NIR vein liveness (blood flow), depth sensing, texture analysis

Technical Details¶

Face Anti-Spoofing (FAS)¶

Hardware-based: - Structured-light depth (Apple Face ID) — 30K dot projector maps 3D face; rejects flat prints/screens. - Time-of-flight (ToF) depth — Similar depth-based rejection. - Multi-spectral imaging — NIR + VIS; skin reflectance differs from paper/screen.

Software-based (single RGB camera): | Method | Year | Approach | |---|---|---| | LBP / texture analysis | 2011+ | Local Binary Patterns detect micro-texture differences between real skin and print/screen | | De Moiré pattern detection | 2015+ | Detect moiré artifacts from screen replay | | Depth from RGB (PRNet, 3DDFA) | 2018+ | Estimate 3D face shape; flat → spoof | | rPPG (remote PPG) | 2016+ | Detect subtle pulse-induced color changes in facial skin; absent in spoofs | | CDCN (Yu et al.) | 2020 | Central Difference Convolution for fine-grained texture + depth map supervision | | ViTranZFAS | 2022 | Vision Transformer for zero-shot cross-dataset FAS | | FLIP (Cai et al.) | 2023 | Foundation model (CLIP) adapted for FAS; strong cross-dataset generalization | | One-Class FAS | 2024 | Train only on real faces; detect anomalies as spoofs; handles unseen attack types |

Key challenge: Cross-dataset generalization. Models trained on one dataset (CelebA-Spoof) often fail on another (OULU-NPU). Domain generalization and adaptation are active research areas.

Iris Anti-Spoofing¶

Textured contact lens detection — Analyze high-frequency patterns; deep classifiers (D-NetPAD) achieve ~99% accuracy on LivDet-Iris.
Printed iris — Detect print artifacts (dot patterns, paper texture, lack of specular reflection).
Pupil dynamics — Real pupils dilate/constrict in response to light; static images don't.
3D structure — Real irises are concave; prints are flat.

Fingerprint Anti-Spoofing (Liveness Detection)¶

Hardware: Multispectral sensors capture subsurface features (sweat pores, dermal ridges). Pulse oximetry detects blood flow.
Software: CNN classifiers on fingerprint images (LivDet competition). Texture-based: pore analysis, ridge distortion patterns. Material-specific: silicone, gelatin, wood glue have different optical properties.
LivDet competition (2009–2023): Accuracy has improved from ~90% to ~99%+ on known materials; unknown materials still challenging.

Voice Anti-Spoofing¶

ASVspoof challenge series — Benchmark for replay, TTS, and voice conversion detection.
Features: Linear frequency cepstral coefficients (LFCCs), constant-Q transform (CQT), raw waveform.
Models: LCNN, RawNet2, AASIST (graph attention), Wav2Vec 2.0 fine-tuned.
Deepfake audio challenge: VALL-E, XTTS, and GPT-4o voice cloning create increasingly realistic spoofs; detection relies on subtle codec artifacts and prosody anomalies.

Datasets¶

Dataset	Modality	Size	Notes
OULU-NPU	Face	4.9K videos / 55 subjects	4 protocols testing generalization
CelebA-Spoof	Face	625K images / 10K subjects	Large-scale; rich annotations
SiW-Mv2	Face	785 videos / unknown attacks	Cross-domain FAS benchmark
CASIA-FASD	Face	600 videos / 50 subjects	Early benchmark; print + replay
LivDet-Iris (2017–2023)	Iris	Varies	Contact lens + print attacks
LivDet-Fingerprint (2009–2023)	Fingerprint	Varies by year	Multi-material spoof competition
ASVspoof 2019/2021/2024	Voice	~600K utterances	Replay + TTS + VC attacks
WildDeepfake	Face video	7K clips	In-the-wild deepfake detection

Challenges¶

Unseen attack types — A PAD trained on silicone fingers may fail against wood glue or 3D-printed spoofs. Zero-shot and one-class methods are critical.
Cross-dataset generalization — Domain shift between training and deployment environments (different cameras, lighting, populations).
Deepfakes — Rapid advances in generative models (diffusion-based face swap, neural voice cloning) outpace detection.
Computational cost — PAD must run in real-time alongside the biometric matcher, often on resource-constrained devices.
User experience — Active liveness (blink, turn head, say a phrase) adds friction; passive liveness preferred but harder.
Adversarial attacks on PAD — Adversarial perturbations can fool PAD classifiers into accepting spoofs.

State of the Art (SOTA)¶

As of early 2026: - Face PAD (within-dataset): ACER < 1% on OULU-NPU Protocol 1; ACER < 5% on hardest protocol (unseen attack + environment). - Face PAD (cross-dataset): HTER 5–10% (FLIP, CDCN with domain generalization). - Fingerprint PAD (LivDet 2023): Average classification error ~1.5% on known materials; ~5% on unknown. - Iris PAD (LivDet-Iris 2023): APCER < 1% for textured contact lenses. - Voice PAD (ASVspoof 2024): EER ~2% for best systems on LA track; replay detection EER < 1%. - Deepfake detection: AUC ~95% within-dataset; ~80% cross-dataset. Active research area.

Open Questions¶

Can a unified PAD model work across all modalities?
Will generative AI (diffusion models, neural codecs) permanently outpace detection, or will the arms race stabilize?
How to certify PAD systems for regulatory compliance (EU AI Act, ISO 30107-3)?
Can physiological signals (pulse, blood oxygenation) be reliably extracted from consumer-grade sensors for passive liveness?

References¶

Yu, Z. et al. (2020). Searching Central Difference Convolutional Networks for Face Anti-Spoofing. CVPR.
Cai, R. et al. (2023). FLIP: Cross-domain Face Anti-Spoofing with Language Guidance. ICCV.
Todisco, M. et al. (2019). ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. Interspeech.
Yambay, D. et al. (2023). LivDet 2023 — Fingerprint Liveness Detection Competition. IJCB.
ISO/IEC 30107-3:2023. Biometric Presentation Attack Detection.

Backlinks: Facial Recognition Systems, Iris Recognition, Fingerprint Recognition, Voice Biometrics, Palm Recognition, Multimodal Biometrics, Deep Learning Architectures for Biometrics, Biometric Datasets and Benchmarks