Skip to content

Anti-Spoofing Techniques

One-line summary: Presentation attack detection (PAD) methods that distinguish genuine biometric samples from spoofs — prints, replays, masks, synthetic audio, silicone fingers, and deepfakes — across all modalities.

Modality: Cross-modal
Related concepts: Facial Recognition Systems, Iris Recognition, Fingerprint Recognition, Voice Biometrics, Palm Recognition, Deep Learning Architectures for Biometrics, Biometric Datasets and Benchmarks
Last updated: 2026-04-04


Overview

A biometric system is only as secure as its ability to reject spoofs (presentation attacks). The ISO/IEC 30107 standard defines:

  • Presentation Attack (PA) — Presenting a biometric artifact to the sensor with intent to interfere with system operation.
  • Presentation Attack Instrument (PAI) — The artifact used (printed photo, silicone finger, replay video, 3D mask, synthesized audio).
  • Presentation Attack Detection (PAD) — The subsystem that detects PAs.

Taxonomy by Modality

Modality Common Attack Types Common Defenses
Face Print attack, screen replay, 3D silicone/resin mask, deepfake video Texture analysis, depth estimation, liveness cues (blink, head motion), rPPG (remote photoplethysmography)
Iris Printed iris, cosmetic contact lens, prosthetic eye, screen replay Texture/spectral analysis, pupil dynamics, 3D iris structure
Fingerprint Silicone/gelatin/Play-Doh mold, printed fingerprint, 3D-printed ridge pattern Liveness detection (pulse, sweat, temperature), material classification, distortion analysis
Voice Replay, text-to-speech, voice conversion, deepfake audio Spectral artifact detection, channel analysis, challenge-response (random phrase)
Palm Printed palm image, synthetic vein overlay NIR vein liveness (blood flow), depth sensing, texture analysis

Technical Details

Face Anti-Spoofing (FAS)

Hardware-based: - Structured-light depth (Apple Face ID) — 30K dot projector maps 3D face; rejects flat prints/screens. - Time-of-flight (ToF) depth — Similar depth-based rejection. - Multi-spectral imaging — NIR + VIS; skin reflectance differs from paper/screen.

Software-based (single RGB camera): | Method | Year | Approach | |---|---|---| | LBP / texture analysis | 2011+ | Local Binary Patterns detect micro-texture differences between real skin and print/screen | | De Moiré pattern detection | 2015+ | Detect moiré artifacts from screen replay | | Depth from RGB (PRNet, 3DDFA) | 2018+ | Estimate 3D face shape; flat → spoof | | rPPG (remote PPG) | 2016+ | Detect subtle pulse-induced color changes in facial skin; absent in spoofs | | CDCN (Yu et al.) | 2020 | Central Difference Convolution for fine-grained texture + depth map supervision | | ViTranZFAS | 2022 | Vision Transformer for zero-shot cross-dataset FAS | | FLIP (Cai et al.) | 2023 | Foundation model (CLIP) adapted for FAS; strong cross-dataset generalization | | One-Class FAS | 2024 | Train only on real faces; detect anomalies as spoofs; handles unseen attack types |

Key challenge: Cross-dataset generalization. Models trained on one dataset (CelebA-Spoof) often fail on another (OULU-NPU). Domain generalization and adaptation are active research areas.

Iris Anti-Spoofing

  • Textured contact lens detection — Analyze high-frequency patterns; deep classifiers (D-NetPAD) achieve ~99% accuracy on LivDet-Iris.
  • Printed iris — Detect print artifacts (dot patterns, paper texture, lack of specular reflection).
  • Pupil dynamics — Real pupils dilate/constrict in response to light; static images don't.
  • 3D structure — Real irises are concave; prints are flat.

Fingerprint Anti-Spoofing (Liveness Detection)

  • Hardware: Multispectral sensors capture subsurface features (sweat pores, dermal ridges). Pulse oximetry detects blood flow.
  • Software: CNN classifiers on fingerprint images (LivDet competition). Texture-based: pore analysis, ridge distortion patterns. Material-specific: silicone, gelatin, wood glue have different optical properties.
  • LivDet competition (2009–2023): Accuracy has improved from ~90% to ~99%+ on known materials; unknown materials still challenging.

Voice Anti-Spoofing

  • ASVspoof challenge series — Benchmark for replay, TTS, and voice conversion detection.
  • Features: Linear frequency cepstral coefficients (LFCCs), constant-Q transform (CQT), raw waveform.
  • Models: LCNN, RawNet2, AASIST (graph attention), Wav2Vec 2.0 fine-tuned.
  • Deepfake audio challenge: VALL-E, XTTS, and GPT-4o voice cloning create increasingly realistic spoofs; detection relies on subtle codec artifacts and prosody anomalies.

Datasets

Dataset Modality Size Notes
OULU-NPU Face 4.9K videos / 55 subjects 4 protocols testing generalization
CelebA-Spoof Face 625K images / 10K subjects Large-scale; rich annotations
SiW-Mv2 Face 785 videos / unknown attacks Cross-domain FAS benchmark
CASIA-FASD Face 600 videos / 50 subjects Early benchmark; print + replay
LivDet-Iris (2017–2023) Iris Varies Contact lens + print attacks
LivDet-Fingerprint (2009–2023) Fingerprint Varies by year Multi-material spoof competition
ASVspoof 2019/2021/2024 Voice ~600K utterances Replay + TTS + VC attacks
WildDeepfake Face video 7K clips In-the-wild deepfake detection

Challenges

  • Unseen attack types — A PAD trained on silicone fingers may fail against wood glue or 3D-printed spoofs. Zero-shot and one-class methods are critical.
  • Cross-dataset generalization — Domain shift between training and deployment environments (different cameras, lighting, populations).
  • Deepfakes — Rapid advances in generative models (diffusion-based face swap, neural voice cloning) outpace detection.
  • Computational cost — PAD must run in real-time alongside the biometric matcher, often on resource-constrained devices.
  • User experience — Active liveness (blink, turn head, say a phrase) adds friction; passive liveness preferred but harder.
  • Adversarial attacks on PAD — Adversarial perturbations can fool PAD classifiers into accepting spoofs.

State of the Art (SOTA)

As of early 2026: - Face PAD (within-dataset): ACER < 1% on OULU-NPU Protocol 1; ACER < 5% on hardest protocol (unseen attack + environment). - Face PAD (cross-dataset): HTER 5–10% (FLIP, CDCN with domain generalization). - Fingerprint PAD (LivDet 2023): Average classification error ~1.5% on known materials; ~5% on unknown. - Iris PAD (LivDet-Iris 2023): APCER < 1% for textured contact lenses. - Voice PAD (ASVspoof 2024): EER ~2% for best systems on LA track; replay detection EER < 1%. - Deepfake detection: AUC ~95% within-dataset; ~80% cross-dataset. Active research area.

Open Questions

  • Can a unified PAD model work across all modalities?
  • Will generative AI (diffusion models, neural codecs) permanently outpace detection, or will the arms race stabilize?
  • How to certify PAD systems for regulatory compliance (EU AI Act, ISO 30107-3)?
  • Can physiological signals (pulse, blood oxygenation) be reliably extracted from consumer-grade sensors for passive liveness?

References

  • Yu, Z. et al. (2020). Searching Central Difference Convolutional Networks for Face Anti-Spoofing. CVPR.
  • Cai, R. et al. (2023). FLIP: Cross-domain Face Anti-Spoofing with Language Guidance. ICCV.
  • Todisco, M. et al. (2019). ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. Interspeech.
  • Yambay, D. et al. (2023). LivDet 2023 — Fingerprint Liveness Detection Competition. IJCB.
  • ISO/IEC 30107-3:2023. Biometric Presentation Attack Detection.

Backlinks: Facial Recognition Systems, Iris Recognition, Fingerprint Recognition, Voice Biometrics, Palm Recognition, Multimodal Biometrics, Deep Learning Architectures for Biometrics, Biometric Datasets and Benchmarks