Face Recognition Attendance System

Problem

Automate classroom attendance. One photo, 10–15 students recognized in real time, attendance logged with timestamps.

Hard constraints:

Only 2–3 enrollment images per student
No model retraining at deployment
Low false acceptance rate (wrong person marked = unacceptable)

Architecture Decision

Closed-set classifiers (softmax over student IDs) were rejected — they require full retraining per new student. Chose open-set metric learning instead.

Stage	Component
Face Detection	InsightFace (handles alignment internally)
Feature Extraction	ArcFace — 512-D L2-normalized embedding
Similarity	Cosine similarity
Decision	Empirically calibrated threshold
Storage	Local serialized embeddings (float32)
Output	Timestamped CSV attendance log

No training code runs in production. The system is entirely inference + similarity search.

Failures & Fixes

Failure	Root Cause	Fix
Wrong person recognized with high confidence	Embeddings not L2-normalized before comparison	Explicit normalization after extraction
Recognition quality degraded silently over time	Embeddings stored as Python lists, precision lost on reload	Enforced `float32`, validated shape on load
Same student marked multiple times per session	No temporal deduplication logic	Per-day in-memory cache — one entry per identity per date
Worked in tests, failed in real classroom	Test images too clean, no pose/lighting variation	Multi-angle enrollment, averaged prototype embeddings

Threshold Calibration

Initial threshold of 0.6 (intuition-based) caused silent false positives — different people matched with high confidence.

Calibration approach:

Compute similarity distributions for same-identity and different-identity pairs
Find the boundary where distributions separate
Bias toward lower FAR (false acceptance), even at cost of higher FRR (false rejection)

Same-identity avg similarity:      0.75
Different-identity avg similarity:  0.25
Final threshold chosen:             0.40

This isn't calibration by metrics — it's a deliberate business decision. A wrong match in attendance is worse than a missed one.

Lessons

Embedding-based systems fail silently without similarity distribution validation
Most bugs were representation bugs, not model bugs
Threshold selection is a business constraint, not a purely mathematical one
Production ML is about containing failure modes — not maximizing accuracy numbers

Have a Better Approach?

Open-set face recognition has many valid approaches. If you know a better calibration method, a more robust embedding model for low-resource settings, or have thoughts on liveness detection, I'd love to hear it.

Get in touch →