Face Recognition Attendance System
Engineering a production-grade face recognition pipeline using InsightFace embeddings and cosine similarity — no model training at deployment time.

Problem
Automate classroom attendance. One photo, 10–15 students recognized in real time, attendance logged with timestamps.
Hard constraints:
- Only 2–3 enrollment images per student
- No model retraining at deployment
- Low false acceptance rate (wrong person marked = unacceptable)
Architecture Decision
Closed-set classifiers (softmax over student IDs) were rejected — they require full retraining per new student. Chose open-set metric learning instead.
| Stage | Component |
|---|---|
| Face Detection | InsightFace (handles alignment internally) |
| Feature Extraction | ArcFace — 512-D L2-normalized embedding |
| Similarity | Cosine similarity |
| Decision | Empirically calibrated threshold |
| Storage | Local serialized embeddings (float32) |
| Output | Timestamped CSV attendance log |
No training code runs in production. The system is entirely inference + similarity search.
Failures & Fixes
| Failure | Root Cause | Fix |
|---|---|---|
| Wrong person recognized with high confidence | Embeddings not L2-normalized before comparison | Explicit normalization after extraction |
| Recognition quality degraded silently over time | Embeddings stored as Python lists, precision lost on reload | Enforced float32, validated shape on load |
| Same student marked multiple times per session | No temporal deduplication logic | Per-day in-memory cache — one entry per identity per date |
| Worked in tests, failed in real classroom | Test images too clean, no pose/lighting variation | Multi-angle enrollment, averaged prototype embeddings |
Threshold Calibration
Initial threshold of 0.6 (intuition-based) caused silent false positives — different people matched with high confidence.
Calibration approach:
- Compute similarity distributions for same-identity and different-identity pairs
- Find the boundary where distributions separate
- Bias toward lower FAR (false acceptance), even at cost of higher FRR (false rejection)
Same-identity avg similarity: 0.75
Different-identity avg similarity: 0.25
Final threshold chosen: 0.40
This isn't calibration by metrics — it's a deliberate business decision. A wrong match in attendance is worse than a missed one.
Lessons
- Embedding-based systems fail silently without similarity distribution validation
- Most bugs were representation bugs, not model bugs
- Threshold selection is a business constraint, not a purely mathematical one
- Production ML is about containing failure modes — not maximizing accuracy numbers
Have a Better Approach?
Open-set face recognition has many valid approaches. If you know a better calibration method, a more robust embedding model for low-resource settings, or have thoughts on liveness detection, I'd love to hear it.