From Camera Feed to Confidence: Building a Production-Ready Face Recognition Attendance System

Abstract

This post documents the end-to-end engineering journey of building a face detection and recognition system for automated attendance, designed for real-world deployment in low-resource educational settings. Unlike polished tutorials, this write-up focuses on the actual engineering work: model selection trade-offs, data representation mistakes, embedding storage failures, threshold calibration issues, false-positive analysis, and production constraints.

The final system uses a pre-trained InsightFace embedding model with vector similarity search to identify individuals in real time, generate attendance logs with timestamps, and prevent duplicate entries per day. This post is written as a research case study and engineering postmortem, intended for ML engineers evaluating applied problem-solving ability rather than theoretical novelty.

1. Problem Context and Motivation

Project Type

Applied computer vision system for automated attendance using face recognition, targeting classroom environments where a teacher captures photos of 10-15 students at once.

Real-World Motivation

Manual attendance systems are:

Time-consuming (5–15 minutes per class)
Prone to proxy attendance
Difficult to audit or analyze historically

The goal was not to invent a new face recognition model, but to engineer a reliable inference pipeline that works under real constraints:

Limited data per identity
Inconsistent lighting and camera quality
No retraining allowed during deployment
Strong requirement for low false positives

2. System Overview

At a high level, the system consists of two distinct phases:

Enrollment Phase

Camera -> Face Detection (Multi-face) -> Embedding Generation -> Embedding Store

Inference Phase

Camera -> Face Detection (Multi-face) -> Embedding Generation
             |
      Similarity Search
             |
      Identity Decision
             |
Attendance Logging (timestamped)

Key design decision: No model training in production. The system relies entirely on pre-trained embeddings and similarity metrics.

3. Baseline Approaches Considered (and Rejected)

Before settling on the final architecture, several baselines were evaluated.

3.1 Baseline 1: Haar Cascades + Raw Pixel Comparison

❌ Extremely sensitive to lighting and pose
❌ No meaningful similarity metric
❌ Completely unsuitable beyond demos

3.2 Baseline 2: LBPH / Eigenfaces

✅ Simple to implement
❌ Poor generalization
❌ Required retraining when new identities were added
❌ High false acceptance rate in practice

3.3 Baseline 3: CNN Classifier (Softmax over identities)

❌ Required full retraining for each new student
❌ Not scalable for dynamic enrollment
❌ Overkill for inference-only constraints

Conclusion: Any closed-set classifier was fundamentally misaligned with the problem.

4. Final Model Architecture

Chosen Architecture

InsightFace (ArcFace-based embedding model)

Output: 512-dimensional normalized embedding
Training: Pre-trained on large-scale face datasets
Usage: Inference-only

Why InsightFace?

Strong intra-class compactness
Large inter-class margins
Designed for open-set recognition
Proven robustness in unconstrained settings

Final Pipeline

Stage	Component	Notes
Face Detection	InsightFace detector	Handles alignment internally
Feature Extraction	InsightFace embedding model	512-D vector
Similarity	Cosine similarity	Normalized embeddings
Decision	Threshold-based	Calibrated empirically
Storage	Local serialized embeddings	JSON / pickle
Output	CSV attendance log	Timestamped

5. Metrics and Why They Matter

This project did not optimize for accuracy in the classification sense. Instead, the critical metrics were:

Key Metrics

False Acceptance Rate (FAR) A wrong person being accepted as someone else → This is catastrophic in attendance systems.

False Rejection Rate (FRR) A valid person being rejected → Annoying but acceptable within limits.

Cosine Similarity Distributions

Same-identity similarity distribution
Different-identity similarity distribution

Observed Distributions (Example)

Scenario	Avg Similarity
Same person	~0.50–0.90
Different persons	~0.10–0.35

Key Insight: The overlap between these distributions is where real engineering decisions happen.

6. Threshold Calibration: A Non-Trivial Problem

Initial Mistake

Initially, a naive threshold (0.6) was chosen based on intuition. This led to:

Different people being verified as the same person
Silent failures that looked “correct” in logs

Fix

We implemented a calibration script:

Compute similarities for same-person pairs
Compute similarities for different-person pairs
Choose a threshold that maximizes separation

Example output:

Same-identity avg similarity: 0.75
Different-identity avg similarity: 0.25
Suggested threshold: 0.40

Final Decision

We intentionally biased towards lower FAR, even at the cost of slightly higher FRR.

7. Failure Modes and Debugging Narrative

This section reflects actual engineering failures encountered.

Failure 1: Embeddings Were “Correct” but Recognition Was Wrong

Symptom: Different people were being recognized as the same identity with high confidence.

Root Cause:

Embeddings were not normalized before similarity comparison.
Cosine similarity without normalization is meaningless.

Fix: Explicit L2 normalization after embedding extraction.

Failure 2: Embedding Storage Corruption

Symptom: System ran without errors, but recognition quality degraded over time.

Root Cause:

Embeddings were stored as Python lists without type consistency.
On reload, precision loss occurred.

Fix:

Enforced float32 arrays
Validated embedding shapes on load
Rejected malformed entries

Failure 3: Duplicate Attendance Entries

Symptom: Same person marked multiple times during a single session.

Root Cause:

No state tracking for “already marked today”
Frame-by-frame inference without temporal logic

Fix:

Introduced per-day in-memory cache
Attendance logged only once per identity per date

Failure 4: System Looked “Perfect” in Testing, Failed in Practice

Symptom: High confidence during testing, unexpected matches in real usage.

Root Cause:

Test images were too clean
No variation in pose, distance, lighting

Fix:

Enrollment with multiple images per identity
Averaged prototype embeddings
Enforced minimum face size during detection

8. Attendance Logging Design

Design Constraints

Append-only
Human-readable
Easy to export
No database dependency (initially)

Final Format (CSV)

ID	Date	Time
101	2026-01-16	09:12:03

Guarantee

One record per identity per day
Deterministic behavior across runs

9. What This System Is — and Is Not

What It Is

A production-grade inference pipeline
A practical application of metric learning
A system designed around failure containment

What It Is Not

A novel face recognition model
A training-heavy deep learning project
A demo optimized only for visuals

10. Lessons Learned

Embedding-based systems fail silently if not calibrated properly
Most bugs were data and representation bugs, not model bugs
Threshold selection is a business decision, not a mathematical one
Engineering discipline matters more than model complexity
Production ML is about controlling failure modes, not maximizing metrics

11. Future Work

Integrate robust liveness detection (anti-spoofing)
Move embedding storage to a vector database
Introduce audit logs for rejected matches
Edge-device optimization

Closing Thoughts

This project reinforced a core belief:

Real machine learning work begins after the model is chosen.

The hardest problems were not solved with more data or bigger models, but with careful reasoning about representations, metrics, and failure modes. This system is intentionally simple, intentionally constrained, and intentionally honest about its trade-offs — exactly what production ML demands.

If you’re reviewing this as a recruiter or engineer, this project represents how I approach applied ML: rigorously, skeptically, and with respect for reality.

References

InsightFace: Documentation
GitHub Repository: github

Connect

LinkedIn: LinkedIn
Email: devp1866@gmail.com

Author: Devkumar Patel
Domain: Face Recognition · Computer Vision · Applied Machine Learning