Back to Blog
Machine LearningComputer Vision

From Camera Feed to Confidence: Building a Production-Ready Face Recognition Attendance System

A research-style engineering case study on designing, debugging, and deploying a face detection and recognition system using InsightFace embeddings, vector similarity search, and real-world constraints.

September 15, 2025
7 min read
Devkumar Patel
Face RecognitionInsightFaceApplied MLEngineering Case Study
From Camera Feed to Confidence: Building a Production-Ready Face Recognition Attendance System

Abstract

This post documents the end-to-end engineering journey of building a face detection and recognition system for automated attendance, designed for real-world deployment in low-resource educational settings. Unlike polished tutorials, this write-up focuses on the actual engineering work: model selection trade-offs, data representation mistakes, embedding storage failures, threshold calibration issues, false-positive analysis, and production constraints.

The final system uses a pre-trained InsightFace embedding model with vector similarity search to identify individuals in real time, generate attendance logs with timestamps, and prevent duplicate entries per day. This post is written as a research case study and engineering postmortem, intended for ML engineers evaluating applied problem-solving ability rather than theoretical novelty.


1. Problem Context and Motivation

Project Type

Applied computer vision system for automated attendance using face recognition, targeting classroom environments where a teacher captures photos of 10-15 students at once.

Real-World Motivation

Manual attendance systems are:

  • Time-consuming (5–15 minutes per class)
  • Prone to proxy attendance
  • Difficult to audit or analyze historically

The goal was not to invent a new face recognition model, but to engineer a reliable inference pipeline that works under real constraints:

  • Limited data per identity
  • Inconsistent lighting and camera quality
  • No retraining allowed during deployment
  • Strong requirement for low false positives

2. System Overview

At a high level, the system consists of two distinct phases:

Enrollment Phase

Camera -> Face Detection (Multi-face) -> Embedding Generation -> Embedding Store

Inference Phase

Camera -> Face Detection (Multi-face) -> Embedding Generation
             |
      Similarity Search
             |
      Identity Decision
             |
Attendance Logging (timestamped)

Key design decision: No model training in production. The system relies entirely on pre-trained embeddings and similarity metrics.


3. Baseline Approaches Considered (and Rejected)

Before settling on the final architecture, several baselines were evaluated.

3.1 Baseline 1: Haar Cascades + Raw Pixel Comparison

  • ❌ Extremely sensitive to lighting and pose
  • ❌ No meaningful similarity metric
  • ❌ Completely unsuitable beyond demos

3.2 Baseline 2: LBPH / Eigenfaces

  • ✅ Simple to implement
  • ❌ Poor generalization
  • ❌ Required retraining when new identities were added
  • ❌ High false acceptance rate in practice

3.3 Baseline 3: CNN Classifier (Softmax over identities)

  • ❌ Required full retraining for each new student
  • ❌ Not scalable for dynamic enrollment
  • ❌ Overkill for inference-only constraints

Conclusion: Any closed-set classifier was fundamentally misaligned with the problem.


4. Final Model Architecture

Chosen Architecture

InsightFace (ArcFace-based embedding model)

  • Output: 512-dimensional normalized embedding
  • Training: Pre-trained on large-scale face datasets
  • Usage: Inference-only

Why InsightFace?

  • Strong intra-class compactness
  • Large inter-class margins
  • Designed for open-set recognition
  • Proven robustness in unconstrained settings

Final Pipeline

StageComponentNotes
Face DetectionInsightFace detectorHandles alignment internally
Feature ExtractionInsightFace embedding model512-D vector
SimilarityCosine similarityNormalized embeddings
DecisionThreshold-basedCalibrated empirically
StorageLocal serialized embeddingsJSON / pickle
OutputCSV attendance logTimestamped

5. Metrics and Why They Matter

This project did not optimize for accuracy in the classification sense. Instead, the critical metrics were:

Key Metrics

False Acceptance Rate (FAR) A wrong person being accepted as someone else → This is catastrophic in attendance systems.

False Rejection Rate (FRR) A valid person being rejected → Annoying but acceptable within limits.

Cosine Similarity Distributions

  • Same-identity similarity distribution
  • Different-identity similarity distribution

Observed Distributions (Example)

ScenarioAvg Similarity
Same person~0.50–0.90
Different persons~0.10–0.35

Key Insight: The overlap between these distributions is where real engineering decisions happen.


6. Threshold Calibration: A Non-Trivial Problem

Initial Mistake

Initially, a naive threshold (0.6) was chosen based on intuition. This led to:

  • Different people being verified as the same person
  • Silent failures that looked “correct” in logs

Fix

We implemented a calibration script:

  • Compute similarities for same-person pairs
  • Compute similarities for different-person pairs
  • Choose a threshold that maximizes separation

Example output:

Same-identity avg similarity: 0.75
Different-identity avg similarity: 0.25
Suggested threshold: 0.40

Final Decision

We intentionally biased towards lower FAR, even at the cost of slightly higher FRR.


7. Failure Modes and Debugging Narrative

This section reflects actual engineering failures encountered.

Failure 1: Embeddings Were “Correct” but Recognition Was Wrong

Symptom: Different people were being recognized as the same identity with high confidence.

Root Cause:

  • Embeddings were not normalized before similarity comparison.
  • Cosine similarity without normalization is meaningless.

Fix: Explicit L2 normalization after embedding extraction.

Failure 2: Embedding Storage Corruption

Symptom: System ran without errors, but recognition quality degraded over time.

Root Cause:

  • Embeddings were stored as Python lists without type consistency.
  • On reload, precision loss occurred.

Fix:

  • Enforced float32 arrays
  • Validated embedding shapes on load
  • Rejected malformed entries

Failure 3: Duplicate Attendance Entries

Symptom: Same person marked multiple times during a single session.

Root Cause:

  • No state tracking for “already marked today”
  • Frame-by-frame inference without temporal logic

Fix:

  • Introduced per-day in-memory cache
  • Attendance logged only once per identity per date

Failure 4: System Looked “Perfect” in Testing, Failed in Practice

Symptom: High confidence during testing, unexpected matches in real usage.

Root Cause:

  • Test images were too clean
  • No variation in pose, distance, lighting

Fix:

  • Enrollment with multiple images per identity
  • Averaged prototype embeddings
  • Enforced minimum face size during detection

8. Attendance Logging Design

Design Constraints

  • Append-only
  • Human-readable
  • Easy to export
  • No database dependency (initially)

Final Format (CSV)

IDDateTime
1012026-01-1609:12:03

Guarantee

  • One record per identity per day
  • Deterministic behavior across runs

9. What This System Is — and Is Not

What It Is

  • A production-grade inference pipeline
  • A practical application of metric learning
  • A system designed around failure containment

What It Is Not

  • A novel face recognition model
  • A training-heavy deep learning project
  • A demo optimized only for visuals

10. Lessons Learned

  1. Embedding-based systems fail silently if not calibrated properly
  2. Most bugs were data and representation bugs, not model bugs
  3. Threshold selection is a business decision, not a mathematical one
  4. Engineering discipline matters more than model complexity
  5. Production ML is about controlling failure modes, not maximizing metrics

11. Future Work

  • Integrate robust liveness detection (anti-spoofing)
  • Move embedding storage to a vector database
  • Introduce audit logs for rejected matches
  • Edge-device optimization

Closing Thoughts

This project reinforced a core belief:

Real machine learning work begins after the model is chosen.

The hardest problems were not solved with more data or bigger models, but with careful reasoning about representations, metrics, and failure modes. This system is intentionally simple, intentionally constrained, and intentionally honest about its trade-offs — exactly what production ML demands.

If you’re reviewing this as a recruiter or engineer, this project represents how I approach applied ML: rigorously, skeptically, and with respect for reality.


References

Connect


Author: Devkumar Patel
Domain: Face Recognition · Computer Vision · Applied Machine Learning