AI Attendance Manager - Techwiz6 Project [OpenCV + MTCNN +FAISS + Web Development]
Forget Shouting "Present!"—Our System Clocks You In Before You Even Find Your Seat.
Let's be real: facial recognition has officially left the realm of sci-fi. It's in your phone, it's at the airport, and now—if you're brave enough—it's staring back at you from the classroom door.
The best part? You don't need to be a Gandalf-level coding wizard to make it work. With some open-source magic, a halfway-decent webcam, and the sheer, caffeine-fueled determination of a student on a deadline, you can absolutely build this yourself.
"Woah" at what we built.
"Wow" at what we learned.
And maybe even "Where has this been all my semester?!" about automated attendance.
1. Getting the Basics Straight: It's All About the Data
You can't build a face-recognition system without faces, just like you can't win a hackathon without a solid supply of pizza and code. It all starts with the data.
1.1 The Dataset: How to Collect 1,400 Reasons to Smile
Our mission: build an attendance manager for a real classroom. That meant we needed a dataset of real students, not just a bunch of celebrity photos from the internet.
- The "I-just-woke-up" angle? Check.
- The "is-that-a-double-chin?" tilt? You bet.
- The "squinting-at-the-whiteboard" expression? Absolutely.
Plus glasses on, glasses off, good lighting, weird shadows... we covered it all.
Why this obsession? Because we wanted our model to recognize a student whether they're fresh-faced at 8 a.m. or dead-eyed during finals week. A robust dataset is your first and best line of defense against real-world chaos.
Let's be real: collecting data isn't technically complex, but it is a special kind of exhausting. It's the digital equivalent of herding cats—exceptionally selfie-conscious cats. Unless you have the backing of a giant corporation or a university lab, you're basically going door-to-door asking for people's most precious commodity: their digital likeness.
Pro-Tip for the Solo Builder: If you're flying solo, your new best friends are the massive public datasets on places like Kaggle. It's like a buffet of faces, and everyone's already given their permission to dig in.
For Techwiz, however, we went the custom route. Our process was a two-part ballet of modern tech:
- The Paparazzi Phase: We used a phone camera to record short video clips of each student slowly rotating, nodding, and tilting their head. Think of it as a minimalist dance routine, set to the gentle hum of a computer fan.
- The Slicer-Dicer 3000: We then wrote a Python script to automatically chop those videos into hundreds of individual frames. This saved us from the mind-numbing hell of manually saving 1,400+ images. Automate the boring stuff!
Why This Angle-Filled Madness?
- Better Generalization: By feeding our model every possible angle, we force it to learn the essence of a face, not just one perfect mugshot. This prevents it from throwing a fit (a.k.a. overfitting) when someone dares to sit slightly off-center.
- Stronger Resilience: A student might be slouching, looking at the board, or whispering to a friend. Our system, trained on a circus of angles, just shrugs and says, "Yep, I know you."
1.2. Face Detection: Where We Play 'Where's Waldo?' with Pixels
With our massive pile of images ready, it was time for step two: finding the faces in them. This is where Face Detection comes in—it's the bouncer at the club, identifying who gets in (the face) and what gets left out (the noisy background).
For Techwiz, we didn't need to reinvent the wheel. We leaned on the giants of open-source: the classic Haar Cascades and the more modern, mighty MTCNN. After a classic showdown, the winner was clear: MTCNN took the crown 🏆
- Haar Cascades: The reliable old veteran. Fast, but a bit clumsy. It often missed profiles or got confused by non-ideal lighting.
- MTCNN (Multi-Task Cascaded Convolutional Neural Network): The new sharpshooter. It's not just detecting a face; it's pinpointing facial landmarks (eyes, nose, mouth) with scary accuracy, even on dramatic side profiles.
🔥 Critical Pro-Tip Alert! MTCNN is a bit of a diva—it expects images in RGB format. If you forget to convert your images from BGR (a common OpenCV default), it will fail spectacularly, and you'll spend an hour wondering why it can't find a face that's staring right at you.
Once MTCNN points a digital finger and says "FACE HERE," we do the obvious: we crop it. For our project, we standardized all detected faces to a clean 160x160 pixel square. This gives our next step a consistent, tidy image to work with. No mess, no fuss.
1.3 Face Embedding – The Magic of Turning Your Face into 512 Numbers
Alright, we've found the faces and cropped them nice and neat. But a picture is just a picture to a computer. To make it truly understand, we need to speak its language: numbers. This next step is where the real magic happens. We're not storing images anymore; we're distilling the very essence of a face into a mathematical recipe. We call this magical process Face Embedding.
Think of it like this: we're running every face through a secret machine that spits out a unique 512-dimensional vector—a fancy term for a list of 512 numbers. This list isn't random; it's a fingerprint. It captures the geometric DNA of your face: the distance between your eyes, the shape of your jawline, the arch of your brows.
The goal? To make sure that vector for Nguyen Van Tuan is mathematically worlds apart from the vector for Nguyen Van Tien, even if their names are confusingly similar!
How We Did It: The Sorcerer's Tool
- We didn't train this magic machine from scratch (that would require a wizard's hat and a lifetime). Instead, we used a pre-trained Deep Learning model called FaceNet. This model is a genius; it's already seen millions of faces and learned what makes each one unique.
- We fed our 160x160 cropped face images into FaceNet.
- For each image, it gave us back that magical list of 512 numbers.
- Boom! A face is no longer a picture. It's now a point in a wild, 512-dimensional galaxy.
This is the ultimate data glow-up. We traded bulky image files for sleek, lightweight vectors. This makes the next steps—comparing and matching faces—incredibly fast and efficient. Instead of comparing millions of pixels, we just calculate the distance between two points in space. It's smart, scalable, and seriously cool.
We used Facenet from facenet-pytorch, feeding each cropped face tensor (normalized to [-1,1]) in batches of 32. In return, we got a 512-dimensional embedding for every student—unique enough that even twin-like faces wouldn't fool the system.
Each embedding got mapped to a student ID, like:
"Nguyen_Van_Tuan" → "SV_0007"
Pro tip: skip mapping if you want to display actual names instead of IDs.
All embeddings were packed neatly into a compressed .npz file. Compact, tidy, and ready to feed into our model—think of it as prepped ingredients for a gourmet AI dish.
If you want to dive deeper into the math-magic, check out the original Facenet paper: Facenet: A Unified Embedding for Face Recognition and Clustering. Or if you're eager to dive into the practical side of face recognition, the FaceNet repository by David Sandberg is an excellent starting point. This TensorFlow-based implementation offers a comprehensive suite of tools for face detection, alignment, and embedding generation.
1.4 Building the Model – From Numbers to Names
We had our hands on a whole set of embeddings—those 512-number vectors representing each face. Now, time to split the data: 80% for training, 20% for testing. The mission? Turning a pile of meaningless numbers into accurate names. Tough—but fun!
Our model had to follow three unbreakable golden rules:
- Absolute accuracy: Wrong? You're fired! No one wants to be misidentified.
- Scalability: 14 students today, maybe 1,000 tomorrow. Gotta be ready!
- Blazing speed: Mass attendance can't be slower than a snail. Otherwise, we might as well stick to roll call on paper.
After some serious tinkering, we decided to pit four strong candidates against each other to pick the best fit:
- Cosine Similarity: Simple and speedy. Works fine for small classes, but scales poorly.
- SVM (Support Vector Machine): Accurate for small classes, but retraining is required for new students.
- HNSW (Hierarchical Navigable Small World): Lightning-fast, scalable, but can be slightly inaccurate with tricky angles or shadows.
- FAISS (Facebook AI Similarity Search): Super fast, accurate, scalable—our final hero!
1.4.1 Cosine Similarity – The "Hello World" of Face Recognition
Take a new face's embedding and compare it to every single one in our database using cosine similarity. Closest match wins.
Lab Result: 100% accuracy on 14 students. Live Camera: dropped to ~85–92% in real-time. Brute-force method doesn't scale.
1.4.2 SVM – The Smart Classifier
SVM with linear kernel separates students in embedding space. Accurate for small classes but retraining is a nightmare when new students join.
1.4.3 HNSW – The Graph Ninja
Graph-based nearest neighbor search. Fast, scalable, minor accuracy drops in awkward conditions.
1.4.4 FAISS – The Superhero
FAISS builds a hyper-optimized index for thousands of embeddings. Real-time performance is magical, accurate, and scalable. Winner of our smart attendance system!
If you've made it this far, congrats—you survived the model-building thunderdome! Next up: real-time deployment, spotting faces live, handling tricksters, and smooth attendance tracking. The adventure continues!
2. Application Deployment – Real-Time Face Recognition in Action
So, our model is trained, tested, and now it's time for the big stage: real-time face recognition! 🎥
We hooked it up to a live camera feed, letting our system scan faces continuously, like a hawk with a PhD in spotting students.
Step 1: Face spotting
MTCNN jumps in first, detecting multiple faces per frame—even if someone's tilting their head like a confused giraffe. Only faces with a high detection score (say >0.9) get through. The rest? Rejected. Sorry, photobombers!
Step 2: Anti-spoofing ninja mode 🥷
No cheating here. Each detected face is checked by Silent Face Anti-Spoofing (SFAS) to see if it's real—or just a sneaky mask, photo, or video trying to sneak in. Spoofs get booted instantly. Real faces? Welcome aboard.
Step 3: Prepping the face for the spotlight
Faces are cropped to 160x160 pixels and normalized, ready for their cameo in our InceptionResnetV1 (VGGFace2) model. Out pops a vector embedding, the mathematical fingerprint that makes each student unmistakable.
Step 4: Who's that student? 🤔
We consult our FAISS index—the superhero librarian of embeddings—to find the closest match using cosine similarity. Too far from anyone? That's "Unknown." Otherwise, we proudly tag them with their name.
Step 5: Stability matters
To avoid "flicker fame" (labels flipping every other frame), we keep a queue of recent predictions and use majority voting. When a face shows up consistently for, say, 5 frames, the identity is locked in.
Step 6: Showtime
Finally, we draw a neat rectangle around each student's face on the video, slap the name on top, and voilà—attendance done in style, faster than a teacher can say "Present!"
Haha, you guys are moving at lightning speed—we've already wrapped up Level 2! Now, let's dive into the next stage.
3. Web Application Deployment – Making Attendance a Breeze
For the Techwiz project, we've split this chapter into four key parts, showing exactly what we did, how we did it, and a peek behind the curtain of our face-recognition adventure. Our face-recognition magic isn't just behind the scenes—it's got a fancy frontend, a busy backend, and a solid database, all working together like a well-rehearsed orchestra.
3.1 Frontend – The Teacher's Playground
We built a dynamic web app with Next.js, styled with TailwindCSS and powered by shadcn/ui components. Teachers can:
- Manage classes
- View student lists
- Start attendance sessions
Auth is handled by Clerk, keeping logins and sessions secure. During attendance, the frontend becomes a real-time display: showing live video, drawing boxes around students' faces, and labeling them—all without breaking a sweat.
3.2 Backend – The Brain
Our FastAPI backend does the heavy lifting:
- Handles all API requests (new classes, adding students…)
- Manages authentication tokens
- Processes the real-time video stream from the frontend
Basically, the backend sees everything, thinks fast, and whispers back the results in real-time so the frontend can display them instantly.
3.3 Database – The Memory Vault
We used PostgreSQL powered by Neon. Everything—users, classes, students, and attendance records—is stored safely and efficiently. Backend talks to it via psycopg and SQLAlchemy, making sure no data is lost. Historical attendance? Always there when teachers want to check.
3.4 Workflow – From Start to Finish
Here's how a session actually flows:
- Start: Teacher logs in, sets a session end time.
- Connect: Frontend fires up a WebRTC channel to the backend, using IP Webcam as the video source.
- Stream: Video frames flow non-stop to the backend.
- Process: Backend runs its magic, identifying faces and preparing overlays.
- Return: Results fly back to the frontend, showing names and boxes in real-time.
- Finish: Session ends, video stops, attendance is finalized, and the record is saved for posterity.
In a Nutshell: The Magic Behind the Curtain
So, how does it all come together? It's simple, really:
- The Frontend is the friendly face—it shows the video feed and the final result.
- The Backend is the brilliant brain—it does the heavy lifting of processing and recognition.
- The Database is the infallible memory—it remembers every student's unique digital fingerprint.
And the best part? Students get counted without a single "Present!" ever being shouted. Mission accomplished.
Hahaha, you guys rock! So, from the first line of code to the final live demo—we did it!
This project was a wild ride from theory to reality, and we're incredibly proud of what we've built. But don't just take our word for it—see it in action for yourself!
Want to try it out or geek out over the code? The entire project—from the data collection scripts to the FAISS-powered inference engine—is open-source and waiting for you on GitHub. Fork it, break it, improve it, and make it your own!
Got questions or want to collaborate? Drop us an issue on GitHub or
Comments
Post a Comment