Face Recognition Login in iVendNext POS uses two distinct stages to authenticate a user: a browser-based liveness check, followed by a server-side face comparison. Both must succeed before the user is granted access.


This article explains how each stage works and what happens between the moment a user clicks Login with Face ID and the moment they are redirected into the application.
Liveness detection solves a specific problem: a camera cannot inherently tell the difference between a live face and a photograph of a face. Without liveness checks, someone could hold a printed photo of an enrolled user in front of the camera and potentially authenticate as that person. The liveness system prevents this by requiring signals that only a live person can produce.
When a user clicks Login with Face ID, the browser requests access to the front-facing camera. If permission is granted, the camera feed appears in the face detection widget — on the left panel of the login page on desktop, or in a modal overlay on narrow screens. At the same time, the system loads the MediaPipe FaceLandmarker model from a CDN. This model processes the live camera feed in real time, detecting 478 facial landmark points at approximately 30 frames per second.
The MediaPipe model is loaded once per browser session and cached afterwards. Subsequent face login attempts within the same session do not reload the model.
The system requires the browser and device to have a functioning front-facing camera. Internet access is required to load the MediaPipe WebAssembly bundle and model file from their external CDNs the first time they are used in a session.
Rather than a simple binary pass/fail, the system accumulates a liveness score from 0 to 100. The score is composed of two parts:
Blink score — contributes up to 60 points.
Movement score — contributes up to 40 points.
Both must contribute to the total. A score of 100 reached entirely through blinks (without any movement) does not proceed to capture. A score of 100 reached entirely through head movement (without any blink) does not proceed to capture either. Both signals are required.

A blink is detected by measuring the Eye Aspect Ratio (EAR) — a numerical measure of how open or closed the eyes are, computed from facial landmark coordinates. When the EAR falls below 0.19 for between 2 and 14 consecutive frames and then rises again, the system registers a valid blink.
Two constraints prevent spoofing through blink simulation:
A minimum of 800 milliseconds must pass between two credited blinks. Rapid repeated closing and opening is not counted.
A closure lasting more than 14 frames is rejected as a blink. This prevents a static photograph being tilted toward the camera from registering as an "eye closure."
Scoring does not begin until the face has been consistently detected in at least 8 consecutive frames. This warmup period prevents false scores from brief accidental detections.
The system tracks the position of the nose-tip landmark across frames. A valid movement is counted when the nose-tip moves between 0.004 and 0.028 normalised coordinate units per frame. Movement below 0.004 is filtered as micro-jitter or camera noise. Movement above 0.028 is rejected as a sharp shake rather than a subtle movement.
A minimum of 700 milliseconds must pass between two credited movement events.
The liveness score is not permanent once accumulated. Both the blink score and the movement score decay over time if the user becomes inactive:
Blink score begins decaying at 10 points per second after a 2-second pause in blink activity.
Movement score begins decaying at 5 points per second after a 1-second pause in movement.
This prevents a scenario where a user briefly blinks once and then holds completely still for several minutes while the score sits at the threshold. To proceed to capture, the system requires both a blink and a movement to have occurred within the last 5 seconds at the moment the score reaches 100.
When the accumulated score reaches 100, the system performs one final anti-spoof check: it confirms that both a blink event and a movement event occurred within the most recent 5-second window. If either condition is not met — for example, because the score reached 100 very slowly and the last blink happened more than 5 seconds ago — the system does not proceed. The user must continue the natural blink-and-move pattern until both signals are fresh.
Once liveness is confirmed, the system does not capture immediately. It first waits for the user to settle into a pose that will produce a clean, usable face image for the server comparison.
The pose requirements are:
Eyes open, with an EAR of 0.26 or above.
Face centred in the frame, with the nose-tip within ±0.16 horizontal and ±0.20 vertical units of the frame centre.
Head roll of 10 degrees or less — the face must not be significantly tilted.
Nose movement of 0.004 units or less per frame — the user must be still.
All four conditions must hold simultaneously for 5 consecutive frames before the capture fires. The user is prompted to look straight at the camera with eyes open and hold still. The system shows "Hold still, capturing best frame…" once the pose is stable and the capture is about to occur.When the pose is confirmed, the system crops the face region from the video frame with 15% padding on each side, enforces a minimum crop size of 64×64 pixels, and encodes the crop as a JPEG image at maximum quality. The captured image is passed from the browser-side Vue application to the login page via a window bridge function — it is never written to disk or stored in the browser.
The login page converts the captured face image to a base64-encoded string and submits it, along with the username typed by the user, to the server API endpoint ivendnext_pos.api.auth.face_login.
On the server, the following steps occur:
The system retrieves the stored face encoding from the User Face Maps record for the given username. This encoding is the 128-dimension numerical vector generated during enrolment.
The submitted base64 image is decoded and converted to a numpy array using OpenCV.
The dlib face detector locates the face in the submitted image. If no face is found, the login fails with the message "No face detected in camera stream."
The dlib ResNet face recognition model computes a 128-dimension encoding for the submitted face image.
The system calculates the Euclidean distance between the live encoding and the stored enrolment encoding.
If the distance is 0.5 or less, the faces are considered a match. If the distance is greater than 0.5, the match fails.

When the Euclidean distance is 0.5 or less, the server creates a standard Frappe session for the user via the Frappe LoginManager. This session is identical in every respect to a session created by a password-based login — the same roles, the same permissions, the same session timeout, and the same audit log entries apply. The user is redirected to /app.

If the Euclidean distance exceeds 0.5, the server returns the message "Face match failed. Access denied." No session is created. The camera widget closes. The user can attempt the process again or use their password instead.
Several other conditions can interrupt the login flow before the distance comparison is reached:
In all error cases, the full server-side traceback is written to the Frappe Error Log with the title "Face Login Error" to allow the administrator to investigate.
One aspect of Face Recognition Login that differs from a pure biometric system is that the username must still be typed before face login is attempted. The Login with Face ID button remains disabled until at least one character has been entered in the username field.
This is intentional. The server uses the username to look up the correct User Face Maps record before performing any comparison. Without a username, the server would have no way to know which stored encoding to compare against. The username is not verified against a password — it is used only to retrieve the correct reference encoding from the database.
Understanding what data is retained after a face login attempt is important for both administrators and users.
What is stored permanently:
The reference photograph attached to the User Face Maps record (stored as a file on the server).
The 128-dimension encoding computed from that reference photograph (stored in the Image Info field as a JSON array of floats).
What is never stored:
The live camera video stream — it is processed in the browser in real time and discarded.
The captured face image — it is transmitted to the server for comparison and not saved to disk or the database.
The live encoding computed on the server — it is computed in memory, compared against the stored encoding, and discarded.

The server retains only the outcome of the authentication attempt, through the standard Frappe session and login audit mechanisms.