Research on face feature point detection algorithm under computer vision
Published Online: Mar 19, 2025
Received: Nov 07, 2024
Accepted: Feb 16, 2025
DOI: https://doi.org/10.2478/amns-2025-0503
Keywords
© 2025 Dan Li et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Face recognition technology as a kind of biometrics, is through the acquisition of recognized facial features to identify the identity [1]. With the in-depth study of face recognition methods, the current face recognition algorithms have reached a high level, recognition efficiency and accuracy has been very high, face recognition has become the mainstream biometrics, in practice, has been widely used. For example, face recognition technology is used in daily security issues such as login authentication [2], attendance [3], purchase payment [4], and criminal identification [5] in public security system. The main components of the face recognition system, which can also be said to be the direction of research, are divided into the following parts: face detection, face alignment (key point detection), face feature point extraction, and face classification and recognition [6]. Among them, face feature point detection is also known as face key point detection and face alignment, and the purpose of its technology is to process the already detected face images in more detail through the computer, and to coordinate the specific location of each key part of the face, such as eyebrows, eyes, mouth, nose, and face contour [7]. Accurate face feature point localization is conducive to the extraction of facial features, which is very meaningful for automatic face recognition, expression recognition and analysis, and face modeling [8–10]. At the same time, excellent face alignment algorithms have good robustness to the interference on the image, i.e., they have good adaptability to problems such as illumination change, background change, face angle change, occlusion, etc., and can still accurately calibrate the location of the keypoints in the case of poor image quality [11–12]. Therefore, face keypoint detection is of great significance for dealing with various problems in face recognition, and it is a very important part of the face recognition field. Face recognition, as an important research topic in the application of biometrics, involves a wide range of fields, including computer applications, computer vision, perceptual science, image processing, psychology, statistics, pattern recognition, and neurology, etc. [13–16].
This paper focuses on the face feature point detection algorithm under computer vision. The cascade regression algorithm model is built, the face pose change module is designed based on weak invariance and pose indexing features, and the algorithm model is kept in incremental learning by training all levels of layer regressors to improve the feature extraction accuracy. By comparing the experimental results with the existing methods and the pupil localization accuracy and detection speed tests, it is verified that the cascade regression algorithm can extract the face feature points quickly and accurately.
The accuracy of face feature point detection relies on the continuous training and testing of the cascade regression algorithm, which will be described in the following section on face feature point detection, cascade regression algorithm, and the learning and training process.
A human feature point is a set of points with special semantics, such as cheeks, eyebrows and lips, on a face image. For the sake of exposition, the set composed of all face images is denoted as II = {
Following the customary terminology in the field of face feature point detection, in this paper, we refer to the vector
Definition 1 (Face Shape) For a given
Face feature point detection is usually learned on a given dataset to obtain the corresponding model, and the average face shape usually has a special meaning in a carefully structured dataset. For the sake of the narrative of this chapter, this section gives definitions of the face feature point dataset and the average face shape:
Definition 2 (Face Feature Point Dataset and Mean Shape) A face feature point dataset consists of a series of images and the face shapes on those images, denoted as D = {(
In a standard face feature point dataset, the sample capacity is usually large, and the angles and poses of the faces in the image are widely and uniformly distributed, so
In algorithms of the cascade regression class, face shape-indexed features are often used to describe information about how well the current face feature points match the image. Such features are constructed with the help of face shape indexing of pixel coordinates, i.e., for different face shapes, the positions of these pixels should have similar semantics. For example, when the
On different face images, after shape indexing, the positions of the corresponding points relative to the face shapes are geometrically invariant and have some semantic information.
Shape indexing can be realized with the help of various geometrical means. For the sake of exposition, in this paper, the process of mapping a set of reference points
In the ERT algorithm, the pixel difference feature of shape indexing is proposed by calculating the difference of gray values between shape indexed pixel points under the condition of (1).
The regression-based face feature point detection algorithm can be divided into two parts: feature extraction from the input image and mapping the extracted features to face feature points by a learned regressor. However, due to the complexity of the distribution of feature points, it is very difficult to learn a strong regressor to directly map the features to the exact location, and the results are often unsatisfactory. Therefore, drawing on the idea of integrated learning, the researchers propose to obtain a strong regressor by integrating multiple weak regressors in order to realize face feature point detection. The algorithmic framework can be divided into two parts: the training phase and the testing phase. Training phase In the training phase of the algorithm, it can be divided into three main modules: generating training samples, training the optimal weak regressors, and updating the training samples. The framework of the algorithm is shown in Figure 1. Each training sample can be described as After determining the set of training samples Where After completing the training of the optimal weak regressor Testing phase In the testing phase of the algorithm, for the input face image, the face feature points are initialized, the pose index features are extracted according to the current feature point distribution, and the extracted features are input to the trained weak regressor to get a set of feature point distribution residuals to update the current feature point distribution. Repeat the above process for each weak regressor to finally approximate the real face feature point distribution. The process of algorithm iteration can be represented by equation (3).
Where

Training block diagram of cascade regression algorithm

Test block diagram of cascade regression algorithm
Pose-indexed feature is a feature extraction method, compared to common feature extraction algorithms such as HOG, SIFT, etc., the most important feature of pose-indexed feature is that it contains the information of pose change in the process of feature extraction. For face images, this pose change information is implicitly included in the face feature point distribution. The pose indexing features are the key to the cascade regression framework to achieve good results, and the overall robustness of the algorithm can be well improved by designing reasonable pose indexing features.
Formally, the pose-indexed feature extraction operator for a face image can be defined as
Weak invariance is a property of pose-indexed features, which indicates that consistent pose-indexed features can be extracted from images with different poses. In order to facilitate the description of weak invariance, this paper defines the camera mapping method as
That is, satisfying weak invariance means that consistent features can be extracted from face images with different poses for the same set of feature points
Considering that pose-indexed features will be computed very frequently during the training and testing process of the whole cascade regression algorithm framework, the design of features should also be as simple as possible to ensure that the algorithm can realize real-time face feature point detection. Based on the pixel difference features, the algorithm first randomly selects
The data distribution statistics are used to train all levels of layer regressors, so that the cascade regression framework can complete the incremental learning, which can gradually correct the initial shape to the position of the real shape marking, and improve the accuracy of face feature point localization. While the incremental learning needs to be built on an existing offline model to be carried out, this section first introduces the learning process of the offline model.
The
Where
The learning process of traditional cascaded homoscedastic methods requires iteratively training each level of the hierarchical regressors until the error between the current shape and the true shape is no longer increasing. However, such a sequential approach makes learning the model longer and does not allow for incremental updates of each regressor. As shown in FIG. 3, since the shape displacement space

The training manner of conventional CR

Parallel update method based on mixed Gaussian distribution sampling
Specifically, the input space of the level
Step
Step
The EM algorithm solves for the hybrid Gaussian distribution to better characterize the distribution of shape displacement vectors in the level
With the known distribution statistics of the data in the
Through the testing and training of the cascade regression algorithm, the algorithm model that can continuously learn and update is obtained, which is compared and analyzed with the existing detection methods in the following section and applied to the pupil localization accuracy experiments, which verifies the practical prospect of the cascade regression algorithm.
In order to further validate the effectiveness of this paper’s algorithm, the experimental results of this paper’s algorithm are compared with those of existing methods on the LFPW and LFW face datasets.
Figure 5 shows the average relative localization error of this paper’s algorithm for each face feature point on the LFPW dataset, and also shows the statistical results of the sample consistency method on the LFPW experimental dataset, as well as the results of manual annotation on the LFPW dataset. It can be seen that this paper’s algorithm exceeds the accuracy of manual labeling in all the feature points, which also means that this paper’s algorithm has better stability than manual labeling. Compared with the sample consistency method, this paper’s algorithm is very close in the localization accuracy of stable feature points, and has higher localization accuracy in unstable feature points (i.e., those located in the center of the eyebrow, etc., where the texture features are relatively not very distinctive), with an increase in accuracy of more than 10%.

Comparison of average relative positioning errors of face feature points
Figure 6 shows the error cumulative distribution curves of this paper’s algorithm, the sample consistency method, and in order to further compare the localization accuracy of the localization detector, the figure also shows the error cumulative distribution curves of the support vector machine used in this paper’s algorithm as well as the sample consistency method, where the support vector machine localization result of this paper’s algorithm is given by the weighted Mean-Swift algorithm. Where the red curve represents the algorithm of this paper that introduces face shape constraints, and the green curve represents the sample consistency method that introduces face shape constraints; the blue curve represents the error accumulation distribution curve given by the support vector machine of the sample consistency method; and the purple curve represents the error accumulation distribution curve given by the support vector machine of this paper’s algorithm.

Cumulative relative error distribution curve and comparison
It can be seen that, due to the fact that the training set used by this algorithm to train the support vector machine with probabilistic output has fewer samples and the parameters have not been carefully adjusted, the cumulative distribution of the error given by the support vector machine has a poorer result compared with the cumulative distribution of the error given by the support vector machine of the sample consistency method, which is manifested by the fact that the detection rate of the support vector machine is lower than the detection rate of the support vector machine of the sample consistency method for almost all the values of the relative localization error. The results of the sample consistency method. After adding the face shape constraint, the cumulative error distribution curve of this paper’s algorithm is better than that of the sample-consistent method, which shows that the detection rate of this paper’s algorithm is higher for any given value of relative localization error. This shows that the introduction of face shape constraints by this paper’s algorithm on a poor localization detector can greatly improve the accuracy of face feature point localization, which also illustrates the importance of face shape constraints, especially higher-order constraints, for face feature point localization.
Fig. 7 shows the detection rate of this paper’s algorithm for each feature point on the LFW face dataset and the comparison with the results obtained by the sample consistency method, conditional random forest method and other methods, and it can be seen that this paper’s algorithm has a greater degree of improvement in the detection rate compared to the other methods, and all the detection rates are more than 90%.

Detection rate of each feature point on LFW face data set
Through the analysis and comparison above, it is verified that this paper’s algorithm has a high detection rate in face feature point detection. Pupil localization accuracy test is carried out using the algorithm of this paper to further analyze the detection error of this paper’s algorithm in small pixels and the corresponding accuracy.
The error analysis is calculated by Matlab, and the square of the distance calculated by the calibration point and the program is plotted, where the vertical coordinate is the square of the distance calculated by the manual calibration point and the algorithm (the unit is the square of the distance of the pixel point), and the horizontal coordinate is the pixel point, and the pupil localization accuracy graph is shown in Figure 8.

Pupil precision location map
By analyzing Figure 8, the following conclusions can be drawn:
Overall, the accuracy of the pupil center detection of the left and right eyes is controlled within a distance of 3 pixels, and the accuracy of the left eye corner detection can be controlled within 3 pixels. The accuracy of right eye corner detection was controlled within 5-6 pixels, with most of them within 5 pixels. The low detection accuracy of the right eye corner lies in the fact that the eye corners of the test samples are not clear, and the difficulty of confirmation in real life is still high. Through the analysis of the test samples, the human eye pupil accuracy has an error because of the light caused by the presence of the pupil part of the region of the region of white pixels, therefore, the pupil detection when calculating the center of the pupil black area caused by the impact of off-center. The test accuracy of large and small pixels is similar, mainly due to the similarity of the detection algorithm.
Through this test, it is obvious that this paper’s algorithm, although to a certain extent affected by the clarity of the face feature points, the brightness of the light and other factors, still maintains a high detection accuracy in the detection of the pupil, the corner of the eye and other small pixels of the face feature points.
Combined with the previous test and evaluation, it can be seen that the cascade regression algorithm has high detection accuracy in face feature point detection, and it can have better application in face detection. In the following section, the cascade regression algorithm will be comprehensively evaluated in combination with the evaluation index of detection speed to clarify its application prospect in face detection.
There are many evaluation indexes for face detection algorithms, among which detection speed is one of the indexes to measure whether the algorithms can process the face feature points quickly in real scenarios. We test the data forward propagation speed and single detection speed of this paper’s algorithm on AFW, PASCAL Faces, FDDB and other datasets, and obtain the data results in Table 1.
Face detection module evaluation
Test set | Forward propagation time/ms | Detection time/ms |
---|---|---|
AFW | 7.39 | 10.76 |
PASCAL Faces | 7.25 | 12.28 |
FDDB | 7.48 | 11.68 |
CASIA-WebFace | 7.27 | 10.80 |
WIDER Face | 7.42 | 11.59 |
MALF | 7.36 | 12.15 |
The number of model parameters of the face detection module of this paper’s algorithm is about 5M, combined with Table 1, it can be seen that the time required for a single forward propagation on a single GTX 1080Ti GPU is less than 7.5 ms, and the time required for a single detection is about 11.5 ms on average, which means that the detection speed is about 87 FPS.In the previous section, it was verified that this paper’s algorithm has a higher detection accuracy, and the algorithm of this paper at the same time maintains the advantage of fast detection speed. This means that the algorithm in this paper can not only quickly detect different face feature points, but also accurately detect different face feature points. Based on this, the algorithm has practical and broad application prospects in real life.
This paper focuses on the advantages of cascade regression algorithm in face feature point detection. The overall framework of the cascade regression algorithm is built through training and testing, combined with the weak invariance of the pose index feature and the training of the regressors at all levels of the hierarchy, to improve the accuracy of the algorithm for face feature point detection. Through relevant experiments and results comparison analysis, it is concluded that the detection rate of cascade regression algorithm is more than 90%, the pupil detection accuracy is about 3 pixels, and the detection speed is about 87FPS, which has the advantages of fast detection speed and high detection accuracy.
This paper studies the face feature point detection algorithm from the perspective of computer vision, which can provide scientific and effective support for the application of cascade regression algorithm in real life. And from the actual research results, the cascade regression algorithm can quickly and accurately recognize different face feature points, and has a better application prospect in the field of face recognition.