Deep Learning Models for Biometric Recognition based on Face, Finger vein, Fingerprint, and Iris: A Survey

Forming the personality of a person is critical in our vastly interconnected society, and a broad variety of systems demand reliable personal recognition methods to either emphasize or indicate the identity of the intended individual asking for their services. Biometrics is the branch of science that has gained a significant place in the technical world, which mainly refers to the analysis of biological data that concerns both the identification and verification of an individual by relying on relevant behavioral and physiological traits. These traits own powerful properties due to their constant and unrivaled nature, which can distinguish the individuals from each other [1]. The Biometric-based recognition systems, that know as Identity Verification (IV) techniques, offers the ability of obtaining an authentication method with the aid of biometric features in order to measure, identify, and validate individuals' identity in an automatic way [2]. The reason behind developing these systems is to improve the safety and security related to the digital fields, by governorate both authentication and identification of an intended individual in many domains such as forensics, defense, surveillance, and banking. Spoofing the biometric modality of a registered individual is a hard job and may even be impossible, and this is what make these systems grow and get more popular.

The main factors of these systems that distinguish them from the traditional ones in terms of privacy and security are their robustness, accuracy, and resistance to spoof. Unlike other methods, in which biometric traits cannot be forging, borrowed, or forgotten, and stolen one is nearly impractical [3]. Deep Learning techniques and architectures have obtained a significant place in designing the security system with tremendous performance. They have been adopted for raising the accuracy rates of multiple biometric recognition systems in the last few years, due to their magnificent performance and the ability to deal with a vast amount of data of any kind [4].

1.1.

MOTIVATION

The motivational force behind developing biometric systems is the repeated need for recognition and approval of individual identity, which is one of the basics nowadays in a lot of automated actions. Many functional, interesting, and constructive applications that depend on biometrics of many different kinds can help maximize the convenience and efficiency of transportation, minimize fraud, enhance safety criteria, and in general improve the degree of national security. Deep learning (DL) approaches can discover features describing the data, which is used in training discriminatively, in order to have the ability to distinguish between the big count of individuals. One of the main obstacles is the handling of all kinds of variations such as large interclass differentiation and noisy biometric data. To do this, the model must be tough enough, which necessitates big volumes of information and massive attempts to accumulate the information in order to exhibit gradual differences over time (e.g., face datasets for gender recognition). In this scenario, the motivational concept of biometric systems-based deep learning is the proclivity to minimize error rates, enhance accuracy, lower frauds, multiple options for decreasing costs, better scalability, physical safety, and comfortability. The purpose of this design is to investigate how biometric systems benefit from the merits of DL techniques in achieving their goals (e.g., recognition, verification, gender recognition, and spoof detection of individuals) by focusing on only four popular traits (face, fingerprint, iris, and finger vein). A survey is introduced to determine important and interesting researches that obtain high quality results using deep learning architectures in biometric systems. This work will enable new researchers to familiarize themselves with the significance and growth of a DL-based biometric recognition system.

1.2.

CONTRIBUTIONS

The significant contributions of this paper are as follows:

Introduce an overview of biometrics, and DL neural networks (NN) in a brief explanation, within the most significant points describing them.

Focusing on four biometric traits only including (finger-vein, iris, face, and fingerprint), that considered to be the most powerful and widely utilized for recognition and authentication, due to its clear presence which make it comfortable to be utilized by the users unlike other traits like ear that may be covered and hard to be reached (e. Wearing Hijab).

Provide a comprehensive collection of both outstanding and new works from the past seven years which adapted deep learning of different types and architectures in biometric systems.

Present a review of related works for finger vein verification and recognition based on DL. This is because most surveys do not focus on finger veins only, but rather on the total veins in the hand.

Analyze how biometric recognition approaches have leveraged the advantages of DL and explore ways where these systems can be further enhanced.

1.3.

SURVEY ORGANIZATION

This paper is composed of six main sections as described in Figure 1. To facilitate better and smoother reading of this survey, a detailed description of each section has been illustrated as follows:

Introduction: This section provides a brief introduction to biometrics and deep learning, describes the motivation behind this survey, presents the contributions obtained, and illustrates the layout of this work.

Biometric Recognition Overview: Presents an overview of the types of biometric traits and their significant features and applications, as well as a description of recognition and verification of these traits.

Deep learning Background: Introduces the historical background of deep learning networks and structures, as well as their common applications.

Deep learning in Biometrics: Presents a comprehensive overview of related works in biometric-based deep learning systems from the year 2016 to 2023 depending on four traits (face, fingerprint, iris, and finger vein).

Performance Measurement: Illustrates the kinds of public datasets utilized for the four biometric characteristics, compression tables between the most powerful works, and a discussion about the performance of the mentioned models.

Conclusions: Presents the conclusions reached by performing this survey paper.

2.

BIOMETRIC RECOGNITION OVERVIEW

Recognizing humans based on their body characteristics has become more and more interesting in emerging technology applications. This is because biometrics cannot be forging, forgotten, or borrowed, and stolen one is nearly impractical. Biometrics is a methodological study of measuring and analyzing biological data of a specific person for the purpose of authentication or identification [5] Basically, any kind of biometric system relies on utilizing one of the biometric traits of the individual that is captured based on measuring the characteristics of individual behavior or body. These characteristics are also known as identifiers or modalities and behave as the basic for building up the systems [6]. In general, biometric characteristics can be separated into two main kinds including behavioral and physiological, as described in Figure 2. Physiological refers to hereditary characteristics that are produced in the early embryonic levels of human creation like fingerprints, face, retina, iris, and hand. Behavioral, on the other side, are traits that are learned or gained and not inherited, such as voice patterns, signature, handwriting, and keystroke dynamics [7].

Any biometric system depends on utilizing the features extracted from human characteristics after scanning them with the use of some kind of sensor and keeping it in a determined dataset as a reference. A set of features is then obtained from the dataset to be used as an entry to the classifier to perform the matching procedure. Figure 3 describes the main utilized steps when designing a biometric system [8].

A biometric system can either be a verification or an identification method. Verification, also noted as authentication, is a one-to-one process. The captured biometric trait of the individual claiming his/her actual identity is compared to a stored template by the system, in which the outcome is a binary value representing either acceptance or rejection [9]. Identification refers to a one-to-many procedure, which is mainly utilized to compare each recorded biometric template from the database to look for an individual's identity depending on the highest similarity result. There are a set of requirements that should be satisfied as much possible as for the utilized trait so that the trait can provide the desired goal in the recognition-based system. The requirements for biometric-based systems are illustrated in Table 1 [10].

TABLE 1:

BIOMETRIC-BASED SYSTEMS REQUIREMENTS

Universality	All authorized individuals must have the utilized biometric trait
Distinctiveness	No two authorized individuals have similar characteristics of the trait
Permanence	The obtained trait doesn't change for a specific duration of time
Performance	Identified in the achieved Security, speed, accuracy, and robustness
Acceptability	Agreed by the individual's population without an interception
Circumvention	The degree ability of to use a fake biometric
Collectability	The simplicity of gathering traits samples in a comfortable manner for the individual

3.

DEEP LEARNING BACKGROUND

Deep learning encompasses a wide range of machine learning methodologies and architectures that possess the unique characteristic of employing multiple hierarchical layers of non-linear information processing steps [11]. Deep learning algorithms make use of hierarchical data in conjunction with robust computational resources and efficient optimization techniques. Deep learning models have effectively delivered the intended outcomes across various domains, including but not limited to computer vision procedures, speech recognition, and natural language processing (NLP). It appears that these models are the most viable option for addressing the ongoing proliferation of challenges associated with biometric recognition across various domains [12]. The efficacy of profound approaches is predicated on their capacity to surmount the challenges that conventional methods present in their pursuit of success. A collection of elements governs deep learning [13]:

Feature learning: Acquire knowledge of characteristics that delineate the information and influence other interconnected operations. This necessitates the separation of numerous associated factors into these features, as opposed to those that are assembled by hand and are intended to remain stable on the intended factors.

Hierarchical impersonation: This approach represents the features hierarchically, with the most basic ones being encoded in the lower layers and the more complex ones being learned by the higher layers. This will ensure that properties of two types, including local and global representations of features, are successfully encoded.

Distributed impersonation: This approach operates on a many-to-many basis, wherein the representations are dispersed due to the fact that multiple neurons can demonstrate a single factor, and a single neuron can provide an explanation for multiple factors. This approach has the potential to eradicate the notion of dimensionality and yield a dense depiction.

Computational resources: the utilization of recent advancements in parallel computation and graphics processing units (GPUs) enables the execution of deep neural networks and their training across a substantial number of training samples.

Big data sets, which consist of an extensive quantity of training samples, enable deep learning to achieve significant advancements in numerous domains, including natural language processing (NLP).

The computer vision domains utilize an extensive variety of magnificent DL architectures, including Convolutional Neural Networks (CNN), Auto-Encoders, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Generative Adversarial Networks (GANs) [14].

3.1.

CONVOLUTIONAL NEURAL NETWORK (CNN)

Based on the multi-layer perceptron architecture, CNN is the most widely used deep architecture; it was initially proposed by LeCun et al. [13] to conduct classification tasks and address a variety of computer vision issues. CNN draws inspiration from the visual cortex, an intricate network of cells located in the human brain that performs the crucial function of light detection in receptive fields [15]. These receptive fields are compact and overlapping sub-regions of the visual field. CNN is a specialized entity within the NN framework that possesses a grid topology. Its primary structure consists of a collection of filters that process and arrange inputted data in various locations, ultimately producing an output map. CNN's primary architecture consists of three key levels: entirely linked, convolutional, and pooling linked. When evaluating the results, a nonlinear activation function, also known as nonlinearity, is applied. This will generate an equivalent quantity of feature maps, which are subsequently transmitted as entries to the subsequent level [16]. In most cases, the pooling and convolutional layers are positioned above one or more wholly linked levels. CNN's principal advantage is the weight sharing method, which employs the sliding kernel to traverse the images and collect local information in order to extract the images' features. Due to the fact that the kernel's weights are distributed across the entire image, CNN has fewer parameters than the completely connected network. Furthermore, when numerous convolution layers are stacked, the higher-level layers are capable of acquiring knowledge from receptive fields that are progressively broader [17].

3.2.

RECURRENT NEURAL NETWORKS (RNN)

Recurrent neural networks (RNNs) have been extensively implemented to process sequential data (e.g., speech, videos, and text) in which the information at each instant is dependent on the information encountered previously. In general, RNNs consist of an output layer, an input layer, and more than one unobserved layer. The nodes of concealed layers from the previous time-step are linked to the nodes of the current time-step in this multi-time-step network [18]. As a consequence, hidden layer nodes acquire two entries at each time-step: one containing the current time-step's data and the other originating from the hidden layer impersonation performed in the previous time-step.

3.3.

LONG SHORT-TERM MEMORY (LSTM)

LSTM is one of the most well-known RNN architectures designed to process sequential data. It is designed to provide superior efficacy in obtaining long-term dependencies compared to conventional RNNs, which frequently experience gradient vanishing or expanding issues. This is attempted to be resolved by the LSTM network via an internal gate. LSTM is composed of a memory cell and three primary stages (output, input, neglect) [19]. The function of the cell is to retain the values for arbitrary time intervals, whereas the gates govern the information flow into and out of the designated cell.

3.4.

GATED RECURRENT UNIT (GRU)

was introduced by Cho et al. [20] in 2014. GRUs possess the same capability as LSTMs to manage long-term dependencies. The structural distinction between GRUs and LSTMs is that the former is more straightforward, whereas the latter lacks distinct memory cells, rendering it computationally more robust when training data. The number of gates in the GRU network is also regarded as a distinction from the LSTM network, which consists of only two gate structures—the update gate and the reset gate—which are capable of providing a resolution to the challenge of forecasting time series with extended intervals and delays. The update gate regulates the degree to which information from the previous moment is incorporated into the current moment, whereas the reset gate regulates the extent to which information from the previous moment is disregarded [21].

3.5.

AUTO-ENCODERS (AE)

Auto-decoders are constructed using a collection of NN models that are implemented unsupervised in order to discover efficient data encoding techniques. Auto-decoders transform the input data into a latent space by compressing the data and subsequently reconstructing the output, which is essentially the same as the input. The encoders comprise two components: the encoder compresses the input into the desired representation, as demonstrated by the function z = f(x) during encoding, and the Decoder reconstructs the fundamental entry from the latent space representation, as demonstrated by the function y = g(z) during decoding [22].

3.6.

DEEP BELIEF NETWORKS (DBN)

Initial development of DBN (Structed Boltzmann Machines) [24] by Hinton et al. [23] was identical to that of a stacked auto-encoder and consisted of stacked basic learning methods. Including input data and a concealed level, the fundamental RBM consists of two layered layers. With the exception of the upper two layers, which display an oblique bipartite graph, all tiers of the DBN conduct directed communications. A relationship does not exist between units that are possessed by the same level (visible or invisible). Among the units of each bias and stratum, the weights w constitute the DBN parameter.

3.7.

GENERATIVE ADVERSARIAL NETWORK (GAN)

GAN, which is considered a contemporary family of DL models, is composed of two networks, a discriminator and a generator [25]. The generator operates primarily at a high level to generate samples from a particular distribution that are sufficiently similar to the original models in order to deceive the discriminator. Additionally, the function of the discriminator is to distinguish the fabricated samples (false) from the authentic ones. In GAN, the generator network learns to map input from noise z (which may have been generated using a prior distribution such as Gaussian) into meaningful output.

Overall, each deep learning technique operates differently, with the capability of handling a particular data type and carrying out a specific operation within a particular domain and application. Table 2 provides an overview of the key benefits and drawbacks associated with the deep learning techniques that were previously demonstrated [26,27,28,29,30,31].

TABLE 2:

The ADVANTAGES AND DISADVANTGES OF THE MOST WIDLY USEED DEP LEARNING ARCHITECTURES

Architecture	Advantages	Disadvantages
CNN	➢ Unsupervised feature learning ➢ Low complexity due to count of parameters and sharing of weights. ➢ High performance in recognition and classification of images	➢ Large dataset required. ➢ Long training time ➢ Unable to deal with input variations (i.e., orientation, position, environment)
RNN	➢ Can remember and learn from past data, to give better prediction. ➢ The ability to capture long sequences patterns in the data of large size. ➢ Often utilized for natural language processing tasks	➢ computationally expensive ➢ more porn to overfitting and vanishing gradient problems. ➢ hard to optimize due to the large count of layers and parameters.
LSTM	➢ Better attitude in dealing with long-term dependencies. ➢ Utilized LSTM cell as activation function so it's less susceptible to the vanishing gradient problem. ➢ Very effective at modeling complex sequential data.	➢ More complicated than RNNs ➢ require more training data in order to learn effectively. ➢ Not suited for prediction or classification tasks. ➢ Slow on large datasets training. ➢ Not work effectively for all kinds of data such as nonlinear or noisy ones.
GRU	➢ Uses less memory and is faster than LSTM. ➢ Has fewer parameters than LSTM	➢ low learning efficiency, due to the slow convergence rate ➢ too long training time ➢ may suffer from under-fitting problem
AE	➢ Unsupervised and doesn't need labeled data for training. ➢ Convert the high dimension data into low dimension features. ➢ High scalability with the increase of data. ➢ minimize the noise of entered data	➢ high complexity ➢ computationally expensive, ➢ need large training dataset. ➢ causes losses in interpretability, when representing features in a latent space
DBN	➢ Unsupervised feature learning ➢ robust in classification (size, position, color, view angle – rotation). ➢ implemented in many kinds of dataset. ➢ resistant to overfitting due to the RBMs' contribution to model regularization. ➢ Can manage missing data	➢ high complexity ➢ computationally expensive ➢ need large training dataset.
GAN	➢ Can deal with partially labelled data. ➢ Efficient generation of samples which looks like the original one. ➢ used in generating images and videos.	➢ Hard to be trained due to the need for different data types in a continuous manner. ➢ training cannot be completed when having missing pattern. ➢ have difficulties in dealing with discrete data (e.g., text)

4.

DEEP LEARNING IN BIOMETRICS

This segment provides an overview of deep learning advancements in biometric systems that hold promise for various purposes, including segmentation, feature extraction, classification, and matching. These applications rely on four widely recognized biometric characteristics: the fingerprint, face, finger vein, and retina. From 2016 to 2022, the outcomes and various deep learning architectures of over 190 published works in the field have been compiled. Table 3 [32] provides descriptions of the significant attributes, qualities, and uses of the four most widely employed biometrics.

TABLE 3:

BIOMETRICS FEATURES AND APPLICATIONS

Biometric trait	Significant Features	Applications
Face	No need for physical friction Easy in keeping templet. Comfortable, statistics less complicated Rapid identification procedure Changes depending on time, age, incidental events, Differences between twins are difficult. Affected by lighting in the surrounding environment. May be partially occluded by other objects	Access control Face ID Interaction within computer Criminal determination Monitoring Smart cards
Fingerprint	Modern, reliable, safe, highly accurate and less cost Rapid matching Need small memory space. Affected by wound, dust, twists. Need a physical communication	Authentication of the driver Criminals' determination and forensics Authentication in both license and visa cards Access control
Iris	Scalable, accurate and highly covered Samples of small size Rapid processing and maximum cost Have unparalleled structure. Remains stable throughout the life Difficult to adjust. High randomness No physical contact is needed and just user collaboration. Hidden by some eye parts such as lashes. Affected by some illness conditions	Criminals' determination, and forensics Identification Access control National security determining in all of seaports, land, and airports
Finger vein	Sanitary without any touch Highly accurate and hard to spoof. Unique Affected by body temperature. Affected by some diseases. Tiny size of template Minimum processing	Driver identification Door's security login Bank services Physical access monitoring and attendance time Airports, hospitals, schools

4.1.

FACE-BASED DEEP LEARNING RELATED WORKS

Recently, face-based deep learning has received significant attention because facial images can reveal a lot of information. These images can tell an individual's mood, intention, and attentiveness, so they are considered an effective identity verification. Facial images can be utilized in many gender and age estimation applications. A novel method for age-invariant Face Recognition (FR) is introduced in [33] using DL-driven CNN descriptors and maintains an accuracy of 80–90%. In [34], a Multi-Scaled Principal Component Analysis (PCA) Network (MS-PCANet), a Multiple Scales combined DL Network (DLN), is described as requiring a significantly smaller training dataset than conventional CNN in order to achieve the intended outcomes. For face verification, [35] proposed a framework consisting of a Convolutional Fusion Network (CFN) and a Deep Mixture Model (DMM). The accuracy of the two datasets (DS) is 82% and 87.5%, correspondingly. A low error rate was achieved when deep CNN was applied to the multimodal thermal, depth, RGB, and (RGB-D-T) FR problem in [36]. A novel approach was introduced in [37] by analyzing frontal views in FR with varying illumination, occlusion, and disguise. This involved extracting the dynamic subspace of the images and deriving the discriminative components in each individual. The KNN classifier was implemented and a 95% Recognition Rate (RR) was achieved. The application of deep CNN to the neonate database of high-quality images at IIT (BHU) resulted in an accuracy of 91.03 percent for natural faces [38]. A deep CNN-based DL-based framework for age classification was proposed in [39]. The framework underwent training utilizing a transfer learning approach with a 90% loss function. A CNN for Near-infrared (NIR) FR demonstrated increased identification rates while requiring reduced training and processing time [40]. A DCNN with the Caffe framework for Vehicle and FR was proposed in [41] and achieved 91.22 percent accuracy on a collected D.S. [42] proposed a deep network model utilizing VGG net that optimally processed FR by incorporating both visible light and NIR images. [43] presents a novel CNN frame referred to as the Low-Rank-Recovery Network (LRRNet), which effectively processes severely injured images. A novel framework is presented in reference [44] that integrates the benefits of locally crafted feature descriptors and the Deep Belief Network (DBN). This framework demonstrated encouraging outcomes on four DSs. A DNN termed Noise-Resistant Network (NR-Network) was developed by [45] to handle FR under noisy conditions; it yielded an RR of 70–85%. A One-Class-in-One-Neuron (OCON) system capable of identifying many-expression, occluded, and obscured features via efficient yet compact DL was introduced in reference [46]. This system yielded excellent results. In [47] proposed a novel deep transfer neural network (FMTNet) method for facial attribute classification based on multi-label learning; it achieved an accuracy of 91.66 percent. A Recurrent Regression Neural Network (RRNN) framework was introduced in [48] to combine two traditional tasks: cross-pose FR on still images and videos with a 95.6% RR. Successful results were obtained when [49] introduced a novel scheme utilizing DL and a quicker RCNN framework by integrating a number of strategies. The authors of [50] propose a dependable approach to real-time FR while also applying a filter to the images. Extracted binary patterns are fed into a multilayer perceptron that classifies with the Gradient Descent algorithm and achieved 91% accuracy. [51] implemented a DL network with triple loss and achieved 95.5 percent accuracy. A FR assisted by a facial texture feature-based DL feature (FTFA-DLF) was introduced in reference [52] and achieved 97.02 percent RR. [53] introduced an active body detection algorithm for CNNs that utilizes dynamic features and achieves an RR ranging from 98% to 100% on a variety of DSs. Re-Classification Network (RCNet), a deep cascaded detection method that iteratively utilized bounding-box regression, was introduced by [54]. It obtained an exceptional recall rate of 87%. Circular symmetrical Gabor filter (2D)2PCA neural networks [CSGF(2D)2PCANet], a novel DLN proposed by [55], achieved 97%–100% accuracy across a variety of variations. By combining the additive cosine margin and multiplicative angular margin, [56] achieved a softmax loss for DCNN of 99.77% and 96.40% for two DS, respectively. [57] proposed a CNN-based algorithm with an accuracy of 97.9%. [58] introduced the Deep Stacked Denoising Sparse Auto encoders (DS-DSA) system, which achieved a 98.16% accuracy rate. Enumerate Net, a deep local descriptor learning framework based on DCNN that [59] proposed, attained 98.68% accuracy. Achieving an accuracy of 96.86%, [60] proposed the DL network L1-2D2PCANet using L1-norm-based 2-dimensional principal component analysis (L1-2DPCA). CNN was utilized with a pre-trained VGG-Face model [61]. A deep class-skewed learning method with 96.80% accuracy was investigated in [62]. CNN was utilized in [63] for Facial Expression Recognition (FER), yielding a 98.43% RR. Using CNN and data augmentation, [64] achieved an accuracy of 98.1%. In [65], a novel loss function for deep learning FR dubbed additive angular margin (Arcface) was introduced. This function utilizes an angular penalty margin on the angle between the deep features and their corresponding weights in order to generate extremely distinctive features. The algorithm has been executed on four distinct datasets, with the maximum LFW result being 99.82. In A Deep Convolutional-Optimized Kernel Extreme Learning Machine (DC-OKELM) algorithm was introduced in reference [66]. It exhibited an error rate of 0.5. With an accuracy of 97.06%, [67] proposed a profile for the frontal revise mapping (PTFRM) module in conjunction with deep CNN. Maximum accuracy of 99.19% was attained with a tree-based DL model for automatic FR in a cloud environment, as proposed in [68]. A Receptive Field Enhanced Multi-Task Cascaded CNN with a precision of 98.37% was proposed in [69]. A deep CNN was used to train a small-sized DL model for mobile devices with an accuracy of over 98% in [70]. A multi-foot input CNN model and an SPP-based CNN, both of which achieved 97.6% accuracy, are proposed in [71]. A facial diagnostic system for Turner syndrome (TS) was developed by [72] using DCNNs with an accuracy of 97.0%. In [73] introduced mpdCNN, a Deep-CNN-based architecture designed for FR in surveillance with an accuracy greater than 99 percent. 57% accuracy is achieved when CNN deep learning models for FR are implemented [74]. [75] introduced a novel one-dimensional deep CNN (1D-DCNN) classifier that achieved a 100% accuracy rate when combined with linear discriminative analysis (LDA). With an accuracy of 95.78%, [76] proposed an Optimized Multi-Task Cascaded CNN (OMTCNN) and a lightweight FR algorithm based on CNN (LCNN). [77] introduced an innovative method for FR from CNN-based videos with an accuracy of 83.48 percent. A deep CNN framework for FR in an unconstrained environment was devised by [78] with an accuracy of 99.2%. A deep learning loss function called MagFacet is proposed in reference [79]. This function learns a universal feature embedding and uses its magnitude to compute the quality of the intended face. It offers an adaptive mechanism for learning well-structured within-class feature distributions by promoting simple samples to class centers and excluding difficult ones. This prevents overfitting on chaotic and low-quality samples and improves FR in the field. It has been implemented on one training dataset and seven evaluation datasets, with the LFW achieving the utmost accuracy of 99.83 percent. Variational Prototype Learning (VPL), which represents each class as a distribution in the latent space rather than a point, is proposed in [80]. By recognizing the delayed feature drift phenomenon, the memorized features are injected directly into prototypes; this method is straightforward, memory-efficient, and simple to implement; it is applied to eight DS and one for training; and it achieves an accuracy of 99.83 percent on LFW. In A novel methodology was introduced in reference [81] that utilized field-programmable gate arrays (FPGAs) in conjunction with a DCNN model to achieve an accuracy of 96.9%. In their study, [82] introduced a dependable approach to tackle the issue of the masked FR process by utilizing DL-based features and occlusion removal in conjunction with three pre-trained deep CNNs—VGG-16, Alex Net, and ResNet-50—for feature extraction. The application of Multilayer Perceptron (MLP) for classification yielded an outcome of 88.9%. [83] A classifier comprised of DL and LBP in conjunction with KNN achieved 87% precision. With an accuracy of 97%, [84] proposed a novel deep FR framework for dim images comprised of a CNN-based feature restoration network, an embedding matching module, and a feature extraction network. A CNN model was developed by [85] that accurately identified the presence of coverings on human faces while preventing overfitting. The model demonstrates the capability to identify faces and masks in both moving images and videos with an accuracy rate of 99.15%. A hierarchy feature fusion method for face recognition was proposed in [86], which learned superficial and deep facial features using supervisory information. The approach utilized a Lightened CNN with an accuracy of 99.9%. In order to maximize the mutual information between the embeddings of profile and frontal face images, [87] implements Pose Aware Adversarial Domain Adaptation (PADA) and coupled ResNet50 network encoder. This enables the coupled encoder to acquire pose-agnostic representations during the face recognition process. The coupled encoder is then applied to four datasets. [88] Incorporate the quality-aware injection procedure (QAFace), a novel weighting scenario introduced during sample injection, into the Softmax-based classification framework for face recognition. This integration addresses the issue of unrecognizable samples in the dataset by disregarding them, thereby enhancing the similarity score among positive samples. On account of this, the proposed procedure was implemented on seven datasets. [89] A controllable face synthesis model (CFSM) was introduced as a solution to the problem of FR in unconstrained environments. It accomplishes this by eliminating the gap between the semi-constrained training datasets and the unconstrained testing datasets; the CFSM generates facial images in the style of the desired dataset. It has been implemented on four unconstrained datasets, with the IJB-B achieving the highest accuracy of 94.61 percent.

4.2.

FINGERPRINT-BASED DEEP LEARNING RELATED WORKS

A fingerprint is a pattern of ridges and furrows located on the tip of each finger. The patterns of ridges, furrows and the minutiae points on the finger are utilized to distinguish a fingerprint. It is the oldest and most widely utilized trait for recognition due to its effectiveness, simplicity, and ease of acquisition [90]. In [91] presented a fingerprint (FP) liveness detection method based on a DBN within accuracy of 99.4%. [92] achieved the CNN for FP liveness detection with overall accuracy of 95.5%. [93] presented a novel DL-based indoor fingerprinting system using Channel State Information (CSI), which is termed DeepFi, with 0.9425 mean error. [94] presented a pore extraction method using deep CNNs and pore intensity refinement which showed good performance. [95] utilized a CNN integrated with an ensemble model and a batch normalization method to achieve 97.2% accuracy. [96] proposed a novel approach named as D-LVQ for fingerprint identification for large DB with 99.075% RR. [97] proposed an automated latent FP recognition algorithm that utilized CNNs with a result of 76.6%. [98] proposed a novel latent FP enhancement method based on Finger Net inspired by recent development of CNN, showing a processing speed of 0.7s. [99] focused on very low-quality FP images and based the model on CNN to achieve 93% result. [100] proposed an approach using CNNs which averted the need of a frank feature extraction operation and gave 99.6% accuracy. [101] proposed the use of DCNN for FP feature extraction and classification of wireless channels based on software defined radio. An accuracy of 96.46% was obtained. [102] proposed an end-to-end contactless 3D FP representation learning model based on CNN and three Siamese networks with accuracy of 99. 89%. [103] introduced a new CNN architecture for FP liveness detection problem within 98.60% highest accuracy on four DS. [104] employed deep CNN to detect the attributes of physical layer to identify the perceptual radio devices of FP with 92.29% result. [105] proposed a method that could extract the coordinates of the pores from touch-based, touchless, and latent FP images using CNN and achieved satisfactory accuracy. [106] designed a novel hybrid location image using Wi-Fi and magnetic field fingerprints. Then CNN is employed to classify the locations of the FP images and gave high accuracy of within 1m under different smartphone orientations. [107] proposed a partial algorithm based on DL of Residual network for the recognition of partial FP images with 93% classification result. [108] described an appropriate pipeline for using DL to improve the brain functional connectivity-based fingerprinting process which is based on functional Magnetic Resonance Imaging (FMRI) data-processing results for identifying people with average accuracy of 0.3132 ± 0.0129. [109] adopted the DL-based Long-Time-Evolution (LTE) signal FP positioning method for outdoor environment positioning along with modified Deep Residual Network (Resnet) and obtained 94.73% accuracy. [110] illustrated a novel DL-based radio frequency fingerprint (RFF) recognition technique for Internet of Things (IoT) terminal verification within 93.8% accuracy. In [111] Fingerprint Liveness Detection (FLD) has been proposed by applying Deep Residual Network (DRN) with 97.04% highest result. [112] proposed a novel matching algorithm that employed a couple of DCNN to learn both high-level global features and low-level minutia features with 1.87 equal error rate (EER). [113] proposed that a new spoof detection framework exclusively trained on the new fake type was integrated into a spoof detector that consisted of multiple Support Vector Machines (SVMs). After applying an incremental learning algorithm, the results were within acceptable range. In [114] the ratio FP was created within a CNN architecture to learn significant features from the complex FP for indoor sites with 84.17%. accuracy. [115] proposed data pre- and post-processing algorithms with DL classifiers for Wi-Fi FP-based indoor positioning within success rate of 95.94%. [116] employed 1-D CNN to learn a discriminative and compact representation of FP within 0.06% percentage rate. [117] learned accurate features from raw FP images rather than explicit feature extraction along with DCNNs for FP classification with 95.3% result. [118] presented a method that used the Deep Boltzmann Machines along with KNN to recognize FP accurately against fabricated materials used for spoofing and achieved 96.00% highest accuracy. [119] proposed a novel pore matching method within DCNN, denoted as DeepPoreID, and gave EER of 0.16. [120] proposed a method for broken FP based on DL fuzzy theory using the DCNN with 97.1% RR. [121] presented a multi-task CNN based method to recover FP ridge structures from corrupted FP images having a matching accuracy of 84.10%. [122] proposed a robust framework to detect spoofing attacks in FP recognition using DCNN architecture in mobile financial applications within accuracy of more than 99.80%. In [123] a new model based on DCNN was suggested and the effects of two dedicated optimizers were approved, in which Adam achieved 91.73% accuracy. [124] proposed a multi-task fully DCNN for jointly learning the minutiae location detection and its corresponding direction computation. This system operated directly on the whole gray scale FP with 98.24% accuracy. [125] introduced an intelligent computational approach to automatically authenticate FP for personal identification. The feature obtained using Gabor filtering and DCNN, also PCA has been performed to minimize the overfitting to give 99.87% accuracy. [126] proposed a CNN based model for the feature level fusion of FP and online signature within 99.1% accuracy. [127] introduced the traditional point matching FP recognition algorithm and the damaged FP based on CNN with 98.65% RR. In [128] an outdoor positioning system based on a wavelet feature FP image and DL was proposed. DNN with a two-level hierarchical structure was utilized. A ResNet-based rough locator and an MLP-based fine locator achieved more than 90% accuracy. [129] proposed a new algorithm of recovering fingerprints with the use of a new ML Pix2Pix model and skeleton image features of FP with 100% RR. In [130] a deep features-based Touchless 3D-Fingerprint classification system was proposed based on transfer DL model Alex Net-CNN which gave 90.20% accuracy. In [131] a CNN based finger stamping confirmation strategy was presented with no preprocessing of an image and gave 99.1% accuracy. [132] proposed an efficient deep fingerprint classification network (DFCN) model to achieve accurate performances of classifying between real and fake FP with 99.22% accuracy. [133] performed a classification gender based on FP using method CNN of accuracy level equal to 99.9667%. [134] proposed a novel 2D contactless FP matching method based on DL, named as Fingerprint Triplet-GAN (FTG), using generative adversarial network with EER of 3.4%. [135] exploited the relationship of spatial ridges in FP and proposed a novel method for liveness detection (FLD) method based on spatial ridges continuity (FLD-SRC) using deep CNN with 0.3 EER. [136] presented an algorithm for FP classification using CNN to process low quality images and gave 75.6% highest accuracy. [137] introduced a method for automatic determination about the architecture of a CNN model obtained for FP classification of 98.89% accuracy. [138] investigated a CNN for molecular FP prediction based on data acquired by mass spectrometry with 95% accuracy.

4.3.

IRIS-BASED DEEP LEARNING RELATED WORKS

Iris refers to muscles with the shape of a circular disc that have a tincture, which indicates the appeared color of the eye. The iris texture illustrates many significant features that make it a powerful biometric characteristic and can be adopted for authentication reason [139]. In [140] proposed a DL based framework for heterogeneous iris verification, namely Deep Iris, which computes the resemblance between couples of iris images based on CNNs with 0.15 EER. [141] proposed a new optimization and recognition process of iris features selection by using proposed Modified ADMM and DL Algorithm (MADLA) within RR more than 70%. [142] utilized an adaptive Gabor filter selection planning and DBN within 99.90% RR. [143] proposed a two-stage iris segmentation scheme based on CNN for iris segmentation in noisy environments of iris recognition (IR) by visible light camera sensor with error equal to 0.0034. [144] presented the diagnosis of iris nevus using CNN and DBN with RR of 93.67%. [145] described a processing chain based on CNNs that defined the regions-of-interest for Periocular Recognition of 99% RR. [146] explored a pre-trained CNNs on IR within highest RR of 98.8%. In [147] constructed a DL representation, named as (IrisConvNet), that integrated CNN and softmax classifier for images of the right and left irises within 100% RR. [148] proposed an algorithm based on CNN for iris sensor model identification and achieved accuracy exceeding 99%. [149] developed a deep feature fusion network based on CNN that exploited the complementary information presented in iris and periocular region with 0.60 EER. [150] proposed a combination of Convolutional and Residual network (MiCoRe-Net) for the eye introduction task with 99.08% accuracy. [151] developed an IR system for smartphones using Deep Sparse Filtering (DSF) along with matching strategies within 90% accuracy. [152] illustrated a new Presentation Attack Detection (PAD) technique for IR system (iPAD) for NIR camera. DL-based and handcrafted-based technique along with SVM were utilized and achieved acceptable results. [153] proposed a new method for IR by integrating the features adopted from both local and global iris spots. CNN and SVM were used with a NIR camera giving the best EER of 0.016. [154] evaluated the extracted learned features from a pre-trained CNN (Alex-Net Model) followed by a multi-class (SVM) algorithm to perform classification within 98.3% accuracy. In [155] an adaptive architecture, named irisConvDeeper, was proposed with 99.57% RR for CASIA-Iris-V3 dataset. [156] proposed a DL method depending on the Capsule Network architecture for IR and achieved 99.37% accuracy. [157] investigated cross-spectral IR using a range of DL architectures within EER equal to 4.50%. [158] proposed a DL based unified and generalizable framework for accurate iris detection, segmentation, and recognition within EER of 1.12%. [159] introduced a new technique that quickly locates a coarse iris box spot without segmenting the region of the iris based on deep ResNet with 1.331 EER. [160] illustrated an IR algorithm depending on a multi-layer analogous convolutional frame and cooperating representation to give a solution to the high intra-class difference with 99% accuracy. [161] proposed a network model that fully dilated convolution combining U-Net (FD-UNet) within 97.36% F1 score. [162] applied the enhanced images through fuzzy operations to train DL methods with 89.2% accuracy. [163] proposed iris image augmentation based on a conditional Generative Adversarial Network (cGAN) with EER 2.96. [164] investigated a new DL based approach for IR with EER equal to 7.14%. [165] explored single image super-resolution using CNNs for IR with 27.54 EER. In [166] used DL-based iris segmentation models and the EER reach less than 1%. In [167] three distinct models based on the ensemble of Convolutional and Residual blocks were proposed to enrich heterogeneous (cross-sensor) IR with lower EER of 1.01%. [168] presented a method of heterogeneous IR based on an entropy feature lightweight NN under multi-source feature fusion with more than 99% RR. In [169] an application of the combined network model based on EfficinetNet-b0 was presented within 99.65% accuracy. [170] introduced an interactive variant of UNet for iris segmentation within 98.3% RR. [171] explored an efficient technique that used CNN and SVM for feature extraction and classification with 96.3% accuracy. [172] proposed a DL-based method that actively utilized the various features contained in periocular images with 11.51 EER. [173] trained DCNNs based on a large number of iris samples to extract iris features using T-Center loss and showed a 99.3% accuracy. In [174] introduced an effective DL based integrated model for precise iris recognition, segmentation, and detection within 99.14% maximum accuracy. [175] employed a DCNN based on partial convolution operators to extract iris features with 97.35% result. [176] proposed a few-shot learning approach for IR based on Model-Agnostic Meta-Learning (MAML) within 99.06% accuracy. [177] performed IR by the DL based YOLOv2 model and gave 99% precision. [178] presented DBN strategy for performing IR within maximum accuracy of 97.96%. [179] enhanced the quality of iris images by blurring the iris region and DL-based de-blurring along with SVM with EER of 7.49%. [180] train a novel condensed 2-channel (2-ch) CNN with few training samples for IR with 0.33% EER. [181] proposed a dense squeeze and excitation network (DenseSENet) to extract common features in cross-domain iris images with 99.06% accuracy. In [182] explored the IR problem using a basic CNN model and hybrid DL model with 97.8% accuracy. [183] proposed a fast IR method that required a single matching operation and was based on pre-trained image classification models as feature extractors from CNN within 99.99% accuracy. [184] proposed open-set IR based on DL with result of 99.00% accuracy. [185] proposed a multitask deep active contour model for off-angle iris image segmentation of EER of 0.159. [186] proposed a CNN model to develop robust, scale and rotation invariance, and scalable IR system within EER of 0.46%. [187] proposed a self-supervised framework utilizing the pix2pix conditional adversarial network within algorithm to generate iris masks. [188] proposed a novel iris identification framework that integrated the light-weight Mobile Net architecture with customized Arc Face and Triplet loss functions within EER of 0.45, and 99.99% accuracy.

4.4.

FINGER VEIN-BASED DEEP LEARNING RELATED WORKS

Identification for finger veins depends on vein patterns existing in the finger's palmar side which grow from fingertip to finger root. Veins can be defined as blood vessels bringing the blood to the heart and every person's veins have distinctive behavioral and physical features [189]. In [190] presented an effective approach to dorsal vein recognition within novel shape representation method to describe the geometric structure of the venous network and directly extract the CNN features with 99.27% RR. [191] presented a DCNN for the problem of plant identification from leaf vein patterns and gave 96:9% RR. In [192] proposed a PAD method for NIR camera-based finger-vein (FV) recognition system using CNN along with SVM for classification of 0.00 error. [193] proposed a DL model based on CNN to extract and regain vein features utilizing preceding knowledge and gave 1.42 EER. [194] proposed a FV recognition method that was robust in different kinds of DB and environmental alterations based on CNN with 0.396 EER. [195] adapted a FV recognition algorithm depending on feature block fusion and DBN (FBF-DBN) along with CNN with 99.6% accuracy. In [196] the CNN was obtained to handle the vein recognition task within regularized RBF network and achieved an 89.43% RR. In [197] presented a novel FV recognition algorithm by employing a secure biometric template scheme based on DL and random projections, named FVR-DLRP, with 91.2% result. [198] proposed a lightweight DL framework for FV verification with three Convolution layers and gave EER equal to 0.10. [199] proposed a global Max-Pooling to maintain the spatial location information which is on the feature maps of convolutional layer to determine the details of finger vein. The discriminative deep convolutional features 96.81% accuracy. In [200] a novel hand-dorsa vein recognition model was presented by using DNN trained previously within convolutional activations as the region representation. SVM was used for classification and showed 0.068 EER. [201] proposed a CNN-based FV identification system with 98.33% best accuracy. In [202] a FV and finger shape multimodal biometrics using NIR sensor depending on a DCNN were introduced and achieved more than 90 % accuracy. In [203] introduced a modern approach for the FV authentication based on CNN and Supervised Discrete Hashing and showed EER of 0.093. [204] proposed an end-to-end Finger Vein Graph Neural Network (GNN) (FVGNN) and a multi-stage DNN which was composed of an embedding network and an edge feature learning network. The most critical GNN showed 99.98% accuracy. [205] presented a lightweight and fully GAN architecture, known as FCGAN, which utilized previous batch normalization. A novel scheme FCGAN-CNN for FV classification showed accuracy more than 99%. [206] described a novel DL-based method that combined a Convolutional Auto-Encoder (CAE) with SVM for FV verification with 99.95% best RR. [207] proposed a new filter generation technique that could adopt the vein lines for FV recognition depending on the DL of Ridge Regression Classifier RRC with 99.89% RR. [208] tested a method that was less susceptible to noise and depended on the whole network using a deep Densely Connected Convolutional Network (Dense Net) with 2.35 EER. [209] obtained a novel biometric structure protection algorithm utilizing the Binary Decision Diagram BDD for DL based FV biometric systems with 98.70% accuracy. In [210] a CNN model pre-trained on ImageNet was utilized to build up a CNN-based local descriptor named CNN Competitive Order (CNN-CO) and showed 0.74 EER. [211] proposed a novel 3D renovation method to get the overall view 3D FV image. A corresponding 3D FV feature extraction and matching technique based on a lightweight CNN with depth wise Separable Convolution showed 0.94 EER. In [212] proposed a modern method of identifying FV within CNNs with center loss function and dynamic regularization and provided 99%.05 accuracy. [213] proposed a lightweight CNN, named Finger-Vein Recognition and AntiSpoofing Network (FVRAS-Net), which combined the recognition and antispoofing into a united CNN model by using Multitask Learning (MTL) and achieved 95% RR. In [214] a novel densely connected convolutional auto encoder was adopted on top of backbone deep CNNs for FV verification within RR more than 99.99%. [215] introduced a recognition model depending on the utilization of Convolutional and Recurrent networks that could obtain the FV structure within an array by camera with RR equal to 99.13%. [216] obtained an enhanced deep network, called Merge Convolutional Neural Network (Merge CNN), which depended on many CNNs that have short paths with RR of 99.56%. [217] illustrated a new FV patterns algorithm based on the improved CNN and Curvature Gray Feature Decomposition (CGFD) within RR of 98.4%. In [218] new technique for recovering optically blurred FV images within an enhanced cGAN and recognizing the recovered FV images by a deep CNN with 1.8 EER. In [219] a score-level fusion was done for two outcome scores of DCNN extracted from both texture and shape of image, achieving EER equal to 0.05. [220] presented an end-to-end model to get the textures of vein through combining the FCN with Conditional Random Field (CRF) with 0.36 EER. In [221] provided a lightweight image improved technique for individual identification by FV based on CNN with more than 99.84%. accuracy. [222] showed a lightweight algorithm for FV recognition and conformity that obtains a lightweight convolutional model in the backbone network and adopted a triplet loss function in the training, giving within 99.6% accuracy. In [223] a FV recognition method was proposed that utilized Multi-Receptive Field Bilinear CNN (MRFBCNN) with more than 99% accuracy. [224] showed a new loss function, named as Arccosine Center Loss, for FV recognition based on DCNN with 99.79% accuracy. [225] proposed an attention mechanism, known as Joint Attention (JA) module, and built a new FV authentication architecture, called JA Finger Vein Network (JAFVNet), with 0.08 EER. In [226] a recognition system of FV with template protection is introduced. The recognition performance depending on DCNN achieved RR beyond 96%. [227] presented a fusion loss by consolidating the classification and metric learning loss to train a 6-layer simple CNN, reaching 0.21 EER. [228] illustrated a new method for enhancing the FV recognition performance by recovering motion blurred FV images using a modified de-blur GAN and DCNN, giving 0.85 EER. [229] utilized DCNN models for feature extraction reasons and employed the Triplet loss network model of one-shot learning with accuracy more than 95%. [230] propose a novel CNN-based FV recognition approach with bias field correction and spatial attention mechanism, giving accuracy of 99.53%. In [231] provided an Xception model which is pre-trained CNN based residual interconnection for FV recognition, achieving 99% accuracy. In [232] a novel Trilateral Iterative Hermitian Feature Transformation based Deep Perceptive Fuzzy Neural Network (TFHFT-DPFNN) model was presented to learn biometric features that gave a robust and accurate verification of FV within 98% accuracy. [233] proposed GCNN with receptive fields reaching 95.22% best accuracy. In [234] a deep neural network named Hierarchical Content-Aware Network (HCAN) was proposed to extract the discriminative hierarchical features of FV, giving 0.97 EER. In [235] a new GAN, known as Triplet-Classifier GAN, was implemented for FV verification by using the triplet loss-based CNN, attaining 0.03 EER. [236] delved into Vision Transformer (ViT)-based method and presented a novel model FVT for authentication reason with EER of 1.50.

5.

PERFORMANCE MEASUREMENTS

This section provides an illustration about the kinds of utilized public datasets of the intended biometrics and compares the related introduced works having the best and highest results using a specific dataset.

5.1.

PUBLIC DATASETS DESCRIPTION

The datasets that are popularly adopted for multiple biometric systems are introduced in this section as follow:

1.

FACIAL DATASETS

Labeled Faces in the Wild (LFW): It is popularly utilized in most face recognition models and basically contains 13,323 images belonging to 5,749 individuals, who are celebrities. The capturing is performed in unrestrained settings. When utilizing this in verification operation, the images are marshaled as 6,000 face pairs in 10 groups. Three benchmarking protocols are introduced within this kind of dataset including image-restricted protocol, unrestricted protocol, and unsupervised protocol. considered as very challenging, due to exhibits huge variations in pose, lighting, facial expression, age, gender, ethnicity and general imaging and environmental conditions.[237].

Yale and Yale Face B: This dataset is one of the modern face recognition datasets [238]. It includes 165 grayscale images belonging to 15 individuals. There are 11 images for every subject, one per variant facial expression or configuration. The upgraded version of this is the Yale Face Database B [239], which has 16,128 images of 28 subjects under 9 poses and 64 illumination circumstances. It has many variations in pose, illumination, expression, wearing glasses and the most important is the aging arti facts.

CMU Multi-PIE: This face database contains more than 750,000 images of 337 people [240], [241]. Subjects were imaged under 15 viewpoints and 19 illumination conditions while displaying a range of facial expressions.

YouTube Faces (YTF): This dataset is composed of 3,425 YouTube videos belonging to 1,595 celebrities. The dataset is arranged as 5,000 video pairs in 10 sections and utilized in a wide manner for face recognition [242].

AR Face database: This database includes 4,000 anterior images of 126 people's faces, under variant illuminations, occlusions, and expressions. This dataset is utilized mostly for face recognition and facial attribute recognition [243].

PolyU NIR Face: The Biometric Research Centre at the Hong Kong Polytechnic University developed a NIR face capture device and utilized it to obtain a large NIR face database [244]. By adopting the device, a set of NIR face images is gathered from 335 individuals. Every record has 100 images from everyone, so the overall count of images is nearly 34,000. It provides a platform for developing and evaluating various near-infrared face recognition techniques, with samples having various variations of expression, pose, scale, focus, time, etc.

MORPH: It is widely employed to estimate facial attributes and contain two albums of face images having properties like age, gender, ethnicity, etc. Album 1 has 1,724 images of 515 people, while Album 2 includes 55,134 images belonging to 13,000 individuals [245].

VGGFace2: This represents a large-scale face recognition dataset. Images are obtained from Google image search engine and have a large difference in pose, age, illumination, ethnicity, and declaration. It includes 3.31 million images from 9,131 subjects (identities), with an average of 362.6 images for every subject. The distribution of faces for multiple identities is changed from 87 to 843 [246].

IJB-A [247]: this dataset consists of 5,712 images and 2,085 videos from 500 identities, with an average of 11.4 images and 4.2 videos per identity, with a wide variation in pose, illumination, expression, resolution, and occlusion.

CASIA-WebFace: This dataset is composed of 494,414 face images of 10,575 real identities collected from the web. This is utilized for face identification and face verification tasks [248].

Adience: This is the most popular dataset utilized for age and gender determination that depends on unconstrained facial images. It is composed of 26,580 images gathered from 2,284 people with age labels categorized into eight groups [249].

MS-Celeb: Microsoft Celeb is a dataset of 10 million face images harvested from the Internet for the purpose of developing face recognition technologies. The images are of nearly 100,000 individuals [250].

CelebA: CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset having more than 200,000 images of celebrity individuals [251]. CelebA has a wide variation, big quantities, and fertile annotations, containing more than 10,000 identities and more than 202,599 face images with known locations and 40 binary attributes observation for every image.

MegaFace: This dataset is utilized in a great manner to test the attitude of facial recognition algorithms (identification and verification) [252]. This dataset contains 1 million images from 690,000 identities gathered from Flickr [253]. The realization sets are two databases, FaceScrub and FGNet. The FaceScrub dataset has 106,863 images of 530 celebrities, while the FGNet is mainly utilized for testing age steady face recognition, with 1,002 images from 82 individual.

SDUMLA-HMT: This dataset captures faces with different poses including 8,904 images having different facial expressions with 3 kinds of directions of look (upward, forward, and downward), 4 kinds of expressions (smile, frown, surprise, close eyes), and 2 kinds of accessories (glasses, hat) [254].

FERET: this dataset is implemented for face recognition and contains 11,338 color images of size 512×768 pixels captured in a semi-controlled environment with 13 different poses from 994 subjects [255].

ORL: This dataset mainly owns only 10 images for every individual of a total 40 subjects, making the total count of images being 400. The face images differ in angular capturing, illumination, and facial expressions. Nearly all the faces have a straight frontal view, sometimes with a little rotation to the left or right direction. This dataset was created at Cambridge University [256].

2.

FINGERPRINT DATASETS

Fingerprint Verification Competition (FVC 2002): This was adopted to evaluate fingerprints and basically composed of three datasets (DB1, DB2, and DB3) gathered from multiple kinds of sensors. Every data set is composed of two collections of images: Set A has 100 subjects and 8 impressions for each subject, while Set B has 10 subjects and 8 impressions of every subject [257].

PolyU Fingerprint: This dataset includes two main databases of total 1,480 images captured from 148 fingers in high resolution (DBI and DBII). This dataset is presented by the Hong Kong Polytechnic University [258].

CASIA Fingerprint: CASIA Fingerprint V5 includes 20,000 images of 500 subjects [259], in which images are taken from eight fingers (left and right thumb, second, third, fourth finger) from the volunteers.

NIST SD27: this dataset contains 258 latent fingerprints along with corresponding reference fingerprints [260].

WVU DB: this was released by West Virginia University and included 449 latent fingerprints images with their corresponding reference [261].

SDUMLA-HMT fingerprint: this is mainly composed of 25,440 fingerprint images captured from multiple fingers including thumb, index, and middle one for both hands using 5 different kinds of sensors [254].

3-

IRIS DATASETS

CASIA-Iris-thousand: CASIA-Iris-1000 combines 20,000 iris images taken from 1,000 subjects and are captured with an IKEMB-100 camera. The main variations in the images come from both eyeglasses and lustrous reflection [262].

VSSIRIS: The images in this dataset are adopted from iPhone 5S and Lumia 1020 under unconstrained terms in a visible spectrum. The images are composed of 560 iris images obtained from 28 subjects, primarily from European countries [263].

UBIRIS: This dataset owns two distinguished versions: UBIRIS.v1 and UBIRIS.v2. The first version of this database is composed of 1,877 images collected from 241 eyes in two distinct sessions. It simulates less constrained imaging conditions [264]. The second version of the UBIRIS database has over 11,000 images (and continuously growing) and more realistic noise factors.

Mobile Iris Challenge Evaluation (MICHE I): This consists of iris images acquired without any restrictions within a smartphone. The images are mainly composed of more than 3,732 images adopted from 92 subjects using three variant smartphones [265].

IITD Iris: IIT Delhi iris dataset includes 2,240 images taken from 224 variant individuals within 320x240 pixels resolution. The images of iris in this dataset have a color distribution that is changeable, with many different sizes of iris [266].

Q-FIRE: 3,123 high-resolution iris images are included in this dataset captured from 5 ft and 2,902 images taken from 11 ft with low resolution. All the high- and low-resolution images are adopted from 160 subjects [267].

ND-CrossSensor-Iris-2013: This is composed of two main databases where the iris is captured based on two kinds of sensors: LG2200 and LG4000. The LG2200 has captured 116,564 iris images, while the LG4000 has 29,986 iris images. These images belong to 676 subjects [268].

MMU iris: This dataset contains 450 images with 5 images per iris and 2 irises per subject. All the images were taken using the LG Iris Access 2200 at a range of 7–25 cm [269].

SDUMLA-HMT iris dataset: This contains 1,060 iris images gathered from 6cm to 32cm distance using a device developed by University of Science and Technology of China under near infrared illumination. Each subject gave 10 images of iris, meaning 5 images for every eye [254].

4-

FINGER VEIN DATASETS

THU-FVFDT1: This contains raw finger vein and finger dorsal texture images of 220 different subjects captured in two different sessions with intervals of about dozens of seconds. One session is for training and the other for testing. Four finger vein images and four finger dorsal texture images were captured simultaneously in each session, resulting in 440 images of size 720×576 [270].

UTFVP: This database for finger vein recognition consists of 1,300 images from 60 clients. This was produced at the University of Twente, the Netherlands [271].

MMCBNU_6000 finger vein data set: this contains images of six fingers (ring, middle, and index finger from both hands) from 100 volunteers. There are 6,000 images for 100 subjects and 10 images per finger with 480x640 resolution [272].

PLUSVein-FV3: This is mainly composed of 360 finger images of palmar and dorsal from 60 variant subjects (ring, middle, and index finger from both hands) taken in single session with 5 samples for each finger. Two variants of the same sensor were utilized, one adopting NIR laser modules for lighting while the other employing NIR LEDs [273].

HKPU-FV: The Hong Kong Polytechnic University Finger VEIN image database is composed of images of finger vein and finger surface texture captured simultaneously from both males and females. This resulted in 6,264 images acquired from 156 subjects. The images were taken in two separate sessions with a minimum interval of one month and maximum interval of over six months, creating 24 images for everyone [274].

SDUMLA-HMT: Composed of 3816 images of 106 obtained by finger of type (index, middle, ring) for the two hands, and the gathering for every one of the 6 fingers is iterated for 6 times to acquire 6 finger vein images using 5 variant sensors [254].

6.

RESULTS

Deep learning architectures have been employed for biometrics to a great extent, and a set of performance measurements developed to evaluate these systems' behavior and the way they work.

6.1.

FACIAL BIOMETRIC-BASED DEEP LEARNING SYSTEMS RESULTS

Facial biometric systems are developed to fulfill various objectives and purposes, such as counterfeit detection, age estimation, gender recognition, and recognition. Each requires metrics that are specific to its efficacy. Verification is associated with the re-identification problem, in which the objective is to determine whether a given set of data matches samples that were previously registered. Performance is frequently assessed based on the precision of verification, particularly when the testing dataset is accessible. Another prevalent type of performance metric is Equal Error Rate (EER), which is the rate of error ascertained by a criterion from which Equal False Negative Rate and False Positive Rate are derived. Alternately stated, the set of metrics utilized for face recognition comprises both close and open-set identification accuracy. In the interim, the attitude towards age estimation is assessed using the Mean Absolute Error (MAE) value. The proliferation of face biometric systems has prompted the development of an extensive assortment of algorithms and datasets. The purpose of this section is to provide an overview of the most effective works; therefore, this section describes the performance of several prospective DL-based biometric facial models and compares them to other models on well-known datasets. The outcomes of different deep learning-facial biometric systems that were applied to the LFW dataset for identification and recognition tasks are detailed in Table 4. Similarly, the outcomes of systems that utilized the YALE dataset are presented in Table 5.

TABLE 4:

FACE- BASED DEEP LEARNING RESULTS USING LFW DATASET

Method	Year	Architecture	Accuracy	EER
Tian L. et. al [34]	2016	Multiple Scales Combined DL	93.16%	-
Xiong C. et al [35]	2016	Deep Mixture Model (DMM), and Convolutional Fusion Network (CFN)	87.50 %	1.57
Al-Waisy, A. S., et al. [44]	2017	Deep Belief Network DBN	98.83%	0.012
Zhuang, Ni, et al [47]	2018	deep transfer NN	84.34%	-
Santoso K, et al. [51]	2018	DL network using Triple loss	95.5	-
Li, Y., et al. [52]	2018	DCNN	97.2%	-
Luo, D, et al. [54]	2018	deep cascaded detection method	99.43%	0.16
Kong, J, et al. [55]	2018	novel DLN	95.84%	-
Iqbal, M, et al. [56]	2019	DCNN	99.77%	-
Khan, M Z., et al. [57]	2019	DCNN	97.9%	-
Elmahmudi, A., et al. [61]	2019	CNN + pre-trained VGG	99%	-
Wang, P., et al. [62]	2019	deep class-skewed learning method	99.9%	-
Bendjillali, R., et al. [63]	2019	DCNN	98.13%	-
Goel, T., et, al. [66]	2020	Deep Convolutional-Optimized Kernel Extreme Learning Machine (DC-OKELM	99.2%	0.04
Zhang, J., et al. [86]	2022	Lightened CNN	99.9%	-

TABLE 5:

FACE- BASED DEEP LEARNING RESULTS USING Yale and Yale FACE B DATASET

Method	Year	Architecture	Accuracy	EER
Tripathi, B. K. [46]	2017	One-Class-in-One-Neuron (OCON) DL	97.4 %	-
Kong, J, et, al. [55]	2018	Novel DLN	100%	-
Görgel, P., et al. [58]	2019	Deep Stacked De-Noising Sparse Auto encoders (DS-DSA)	98.16%	-
Li, Y. K., et al. [60]	2019	DL network L1-2D2PCANet	96.86%	0.77
Goel, T., et, al. [66]	2020	Deep Convolutional-Optimized Kernel Extreme Learning Machine (DC-OKELM)	-	6.67

FINGERPRINT BIOMETRIC-BASED DEEP LEARNING SYSTEMS RESULTS

The fingerprint models commonly report their performance results utilizing also the accuracy or EER. Table 6 provides a list of works that have the highest performance results of fingerprint-based deep learning during the last seven years in term of accuracy and/or ERR, depending on the kind of datasets.

TABLE 6:

PERFORMANCE RESULTS OF THE BEST FINGERPRINT- BASED DEEP LEARNING MODELS

Method	Year	Dataset	Architecture	Accuracy	EER
Kim, S., et al. [91]	2016	Collected	DBN	99.4%	-
Jeon, W. S. et al. [95]	2017	FVC	DCNN	97.2%	-
Wang, Z., et al. [96]	2017	NIST	Novel approach (D-LVQ)	99.075%	-
Peralta, D., et al. [100]	2018	Collected	DCNN	99.6%	-
Yu, Y., et al. [101]	2018	Collected	DCNN	96.46%	-
Lin, C., et al. [102]	2018	-	DCNN	99.89%	0.64
Jung, H. Y., et al [103]	2018	-	DCNN	98.6%	-
Yuan, C, et al [111]	2019	LivDet 2013	Deep Residual Network (DRN)	97.04%	-
Haider, Amir, et al. [115]	2019	Collected	DCNN	95.94%	-
Song, D., et al. [116]	2019	Collected	1-D CNN	-	0.06
Uliyan, D.M., et al. [118]	2020	LivDet 2013	Deep Boltzmann Machines along with KNN	96%	-
Liu, Feng, et al. [119]	2020	-	DeepPoreID	-	0.16
Yang, X., et al. [120]	2020	Collected	DCNN	97.1%	-
Arora, S., et al. [122]	2020	DigitalPersona 2015	DCNN	99.80%	-
Zhang, Z., et al. [124]	2021	-	DCNN	98.24%	-
Ahsan, M., et al. [125]	2021	Collected	Gabor filtering and DCNN+ PCA	99.87%	4.28
Leghari, M., et, al. [126]	2021	Collected	DCNN	99.87%	-
Li, H. [127]	2021	NIST	DCNN	98.65%	-
Lee, Samuel, et al [129]	2021	NIST	Proposed Pix2Pix DL model	100%	-
Nahar, P., et al. [131]	2021	-	DCNN	99.1%	-
Ibrahim, A.M., et al. [132]	2021	-	DCNN	99.22%	-
Gustisyaf, A.I., et al. [133]	2021	Collected	DCNN	99.9667%	-
Yuan, C., Yu, et al. [135]	2022	-	DCNN	-	0.3
Saeed, F., et, al [137]	0	FVC	DCNN	98.	-
	2			89%
	2

6.3.

IRIS BIOMETRIC-BASED DEEP LEARNING SYSTEMS RESULTS

Many recent iris-based models have reported their accuracy rates on various types of iris datasets for making recognition or authentication of identity. The iris is considered one of the strongest biometrics to provide the desired security level. Deep learning for iris system have shown a tremendous performance even if there are some issues related with the complexity of the iris texture. The public iris datasets have been employed in many systems. Table 7 and Table 8 show the performance of DL-based iris recognition systems using IITD and CASIA-Iris-thousand datasets respectively. Table 9 illustrates other public dataset results including UBIRIS, ND-Cross Sensor and CASIA-V4.

TABLE 7:

IRIS- BASED DEEP LEARNING MODEL USING IITD DATASET RESULTS

Method	Year	Architecture	Accuracy	EER
Al-Waisy, Alaa S., et al. [147]	2018	DCNN + softmax	100%	-
Alaslani, M.G. [154]	2018	Alex-Net + SVM	98.3%	-
Chen, Ying, et al. [155]	2019	DCNN + softmax	98.1%	-
Liu, Ming, et al. [162]	2019	DCNN	86.8%	-
Chen, Y., et al. [173]	2020	DCNN	99.3%	0.74
Chen, Y., et al. [175]	2021	DCNN	97.24%	0.18
Chen, Ying, et al. [181]	2021	DenseSENet	99.06%	0.945
Alinia Lat, Reihan, et al. [188]	2022	DCNN	99.99%	0.45

TABLE 8:

IRIS- BASED DEEP LEARNING MODEL USING CASIA-IRIS-THOUSAND DATASET

Method	Year	Architecture	Accuracy	EER
Liu, N., et al. [140]	2016	DCNN	-	0.15
Nguyen, K., et al. [146]	2017	DCNN	98.8%	-
Alaslani, M.G. [154]	2018	Alex-Net Model + SVM	96.6%	-
Lee, Y.W., et al. [159]	2019	Deep ResNet	-	1.3331
Liu, Ming, et al. [162]	2019	DCNN	83.1%	0.16
Chen, Y., et al. [175]	2021	DCNN	99.14%	-
Alinia Lat, Reihan, et al. [188]	2022	DCNN	99.84%	1.87

TABLE 9:

IRIS- BASED DEEP LEARNING MODEL USING MULTIPLE KINDS OF IRIS DATASETS

Dataset	Method	Architecture	Accuracy	EER
CASIA-V4	He, Fei, et al. [142]	Gabor + DBN	99.998%	-
	Wang, Zi, et al. [150]	Convolutional and Residual network	99.08%	-
	Zhang, Wei, et al. [161]	Fully Dilated U-Net (FD-UNet)	97.36%	-
	Azam, M.S., et al. [171]	DCNN + SVM	96.3%	-
	Chen, Y., et al. [175]	DCNN	97.35%	1.05

UBIRIS	Proença, H. et al. [145]	DCNN	99.8%	0.019
	Wang, Zi, et al. [150]	Convolutional and Residual network	96.12%	-
	Zhang, Wei, et al. [161]	Fully Dilated U-Net (FD-UNet)	94.81%	-
	Shirke, S.D., et al. [178]	DBN	97.9%	-

ND	Nguyen, Kien, et, al. [146]	Pre-trained CNNs	98.7%	-
ND	Zhang, Wei, et al [161]	Fully Dilated U-Net (FD-UNet)	96.74%	-

6.4.

FINGER VEIN BIOMETRIC-BASED DEEP LEARNING SYSTEMS RESULTS

Finger vein is one of the newly utilized characteristics in biometric systems. It is also one of the most confident characteristics, where its vast number of properties allow it to excel compared to other types of traits. Deep learning techniques have been widely employed in finger vein-based systems and provided the desired performance rates in terms of accuracy and error rate. Table 10 shows a set of related works for finger vein-based deep learning models for multiple kinds of datasets which shows the highest results.

TABLE 10:

PERFORMANCE RESULTS OF THE BEST FINGER VEIN-BASED DEEP LEARNING MODELS

Method	Year	Dataset	Architecture	Accuracy	EER
Nguyen, Dat Tien, et al [192]	2017	-	CNN + SVM	-	0.00
Chen, Cheng, et al. [195]	2017	Collected	DBN + CNN	99.6%	-
Fang, Y. et al. [198]	2018	MMCBNU	DCNN	-	0.10
Wang, Jun, et al. [200]	2018	PolyU	CNN + SVM	-	0.068
Das, Rig, et al. [201]	2018	UTFVP	CNN	98.33%	-
Xie, C., et al. [203]	2019	-	CNN + Supervised Discrete Hashing	-	0.093
Li, J., et al [204]	2019	SDUMLA	Graph Neural Network (GNN)	99.98%	-
Zhang, J., et al. [205]	2019	SDUMLA	Fully Convolutional GAN + CNN	99.15%	0.87
Hou, B., et al. [206]	2019	FV-USM	Convolutional Auto-Encoder (CAE) + SVM	99.95 %	0.12
Kamaruddin, N.M., et al. [207]	2019	FV-USM	PCANET	100%	-
Yang, W., et al. [209]	2019	MMCBNU	Proposed DL (multilayer extreme learning machine + binary decision diagram (BDD))	98.70%	-
Zhao, D., et al. [212]	2020	MMCBNU	DCNN	99%.05	0.503
Kuzu, R.S. [214]	2020	SDUMLA	DCNN + Autoencoder,	99.99%	0.009
Kuzu, R., et al. [215]	2020	Collected	CNN + LSTM	99.13%.	-
Boucherit, I., et al. [216]	2020	THU-FVFDT2	DCNN	99.56%.	-
Zhao, Jia-Yi, et al. [217]	2020	FV-USM	DCNN	98%	-
Noh, K. J., et al. [219]	2020	HKPolyU	DCNN	-	0.05
Zeng, J., et al. [220]	2020	MMCBNU	RNN + Conditional Random Field (CRF)	-	0.36
Bilal, A., et al. [221]	2021	SDUMLA	DCNN	99.84%	-
Shen, J, et al. [222]	2021	PKU-FVD	DCNN	99.6%	0.67
Wang, K., et, al. [223]	2021	FV-USM	Multi-Receptive Field Bilinear CNN	100%	-
Hou, B [224]	2021	FV-USM	DCNN	99.79%	0.25
Huang, J., et al. [225]	2021	MMCBNU	Joint Attention Finger Vein Network	-	0.08
Huang, Z., et al. [230]	2021	SDUMLA	DCNN	99.53%	-
Shaheed, K., et al. [231]	2022	SDUMLA	DCNN	99%	-
Muthusamy, D. [232]	2022	SDUMLA	Deep Perceptive Fuzzy NN (DPFNN)	98%	-
Hou, B., et al. [235]	2022	FV-USM	Triplet-Classifier GAN	99.66%	0.03

7.

DISCUSSION

Recently, DNN has become a hot research topic due to the benefits of its high results that are attained in many aspects of real-life applications. There have been a vast number of research results on the constancy analysis, stabilization, and related dilemmas for many kinds of biometric systems and networks in the literature. The high results obtained for all kinds of biometric based deep learning models (as shown in Tables 4 to 10) have indicated the success of utilizing and implementing the multiple kinds of deep learning architectures in the biometric systems and the powerful abilities that can be provided using different kinds of datasets. The results of aforementioned models for the four biometric characteristics have demonstrated that deep learning architectures own a common strong property, which is stability in giving the desired results from the system even if there is a slight difference in the obtained results. The slight difference in results can occur due to multiple reasons, such as the type of dataset used, inappropriate deep learning architecture, or an issue in the utilized preprocessing and feature extraction techniques. The literature has shown that CNN deep architecture achieved the highest results and is the most utilized and most successful biometric model in verification and recognition of individual. CNN deep architecture also gives high performance in gender recognition and spoof detection. Both iris and finger vein are considered to be modern utilized characteristics in biometric systems. Adopting iris and finger vein are challenging due to their complexity and vast number of features. However, deep learning can overcome these challenges and reveal promising model behaviors.

8.

CONCLUSION

In this work, a summary of the latest DL-based models using four popular biometric traits (face, fingerprint, iris, and finger vein) is provided. The recent related works (from 2016 to 2022) for the four biometric traits (nearly 50 works for each trait) using different deep learning architectures and datasets is presented. The many applications of the biometric based systems include recognition, verification, gender identification, and spoof detection. Deep learning has shown a tremendous performance in biometric systems with high results and minimum loss, as well as the ability to deal with many kinds of features of different sizes. The adoption of deep learning architectures in biometric systems is a powerful technique in classification of data, even when using challenging datasets of various types and sizes. However, there are some difficulties when trying to build up a biometric system within DL models to achieve the desired results, such as issues with the chosen dataset, issues from the feature extraction method, noise in collected data from real life, and selection of DL model. The DL model may also suffer from overfitting problem which can occur when there is a disparity in the percentage of error in the trained samples and the samples in testing. This disparity can be caused by owning a large number of parameters in relation to the observations count. An interesting area of focus for future research can be to propose a novel integrated deep learning system to classify the four biometric traits discussed in this work.

Idioma:: Inglés

Calendario de la edición:: 2 veces al año
Temas de la revista:: Ingeniería, Introducciones y reseñas, Historia de la Ingeniería, Ingeniería eléctrica, Fundamentos de ingeniería eléctrica, Electrónica, Tecnologías de la información

RSS Feed de revista

Deep Learning Models for Biometric Recognition based on Face, Finger vein, Fingerprint, and Iris: A Survey

Saif Mohanad Kadhim

Johnny Koh Siaw Paw

Yaw Chong Tak

Shahad Ameen

Categoría del artículo: Article

Publicado en línea: 15 jun 2024

Páginas: 117 - 157

Recibido: 23 may 2024

Aceptado: 07 jun 2024

DOI: https://doi.org/10.2478/jsiot-2024-0007

Palabras claveBiometrics, Deep Learning, Face Recognition, Fingerprint Recognition, Iris Recognition, Finger vein Recognition

© 2023 Saif Mohanad Kadhim et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Biometrics, Deep Learning, Face Recognition, Fingerprint Recognition, Iris Recognition, Finger vein Recognition