Uneingeschränkter Zugang

Deep Learning for Sign Language Recognition: A Comparative Review

, ,  und   
15. Juni 2024

Zitieren
COVER HERUNTERLADEN

Figure 1:

Paper Organization
Paper Organization

Figure 2:

Samples of sign language datasets.
Samples of sign language datasets.

Figure 3:

Samples of gesture datasets
Samples of gesture datasets

Figure 4:

The procedural stages of sign language recognition
The procedural stages of sign language recognition

Figure 5:

Sample images (class 9) from NUS hand posture dataset-II (data subset A), showing the variations in hand posture sizes and appearances.
Sample images (class 9) from NUS hand posture dataset-II (data subset A), showing the variations in hand posture sizes and appearances.

Related works’ Classifiers employed in SLR using DL_

Author year Input modality Classifier result
[129] 2018 Static DCNN 92.4%
[131] 2018 Static DCNN 99.85%
[133] 2018 Static DCNN 85.3 %
[134] 2018 Static restricted Boltzmann machine 98.13 %
[135] 2018 Isolated LRCNs and 3D CNNs 99 %
[136] 2018 Static DAN 73.4%
[137] 2018 Static (CNNs) of variant depth sizes and stacked denoising autoencoders 92.83%
[139] 2018 Static DCNN 82.5%
[142] 2018 Static DCNN 90.3 %
[145] 2018 Isolated DCNN 88.59%
[146] 2018 Continues CNN-HMM hybrid 7.4 error
[147] 2018 Static DCNN 98.05 %
[151] 2018 Isolated 3DCNN, and enhanced fully connected (FCRNN) 69.2 %
[155] 2019 Continues Deep Capsule networks and game theory 92.50%
[156] 2019 Continues Hierarchical Attention Network (HAN) and Latent Space 82.7 %
[157] 2019 Static DCNN 93.667%
[160] 2019 Static DCNN 97 %
[161] 2019 Continues DCNN 2.80 WER
[162] 2019 Continues Isolated Modified LSTM 72.3%89%
[167] 2019 Isolated DCNN based Dense NET 90.3 %
[168] 2019 Static DCNN 97.71%
[176] 2020 Static DCNN 90%
[181] 2020 Static DCNN 97.6%
[184] 2020 Static Eight CNN layers+ stochastic pooling, batch normalization and dropout 89.32 %
[185] 2020 Isolated Cascaded model (SSD, CNN, LSTM) 98.42 %
[187] 2020 Static Deep Elman recurrent neural network 98.89 %
[188] 2020 Static DCNN 93%
[190] 2020 Static Enhanced Alex Net 89.48%
[198] 2020 Static Multimodality fine-tuned VGG16 CNN+ Leap Motion network 82.55%
[199] 2020 Continues Multi-channel CNN 10.8 WER
[200] 2020 Static Hybrid model based on the Inception v3+ SVM 99.90%
[201] 2020 Static 11 Layer CNN 95%
[205] 2021 Static Three-layered CNN model 90.8%
[206] 2021 Isolated Hybrid deep learning with convolutional (LSTM)+ and BiLSTM. 76.21%
[209] 2021 Isolated DCNN+ Sentiment analysis 99.63%
[211] 2021 Continues GRU+LSTM 19.56error
[214] 2021 Isolated Generic temporal convolutional network 77.42%
[215] 2021 Static DCNN 96.65%
[216] 2021 Static DCNN 99.7%
[220] 2021 Static Pretrained InceptionV3+ Mini-batch gradient descent optimizer 85%
[221] 2021 Static Apply the PSO algorithm to find the optimal parameters of the convolutional neural networks 99.58%
[223] 2021 Continues Visual hierarchy to lexical sequence alignment network H2SNet 91.72%
[227] 2021 Static Novel lightweight deep learning model based on bottleneck motivated from deep residual learning 99.52%
[228] 2021 Continues Novel hyperparameter based optimized Generative Adversarial Networks (H-GANs) 97%
[229] 2021 Isolated 3DCNN 88.24%
[232] 2021 Continues Bidirectional encoder representations from transformers (BERT) + ResNet 23.30 WER
[234] 2021 Continues Generative Adversarial Network (SLRGAN) 23.4 WER
[238] 2021 Static DCNN 97%
[239] 2022 Static Optimized DCNN hybridization of Electric Fish Optimization (EFO), and Whale Optimization Algorithm (WOA) called Electric Fish based Whale Optimization Algorithm (E-WOA). 98.7%
[241] 2022 Isolated CNN+ RNN 98.8%
[242] 2022 Static Modified CapsNet architecture, (SLR-CapsNet) 99.60%
[245] 2022 Static DCNN 99.52%
[247] 2022 Static DCNN+ diffGrad optimizer 88.01%
[250] 2022 Static DCNN 92%
[251] 2022 Static DCNN 99.38%
[252] 2022 Static Lightweight CNN 94.30%
[254] 2022 Isolated Hybrid model based on VGG16-BiLSTM 83.36%

Related works on SLR using DL that address overfitting problem_

Author(s) Year dataset Model technique result
[129] 2018 NTU DCNN Augmentation 92.4%
[130] 2018 Collected Modified VGG net Dropout 84.68%
[132] 2018 Ishara-Lipi DCNN Dropout 94.88%
[133] 2018 Collected DCNN small convolutional filter sizes, Dropout, and learning strategy 85.3%
[136] 2018 HUST Deep Attention Network (DAN) data augmentation 73.4%
[142] 2018 ASL Finger Spelling A DNN Dense Net 90.3%
[143] 2018 Collected 3DCNN SGD 88.7%
[146] 2018 SIGNUM CNN-HMM hybrid Augmentation 7.4 error
[157] 2019 Collected DCNN Augmentation 93.667%
[79] 2019 Collected ResNet-152 batch size, Augmentation 55.28%
[163] 2019 Collected VGG-16 Dropout 95%
[166] 2019 Collected DCNN Augmentation 95.83%
[167] 2019 Collected DCNN Dense Net 90.3%
[171] 2019 Collected LSTM Increase hidden state number 94.7%
[172] 2019 NVIDIA Squeeze-net Augmentation 83.29%
[173] 2019 G3D Four stream CNN Sharing of multi modal features with RGB spatial features during training and drop out 86.87%
[175] 2019 Collected DCNN Augmentation 98.9%.
[176] 2020 Collected DCNN Pooling Layer 90%
[181] 2020 Collected DCNN Reduce epochs to 30, and dropout added after each maxpooling 97.6%
[184] 2020 Collected CNN with 8 layers Augmentation 89.32 %
[188] 2020 MNIST CNN Dropout 93%
[190] 2020 Collected Enhanced Alex Net Augmentation 89.48%
[191] 2020 Collected SVM Augmentation, and k-fold cross validation 99.9%
[193] 2020 KETI CNN+LSTM New data augmentation 96.2%
[194] 2020 Collected VGG16, and ResNet152 with enhanced softmax layer Augmentation 99%
[196] 2020 Collected RNN-LSTM dropout layer (DR) 99.81%
[201] 2020 Collected CNN dropout layer, and augmentation 95%
[203] 2020 NTU 2 stream CNN randomness in the features interlocking fusion with dropout 93.01%
[207] 2021 Jochen-Triesch’s DCNN two dropouts 99.96%
[214] 2021 Collected Generic temporal convolutional network (TCN) Dropout 77.42%
[215] 2021 Collected DCNN Dropout 96.65%
[216] 2021 Collected DCNN Cyclical learning rate method 99.7%
[217] 2021 MU Modified AlexNet and VGG16 Augmentation 99.82%
[222] 2021 Collected CNN Dropout 97.62%
[229] 2021 Collected 3DCNN Dropout & Regularization 88.24%
[236] 2021 Collected ResNet-18 Zero-patience stopping criteria 93.4%
[238] 2021 Collected DCNN Synthetic Minority Oversampling Technique (SMOTE) 97%
[240] 2022 Collected DCNN Augmentation 99.67%
[253] 2022 Collected ResNet50-BiLSTM Augmentation 99%
[256] 2022 Collected LSTM, and GRU Dropout 97%
[263] 2022 BdSL CNN Augmentation 99.91%

Public sign language datasets

Dataset Language Equipment Modalities Signers Samples
ASL alphabets [45] American Webcam RGB images - 87,000
MNIST [46] American Webcam Grey images - 27,455
ASL Fingerspelling A [47] American Microsoft Kinect RGB and depth images 5 48,000
NYU [48] American Kinect RGB and depth images 36 81,009
ASL by Surrey [49 American Kinect RGB and depth images 23 130,000
Jochen-Triesch [50] American Cam Grey images with different background 24 720
MKLM [51] American Leap Motion device and a Kinect sensor RGB and depth images 14 1400
NTU-HD [52] American Kinect sensor RGB and depth images 10 1000
HUST [53] American Microsoft Kinect RGB and depth images 10 10880
RVL-SLLL [54] American Cam RGB video 14
ChicagoFSWild [55] American Collected online from YouTube RGB video 160 7,304
ASLG-PC12 [56] American Cam RGB video - 880
American Sign Language Lexicon Video (ASLLVD) [57] American Cam RGB videos of different angles 6 3,300
MU [58] American Cam RGB images with illumination variations in five different angles 5 2515
ASLID [59] American Web cam RGB images 6 809
KSU-SSL [60] Arabic Cam and Kinect RGB Videos with uncontrolled environment 40 16000
KArSL [61] Arabic Kinect V2 RGB video 3 75,300
ArSL by University of Sharjah [62] Arabic Analog camcorder RGB images 3 3450
JTD [63] Indian Webcam RGB images with 3 different backgrounds 24 720
IISL2020 [64] Indian Webcam RGB video with uncontrolled environment 16 12100
RWTH-PHOENIX-Weather 2014 [65] German Webcam RGB Video 9 8,257
SIGNUM [66] German Cam RGB Video 25 33210
DEVISIGN-D [67] Chinese Cam RGB videos 8 6000
DEVISIGN-L [67] Chinese Cam RGB videos 8 24000
CSL-500 [68] Chinese Cam RGB, depth and skeleton videos 50 25,000
Chinese Sign Language [69] Chinese Kinect RGB, depth and skeleton videos 50 125000
38 BdSL [70] Bengali Cam RGB images 320 12,160
Ishara-Lipi [71] Bengali Cam Greyscale images - 1800
ChaLearn14 [72] Italian Kinect RGB and depth video 940 940
Montalbano II [73] Italian Kinect RGB and depth video 20 940
UFOP–LIBRAS [74] Brazilian Kinect RGB, depth and skeleton videos 5 2800
AUTSL [75] Turkish Kinect v2 RGB, depth and skeleton videos 43 38,336
RKS-PERSIANSIGN [76] in Persian Cam RGB video 10 10,000
LSA64 [77] Argentine Cam RGB video 10 3200
Polytropon (PGSL) [78] Greek Cam RGB video 6 840
kETI [79] Korean Cam RGB video 40 14,672

Gesture public datasets

Name Modality device signers samples
LMDHG [82] RGB, and depth videos Kinect and 21 608
SHREC Shape Retrieval Contest (SHREC) [83] RGB, and depth videos Intel RealSense short range depth camera 28 2800
UTD–MHAD [84] RGB, depth and skeleton videos Kinect and wearable inertial sensor 8 861
The Multicamera Human Action Video Data (MuHAVi) [85] RGB video 8 camera views 14 1904
NUMA [86] RGB, depth and skeleton videos 10 Kinect with three different views 10 1493
WEIZMANN [87] Low resolution RGB video Camera with 10 different viewpoints 9 90
NTU RGB [88] RGB, depth and skeleton videos Kinect 40 56 880
Cambridge hand gesture [89] RGB video captured under five different illuminations Cam 9 900
VIVA [90] RGB, and depth videos Kinect 8 885
MSR [91] RGB, and depth videos Kinect 10 320
CAD-60 [92] RGB and depth video in different environments, such as a kitchen, a living room, and office Kinect 4 48
HDM05MoCap (motion capture) [93] RGB video Cam 5 2337
CMU [94] RGB images CAM 25 204
isoGD [95] RGB and depth videos Kinect 21 47,933
NVIDIA [96] RGB and depth video Kinect 8 885
G3D [97] RGB and depth video Kinect 16 1280
UT Kinect [98] RGB and depth video Kinect 10 200
First-Person [99] RGB and depth video RealSense SR300 cam 6 1,175
Jester [100] RGB Cam 25 148,092
Ego Guster [101] RGB and depth video Kinect 50 2,081
NUS II [102] RGB images with complex backgrounds, and various hand shapes and sizes Cam 40 2000

Related works on SLR using DL that address movement orientation, trajectory, occlusion problems_

Author(s) Year Type of variation language Signing mode Model Accuracy Error Rate
[129] 2018 similarities, and occlusion American Static DCNN 92.4%
[135] 2018 Movement Brazilian Isolated Long-term Recurrent Convolutional Networks 99% -
[138] 2018 size, shape, and position of the fingers or hands American Static CNN 82% -
[140] 2018 Hand movement American Isolated VGG 16 99% -
[144] 2018 Movement American Isolated Leap Motion Controller 88.79% -
[145] 2018 3D motion Indian Isolated Joint Angular Displacement Maps (JADMs) 92.14%
[150] 2018 head and hand movements Indian Continues CNN 92.88 % -
[155] 2019 Hand movement Indian Continues Wearable systems to measure muscle intensity, hand orientation, motion, and position 92.50% -
[156] 2019 Variant hand orientations Chines Continues Hierarchical Attention Network (HAN) and Latent Space 82.7% -
[165] 2019 Similarity and trajectory Chines Isolated Deep 3-d Residual ConvNet + BiLSTM 89.8% -
[166] 2019 orientation of camera, hand position and movement, inter hand relation Vietnam Isolated DCNN 95.83%
[173] 2019 Movement, self-occlusions, orientation, and angles Indian Continues Four stream CNN 86.87%
[174] 2019 Movement in different distance from the camera American Static Novel DNN 97.29% -
[176] 2020 Angles, distance, object size, and rotations Arabic Static Image Augmentation 90% 0.53
[180] 2020 fingers' configuration, hand's orientation, and its position to the body Arabic Isolated Multilayer perceptron+ Autoencoder 87.69%
[185] 2020 Hand Movement Persian Isolated Single Shot Detector (SSD) +CNN+LSTM 98.42%
[186] 2020 shape, orientation, and trajectory Greek Isolated Fully convolutional attention-based encoder-decoder 95.31% -
[192] 2020 Trajectory Greek Isolated incorporate the depth dimension in the coordinates of the hand joints 93.56% -
[195] 2020 finger angles and Multi finger movements Taiwan Continues Wristband with ten modified barometric sensors+ dual DCNN 97.5%
[196] 2020 movement of fingers and hands Chinese Isolated Motion data from IMU sensors 99.81% -
[197] 2020 finger movement Chinese Isolated Trigno Wireless sEMG acquisition system used to collect multichannel sEMG signals of forearm muscles 93.33%
[199] 2020 finger and arm motions, two-handed signs, and hand rotation Chinees Continues Two armbands embedded with an IMU sensor and multi-channel sEMG sensors are attached on the forearms to capture both arm, and finger movements - 10.8%
[76] 2020 Hand occlusion Persian Isolated Skeleton detection 99.8%
[204] 2020 Trajectory Brazilian Isolated Convert the trajectory information into spherical coordinates 64.33%
[210] 2021 Trajectory Arabic Isolated Multi-Sign Language Ontology (MSLO) 94.5%
[213] 2021 Movement Korean Isolated 3DCNN 91%
[214] 2021 finger movement Chines Isolated Design a low-cost data glove with simple hardware structure to capture finger movement and bending simultaneously 77.42%
[218] 2021 Skewing, and angle rotation Bengali Static DCNN 99.57 0.56
[219] 2021 Hand motion American Continues Sensing Gloves 86.67%
[223] 2021 spatial appearance and temporal motion Chines Continues Lexical prediction network 91.72% 6.10
[226] 2021 finger self-occlusions, view invariance Indian Continues Motion modelled deep attention network (M2DA-Net) 84.95%
[228] 2021 Occlusions of hand/hand, hands/face, or hands/upper body postures. American Continues Novel hyperparameter based optimized Generative Adversarial Networks (H-GANs) Deep Long Short-Term Memory (LSTM) as generator and LSTM with 3D Convolutional Neural Network (3D-CNN) as a discriminator 97% 1.4
[230] 2021 Variant view American Isolated 3-D CNN’s cascaded 96%
[233] 2021 Hand occlusion, Italian Isolated LSTM+CNN 99.08%
[237] 2021 Finger occlusion, motion blurring, variant signing styles. Chines Continues Dual Network up on a Graph Convolutional Network (GCN). 98.08%
[239] 2022 self-structural characteristics, and occlusion Indian Continues Dynamic Time Warping (DTW) 98.7%
[240] 2022 High similarity and complexity American Static DCNN 99.67% 0.0016
[241] 2022 Movement Arabic Isolated The difference function 98.8%
[259] 2022 Hand Occlusion American Static Re-formation layer in the CNN 91.40%
[260] 2022 Trajectory, hand shapes, and orientation American Isolated Media Pipe’s Landmarks with GRU 99%
[261] 2022 ambiguous and 3D double-hand motion trajectories American Isolated 3D extended Kalman filter (EKF) tracking, and approximation of a probability density function over a time frame. 97.98%
[262] 2022 Movement Turkish Continues Motion History Images (MHI) generated from RGB video frames 94.83%
[264] 2022 Movement Argentina Continues Propose an accumulative video motion (AVM) technique 91.8%
[269] 2022 orientation angle, prosodic, and similarity American continues Develop robust fast fisher vector (FFV) in in Deep Bi-LSTM 98.33%
[270] 2022 variant length, sequential patterns, English Isolated Novel Residual-Multi Head model 95.03%

Related works on SLR using DL that aim to achieve generalization_

Author(s) Year Datasets Technique Result
[129] 2018 ASL finger spelling ANTU DCNN 92.4%99.7%
[134] 2018 NYUMUASL Fingerspelling AASL Surrey Restricted Boltzmann Machine (RBM) 90.01%99.31%98.13%97.56%
[136] 2018 NTUHUST DAN 98.5%73.4%
[143] 2018 Collected CSLChaLearn14 3D-CNN 88.7%95.3%
[145] 2018 Collected MD05CMU JADM+CNN 88.59%87.92%87.27%
[146] 2018 RWTH 2012RWTH 2014SIGNUM CNN-HMM hybrid 30.0 WER32.57.4
[156] 2019 CollectedRWTH-2014 Hierarchical Attention Network (HAN) + Latent Space LS-HAN 82.7%61.6%
[161] 2019 RWTH-2014SIGNUM DCNN 22.86 WER2.80
[164] 2019 CSLIsoGD Proposed multimodal two-stream CNN 96.7%63.78%
[165] 2019 DEVISIGN-DCollected Deep 3-d Residual ConvNet + BiLSTM 89.8%86.9%
[170] 2019 KSU-SSLArSLRVL-SLLL 3D-CNN 77.32%34.90%70%
[173] 2019 Collected RGB-DMSRUT KinectG3D Four stream CNN 86.87%86.98%85.23%88.68%
[174] 2019 Jochen-TrieschMKLMNovel SI-PSL Novel DNN 97.29%96.8%51.88%
[182] 2020 KSU-SSLArSL by University of SharjahRVL-SLLL 3DCNN 84.38%34.9%70%
[186] 2020 PGSLChicagoFSWildRWTH 2014T DCNN 95.31%92.63%76.30%
[187] 2020 ASLMU Deep Elman recurrent neural network 98.89%97.5%
[192] 2020 GSLChicagoFSWild CNN 93.56%91.38%
[76] 2020 NYUFirst-Person, RKS-PERSIANSIGN CNN 4.64 error91.12%99.8%
[202] 2020 NUSAmerican fingerspelling A DCNN 94.7%99.96%
[203] 2020 HDM05CMUNTUCollected 2 stream CNN 93.42%92.67%94.42%93.01%
[204] 2020 UTD–MHADIsoGDCollected linear SVM classifier 94.81%67.36%64.33%
[207] 2021 Collected RGB images.Jochen-Triesch’s DCNN 99.96%100%
[210] 2021 LSA64LSACollected 3DCNN 98.5%99.2 %94.5%
[211] 2021 ASLG-PC12RWTH-2014 GRU and LSTM Bahdanau and Luong’s attention mechanisms 66.59%19.56% BLEU
[221] 2021 ASL alphabet, ASL MNIST MSL Optimized CNN based on PSO 99.58%99.58%99.10%
[225] 2021 KSU-ArSLJesterNVIDIA Inception-BiLSTM 84.2%95.8%86.6%
[226] 2021 CollectedNTUMuHAVi,WEIZMANNNUMA Motion modelled deep attention network (M2DA-Net) 84.95%89.98%85.12%82.25%88.25%
[228] 2021 RWTH-2014ASLLVD Novel hyperparameter based optimized Generative.Adversarial Networks (H-GANs) 73.9%97%
[232] 2021 RWTH-2014Collected Bidirectional encoder representations from transformers (BERT) + ResNet 20.123.30 WER
[233] 2021 Montalbano IIisoGDMSRCAD-60 LSTM+CNN 99.08%86.10%98.40%95.50%
[234] 2021 RWTH2014(CSL)(GSL) GAN 23.42.12.26
[237] 2021 CSL-500DEVISIGN-L Dual Network up on a Graph Convolutional Network (GCN). 98.08%64.57%
[242] 2022 SLDDMNIST Modified Caps Net architecture (SLR-Caps Net) 99.52%99.60%
[243] 2022 RKS-PERSIANSIGNFirst-PersonASVIDisoGD Single shot detector, 2D convolutional neural network, singular value decomposition (SVD), and LSTM 99.5%91%93%86.1%
[247] 2022 CollectedCollectedASL finger spelling DCNN+ diffGrad optimizer 92.43%88.01%99.52%
[248] 2022 38 BdSLCollectedIshara-Lipi BenSignNet 94.00%99.60%99.60%
[251] 2022 CollectedCollectedCollected DCNN 99.41%99.48%99.38%
[254] 2022 CollectedCambridge hand gesture Hybrid model based on VGG16-BiLSTM 83.36%97%
[255] 2022 CollectedMNIST,JTDNUS Hybrid Fist CNN 97.89%,95.68%94.90%95.87%
[256] 2022 ASLGSLAUTSLIISL2020 LSTM+GRU 95.3%94%95.1%97.1%
[261] 2022 CollectedSHRECLMDHG DLSTM 97.98%96.99%97.99%
[262] 2022 AUTSLCollected 3D-CNN 93.53%94.83%
[265] 2022 CSL-500JesterEgo Gesture deep R (2+1) D 97.45%97.05%94%
[266] 2022 MUHUST-ASL end-to-end fine-tuning method of a pre-trained CNN model with score-level fusion technique 98.14%64.55%
[269] 2022 SHRECCollectedLMDHG FFV-Bi-LSTM 92.99%98.33%93.08%

Related works on SLR using DL that address the various environmental conditions problem_

Author (s) Year Language Modality Type of condition Deal with technique results
[130] 2018 Bengali RGB images Variant background and skin colors Modified VGG net 84.68%
[134] 2018 American RGB images noise and missing data Augmentation 98.13%
[150] 2018 Indian RGB video Different viewing angles, background lighting, and distance Novel CNN 92.88%
[158] 2019 American Binary images Noise Erosion, closing, contour generation, and polygonal approximation, 96.83%
[159] 2019 American Depth image Variant illumination, and background Attain depth images 88.7%
[164] 2019 chines RGB, and depth video Variant illumination, and background Two-stream spatiotemporal network 96.7%
[173] 2019 Indian RGB, and depth video Variant illumination, background, and camera distance Four stream CNN 86.87%
[178] 2020 Arabic RGB images Variant illumination, and skin color DCNN 94.31%
[179] 2020 Arabic RGB videos Variant illumination, background, pose, scale, shape, position, and clothes Bi-directional Long Short-Term Memory (BiLSTM) 89.59%
[180] 2020 Arabic RGB Videos Variant illumination, clothes, position, scale, and speed 3DCNN and SoftMax function 87.69%
[182] 2020 Arabic RGB Videos Variations in heights and distances from camera Normalization 84.3%
[194] 2020 Arabic RGB images variant illumination, and background VGG16 and the ResNet152 with enhanced softmax layer 99%
[201] 2020 American Grayscale images illumination, and skin color Set the hand histogram 95%
[202] 2020 American RGB images Variant illumination, background DCNN 99.96%
[206] 2021 Indian RGB video Variant illuminations, camera positions, and orientations Google net+ BiLSTM 76.21%
[207] 2021 Indian RGB images Light and dark backgrounds DCNN with few numbers of parameters 99.96%
[209] 2021 American RGB video Noise Gaussian Blur 99.63%
[213] 2021 Korean Depth Videos Low resolution Augmentation 91%
[224] 2021 Bengali RGB images Variant backgrounds, camera angle, light contrast, and skin tone Conventional deep learning + Zero-shot learning ZSL 93.68%
[225] 2021 Arabic RGB video Variant illumination, background, and clothes Inception-BiLSTM 84.2%
[227] 2021 American Thermal images Varying illumination Adopt live images taken by a low-resolution thermal camera 99.52%
[229] 2021 Indian RGB video Varying illumination 3DCNN 88.24%
[230] 2021 American RGB video Noise, varying illumination Median filtering + histogram equalization 96%
[236] 2021 Arabic RGB images Variant illumination, and background Region-based Convolutional Neural Network (R-CNN) 93.4%
[239] 2022 Indian RGB video Variant illumination, and views Grey scale conversion and histogram equalization 98.7%
[241] 2022 Arabic RGB video Variant illumination, and background CNN+ RNN 98.8%
[249] 2022 Arabic Greyscale images Variant illumination, and background Sobel filter 97%
[253] 2022 Arabic RGB, and depth video Variant Background ResNet50-BiLSTM 99%
[259] 2022 American RGB, and depth images Noise and illumination variation Median filtering and histogram equalization 91.4%
[261] 2022 American Skeleton video Noise in video frames An innovative weighted least square (WLS) algorithm 97.98%
[270] 2022 English Wi-Fi signal Noise and uncleaned Wi-Fi signals. Principal Component Analysis (PCA) 95.03%

Related works on SLR using DL that address feature extraction problem_

Author(s) Year Dataset Technique Signing mode Feature(s) Result
[130] 2018 Collected DCNN static Hand shape 84.6%
[135] 2018 Collected 3D CNN Isolated spatiotemporal 99%
[138] 2018 ASL Finger Spelling CNN Static depth and intensity 82%
[141] 2018 RWTH-2014 3D Residual Convolutional Network (3D-ResNet) Continues Spatial information, and temporal connections across frames 37.3WER
[143] 2018 Collected 3D-CNNs Isolated spatiotemporal 88.7%
[144] 2018 Collected DCNN Isolated hand palm sphere radius, and position of hand palm and fingertip 88.79%
[149] 2018 ASL Finger Spelling Histograms of oriented gradients, and Zernike moments Static Hand shape 94.37%
[150] 2018 Collected CNN Continues Hand shape 92.88 %
[151] 2018 Collected 3DRCNN Continues/Isolated motion, depth, and temporal 69.2%
[152] 2018 SHREC Leap Motion Controller (LMC) sensor Isolated, static finger bones of hands. 96.4%
[153] 2018 Collected Hybrid Discrete Wavelet Transform, Gabor filter, and histogram of distances from Centre of Mass Static Hand shape 76.25%
[154] 2018 Collected DCNN Static Facial expressions 89%
[156] 2019 Collected Two-stream 3-D CNN Continues Spatiotemporal 82.7%
[158] 2019 Collected CNN Static Hand shape 96.83%
[79] 2019 Collected Open Pose library Continues human key points (hand, face, body) 55.2%
[159] 2019 ASL fingerspelling PCA Net Static hand shape (corners, edges, blobs, or ridges) 88.7%
[161] 2019 SIGNUM Stacked temporal fusion layers in DCNN Continues spatiotemporal 2.80WER
[162] 2019 Collected Leap motion device Continues Isolated 3D positions of the fingertips 72.3%89%
[163] 2019 Collected CNN Static Hand shape 95%
[164] 2019 CSL D-shift Net Continues spatial features time features, and temporal. 96.7%
[165] 2019 DEVISIGN_D B3D Res-Net Isolated spatiotemporal 89.8%
[166] 2019 Collected Local and GIST Descriptor Isolated Spatial and scene-based features 95.83%
[169] 2019 Collected Restricted Boltzmann Machine (RBM) Isolated Handshape, and network generated features 88.2%
[170] 2019 KSU-SSL 3D-CNN Isolated hand shape, position, orientation, and temporal dependence in consecutive frames 77.32%
[171] 2019 Collected C3D, and Kinect device Continues Temporal, and Skeleton 94.7%
[175] 2019 Collected Open Pose library with Kinect V2 Static 3D skeleton 98.9%.
[177] 2020 Ishara-Lipi Mobile Net V1 Isolated Two hands shape 95.71%
[178] 2020 Collected DCNN Static Hand shape 94.31%.
[179] 2020 Collected Single layer Convolutional Self-Organizing Map (CSOM) Isolated Hand shape 89.59%
[180] 2020 KSU-SSL Enhanced C3D architecture Isolated Spatiotemporal of hand and body 87.69 %
[182] 2020 KSU-SSL 3DCNN Isolated Spatiotemporal 84.3%
[185] 2020 Collected ResNet50 model Isolated Hand shape, Extra Spatial hand Relation (ESHR) features, and Hand Pose (HP), temporal. 98.42%
[186] 2020 Polytropon (PGSL) ResNet-18 Isolated Optical flow of skeletal, handshapes, and mouthing 95.31%
[187] 2020 Collected Discrete cosines transform, Zernike moment, scale-invariant feature transform, and social ski driver optimization algorithm Static Hand shape 98.89%
[189] 2020 RWTH-2014 Temporal convolution unit and dynamic hierarchical bidirectional GRU unit Continues spatiotemporal 10.73% BLEU
[191] 2020 Collected Standard score’ normalization on the raw Channel State Information (CSI) acquired from the Wi-Fi device, and MIFS algorithm Static, and continues The cross-cumulant features (unbiased estimates of covariance, normalized skewness, normalized kurtosis) 99.9%
[192] 2020 GSL Open Pose human joint detector Isolated 3D hand skeletal, and region of hand, and mouth 93.56%
[197] 2020 Collected Four channel surface electromyography (sEMG) signals Isolated time-frequency joint features 93.33%
[199] 2020 Collected Euler angle, Quaternion from IMU signal Continues Hand Rotation 10.8% WER
[76] 2020 RKS-PERSIANSIGN 3DCNNs Isolated Spatiotemporal 99.8%
[202] 2020 ASL fingerspelling A DCNN Static Hand Shape 99.96%
[203] 2020 Collected Construct a color-coded topographical descriptor from joint distances and angles, to be used in 2 streams (CNN) Isolated distance and angular 93.01%
[204] 2020 Collected Two CNN models and a descriptor based on Histogram of cumulative magnitudes Isolated Two hands, skeleton, and body 64.33%
[208] 2021 RWTH-2014T Semantic Focus of Interest Network with Face Highlight Module (SFoI-Net-FHM) Isolated Body and facial expression 10.89Bleu
[210] 2021 Collected (ConvLSTM) Isolated Spatiotemporal 94.5%
[212] 2021 Collected ResNet50 Static hand area, the length of axis of first eigenvector, and hand position changes. 96.42%.
[214] 2021 Collected f-CNN (fusion of 1-D CNN and 2-D CNN Isolated Time and spatial-domain features of finger resistance movement 77.42%
[217] 2021 MU Modified Alex Net and VGG16 Static Hand edges and shape 99.82%
[222] 2021 Collected VGG net of six convolutional layers Static Hand shape 97.62%
[224] 2021 38 BdSL DenseNet201, and Linear Discriminant Analysis Static Hand shape 93.68%
[225] 2021 KSU-ArSL Bi-LSTM Isolated spatiotemporal 84.2%
[226] 2021 Collected Paired pooling network in view pair pooling net (VPPN) Isolated spatiotemporal 84.95%
[228] 2021 ASLLVD Bayesian Parallel Hidden Markov Model (BPaHMM) + stacked denoising variational autoencoders (SD-VAE) + PCA Continues Shape of hand, palm, and face, along with their position, speed, and distance between them 97%
[230] 2021 ASLLVD 3-D CNN’s cascaded Isolated spatiotemporal 96.0%
[231] 2021 Collected leap motion controller Static, and Isolated sphere radius, angles between fingers their distance 91.82%
[232] 2021 RWTH-2014 (3 C 2 C 1) D ResNet Continues height, motion of hand, and frame blurriness levels 23.30WER
[233] 2021 Montalbano II AlexNet + Optical Flow (OF) + Scene Flow (SF) methods Isolated Pixel level, and hand pose 99.08%
[234] 2021 RWTH-2014 GAN Continues spatiotemporal 23.4WER
[235] 2021 MNIST DCNN Static Hand shape 98.58%
[236] 2021 Collected R-CNN Static Hand shape 93%
[237] 2021 CSL-500 Multi-scale spatiotemporal attention network (MSSTA) Isolated Spatiotemporal 98.08%
[242] 2022 MNIST modified CapsNet Static Spatial, and orientations 99.60%
[243] 2022 RKS-PERSIANSIGN Singular value decomposition SVD Isolated 3D hand key points between the segments of each finger, and their angles. 99.5%
[244] 2022 Collected 2DCRNN + 3DCRNN Continues Spatiotemporal out of small patches 99%
[246] 2022 Collected Atrous convolution mechanism, and semantic spatial multi-cue model Static Isolated pose, face, and hand, and Spatial, full frame, 99.85%
[253] 2022 Collected 4 DNN models using 2D and 3D CNN Isolated Spatiotemporal 99%
[255] 2022 Collected Scale-Invariant Feature Transformation (SIFT) Static Corner, edges, rotation, blurring, and illumination. 97.89%
[256] 2022 Collected InceptionResNetV2 Isolated Hand shape 97%
[257] 2022 Collected Alex net Static Hand shape 94.81%
[258] 2022 Collected Sensor + mathematical equations+ CNN Continues Mean, Magnitude of Mean, Variance, correlation, Covariance, and frequency domain features+ spatiotemporal 0.088WER
[260] 2022 Collected Media Pipe framework Isolated hands, body, and face 99%
[261] 2022 Collected Bi-RNN network, maximal information correlation, and leap motion controller Isolated hand shape, orientation, position, and motion of 3D skeletal videos. 97.98%
[264] 2022 LSA64 dynamic motion network (DMN)+ Accumulative motion network (AMN) Isolated spatiotemporal 91.8%
[265] 2022 CSL-500 Spatial–temporal–channel attention (STCA) is proposed isolated spatiotemporal 97.45%
[268] 2022 Collected SURF (Speeded Up Robust Features) Isolated distribution of the intensity material within the neighborhood of the interest point 99%
[269] 2022 Collected Thresholding and Fast Fisher Vector Encoding (FFV) Isolated Hand, palm, finger shape, and position and 3D skeletal hand characteristics 98.33%

Related works on SLR using DL that address segmentation problem_

Author(s) Year Input Modality Segmentation method Results
[131] 2018 RGB image HSV color model 99.85%
[148] 2018 RGB image Skin segmentation algorithm based on color information 94.7%
[149] 2018 RGB images k-means-based algorithm 94.37%
[158] 2019 RGB images Color segmentation by MLP network 96.83%
[159] 2019 Depth image Wrist line localization by algorithm-based thresholding 88.7%
[164] 2019 RGB, and depth video Aligned Random Sampling in Segments (ARSS) 96.7%
[168] 2019 RGB, and depth images Depth based segmentation using data of Kinect RGB-D camera 97.71%
[171] 2019 RGB video Design an adaptive temporal encoder to capture crucial RGB visemes and skeleton signees 94.7%
[179] 2020 RGB videos Hand semantic Segmentation named as DeepLabv3+ 89.59 %
[180] 2020 RGB Videos Novel method based on open pose 87.69 %
[182] 2020 RGB Videos Viola and Jones, and human body part ratios 84.3%
[183] 2020 RGB images Robert edge detection method 99.3 %
[185] 2020 RGB video SSD is a feed-forward convolutional network A Non-Maximum Suppression (NMS) step is used in the final step to estimate the final detection 98.42%
[187] 2020 RGB images Sobel edge detector, and skin color by thresholding 98.89%
[188] 2020 RGB images Open-CV with a Region of Interest (ROI) box in the driver program 93%
[189] 2020 RGB Videos Frame stream density compression (FSDC) algorithm 10.73 error
[199] 2020 RGB Videos Design an attention-based encoder-decoder model to realize end-to-end continuous SLR without segmentation 10.8% WER
[200] 2020 RGB images Single Shot Multi Box Detection (SSD) 99.90%
[209] 2021 RGB Video Canny 99.63%
[216] 2021 RGB images Erosion, Dilation, and Watershed Segmentation 99.7 %
[219] 2021 RGB Video Data sliding window 86.67%
[236] 2021 RGB images R-CNN 93%
[239] 2022 RGB videos Novel Adaptive Hough Transform (AHT) 98.7%
[246] 2022 RGB images, and video Grad Cam and Cam shift algorithm 99.85%
[248] 2022 Grey images YCbCr, HSV and watershed algorithm 99.60%,
[249] 2022 RGB images Sobel operator method 97 %
[263] 2022 RGB images Semantic 99.91%
[267] 2022 RGB images R-CNN 99.7%
[268] 2022 RGB video Mask is created by extracting the maximum connected region in the foreground assuming it to be the hand+ Canny method 99%
Sprache:
Englisch