Deep Learning for Sign Language Recognition: A Comparative Review

Abd Al-Latief, Shahad Thamear; Yussof, Salman; Ahmad, Azhana; Khadim, Saif

Otwarty dostęp

Deep Learning for Sign Language Recognition: A Comparative Review

Shahad Thamear Abd Al-Latief

Abd Al-Latief, Shahad Thamear

,

,

oraz

15 cze 2024

Journal of Smart Internet of Things

Tom 2024 (2024): Zeszyt 1 (Czerwiec 2024)

O artykule

Poprzedni artykuł

Następny artykuł

Zacytuj

Udostępnij

Pobierz okładkę

Kategoria artykułu: Article

Data publikacji: 15 cze 2024

Zakres stron: 77 - 116

Otrzymano: 27 maj 2024

Przyjęty: 05 cze 2024

DOI: https://doi.org/10.2478/jsiot-2024-0006

Słowa kluczowe
Sign language, Recognition, Deep Learning, Classification

© 2023 Shahad Thamear Abd Al-Latief et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

The procedural stages of sign language recognition

Sample images (class 9) from NUS hand posture dataset-II (data subset A), showing the variations in hand posture sizes and appearances.

Related works’ Classifiers employed in SLR using DL_

Author	year	Input modality	Classifier	result
[129]	2018	Static	DCNN	92.4%
[131]	2018	Static	DCNN	99.85%
[133]	2018	Static	DCNN	85.3 %
[134]	2018	Static	restricted Boltzmann machine	98.13 %
[135]	2018	Isolated	LRCNs and 3D CNNs	99 %
[136]	2018	Static	DAN	73.4%
[137]	2018	Static	(CNNs) of variant depth sizes and stacked denoising autoencoders	92.83%
[139]	2018	Static	DCNN	82.5%
[142]	2018	Static	DCNN	90.3 %
[145]	2018	Isolated	DCNN	88.59%
[146]	2018	Continues	CNN-HMM hybrid	7.4 error
[147]	2018	Static	DCNN	98.05 %
[151]	2018	Isolated	3DCNN, and enhanced fully connected (FCRNN)	69.2 %
[155]	2019	Continues	Deep Capsule networks and game theory	92.50%
[156]	2019	Continues	Hierarchical Attention Network (HAN) and Latent Space	82.7 %
[157]	2019	Static	DCNN	93.667%
[160]	2019	Static	DCNN	97 %
[161]	2019	Continues	DCNN	2.80 WER
[162]	2019	Continues Isolated	Modified LSTM	72.3%89%
[167]	2019	Isolated	DCNN based Dense NET	90.3 %
[168]	2019	Static	DCNN	97.71%
[176]	2020	Static	DCNN	90%
[181]	2020	Static	DCNN	97.6%
[184]	2020	Static	Eight CNN layers+ stochastic pooling, batch normalization and dropout	89.32 %
[185]	2020	Isolated	Cascaded model (SSD, CNN, LSTM)	98.42 %
[187]	2020	Static	Deep Elman recurrent neural network	98.89 %
[188]	2020	Static	DCNN	93%
[190]	2020	Static	Enhanced Alex Net	89.48%
[198]	2020	Static	Multimodality fine-tuned VGG16 CNN+ Leap Motion network	82.55%
[199]	2020	Continues	Multi-channel CNN	10.8 WER
[200]	2020	Static	Hybrid model based on the Inception v3+ SVM	99.90%
[201]	2020	Static	11 Layer CNN	95%
[205]	2021	Static	Three-layered CNN model	90.8%
[206]	2021	Isolated	Hybrid deep learning with convolutional (LSTM)+ and BiLSTM.	76.21%
[209]	2021	Isolated	DCNN+ Sentiment analysis	99.63%
[211]	2021	Continues	GRU+LSTM	19.56error
[214]	2021	Isolated	Generic temporal convolutional network	77.42%
[215]	2021	Static	DCNN	96.65%
[216]	2021	Static	DCNN	99.7%
[220]	2021	Static	Pretrained InceptionV3+ Mini-batch gradient descent optimizer	85%
[221]	2021	Static	Apply the PSO algorithm to find the optimal parameters of the convolutional neural networks	99.58%
[223]	2021	Continues	Visual hierarchy to lexical sequence alignment network H2SNet	91.72%
[227]	2021	Static	Novel lightweight deep learning model based on bottleneck motivated from deep residual learning	99.52%
[228]	2021	Continues	Novel hyperparameter based optimized Generative Adversarial Networks (H-GANs)	97%
[229]	2021	Isolated	3DCNN	88.24%
[232]	2021	Continues	Bidirectional encoder representations from transformers (BERT) + ResNet	23.30 WER
[234]	2021	Continues	Generative Adversarial Network (SLRGAN)	23.4 WER
[238]	2021	Static	DCNN	97%
[239]	2022	Static	Optimized DCNN hybridization of Electric Fish Optimization (EFO), and Whale Optimization Algorithm (WOA) called Electric Fish based Whale Optimization Algorithm (E-WOA).	98.7%
[241]	2022	Isolated	CNN+ RNN	98.8%
[242]	2022	Static	Modified CapsNet architecture, (SLR-CapsNet)	99.60%
[245]	2022	Static	DCNN	99.52%
[247]	2022	Static	DCNN+ diffGrad optimizer	88.01%
[250]	2022	Static	DCNN	92%
[251]	2022	Static	DCNN	99.38%
[252]	2022	Static	Lightweight CNN	94.30%
[254]	2022	Isolated	Hybrid model based on VGG16-BiLSTM	83.36%

Related works on SLR using DL that address overfitting problem_

Author(s)	Year	dataset	Model	technique	result
[129]	2018	NTU	DCNN	Augmentation	92.4%
[130]	2018	Collected	Modified VGG net	Dropout	84.68%
[132]	2018	Ishara-Lipi	DCNN	Dropout	94.88%
[133]	2018	Collected	DCNN	small convolutional filter sizes, Dropout, and learning strategy	85.3%
[136]	2018	HUST	Deep Attention Network (DAN)	data augmentation	73.4%
[142]	2018	ASL Finger Spelling A	DNN	Dense Net	90.3%
[143]	2018	Collected	3DCNN	SGD	88.7%
[146]	2018	SIGNUM	CNN-HMM hybrid	Augmentation	7.4 error
[157]	2019	Collected	DCNN	Augmentation	93.667%
[79]	2019	Collected	ResNet-152	batch size, Augmentation	55.28%
[163]	2019	Collected	VGG-16	Dropout	95%
[166]	2019	Collected	DCNN	Augmentation	95.83%
[167]	2019	Collected	DCNN	Dense Net	90.3%
[171]	2019	Collected	LSTM	Increase hidden state number	94.7%
[172]	2019	NVIDIA	Squeeze-net	Augmentation	83.29%
[173]	2019	G3D	Four stream CNN	Sharing of multi modal features with RGB spatial features during training and drop out	86.87%
[175]	2019	Collected	DCNN	Augmentation	98.9%.
[176]	2020	Collected	DCNN	Pooling Layer	90%
[181]	2020	Collected	DCNN	Reduce epochs to 30, and dropout added after each maxpooling	97.6%
[184]	2020	Collected	CNN with 8 layers	Augmentation	89.32 %
[188]	2020	MNIST	CNN	Dropout	93%
[190]	2020	Collected	Enhanced Alex Net	Augmentation	89.48%
[191]	2020	Collected	SVM	Augmentation, and k-fold cross validation	99.9%
[193]	2020	KETI	CNN+LSTM	New data augmentation	96.2%
[194]	2020	Collected	VGG16, and ResNet152 with enhanced softmax layer	Augmentation	99%
[196]	2020	Collected	RNN-LSTM	dropout layer (DR)	99.81%
[201]	2020	Collected	CNN	dropout layer, and augmentation	95%
[203]	2020	NTU	2 stream CNN	randomness in the features interlocking fusion with dropout	93.01%
[207]	2021	Jochen-Triesch’s	DCNN	two dropouts	99.96%
[214]	2021	Collected	Generic temporal convolutional network (TCN)	Dropout	77.42%
[215]	2021	Collected	DCNN	Dropout	96.65%
[216]	2021	Collected	DCNN	Cyclical learning rate method	99.7%
[217]	2021	MU	Modified AlexNet and VGG16	Augmentation	99.82%
[222]	2021	Collected	CNN	Dropout	97.62%
[229]	2021	Collected	3DCNN	Dropout & Regularization	88.24%
[236]	2021	Collected	ResNet-18	Zero-patience stopping criteria	93.4%
[238]	2021	Collected	DCNN	Synthetic Minority Oversampling Technique (SMOTE)	97%
[240]	2022	Collected	DCNN	Augmentation	99.67%
[253]	2022	Collected	ResNet50-BiLSTM	Augmentation	99%
[256]	2022	Collected	LSTM, and GRU	Dropout	97%
[263]	2022	BdSL	CNN	Augmentation	99.91%

Public sign language datasets

Dataset	Language	Equipment	Modalities	Signers	Samples
ASL alphabets [45]	American	Webcam	RGB images	-	87,000
MNIST [46]	American	Webcam	Grey images	-	27,455
ASL Fingerspelling A [47]	American	Microsoft Kinect	RGB and depth images	5	48,000
NYU [48]	American	Kinect	RGB and depth images	36	81,009
ASL by Surrey [49	American	Kinect	RGB and depth images	23	130,000
Jochen-Triesch [50]	American	Cam	Grey images with different background	24	720
MKLM [51]	American	Leap Motion device and a Kinect sensor	RGB and depth images	14	1400
NTU-HD [52]	American	Kinect sensor	RGB and depth images	10	1000
HUST [53]	American	Microsoft Kinect	RGB and depth images	10	10880
RVL-SLLL [54]	American	Cam	RGB video	14
ChicagoFSWild [55]	American	Collected online from YouTube	RGB video	160	7,304
ASLG-PC12 [56]	American	Cam	RGB video	-	880
American Sign Language Lexicon Video (ASLLVD) [57]	American	Cam	RGB videos of different angles	6	3,300
MU [58]	American	Cam	RGB images with illumination variations in five different angles	5	2515
ASLID [59]	American	Web cam	RGB images	6	809
KSU-SSL [60]	Arabic	Cam and Kinect	RGB Videos with uncontrolled environment	40	16000
KArSL [61]	Arabic	Kinect V2	RGB video	3	75,300
ArSL by University of Sharjah [62]	Arabic	Analog camcorder	RGB images	3	3450
JTD [63]	Indian	Webcam	RGB images with 3 different backgrounds	24	720
IISL2020 [64]	Indian	Webcam	RGB video with uncontrolled environment	16	12100
RWTH-PHOENIX-Weather 2014 [65]	German	Webcam	RGB Video	9	8,257
SIGNUM [66]	German	Cam	RGB Video	25	33210
DEVISIGN-D [67]	Chinese	Cam	RGB videos	8	6000
DEVISIGN-L [67]	Chinese	Cam	RGB videos	8	24000
CSL-500 [68]	Chinese	Cam	RGB, depth and skeleton videos	50	25,000
Chinese Sign Language [69]	Chinese	Kinect	RGB, depth and skeleton videos	50	125000
38 BdSL [70]	Bengali	Cam	RGB images	320	12,160
Ishara-Lipi [71]	Bengali	Cam	Greyscale images	-	1800
ChaLearn14 [72]	Italian	Kinect	RGB and depth video	940	940
Montalbano II [73]	Italian	Kinect	RGB and depth video	20	940
UFOP–LIBRAS [74]	Brazilian	Kinect	RGB, depth and skeleton videos	5	2800
AUTSL [75]	Turkish	Kinect v2	RGB, depth and skeleton videos	43	38,336
RKS-PERSIANSIGN [76] in	Persian	Cam	RGB video	10	10,000
LSA64 [77]	Argentine	Cam	RGB video	10	3200
Polytropon (PGSL) [78]	Greek	Cam	RGB video	6	840
kETI [79]	Korean	Cam	RGB video	40	14,672

Gesture public datasets

Name	Modality	device	signers	samples
LMDHG [82]	RGB, and depth videos	Kinect and	21	608
SHREC Shape Retrieval Contest (SHREC) [83]	RGB, and depth videos	Intel RealSense short range depth camera	28	2800
UTD–MHAD [84]	RGB, depth and skeleton videos	Kinect and wearable inertial sensor	8	861
The Multicamera Human Action Video Data (MuHAVi) [85]	RGB video	8 camera views	14	1904
NUMA [86]	RGB, depth and skeleton videos	10 Kinect with three different views	10	1493
WEIZMANN [87]	Low resolution RGB video	Camera with 10 different viewpoints	9	90
NTU RGB [88]	RGB, depth and skeleton videos	Kinect	40	56 880
Cambridge hand gesture [89]	RGB video captured under five different illuminations	Cam	9	900
VIVA [90]	RGB, and depth videos	Kinect	8	885
MSR [91]	RGB, and depth videos	Kinect	10	320
CAD-60 [92]	RGB and depth video in different environments, such as a kitchen, a living room, and office	Kinect	4	48
HDM05MoCap (motion capture) [93]	RGB video	Cam	5	2337
CMU [94]	RGB images	CAM	25	204
isoGD [95]	RGB and depth videos	Kinect	21	47,933
NVIDIA [96]	RGB and depth video	Kinect	8	885
G3D [97]	RGB and depth video	Kinect	16	1280
UT Kinect [98]	RGB and depth video	Kinect	10	200
First-Person [99]	RGB and depth video	RealSense SR300 cam	6	1,175
Jester [100]	RGB	Cam	25	148,092
Ego Guster [101]	RGB and depth video	Kinect	50	2,081
NUS II [102]	RGB images with complex backgrounds, and various hand shapes and sizes	Cam	40	2000

Related works on SLR using DL that address movement orientation, trajectory, occlusion problems_

Author(s)	Year	Type of variation	language	Signing mode	Model	Accuracy	Error Rate
[129]	2018	similarities, and occlusion	American	Static	DCNN	92.4%
[135]	2018	Movement	Brazilian	Isolated	Long-term Recurrent Convolutional Networks	99%	-
[138]	2018	size, shape, and position of the fingers or hands	American	Static	CNN	82%	-
[140]	2018	Hand movement	American	Isolated	VGG 16	99%	-
[144]	2018	Movement	American	Isolated	Leap Motion Controller	88.79%	-
[145]	2018	3D motion	Indian	Isolated	Joint Angular Displacement Maps (JADMs)	92.14%
[150]	2018	head and hand movements	Indian	Continues	CNN	92.88 %	-
[155]	2019	Hand movement	Indian	Continues	Wearable systems to measure muscle intensity, hand orientation, motion, and position	92.50%	-
[156]	2019	Variant hand orientations	Chines	Continues	Hierarchical Attention Network (HAN) and Latent Space	82.7%	-
[165]	2019	Similarity and trajectory	Chines	Isolated	Deep 3-d Residual ConvNet + BiLSTM	89.8%	-
[166]	2019	orientation of camera, hand position and movement, inter hand relation	Vietnam	Isolated	DCNN	95.83%
[173]	2019	Movement, self-occlusions, orientation, and angles	Indian	Continues	Four stream CNN	86.87%
[174]	2019	Movement in different distance from the camera	American	Static	Novel DNN	97.29%	-
[176]	2020	Angles, distance, object size, and rotations	Arabic	Static	Image Augmentation	90%	0.53
[180]	2020	fingers' configuration, hand's orientation, and its position to the body	Arabic	Isolated	Multilayer perceptron+ Autoencoder	87.69%
[185]	2020	Hand Movement	Persian	Isolated	Single Shot Detector (SSD) +CNN+LSTM	98.42%
[186]	2020	shape, orientation, and trajectory	Greek	Isolated	Fully convolutional attention-based encoder-decoder	95.31%	-
[192]	2020	Trajectory	Greek	Isolated	incorporate the depth dimension in the coordinates of the hand joints	93.56%	-
[195]	2020	finger angles and Multi finger movements	Taiwan	Continues	Wristband with ten modified barometric sensors+ dual DCNN	97.5%
[196]	2020	movement of fingers and hands	Chinese	Isolated	Motion data from IMU sensors	99.81%	-
[197]	2020	finger movement	Chinese	Isolated	Trigno Wireless sEMG acquisition system used to collect multichannel sEMG signals of forearm muscles	93.33%
[199]	2020	finger and arm motions, two-handed signs, and hand rotation	Chinees	Continues	Two armbands embedded with an IMU sensor and multi-channel sEMG sensors are attached on the forearms to capture both arm, and finger movements	-	10.8%
[76]	2020	Hand occlusion	Persian	Isolated	Skeleton detection	99.8%
[204]	2020	Trajectory	Brazilian	Isolated	Convert the trajectory information into spherical coordinates	64.33%
[210]	2021	Trajectory	Arabic	Isolated	Multi-Sign Language Ontology (MSLO)	94.5%
[213]	2021	Movement	Korean	Isolated	3DCNN	91%
[214]	2021	finger movement	Chines	Isolated	Design a low-cost data glove with simple hardware structure to capture finger movement and bending simultaneously	77.42%
[218]	2021	Skewing, and angle rotation	Bengali	Static	DCNN	99.57	0.56
[219]	2021	Hand motion	American	Continues	Sensing Gloves	86.67%
[223]	2021	spatial appearance and temporal motion	Chines	Continues	Lexical prediction network	91.72%	6.10
[226]	2021	finger self-occlusions, view invariance	Indian	Continues	Motion modelled deep attention network (M2DA-Net)	84.95%
[228]	2021	Occlusions of hand/hand, hands/face, or hands/upper body postures.	American	Continues	Novel hyperparameter based optimized Generative Adversarial Networks (H-GANs) Deep Long Short-Term Memory (LSTM) as generator and LSTM with 3D Convolutional Neural Network (3D-CNN) as a discriminator	97%	1.4
[230]	2021	Variant view	American	Isolated	3-D CNN’s cascaded	96%
[233]	2021	Hand occlusion,	Italian	Isolated	LSTM+CNN	99.08%
[237]	2021	Finger occlusion, motion blurring, variant signing styles.	Chines	Continues	Dual Network up on a Graph Convolutional Network (GCN).	98.08%
[239]	2022	self-structural characteristics, and occlusion	Indian	Continues	Dynamic Time Warping (DTW)	98.7%
[240]	2022	High similarity and complexity	American	Static	DCNN	99.67%	0.0016
[241]	2022	Movement	Arabic	Isolated	The difference function	98.8%
[259]	2022	Hand Occlusion	American	Static	Re-formation layer in the CNN	91.40%
[260]	2022	Trajectory, hand shapes, and orientation	American	Isolated	Media Pipe’s Landmarks with GRU	99%
[261]	2022	ambiguous and 3D double-hand motion trajectories	American	Isolated	3D extended Kalman filter (EKF) tracking, and approximation of a probability density function over a time frame.	97.98%
[262]	2022	Movement	Turkish	Continues	Motion History Images (MHI) generated from RGB video frames	94.83%
[264]	2022	Movement	Argentina	Continues	Propose an accumulative video motion (AVM) technique	91.8%
[269]	2022	orientation angle, prosodic, and similarity	American	continues	Develop robust fast fisher vector (FFV) in in Deep Bi-LSTM	98.33%
[270]	2022	variant length, sequential patterns,	English	Isolated	Novel Residual-Multi Head model	95.03%

Related works on SLR using DL that aim to achieve generalization_

Author(s)	Year	Datasets	Technique	Result
[129]	2018	ASL finger spelling ANTU	DCNN	92.4%99.7%
[134]	2018	NYUMUASL Fingerspelling AASL Surrey	Restricted Boltzmann Machine (RBM)	90.01%99.31%98.13%97.56%
[136]	2018	NTUHUST	DAN	98.5%73.4%
[143]	2018	Collected CSLChaLearn14	3D-CNN	88.7%95.3%
[145]	2018	Collected MD05CMU	JADM+CNN	88.59%87.92%87.27%
[146]	2018	RWTH 2012RWTH 2014SIGNUM	CNN-HMM hybrid	30.0 WER32.57.4
[156]	2019	CollectedRWTH-2014	Hierarchical Attention Network (HAN) + Latent Space LS-HAN	82.7%61.6%
[161]	2019	RWTH-2014SIGNUM	DCNN	22.86 WER2.80
[164]	2019	CSLIsoGD	Proposed multimodal two-stream CNN	96.7%63.78%
[165]	2019	DEVISIGN-DCollected	Deep 3-d Residual ConvNet + BiLSTM	89.8%86.9%
[170]	2019	KSU-SSLArSLRVL-SLLL	3D-CNN	77.32%34.90%70%
[173]	2019	Collected RGB-DMSRUT KinectG3D	Four stream CNN	86.87%86.98%85.23%88.68%
[174]	2019	Jochen-TrieschMKLMNovel SI-PSL	Novel DNN	97.29%96.8%51.88%
[182]	2020	KSU-SSLArSL by University of SharjahRVL-SLLL	3DCNN	84.38%34.9%70%
[186]	2020	PGSLChicagoFSWildRWTH 2014T	DCNN	95.31%92.63%76.30%
[187]	2020	ASLMU	Deep Elman recurrent neural network	98.89%97.5%
[192]	2020	GSLChicagoFSWild	CNN	93.56%91.38%
[76]	2020	NYUFirst-Person, RKS-PERSIANSIGN	CNN	4.64 error91.12%99.8%
[202]	2020	NUSAmerican fingerspelling A	DCNN	94.7%99.96%
[203]	2020	HDM05CMUNTUCollected	2 stream CNN	93.42%92.67%94.42%93.01%
[204]	2020	UTD–MHADIsoGDCollected	linear SVM classifier	94.81%67.36%64.33%
[207]	2021	Collected RGB images.Jochen-Triesch’s	DCNN	99.96%100%
[210]	2021	LSA64LSACollected	3DCNN	98.5%99.2 %94.5%
[211]	2021	ASLG-PC12RWTH-2014	GRU and LSTM Bahdanau and Luong’s attention mechanisms	66.59%19.56% BLEU
[221]	2021	ASL alphabet, ASL MNIST MSL	Optimized CNN based on PSO	99.58%99.58%99.10%
[225]	2021	KSU-ArSLJesterNVIDIA	Inception-BiLSTM	84.2%95.8%86.6%
[226]	2021	CollectedNTUMuHAVi,WEIZMANNNUMA	Motion modelled deep attention network (M2DA-Net)	84.95%89.98%85.12%82.25%88.25%
[228]	2021	RWTH-2014ASLLVD	Novel hyperparameter based optimized Generative.Adversarial Networks (H-GANs)	73.9%97%
[232]	2021	RWTH-2014Collected	Bidirectional encoder representations from transformers (BERT) + ResNet	20.123.30 WER
[233]	2021	Montalbano IIisoGDMSRCAD-60	LSTM+CNN	99.08%86.10%98.40%95.50%
[234]	2021	RWTH2014(CSL)(GSL)	GAN	23.42.12.26
[237]	2021	CSL-500DEVISIGN-L	Dual Network up on a Graph Convolutional Network (GCN).	98.08%64.57%
[242]	2022	SLDDMNIST	Modified Caps Net architecture (SLR-Caps Net)	99.52%99.60%
[243]	2022	RKS-PERSIANSIGNFirst-PersonASVIDisoGD	Single shot detector, 2D convolutional neural network, singular value decomposition (SVD), and LSTM	99.5%91%93%86.1%
[247]	2022	CollectedCollectedASL finger spelling	DCNN+ diffGrad optimizer	92.43%88.01%99.52%
[248]	2022	38 BdSLCollectedIshara-Lipi	BenSignNet	94.00%99.60%99.60%
[251]	2022	CollectedCollectedCollected	DCNN	99.41%99.48%99.38%
[254]	2022	CollectedCambridge hand gesture	Hybrid model based on VGG16-BiLSTM	83.36%97%
[255]	2022	CollectedMNIST,JTDNUS	Hybrid Fist CNN	97.89%,95.68%94.90%95.87%
[256]	2022	ASLGSLAUTSLIISL2020	LSTM+GRU	95.3%94%95.1%97.1%
[261]	2022	CollectedSHRECLMDHG	DLSTM	97.98%96.99%97.99%
[262]	2022	AUTSLCollected	3D-CNN	93.53%94.83%
[265]	2022	CSL-500JesterEgo Gesture	deep R (2+1) D	97.45%97.05%94%
[266]	2022	MUHUST-ASL	end-to-end fine-tuning method of a pre-trained CNN model with score-level fusion technique	98.14%64.55%
[269]	2022	SHRECCollectedLMDHG	FFV-Bi-LSTM	92.99%98.33%93.08%

Related works on SLR using DL that address the various environmental conditions problem_

Author (s)	Year	Language	Modality	Type of condition	Deal with technique	results
[130]	2018	Bengali	RGB images	Variant background and skin colors	Modified VGG net	84.68%
[134]	2018	American	RGB images	noise and missing data	Augmentation	98.13%
[150]	2018	Indian	RGB video	Different viewing angles, background lighting, and distance	Novel CNN	92.88%
[158]	2019	American	Binary images	Noise	Erosion, closing, contour generation, and polygonal approximation,	96.83%
[159]	2019	American	Depth image	Variant illumination, and background	Attain depth images	88.7%
[164]	2019	chines	RGB, and depth video	Variant illumination, and background	Two-stream spatiotemporal network	96.7%
[173]	2019	Indian	RGB, and depth video	Variant illumination, background, and camera distance	Four stream CNN	86.87%
[178]	2020	Arabic	RGB images	Variant illumination, and skin color	DCNN	94.31%
[179]	2020	Arabic	RGB videos	Variant illumination, background, pose, scale, shape, position, and clothes	Bi-directional Long Short-Term Memory (BiLSTM)	89.59%
[180]	2020	Arabic	RGB Videos	Variant illumination, clothes, position, scale, and speed	3DCNN and SoftMax function	87.69%
[182]	2020	Arabic	RGB Videos	Variations in heights and distances from camera	Normalization	84.3%
[194]	2020	Arabic	RGB images	variant illumination, and background	VGG16 and the ResNet152 with enhanced softmax layer	99%
[201]	2020	American	Grayscale images	illumination, and skin color	Set the hand histogram	95%
[202]	2020	American	RGB images	Variant illumination, background	DCNN	99.96%
[206]	2021	Indian	RGB video	Variant illuminations, camera positions, and orientations	Google net+ BiLSTM	76.21%
[207]	2021	Indian	RGB images	Light and dark backgrounds	DCNN with few numbers of parameters	99.96%
[209]	2021	American	RGB video	Noise	Gaussian Blur	99.63%
[213]	2021	Korean	Depth Videos	Low resolution	Augmentation	91%
[224]	2021	Bengali	RGB images	Variant backgrounds, camera angle, light contrast, and skin tone	Conventional deep learning + Zero-shot learning ZSL	93.68%
[225]	2021	Arabic	RGB video	Variant illumination, background, and clothes	Inception-BiLSTM	84.2%
[227]	2021	American	Thermal images	Varying illumination	Adopt live images taken by a low-resolution thermal camera	99.52%
[229]	2021	Indian	RGB video	Varying illumination	3DCNN	88.24%
[230]	2021	American	RGB video	Noise, varying illumination	Median filtering + histogram equalization	96%
[236]	2021	Arabic	RGB images	Variant illumination, and background	Region-based Convolutional Neural Network (R-CNN)	93.4%
[239]	2022	Indian	RGB video	Variant illumination, and views	Grey scale conversion and histogram equalization	98.7%
[241]	2022	Arabic	RGB video	Variant illumination, and background	CNN+ RNN	98.8%
[249]	2022	Arabic	Greyscale images	Variant illumination, and background	Sobel filter	97%
[253]	2022	Arabic	RGB, and depth video	Variant Background	ResNet50-BiLSTM	99%
[259]	2022	American	RGB, and depth images	Noise and illumination variation	Median filtering and histogram equalization	91.4%
[261]	2022	American	Skeleton video	Noise in video frames	An innovative weighted least square (WLS) algorithm	97.98%
[270]	2022	English	Wi-Fi signal	Noise and uncleaned Wi-Fi signals.	Principal Component Analysis (PCA)	95.03%

Related works on SLR using DL that address feature extraction problem_

Author(s)	Year	Dataset	Technique	Signing mode	Feature(s)	Result
[130]	2018	Collected	DCNN	static	Hand shape	84.6%
[135]	2018	Collected	3D CNN	Isolated	spatiotemporal	99%
[138]	2018	ASL Finger Spelling	CNN	Static	depth and intensity	82%
[141]	2018	RWTH-2014	3D Residual Convolutional Network (3D-ResNet)	Continues	Spatial information, and temporal connections across frames	37.3WER
[143]	2018	Collected	3D-CNNs	Isolated	spatiotemporal	88.7%
[144]	2018	Collected	DCNN	Isolated	hand palm sphere radius, and position of hand palm and fingertip	88.79%
[149]	2018	ASL Finger Spelling	Histograms of oriented gradients, and Zernike moments	Static	Hand shape	94.37%
[150]	2018	Collected	CNN	Continues	Hand shape	92.88 %
[151]	2018	Collected	3DRCNN	Continues/Isolated	motion, depth, and temporal	69.2%
[152]	2018	SHREC	Leap Motion Controller (LMC) sensor	Isolated, static	finger bones of hands.	96.4%
[153]	2018	Collected	Hybrid Discrete Wavelet Transform, Gabor filter, and histogram of distances from Centre of Mass	Static	Hand shape	76.25%
[154]	2018	Collected	DCNN	Static	Facial expressions	89%
[156]	2019	Collected	Two-stream 3-D CNN	Continues	Spatiotemporal	82.7%
[158]	2019	Collected	CNN	Static	Hand shape	96.83%
[79]	2019	Collected	Open Pose library	Continues	human key points (hand, face, body)	55.2%
[159]	2019	ASL fingerspelling	PCA Net	Static	hand shape (corners, edges, blobs, or ridges)	88.7%
[161]	2019	SIGNUM	Stacked temporal fusion layers in DCNN	Continues	spatiotemporal	2.80WER
[162]	2019	Collected	Leap motion device	Continues Isolated	3D positions of the fingertips	72.3%89%
[163]	2019	Collected	CNN	Static	Hand shape	95%
[164]	2019	CSL	D-shift Net	Continues	spatial features time features, and temporal.	96.7%
[165]	2019	DEVISIGN_D	B3D Res-Net	Isolated	spatiotemporal	89.8%
[166]	2019	Collected	Local and GIST Descriptor	Isolated	Spatial and scene-based features	95.83%
[169]	2019	Collected	Restricted Boltzmann Machine (RBM)	Isolated	Handshape, and network generated features	88.2%
[170]	2019	KSU-SSL	3D-CNN	Isolated	hand shape, position, orientation, and temporal dependence in consecutive frames	77.32%
[171]	2019	Collected	C3D, and Kinect device	Continues	Temporal, and Skeleton	94.7%
[175]	2019	Collected	Open Pose library with Kinect V2	Static	3D skeleton	98.9%.
[177]	2020	Ishara-Lipi	Mobile Net V1	Isolated	Two hands shape	95.71%
[178]	2020	Collected	DCNN	Static	Hand shape	94.31%.
[179]	2020	Collected	Single layer Convolutional Self-Organizing Map (CSOM)	Isolated	Hand shape	89.59%
[180]	2020	KSU-SSL	Enhanced C3D architecture	Isolated	Spatiotemporal of hand and body	87.69 %
[182]	2020	KSU-SSL	3DCNN	Isolated	Spatiotemporal	84.3%
[185]	2020	Collected	ResNet50 model	Isolated	Hand shape, Extra Spatial hand Relation (ESHR) features, and Hand Pose (HP), temporal.	98.42%
[186]	2020	Polytropon (PGSL)	ResNet-18	Isolated	Optical flow of skeletal, handshapes, and mouthing	95.31%
[187]	2020	Collected	Discrete cosines transform, Zernike moment, scale-invariant feature transform, and social ski driver optimization algorithm	Static	Hand shape	98.89%
[189]	2020	RWTH-2014	Temporal convolution unit and dynamic hierarchical bidirectional GRU unit	Continues	spatiotemporal	10.73% BLEU
[191]	2020	Collected	Standard score’ normalization on the raw Channel State Information (CSI) acquired from the Wi-Fi device, and MIFS algorithm	Static, and continues	The cross-cumulant features (unbiased estimates of covariance, normalized skewness, normalized kurtosis)	99.9%
[192]	2020	GSL	Open Pose human joint detector	Isolated	3D hand skeletal, and region of hand, and mouth	93.56%
[197]	2020	Collected	Four channel surface electromyography (sEMG) signals	Isolated	time-frequency joint features	93.33%
[199]	2020	Collected	Euler angle, Quaternion from IMU signal	Continues	Hand Rotation	10.8% WER
[76]	2020	RKS-PERSIANSIGN	3DCNNs	Isolated	Spatiotemporal	99.8%
[202]	2020	ASL fingerspelling A	DCNN	Static	Hand Shape	99.96%
[203]	2020	Collected	Construct a color-coded topographical descriptor from joint distances and angles, to be used in 2 streams (CNN)	Isolated	distance and angular	93.01%
[204]	2020	Collected	Two CNN models and a descriptor based on Histogram of cumulative magnitudes	Isolated	Two hands, skeleton, and body	64.33%
[208]	2021	RWTH-2014T	Semantic Focus of Interest Network with Face Highlight Module (SFoI-Net-FHM)	Isolated	Body and facial expression	10.89Bleu
[210]	2021	Collected	(ConvLSTM)	Isolated	Spatiotemporal	94.5%
[212]	2021	Collected	ResNet50	Static	hand area, the length of axis of first eigenvector, and hand position changes.	96.42%.
[214]	2021	Collected	f-CNN (fusion of 1-D CNN and 2-D CNN	Isolated	Time and spatial-domain features of finger resistance movement	77.42%
[217]	2021	MU	Modified Alex Net and VGG16	Static	Hand edges and shape	99.82%
[222]	2021	Collected	VGG net of six convolutional layers	Static	Hand shape	97.62%
[224]	2021	38 BdSL	DenseNet201, and Linear Discriminant Analysis	Static	Hand shape	93.68%
[225]	2021	KSU-ArSL	Bi-LSTM	Isolated	spatiotemporal	84.2%
[226]	2021	Collected	Paired pooling network in view pair pooling net (VPPN)	Isolated	spatiotemporal	84.95%
[228]	2021	ASLLVD	Bayesian Parallel Hidden Markov Model (BPaHMM) + stacked denoising variational autoencoders (SD-VAE) + PCA	Continues	Shape of hand, palm, and face, along with their position, speed, and distance between them	97%
[230]	2021	ASLLVD	3-D CNN’s cascaded	Isolated	spatiotemporal	96.0%
[231]	2021	Collected	leap motion controller	Static, and Isolated	sphere radius, angles between fingers their distance	91.82%
[232]	2021	RWTH-2014	(3 C 2 C 1) D ResNet	Continues	height, motion of hand, and frame blurriness levels	23.30WER
[233]	2021	Montalbano II	AlexNet + Optical Flow (OF) + Scene Flow (SF) methods	Isolated	Pixel level, and hand pose	99.08%
[234]	2021	RWTH-2014	GAN	Continues	spatiotemporal	23.4WER
[235]	2021	MNIST	DCNN	Static	Hand shape	98.58%
[236]	2021	Collected	R-CNN	Static	Hand shape	93%
[237]	2021	CSL-500	Multi-scale spatiotemporal attention network (MSSTA)	Isolated	Spatiotemporal	98.08%
[242]	2022	MNIST	modified CapsNet	Static	Spatial, and orientations	99.60%
[243]	2022	RKS-PERSIANSIGN	Singular value decomposition SVD	Isolated	3D hand key points between the segments of each finger, and their angles.	99.5%
[244]	2022	Collected	2DCRNN + 3DCRNN	Continues	Spatiotemporal out of small patches	99%
[246]	2022	Collected	Atrous convolution mechanism, and semantic spatial multi-cue model	Static Isolated	pose, face, and hand, and Spatial, full frame,	99.85%
[253]	2022	Collected	4 DNN models using 2D and 3D CNN	Isolated	Spatiotemporal	99%
[255]	2022	Collected	Scale-Invariant Feature Transformation (SIFT)	Static	Corner, edges, rotation, blurring, and illumination.	97.89%
[256]	2022	Collected	InceptionResNetV2	Isolated	Hand shape	97%
[257]	2022	Collected	Alex net	Static	Hand shape	94.81%
[258]	2022	Collected	Sensor + mathematical equations+ CNN	Continues	Mean, Magnitude of Mean, Variance, correlation, Covariance, and frequency domain features+ spatiotemporal	0.088WER
[260]	2022	Collected	Media Pipe framework	Isolated	hands, body, and face	99%
[261]	2022	Collected	Bi-RNN network, maximal information correlation, and leap motion controller	Isolated	hand shape, orientation, position, and motion of 3D skeletal videos.	97.98%
[264]	2022	LSA64	dynamic motion network (DMN)+ Accumulative motion network (AMN)	Isolated	spatiotemporal	91.8%
[265]	2022	CSL-500	Spatial–temporal–channel attention (STCA) is proposed	isolated	spatiotemporal	97.45%
[268]	2022	Collected	SURF (Speeded Up Robust Features)	Isolated	distribution of the intensity material within the neighborhood of the interest point	99%
[269]	2022	Collected	Thresholding and Fast Fisher Vector Encoding (FFV)	Isolated	Hand, palm, finger shape, and position and 3D skeletal hand characteristics	98.33%

Related works on SLR using DL that address segmentation problem_

Author(s)	Year	Input Modality	Segmentation method	Results
[131]	2018	RGB image	HSV color model	99.85%
[148]	2018	RGB image	Skin segmentation algorithm based on color information	94.7%
[149]	2018	RGB images	k-means-based algorithm	94.37%
[158]	2019	RGB images	Color segmentation by MLP network	96.83%
[159]	2019	Depth image	Wrist line localization by algorithm-based thresholding	88.7%
[164]	2019	RGB, and depth video	Aligned Random Sampling in Segments (ARSS)	96.7%
[168]	2019	RGB, and depth images	Depth based segmentation using data of Kinect RGB-D camera	97.71%
[171]	2019	RGB video	Design an adaptive temporal encoder to capture crucial RGB visemes and skeleton signees	94.7%
[179]	2020	RGB videos	Hand semantic Segmentation named as DeepLabv3+	89.59 %
[180]	2020	RGB Videos	Novel method based on open pose	87.69 %
[182]	2020	RGB Videos	Viola and Jones, and human body part ratios	84.3%
[183]	2020	RGB images	Robert edge detection method	99.3 %
[185]	2020	RGB video	SSD is a feed-forward convolutional network A Non-Maximum Suppression (NMS) step is used in the final step to estimate the final detection	98.42%
[187]	2020	RGB images	Sobel edge detector, and skin color by thresholding	98.89%
[188]	2020	RGB images	Open-CV with a Region of Interest (ROI) box in the driver program	93%
[189]	2020	RGB Videos	Frame stream density compression (FSDC) algorithm	10.73 error
[199]	2020	RGB Videos	Design an attention-based encoder-decoder model to realize end-to-end continuous SLR without segmentation	10.8% WER
[200]	2020	RGB images	Single Shot Multi Box Detection (SSD)	99.90%
[209]	2021	RGB Video	Canny	99.63%
[216]	2021	RGB images	Erosion, Dilation, and Watershed Segmentation	99.7 %
[219]	2021	RGB Video	Data sliding window	86.67%
[236]	2021	RGB images	R-CNN	93%
[239]	2022	RGB videos	Novel Adaptive Hough Transform (AHT)	98.7%
[246]	2022	RGB images, and video	Grad Cam and Cam shift algorithm	99.85%
[248]	2022	Grey images	YCbCr, HSV and watershed algorithm	99.60%,
[249]	2022	RGB images	Sobel operator method	97 %
[263]	2022	RGB images	Semantic	99.91%
[267]	2022	RGB images	R-CNN	99.7%
[268]	2022	RGB video	Mask is created by extracting the maximum connected region in the foreground assuming it to be the hand+ Canny method	99%

Język:: Angielski

Częstotliwość wydawania:: 2 razy w roku
Dziedziny czasopisma:: Inżynieria, Wstępy i przeglądy, Historia inżynierii, Elektrotechnika, Podstawy elektrotechniki, Elektronika, Technologia informacyjna

Kanał RSS czasopisma

Deep Learning for Sign Language Recognition: A Comparative Review

Kategoria artykułu: Article

Data publikacji: 15 cze 2024

Zakres stron: 77 - 116

Otrzymano: 27 maj 2024

Przyjęty: 05 cze 2024

DOI: https://doi.org/10.2478/jsiot-2024-0006

Słowa kluczowe
Sign language, Recognition, Deep Learning, Classification

© 2023 Shahad Thamear Abd Al-Latief et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Related works’ Classifiers employed in SLR using DL_

Related works on SLR using DL that address overfitting problem_

Public sign language datasets

Gesture public datasets

Related works on SLR using DL that address movement orientation, trajectory, occlusion problems_

Related works on SLR using DL that aim to achieve generalization_

Related works on SLR using DL that address the various environmental conditions problem_

Related works on SLR using DL that address feature extraction problem_

Related works on SLR using DL that address segmentation problem_

Deep Learning for Sign Language Recognition: A Comparative Review

Shahad Thamear Abd Al-Latief

Salman Yussof

Azhana Ahmad

Saif Khadim

Kategoria artykułu: Article

Data publikacji: 15 cze 2024

Zakres stron: 77 - 116

Otrzymano: 27 maj 2024

Przyjęty: 05 cze 2024

DOI: https://doi.org/10.2478/jsiot-2024-0006

Słowa kluczoweSign language, Recognition, Deep Learning, Classification

© 2023 Shahad Thamear Abd Al-Latief et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Related works’ Classifiers employed in SLR using DL_

Related works on SLR using DL that address overfitting problem_

Public sign language datasets

Gesture public datasets

Related works on SLR using DL that address movement orientation, trajectory, occlusion problems_

Related works on SLR using DL that aim to achieve generalization_

Related works on SLR using DL that address the various environmental conditions problem_

Related works on SLR using DL that address feature extraction problem_

Related works on SLR using DL that address segmentation problem_

Słowa kluczowe
Sign language, Recognition, Deep Learning, Classification