The repeated sounds occurring from the heart valves are known as Heart Sound. The sounds come from the movement of the bicuspid/mitral valves, the tricuspid valves, and the aortic valves. These valves move up and down, allowing blood to flow inside the heart. They create the heartbeat sound. The study of phonocardiogram (PCG) signal is inevitable as it leads to the early detection and diagnosis of valvular heart disorder. In medical science, people suffering from valvular heart diseases, more often than not, always invite the risk of cardiac attacks or cardiac malfunction at later stages of their life. Various methods are developed, and few recent studies are going on to study these PCG signals.
In healthy young adults, a PCG signal is characterized by the periodic occurrence of S1 and S2 Heart Sounds at repeated intervals of time as shown in Figure 1. Sometimes, normal Heart Sounds are accompanied by S3 and S4 sounds in abnormal subjects. The systolic time period ranges from starting of S1 to starting of S2. The diastolic time period ranges from the start of S2 to the start of S1; sometimes, it may contain S3 and S4 in case of abnormal Heart Sounds. The time duration of the diastolic period is around 240 ms, whereas the systolic period is 360 ms. One complete human cardiac cycle is composed of a systolic period and a diastolic period.
A typical recorded PCG signal [1]. PCG, phonocardiogram.
Feature learning [1, 27] and categorization become very handy when dealing with Heart Sound signal analysis. However, many developments are made toward Heart Sound segmentation, feature extraction, cardiac disease identification, and categorization. Nevertheless, an adaptive model for denoising, preprocessing, and filtering background noises would still significantly help further processing in this domain, especially when working with Heart Sound captured in real-time, like diagnosis centers and labs. Feature learning has been done on captured Heart Sound signals using transformation methods and to analyze Heart Sounds with computing features such as the following:
Acoustic features: MFCCs, Mel, Chroma, Contrast, and Tonnetz Time-domain features: RMS, Signal Energy, Signal Power, ZCR, and THD Frequency domain features State Amplitude domain features Entropy domain features Cyclostationary features Statistical features: Skewness, Kurtosis, Standard Deviation, Variance, and Mean.
The Figures 1(a) and 1(b) show various steps involved in PCG signal analysis. The features of human Heart Sounds are extracted through computation, and after that, the categorization is done depending on various deep-learning methods.
Various phases engaged in PCG analysis [2]. PCG, phonocardiogram.
Graphical representation of various steps involved in PCG signal classification [2, 14]. PCG, phonocardiogram.
The study was designed to keep the analysis and diagnosis part of the valvular Heart Sound. There exist different Heart Sound analysis methods but screening time, portability, and cost are the important parameters when selecting the proper one.
The study carrying part is carried out on selected volunteers of different age groups and genders. Formal consent was taken from the volunteers followed by the application of the real-time developed analysis method.
In the data analysis part, artificial intelligence-based intelligent systems have been used. Different Python based developed deep-learning algorithms are chosen and implemented for the analysis.
The motivation behind this study of PCG signal and its analysis is mainly divided into two parts:
Challenges in PCG signal analysis. The commercialization of the PCG signal analysis in detecting and screening valvular diseases.
The significant challenges lie in PCG signal analysis are:
Filter design for noise handling while recording real-time Heart Sound: The filter design part lies with the design of a proper signal conditioning unit for the captured Heart Sound to be processed. The ambient noise and the cable of the microphone chest piece movement noise are the two important noises that need to be dealt with. Over the years, different researchers have designed different pre-amplifiers circuits, filter circuits, etc., but still, none were able to find any solutions related to the microphone chest piece movement noise.
Separating Lung sound with Heart Sound: Another major challenge is the separation of Heart Sound from lung sound. While collecting the Heart Sound in a real-time environment, lung sound gets superimposed into it, which can invite classification errors during analysis of Heart Sound only. As Heart Sounds and lung sounds lie in different frequency ranges, separation can be done using a proper selection of filters. In order to filter out the lung sound, different filters are designed that can separate the lung sound from the Heart Sound quite easily.
Development of efficient artificial intelligence-based PCG signal analysis method: An efficient classification-based method is equally important for Heart Sound analysis. Several deep-learning and machine-learning methods have been developed and discussed in this paper over the years. Classification of Heart Sound is done with the help of an efficient deep-learning method. The developed deep-learning method should be efficient in terms of evaluation parameters like accuracy, precision, recall, and F1 Score.
Design of low-cost hardware circuit for collecting shallow amplitude-based real-time Heart Sound: The design of a low-cost hardware circuit for picking up low-amplitude Heart Sound is another challenging task in PCG signal analysis. Low-amplitude Heart Sounds are captured through the chest piece of the stethoscope fitted with a microphone followed by a pre-amplifier circuit. The use of a proper pre-amplifier circuit plays a key role in collecting low-amplitude real-time Heart Sound.
Earlier, the financial cost attached to the Heart Sound signal analysis of patients suffering from various heart diseases was higher. Nowadays, various low-cost PCG signal analysis methods are being developed and coming into the markets. Apart from their lower cost, the accuracy and feasibility of these devices are equally important at the same time. Thus, low-cost PCG signal analysis is the need of the hour.
The study would like to highlight a few recent significant works done by different researchers. The research focuses on identifying Heart Sound and their analysis. In this work, the signal categorization of the PCG signal is essential.
As per the literature study, many workers are researching in this field. Many researchers have described different algorithms and categorization methods for cardiac sounds. Randhawa and Singh [1] surveyed methods for automatically analyzing and classifying cardiac sounds. Gupta et al. [2] worked on identifying S1 & S2 Heart Sounds using a Stacked Auto Encoder with the combinatory feature for categorizing fundamental Heart Sounds. Then, the CNN approach is also used. This method has restrictions on analyzing and detecting S3 Heart Sounds. Lecun et al. [3] proposed an innovative algorithm to detect heart and lung sounds (HLS) based on the variational mode decomposition (VMD) of PCG signal. It is helpful to detect the third Heart Sound. However, this algorithm has limitations in real-time environment applications. Roy and Roy [4] proposed a new method of extracting PCG signals from noise through sparse recovery analysis. Barma et al. [5] did focus on the detection of second Heart Sound, which engages with computing the time duration and the energy of normalized instantaneous frequencies of A2s (Aortic valve) and P2s (Pulmonary valve), but it could not categorize cardiac sounds.
Tavel [6] and Muduli et al. [7] did work on Heart Sound analysis using the discrete wavelet transform (DWT) algorithm. It has specific restrictions in real-time analysis. Roy et al. [8] made PCG signal analysis using Shannon Energy Envelop and DWT method, but it cannot classify the Heart Sounds efficiently. Mishra et al. [9] made heart defect analysis and classification using Pattern Recognition Technique, but the method is based on echo imaging. Mishra et al. [10] made PCG signal analysis using classification by feature extraction but not in the real-time analytical process. El-Segaier et al. [11] made Heart Sound analysis and its categorization using a fuzzy classifier-based soft computing method. The laboratory study was done on a readily available online Heart Sound repository. It was an offline study and not validated with human volunteers. Nygaard et al. [12] studied the classification of PCG signals. Anju and Sanjay Kumar [13] did a segmentation and categorization of PCG signals for Heart Sound analysis. There is no significant work on developing real-time screening of valvular diseases. Buchanna et al. [152] studied the classification of EEG signals using fractal analysis and support vector regression. Ye et al. [151] analyzed multimodal medical images using semantic segmentation maps generated through deep learning. Many researchers studied PCG signal analysis; some of the corresponding algorithms are given below in Table 1.
A study of PCG signal and their gap analysis reported in the literature
[15] | Dewangan et al. | 2018 | Basic features used in the time domain only | Heart Sound analysis using the DWT method |
[16] | Thomas Schanze | 2017 | Biomedical heart signal analysis not using latest machine-learning and deep-learning methods | Singular value decomposition |
[17] | Othman and Khaleel | 2017 | Only time-domain analysis has been made | PCG signal analysis using Shannon Energy Envelop and DWT method |
[18] | Martinek et al. | 2017 | PCG signal analysis applicable to fetal heart only, not real-time human subjects | Adaptive filtering based fetal heart rate monitoring |
[19] | Abhijay Rao | 2017 | It is a survey paper only, not a research paper | Biomedical signal processing |
[20] | Sh-Hussain et al. | 2016 | Frequency domain and statistical domain features need to be analyzed | Heart Sound monitoring system using wavelet transformation |
[21] | Prasad and Kumar | 2015 | Real-time PCG signal analysis was not applied | Analysis of various DWT methods for feature extracted PCG signals |
[22] | Pan et al. | 2015 | Real-time analysis not done | Categorization of PCG signals using multimodal features |
[23] | Lubaib and Muneer | 2015 | More features need to be considered for this analysis | Using pattern recognition techniques |
[24] | Roy et al. | 2014 | No proper experimentation has been done | A survey on classification of PCG signals |
[25] | Mishra et al. | 2013 | It deals with PCG signal noise removal only, not classification | Denoising of Heart Sound signal using DWT |
[26] | Zhao et al. | 2013 | PCG signal/Heart Sound biometric | Marginal Spectrum Analysis |
[27] | Singh and Cheema | 2013 | Limited application of deep-learning algorithms has been used | Classification using Feature Extraction |
[28] | Safara et al. | 2013 | Only time-domain analysis has been made | Multi-level basis selection of wavelet |
[29] | Salleh et al. | 2012 | PCG signal analysis using Kalman filter, not using any standard deep-learning method | Heart Sound analysis: a Kalman filter-based approach |
[30] | Misal and Sinha | 2012 | It deals with PCG signal noise removal only, not classification | Denoising of PCG signal using DWT |
[31] | Kasturiwale | 2012 | Biomedical signal analysis, limited features. | Analysis using component extraction |
[32] | McNames and Aboy | 2008 | It deals with the modeling part of PCG signals, not the classification and analysis | Techniques, statistical modeling of PCG signals |
[33] | Ahmad et al. | 2009 | More feature extraction needs to be done for the PCG signal analysis | Classification of PCG signal using an Adaptive Fuzzy Inference System. |
[34] | Debbal and Bereksi-Reguig | 2006 | Time-domain analysis in time domain only | PCG signal analysis using the CWT |
[35] | Gupta et al. | 2005 | A real-time PCG signal analysis was not done | Segmentation and categorization of Heart Sound for analysis purpose. |
[36] | Muthuswamy | 2004 | It is a survey paper only, not a research paper | Biomedical signal analysis |
DWT, discrete wavelet transform; PCG, phonocardiogram.
The full review paper is structured as mentioned. Section 2 provides a background of different types of Heart Sound and their characteristics. Section 3 provides details of real-time PCG signal analysis. Section 4 explains the other methods used in Heart Sound analysis. Section 5 describes the deep-learning and machine-learning methods adopted in Heart Sound analysis. Finally, Section 6 summarizes the conclusions and future scope.
Table 2 shows frequency ranges for different types of PCG signals that are used in the Heart Sound analysis [37]. Typically, Heart Sound lies in the frequency range of 45–450 Hz, including stenosis and regurgitation condition of the heart. The maximum and minimum frequency range of each PCG signal is also shown in Table 2.
Heart Sound frequencies found in various abnormal Heart Sounds.
|
||
---|---|---|
First Heart Sound | 100 | 200 |
Second Heart Sound | 50 | 250 |
AR | 60 | 380 |
PR | 90 | 150 |
AS | 100 | 450 |
PS | 150 | 400 |
ASD | 60 | 200 |
VSD | 50 | 180 |
MR | 60 | 400 |
TR | 90 | 400 |
MS | 45 | 90 |
TS | 90 | 400 |
MVP | 45 | 90 |
PDA | 90 | 140 |
Flow murmur | 85 | 300 |
AR, aortic regurgitation; AS, aortic stenosis; ASD, atrial septal defect; MR, mitral regurgitation; MS, mitral stenosis; MVP, mitral valve prolapse; PDA, patent ductus arteriosus; PR, pulmonary regurgitation; PS, pulmonary stenosis; TR, tricuspid regurgitation; VSD, ventricular septal defect.
Figure 2 highlights the characteristics and features of different PCG signals. Generally, s1 and s2 sounds can be heard in regular Heart Sounds, comprising short duration and high pitch. Under abnormal Heart Sounds, s3 and s4 sounds can be heard with short duration and low pitch. Common disorders occurring due to abnormal Heart Sounds are mitral stenosis (MS), aortic stenosis (AS), ventricular septal defect (VSD), etc. Apart from these disorders, some other disorders may happen due to the murmur Heart Sound. The aortic regurgitation (AR), mitral regurgitation (MR), mitral valve prolapse (MVP), and patent ductus arteriosus (PDA) are a few examples of murmur Heart Sounds.
Characteristics of various kinds of Heart Sounds. AR, aortic regurgitation; AS, aortic stenosis; MR, mitral regurgitation; MS, mitral stenosis; MVP, mitral valve prolapse; PDA, patent ductus arteriosus; PR, pulmonary regurgitation; PS, pulmonary stenosis; TR, tricuspid regurgitation; TS, tricuspid stenosis; VSD, ventricular septal defect.
Sitting, standing, and supine are postures for capturing Heart Sounds. Typically stethoscope sensors are placed on different locations of the body. Some of the positions are Cardiac Apex, Upper Left Sternal Border (ULSB), Upper Right Sternal Border (URSB), Lower Left Sternal Border (LLSB), etc.
Table 2 provides the characteristics of frequencies in different types of abnormal Heart Sounds. The minimum and maximum frequency range of every kind of Heart Sound is mentioned in Table 2. Generally, Heart Sound frequency ranges from 45 Hz to 450 Hz. Thus, valuable information lies only in this range, and the rest needs to be filtered out using various processing methods.
Different types of valvular Heart Sound disorders are explained below:
AS – Aortic valve could not open properly. AR – Aortic valve could not close completely. PR – Pulmonary valve could not close completely. PS – Pulmonary valve could not open completely. ASD – A hole by birth between the heart’s upper chambers. VSD – A hole by birth between the heart’s lower chambers. MR – Mitral valve could not close completely. TR – Tricuspid valve could not close completely. MS – Mitral valve could not open properly. TS – Tricuspid valve could not open completely. MVP – Mitral valve accelerates into the left atrium and stops abruptly. PDA – An opening between two major blood vessels leads from the heart. Flow murmur – It is an unusual sound heard when blood is through the heart chambers. Normal Heart Sound – It is caused due to the combination of S1 (first Heart Sound) and S2 (second Heart Sound) Heart Sounds only.
Figure 3 (a) depicts the block diagram of sensors used in the Heart Sound signal analysis. It also gives us an idea about how filters play a pivotal role in PCG signal analysis. Generally, low-pass, high-pass, and bandpass filters are used in the said process. New technologies like adaptive noise cancellers (ANCs) and adaptive line enhancers (ALEs) are used in the modern-day era to meet the requirements in a better-enhanced manner in terms of efficiency and stability.
Schematic diagram of sensors, pre-amplifier, and filters used in PCG signal analysis. PCG, phonocardiogram.
PCG signal in the chest generates pressure waves that the stethoscope diaphragm has picked up. The electret microphone fitted inside the chest piece and the stethoscope tube convert Heart Sound signal to electrical signal. This electrical signal is boosted by an audio amplifier based on a MAX 9812 IC chip of gain 20 and fed to the notch filter to remove electrical interferences. The processed signal is fed to an analog tunable bandpass filter with a 35–470 Hz spectrum. The Heart Sound signal typically belongs to 35–470 Hz for normal and abnormal sounds. Finally, the processed signal is converted to digital form in a USB sound card connected to a Raspberry Pi 4B computer. The Heart Sound samples are picked up in real-time by a stethoscope sensor and saved in WAV. format for further processing through Raspberry Pi 4B with developed deep-learning models, as illustrated in Figure 3 (b).
Schematic diagram of PCG signal acquisition system. PCG, phonocardiogram.
Figures 4 and 5 illustrate the denoising methods used in the Heart Sound analysis and diagnosis. Table 3 highlights the various sensors used in the denoising algorithms adopted for the Heart Sound classification.
Block diagram of ANC. ANC, adaptive noise canceller.
Block diagram of ALE. ALE, adaptive line enhancer.
Different sensors used in real-time Heart Sound analysis methods.
Time | Electret condenser microphone | 1 | LMS-ANC |
Time | Electronic stethoscope | 1 | Single input ANC |
Time | Microphone | 1 | LMS-ALE, RLS-ALE |
Frequency | Electronic stethoscope with an electret microphone | 1 | DWT, Hilbert transform |
Frequency | Microphone | 1 | DWT, LMS-ALE, RLS-ALE |
Frequency | NM | NM | EMD |
ALE, adaptive line enhancer; ANC, adaptive noise canceller; DWT, discrete wavelet transform; EMD, empirical mode decomposition; LMS, least mean square; RLS, recursive least square.
ANC is composed of two input signals from two separate sources. The first signal is noisy, and another contains ambient noise only. These two signals are subtracted at the subtractor circuit, producing a noise-free desired signal.
In ALEs, digital adaptive filters can change their characteristics depending on the input signal change. Adaptive filters generally use the least mean square (LMS) algorithm to work. It can be considered one of the better tools for the Heart Sound denoising process. Figure 5 shows an ALE where a delay block is also appended with the adaptive filter to obtain a noise-free Heart Sound signal.
Generally, Heart Sound banks are used in real-time valvular Heart Sound analysis. The following are the Heart Sound bank repositories:
2016 PhysioNet/Computing in Cardiology (CinC) Challenge – Normal and Abnormal Heart Sound [76] Classification of Heart Sounds into three types – namely, normal (N), murmur (M), and extrasystole (EXT) in PASCAL Classifying Heart Sound Challenge [75] Classification of five types of Heart Sounds – normal, AS, MR, MS, and MVP; available in public databases in Yaseen et al. [ Kaggle Dataset for Normal and Abnormal Heart Sound [77].
Cardiac sounds are captured with an electronic stethoscope with a sampling frequency of about 4.7 KHz. The collected Heart Sounds are saved as a WAV. file. The auscultation recordings [129, 130] are done in postures such as sitting, standing, supine, etc.; all other essential information pieces are stored for further processing.
General preprocessing methods used for Heart Sound analysis are:
Normalization. Filtering. Feature extraction.
The cardiac sounds are preprocessed first to generate the required Heart Sound data, as shown in Figure 6. Under the preprocessing stage, Heart Sound normalization occurs, followed by filtering and feature extraction. A typical normalization process is shown in Figure 7.
Block diagram of Heart Sound preprocessing.
Normalization process.
Normalization is expressed as shown in Eq. (1):
In filtering operation, second-order high-pass and low-pass Butterworth filters with 15 Hz and 700 Hz cut-off frequencies are used to filter the normalized PCG signal and remove unwanted background noise. Feature learning is used for the filtered signal as expressed in the literature [41, 45, 88]. The four commonly used envelop-energy-based algorithms employed in the production of features are: (1) Hilbert envelope, (2) Homomorphic environment map, (3) Wavelet envelope, and (4) Power spectral density envelope.
The Heart Sound segmentation is the split-up of the Heart Sounds into four sections: the first Heart Sounds (S1), systole, second Heart Sounds (S2), and diastole. Each section [42, 49, 82] comprises essential features that discriminate the various types of cardiac sounds. The common segmentation methods that researchers used over the recent years are mainly:
Envelope-based methods Shannon energy-based method Hilbert envelope method Squared envelope method Hidden Markov model methods Feature-based method.
The segmented Heart Sound [89, 95, 97] goes to the feature extraction block where new features are extracted, which results in a reduced Heart Sound data size and eventually classifications based on these extracted features using CNN-based deep-learning [90, 96] methods. Figure 8 describes the reduction in data size of the PCG signal by the feature learning method.
Different segments in Heart Sound.
The method based on the Shannon energy of the Heart Sound [125, 126] is considered for extracting the Heart Sound signals as shown in Figure 9. Shannon energy fetches the medium-intensity cardiac sound and removes the noise effect. It is the best way to extract the envelope of any cardiac sound signal compared to other methods, and the following terminologies are used to compute the different parameters:
Shannon Energy: It is the average of the spectrum energy in the Heart Sound signal. Shannon Entropy: It is the amount of information in the Heart Sound signal. Absolute Value: The positive values of Heart Sound signal. Energy (square) value: It is related to the square of the amplitude of the PCG signal.
Reduction in data size due to feature extraction.
To differentiate between various kinds of cardiac sounds [127, 128] as normal or abnormal and then identify the abnormal sound as an MR or MS case, the following stages are considered in this method:
Averaging energy envelop Finding averaged envelop peaks Fixing threshold (S1 and S2 duration computation) Murmurs detection and analysis
Figure 10 (a) illustrates the PCG of a typical cardiac sound, and Figure 10 (b) denotes the Shannon energy envelope of a typical cardiac sound. Figure 11 depicts the flowchart of the said algorithm used.
PCG of a normal cardiac Heart Sound. PCG, phonocardiogram.
Shannon energy envelogram of the PCG of a normal Heart Sound. PCG, phonocardiogram.
Flowchart of the method used.
The Shannon energy method can be a handful tool for identifying different types of Heart Sounds. Figure 10 gives a description of the feature extraction technique using the Shannon energy envelop method. In the flowchart, the preprocessed Heart Sound is checked for the threshold amplitude value (set by the user) and a cycle duration of less than 0.9 s. If it satisfies the condition, then it is a normal sound. Otherwise, the algorithm proceeds to the ascertainment of the S1 and S2 peak locations and the extraction of murmur. The murmur sound undergoes DWT. The 5th level of DWT is selected in the frequency domain using a Fast Fourier transform. If the center frequency is lying at 200–250 Hz, it is an MR; otherwise, it is an MS.
In this method [40, 15, 25], Heart Sound signals are decomposed to a desired number of levels (n). The variance of detail coefficients is produced at the required levels. The computation has been made separately to normal, AS, AR, MS, and MR types of PCG signals. The schematic diagram of PCG signal analysis has been described in Figure 12.
Schematic diagram of Heart Sound analysis using DWT. ANFIS, adaptive neuro fuzzy inference system; DWT, discrete wavelet transform; DT-CWT, dual tree complex wavelet transform.
Different abnormal Heart Sounds are considered and analyzed. All Heart Sounds are taken from a standard cardiac repository available online. Figure 13 illustrates the time-domain representation of normal PCG signal, AS, AR, MS, and MR in the time domain.
PCG analysis of different types of Heart Sounds [41, 39, 49]. PCG, phonocardiogram.
Figure 14 (a) illustrates a generalized block diagram of the DWT. In this Transform, the input signal initially passes through two types of filters: high-pass and low-pass. The high-pass filter output is known as the detail coefficient, and the low-pass filter output is represented as the approximation coefficient. The low-pass filter output is further passed through two filters, thus generating another set of detail and approximation coefficients. This process continues up to the required level n.
Different coefficients in DWT [121, 122]. DWT, discrete wavelet transform.
Figure 14 (b) describes the flowchart of DWT where a signal S is at first divided into CA1 (approximation coefficient at level 1) and CD1 (detail coefficient at level 1). CA1 is further divided into CA2 and CD2, and this process continues until the desired level is reached. In this way, different PCG signals can be classified using DWT, as illustrated in Figure 15.
Flow chart of DWT [121, 122]. DWT, discrete wavelet transform.
Time-domain analysis of different Heart Sounds using wavelet transform [121, 122]. AR, aortic regurgitation; MS, mitral stenosis.
Various types of PCG signals are analyzed and plotted in Figure 15 using DWT. The Heart Sounds are plotted against the amplitude vs. time domain of the Heart Sound samples. Systolic murmurs and diastolic murmurs can be easily observed from the said characteristics plot.
The DWT [8, 24, 40] is a conversion method to analyze the temporal and spectral features of non-stationary signals like sound. The PCG signals are convenient for identifying the different types of valvular heart diseases. ECG readings are not good enough to find information related to the heart valves. This paper uses DWT to categorize normal and various abnormal Heart Sounds without getting into the hectic segmentation method. The proposed method is quickly adopted using a stethoscope without the help of ECG readings. In this paper, Heart Sounds are acquired through data acquisition and decomposed into n number of levels using DWT. Then variance of detail coefficients is calculated. The variance of detail coefficients at the required number of levels will form a feature set to classify normal Heart Sound, abnormal Heart Sound, and cardiac murmurs. Using DWT, any Heart Sound is represented in terms of Detailed Coefficients and Approximate Coefficients. Figures 16 (a) and (b) highlight the description of various Heart Sounds [123, 124] using time-domain and frequency-domain analysis that uses DWT.
Time (a) and frequency (b)-domain analysis of different PCG signals using DWT. AS, aortic stenosis; DWT, discrete wavelet transform; MR, mitral regurgitation; MS, mitral stenosis; MVP, mitral valve prolapse; PCG, phonocardiogram.
In Harmonic-Distribution [43, 44, 1], the amplitude and phase may be ignored due to negligible values. Harmonic-Distribution is produced initially by transforming the time-domain Heart Sound into the frequency domain and then by taking out the real and imaginary components separately.
Stages for computing the Harmonic-Distribution of a recorded cardiac sound signal:
Load the cardiac sound using Python in the time domain. Compute the Fourier transform of a cardiac sound [117, 118] in frequency-domain spectrum analysis. Compute the real and imaginary parts of the frequency domain cardiac sound signal. Calculate the amplitude and phase of every harmonic frequency component. Plot the Harmonic vs. Amplitude and Harmonic vs. Phase value of the cardiac sound signal.
For understanding, a harmonics order of up to 40 is selected. The following distributions provide the Harmonic-Distribution of Amplitude for normal and abnormal Heart Sound as well as for Cardiac Murmurs [119, 120].
Figures 17 (a–c) highlight various Harmonic-Distributions of normal, abnormal, and murmur Heart Sound.
(a). Amplitude distribution of Normal Heart. (b). Amplitude distribution of Abnormal Heart.
Amplitude distribution of Murmurs.
Figure 18 illustrates the classification of various types of cardiac sounds. Murmurs have a smaller amplitude in distribution, while abnormal sounds have a higher amplitude in distribution. This method can identify and classify multiple types of Heart Sounds accurately.
Comparison of different Heart Sounds.
Figure 19 highlights the earlier studies of deep-learning-based PCG signal analysis and classification methods published in the last 5–7 years.
Earlier studies on deep-learning-based methods for cardiac sound classification.
The Convolution operation is the concatenation of scalar multiplications between the elements of two matrices. A typical convolution operation is highlighted in Figure 20, having an input square matrix and a kernel size of 2. The convolution operation occurs between a part of the input matrix and the kernel, as shown in the figure. Every time the kernel slides over the input matrix, it results in one output value, and this process is continued until the entire matrix is not covered.
Figure 21 highlights a block diagram of machine-learning and deep-learning methods adapted to categorize a set of entities. It basically depicts a comparison study between the two algorithms. In machine learning, preprocessing, feature extraction, and feature selection [115, 116] exist, followed by classification. However, no such blocks exist in the deep-learning model; only the deep model, followed by the classification part, is there. Deep-learning models can work with a large number of data as they can be developed by having a large number of layers that can go very deep compared to a machine-learning model that cannot work with a large number of data, and their models cannot go very deep. Figure 22 shows the architecture of a generalized deep-learning based [113, 114] convolutional neural network model.
Comparison of machine-learning and deep-learning architectures.
Architecture of CNN model [137] for Heart Sound classification.
Various methods have been used over the years for PCG signal analysis [38, 53] (Figure 23). However, because of their accuracy, Precision, Recall, and F1 Score, only deep-learning-based CNN methods [84, 98, 99], in comparison with the other methods, have found popularity.
PCG signal classification based on deep-learning models [138, 139]. PCG, phonocardiogram.
Table 4 summarizes recently developed CNN [111, 112] and RNN models for valvular Heart Sound analysis and diagnosis. It deals with the methods, optimizer, features used, types of Heart Sound, and algorithms involved in deep-learning models [100. 110]. Figure 24 depicts a clinical PCG signal analysis method a medical practitioner uses.
A summary of CNN and RNN based methods used in PCG signal analysis [108, 109].
1 | Maknickas and Maknickas 2017 [55] | 2D-CNN | MFSC | No | RMSprop | N, A |
2 | Alafif et al. 2020 [56] | 2D-CNN + transfer learning | MFCC | No | SGD | N, A |
3 | Deng et al. 2020 [54] | CNN + RNN | Improved MFCC | No | Adam | N, A |
4 | Abduh et al. 2019 [57] | 2D-DNN | MFSC | No | Adam | N, A |
5 | Chen et al. 2018 [58] | 2D-CNN | Wavelet transform + Hilbert–Huang features | No | Adam | N, M, EXT |
6 | Rubin et al. 2016 [59] | 2D-CNN | MFCC | Yes | Adam | N, A |
7 | Nilanon et al. 2016 [60] | 2D-CNN | Spectrograms | No | SGD | N, A |
8 | Dominguez-Morales et al. 2018 [61] | 2D-CNN | Spectrograms | No | Adam | N, A |
9 | Bozkurt et al. 2018 [62] | 2D-CNN | MFCC + MFSC | Yes | Adam | N, A |
10 | Chen et al. 2019 [63] | 2D-CNN | MFSC | No | Adam | N, A |
11 | Cheng et al. 2019 [64] | 2D-CNN | Spectrograms | No | Adam | N, A |
12 | Demir et al. 2019 [65] | 2D-CNN | Spectrograms | No | Adam | N, M, EXT |
13 | Ryu et al. 2016 [66] | 1D-CNN | 1D time-series signals | No | SGD | N, A |
14 | Xu et al. 2018 [67] | 1D-CNN | 1D time-series signals | No | SGD | N, A |
15 | Xiao et al. 2020 [68] | 1D-CNN | 1D time-series signals | No | SGD | N, A |
16 | Oh et al. 2020 [69] | 1D-CNN WaveNet | 1D time-series signals | NO | Adam | N, AS, MS, MR, MVP |
17 | Khan et al. 2020 [70] | LSTM | MFCC | No | Adam | N, A |
18 | Yang et al. 2016 [71] | RNN. | 1D time-series signals | No | Adam | N, A |
19 | Raza et al. 2018 [72] | LSTM | 1D time-series signals | No | Adam | N, A |
20 | Tschannen et al. 2016 [73] | 2D-CNN + SVM | Deep features | Yes | Adam | N, A |
AS, aortic stenosis; EXT, extra systole Heart Sounds; M, murmur Heart Sounds; MR, mitral regurgitation; MS, mitral stenosis; MVP, mitral valve prolapse; N, normal Heart Sounds; PCG, phonocardiogram.
Application of Heart Sound classification method adopted by a medical practitioner [140, 141].
Figure 24 depicts the application of a stethoscope [135, 136] in a Heart Sound analysis problem. Heart Sound is acquired from the patient and goes to the preprocessing block for further preprocessing, like filtering, normalization, and segmentation. Eventually, deep-learning techniques are applied to classify the cardiac sound. Finally, this classified cardiac sound is sent to the medical practitioner to examine the patient’s Heart Sound [133, 134].
Figure 25 compares the deep-learning-based classification [39, 50] method used for valvular Heart Sound analysis with the traditional machine-learning methods. It highlights low-level features used for deep learning and handcrafted high-level features used for conventional machine learning [85, 86].
Study of machine-learning vs. deep-learning methods for cardiac sound classification [148, 149, 150].
It is the first Convolutional Neural Network model developed in deep learning. It has five layers comprising three convolutions and two fully connected layers. This network also includes two max-pooling layers and one Softmax layer, as illustrated in Figure 26.
Schematic block diagram of Le Net.
It is the second Convolutional Neural Network model developed in deep learning. It has eight layers comprising five convolution layers and three fully connected layers. This network also contains three max-pooling layers and one Softmax layer, as depicted in Figure 27.
Schematic block diagram of Alex Network.
It is the third Convolutional Neural Network model developed in deep learning. It contains 16 layers comprising 13 convolution layers, and 3 fully connected layers, as illustrated in Figure 28.
Schematic block diagram of VGG 16.
It is the fourth Convolutional Neural Network model developed in the field of deep learning. It has 19 layers comprising 16 convolution layers, and 3 fully connected layers, as illustrated in Figure 29.
Schematic block diagram of VGG 19.
It is the fifth Convolutional Neural Network model developed in the field of deep learning. It has 121 layers comprising 120 convolution layers and 1 fully connected layer, as illustrated in Figure 30.
Schematic block diagram of Dense Net 121.
It is the seventh Convolutional Neural Network model developed in deep learning. It is named after squeeze and expand layers. Generally, it is 18 layers deep, containing fire blocks. Fire blocks comprise the desired number of squeeze and expand filters, as illustrated in Figure 31. Squeeze layers have 1 × 1 filters, whereas the expanded layer combines 1 × 1 and 3 × 3 filters.
Schematic block diagram of Squeeze Net.
The mobile net is compatible with mobile operating systems and can be embedded in mobile devices. The modified CNN-based mobile network [6, 15] is used in this research. It is based on the idea of depth-wise separable convolution, where convolution takes place in two steps, namely those of depth-wise and point-wise convolutions. Depth-wise separable convolution takes less complexity of the model as the computations become lower in the mobile network. Figure 32 shows a modified mobile network model module, where depth-wise convolution uses a 3 × 3 kernel, batch normalization, and ReLU activation function, followed by a 1 × 1 normal convolution [9, 18], batch normalization, and ReLU activation function.
Schematic block diagram of Mobile Net 5.9 Inception Net Model.
It is a sparsely connected network that runs deep. It contains multiple numbers of inception modules connected in series. The inception module comprises the max pool and various filter sizes of 1, 3, and 5 at the same layer. Thus, multiple convolutions occur at the same layer, followed by a concatenation operation, as shown in Figure 33.
Schematic block diagram of Inception Net 5.10 Residual Net Model.
It is an intense network that uses skip connections. As the number of layers increases, an issue called a vanishing gradient occurs. This problem can be resolved by residual blocks used in the residual network, as illustrated in Figure 34.
Schematic block diagram of Residual Net.
There are different means to compute the effectiveness of any deep learning [106, 107] algorithm. These are Accuracy, Precision, Recall, and F-Measure. To calculate these parameters, the numbers of true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN) counts are needed, as given below.
Figure 35 compares all CNN-based deep-learning models [43, 80, 93] used for different datasets. It is observed that, in comparison with other models, complex deep-learning models like inception networks, residual networks, and Xception networks work the best [81, 91, 92]. Figure 36 illustrates a deep neural network based model architecture containing multiple hidden layers comprising several thousand nodes.
Figure 37 illustrates a block diagram of a machine-learning algorithm like Random Forest that can be applied to smaller datasets and has a decent accuracy in the case of classification [104, 105] of PCG signals. It is an ensemble method that works on the supervised machine-learning concept principle. The following table is based on the various research works on Heart Sound analysis using different deep-learning models.
Architecture of a machine-learning algorithm like Random Forest [146, 147].
Table 5 compares different CNN-based models during the training and validation phase used over the last 3–5 years for valvular Heart Sound analysis [44, 78, 94]. The analysis showed that training and validation loss decreases drastically as we move towards more complex models like inception, residual, and Xception networks. At the same time, training and validation accuracy increases effectively to a more considerable extent for a large number of heart data samples [79, 101]. In this paper, different deep-learning methods are used for cardiac sound analysis. The following study explains in Table 5 how efficient network B3 works best for PCG signal categorization in terms of evaluation metrics. Table 6 depicts the study [45, 102, 103] of performance metrics in different CNN-based deep-learning methods applied for cardiac sound analysis. The following table is based on the research carried out over the last 3–5 years on Heart Sound analysis using various deep-learning models.
Comparison of training and validation performance metrics of CNN-based deep-learning models.
|
||||
---|---|---|---|---|
LENET-5 | 0.4352 | 0.7089 | 0.4757 | 0.6799 |
Alex Net | 0.3476 | 0.7379 | 0.3837 | 0.7199 |
VGG16 | 0.2979 | 0.8102 | 0.3134 | 0.7978 |
VGG19 | 0.2692 | 0.8709 | 0.2899 | 0.8469 |
DENSENET121 | 0.2476 | 0.9087 | 0.2665 | 0.8876 |
Squeeze Network | 0.2097 | 0.9265 | 0.2098 | 0.8906 |
Mobile Network | 0.1576 | 0.9435 | 0.1376 | 0.9073 |
Inception Network | 0.0472 | 0.9843 | 0.0654 | 0.9863 |
Residual Network | 0.0432 | 0.9856 | 0.0533 | 0.9892 |
Xception Network | 0.0320 | 0.9951 | 0.0325 | 0.9926 |
Comparison of performance metrics in various CNN-based deep-learning models.
|
||||||
---|---|---|---|---|---|---|
LeNet-5 | 68.97 | 69.75 | 67.44 | 66.68 | 1086 | 1135 |
Alex Net | 72.34 | 70.74 | 73.88 | 71.24 | 1233 | 1337 |
VGG16 | 74.08 | 75.29 | 75.08 | 75.13 | 1352 | 1011 |
VGG19 | 82.17 | 83.33 | 82.19 | 84.21 | 1343 | 947 |
DenseNet121 | 92.47 | 93.55 | 93.47 | 94.48 | 1201 | 937 |
Squeeze Network | 93.57 | 92.65 | 94.79 | 93.56 | 1098 | 1012 |
Mobile Network | 95.09 | 94.35 | 95.68 | 93.46 | 890 | 987 |
Inception Network | 96.96 | 98.17 | 98.96 | 97.02 | 710 | 974 |
Residual Network | 97.32 | 98.42 | 98.32 | 98.35 | 783 | 1056 |
Xception Network | 99.13 | 98.18 | 98.43 | 99.19 | 750 | 865 |
Efficient Network-B3 | 99.49 | 98.62 | 98.72 | 99.37 | 786 | 854 |
Recently, Heart Sound analysis has been a trending domain of research. Researchers are developing different deep-learning models for PCG signal analysis and classification that can produce more accurate computations for the user and, simultaneously, be handy in the early detection of valvular heart diseases.
In this review paper, the authors have touched upon different points of feature extraction techniques, preprocessing methods, segmentation methods, and CNN-based classification methods using deep learning. The preliminary concept of Heart Sound signal analysis is mentioned, and along with the basic idea, different signal processing algorithms adopted in Heart Sound signal analysis are discussed. Finally, the performance metrics of various deep-learning methods applied to various Heart Sound datasets are addressed. The other applications of Heart Sound signal analysis in various fields are summarized in this paper. The entire analysis has been done by comparing different research works done in this field. The conventional acoustic stethoscopes used during auscultation are inadequate for the detection and diagnosis of heart anomalies in human subjects. Conventional stethoscopes are characterized by a lack of efficiency when it comes to use in the classification of various Heart Sounds for early prediction of heart diseases. The need of the hour is to work with electronic stethoscopes having Raspberry Pi to ensure an intelligent way of overcoming the deficiency faced in the use of acoustic stethoscopes. The electronic stethoscope not only is efficient but also can, at the same time, address the issue of ambient noise while collecting the real-time Heart Sound. It is also a portable, low-cost option, and features a low screening time for the diagnosis and analysis of Heart Sound.