Open Access

Optimization Driven Variational Autoencoder GAN for Artifact Reduction in EEG Signals for Improved Neurological Disorder and Disability Assessment

, , , , ,  and   
Feb 24, 2025

Cite
Download Cover

Introduction

Numerous studies have shown that neurological problems are increasing at an alarming rate. The WHO states that one in four people worldwide will experience neurological problems at some point in their lives [1]. Neurological diseases are the second most common disease worldwide after ischemic heart disease. Neurological diseases affect both the brain and the nervous system of the human body [2]. Many neurological disorders are well-documented and rather prevalent, while many others are uncommon. Neurological difficulties include a variety of conditions, including epilepsy, learning disabilities, neuromuscular disorders, autism, Alzheimer's disease, attention deficit hyperactive disorder (ADHD), multiple sclerosis, Parkinson's disease, sleep problems, and cerebral palsy. Mental illnesses are classified as “psychiatric diseases” and are primarily characterized by abnormalities in cognition, emotion, or behavior that lead to suffering or functional impairment. A variety of brain-imaging modalities are available to diagnose neurological disorders, including positron emission tomography (PET), near infrared spectroscopy (NIRS), magnetoencephalography (MEG), electroencephalography (EEG), and functional magnetic resonance imaging (fMRI) [3]. This paper emphasizes EEG analysis due to its cost-effectiveness, non-invasiveness, and portability, making it a widely used approach. An EEG systematically monitors and records the brain's electrical activity to assess the cerebral processes. Many studies use EEG data to detect neurological diseases, neurodevelopmental problems, acute neurological events, and patient behavior [4]-[5]. Traumatic brain injury is the leading cause of disability and death in children worldwide. Over five million Americans are disabled as a result of a traumatic brain injury. Researchers believe that a computer-aided diagnosis (CAD) system trained on extensive patient data and physiologic signals and images using advanced signal processing and AI/ML techniques can help neurologists, neurosurgeons, radiologists, and other medical professionals improve clinical decision-making. Research in this area has increased significantly over the last ten years.

In [6], a generalized EEG neural network (GENet) architecture is developed based on a convolutional neural network, which is able to identify various neurological disorders based on EEG data. This paradigm facilitates the execution of the essential functions for the categorization process. In [7], a deep neural networks (DNN)-based hybrid ensemble feature selection (HEFS) Framework for Parkinson's disease identification was proposed. Multi-level dimensionality reduction (MLDR) is applied to HEFS matrices. After normalizing the matrix scores, merging the scores, reconstructing a new dataset, and reducing the features using neighborhood component analysis (NCA), an accuracy rate of 97.08 % and an F1-score of 98.10 % were achieved. In [8], a unique expert system was introduced that utilizes just EEG information for the early diagnosis of schizophrenia. A deep learning network was developed to improve the accuracy of the image categorization outcomes.

In [9]–[11], the use of variable-frequency complex demodulation (VFCDM) and convolutional neural networks (CNN) to differentiate between healthy, interictal, and ictal states was investigated using EEG data. Time frequency spectrum (TFS) shows frequency changes across different states that correspond to fluctuations in brain activity. The LOSO CV method routinely achieves good performance, ranging from 90 % to 99 % across different combinations of healthy and epileptic states. In [12]–[14], the EEG temporal spatial network (ETSNet) is introduced, which includes a Squeeze-and-Excitation Block and several CNNs tailored for eyes-open and closed resting states. Several limitations are evident form the above studies:

The lack of standardized assessment measures and datasets makes comparison difficult.

The computational complexity of some deep learning models such as CNN with long short term memory (LSTM) may limit their practical use.

Proposed Methodology

An optimization-enhanced variational autoencoder generative adversarial network (OE-VAE-GAN) for artifact reduction in EEG signals could be a robust approach for cleaning EEG data, especially in clinical and research contexts where artifact presence (e.g., due to muscle movements, eye blinks, or ambient noise) compromises the quality of data analysis, as shown in Fig. 1.

Fig. 1.

BSO-VAE-GAN architecture for artifact reduction.

Probabilistic variational autoencoder (PVA) based filtering

First define the low-pass filter (LPF) and high-pass filter (HPF): LPFt=Stwti=0T1Siwi LPF\left[ t \right] = {{S\left[ t \right] \cdot w\left[ t \right]} \over {\sum\nolimits _{i = 0}^{T - 1}S\left[ i \right] \cdot w\left[ i \right]}} HPFt=δtLPFt HPF\left[ t \right] = \delta \left[ t \right] - LPF\left[ t \right] where S[t],w[t], and δ[t] represent the sinc filter, the Hamming window, and the discrete unit impulse function. PVA consists of five main parts: feature extractor, encoder, sampler, feature generator, and signal reconstructor. The model's components work together to synthesize EEG signals. The feature extractor gx(.) uses cascade filters to divide the input signal y into four amplitude-modulated subsets x ∈ {xHH,xHL,xLH,xLL}, which are the learning targets of the feature generator. The encoder ge(.) learns the distribution parameters of the latent variable z and cutoff frequency θ. It makes two assumptions:

θk~u(o,1) for k = 1,2 ... .6 stands for six occurrences in the proposed model. The Bernoulli distributions approximates this distribution.

zjNμzj,σzj2 {z_j} \sim N\left( {{\mu _{{z_j}}},\sigma _{{z_j}}^2} \right) for each dimension j ∈ {1,2, ... J}, where j is a hyperparameter that controls the number of dimensions.

The samplers gz(.) and gθ(.) provide a distinct data distribution estimate. The method uses reparameterization for backpropagation, sampling zj and θk to allow the gradients to flow across the network node. Using the encoder settings, feature generator gz′(.) creates four feature signals for the signal reconstructor. Signal reconstructor gy(.) uses the generated feature subsets to reconstruct the signals while preserving their fundamentals. Monte Carlo (MC) can be used to estimate the predicted log likelihood of reconstructing raw signals y from reconstructed feature signals x. Eq,(x,z,θ|y)[logp(y|x)]1Ll=1Llogpy|xl - {E_{q,(x,z,\theta |y)}}[log p(y|x)] \approx - {1 \over L}\sum\nolimits _{l = 1}^L log p\left( {y|{x^{\left( l \right)}}} \right) where L is the number of samples. The mean square error (MSE) was reduced instead of the negative log likelihood from (4) to maintain convergence stability. J(φy;y)=1Nn=1Nyngnxn2 J({\varphi _y} ; y) = {1 \over N}\sum\nolimits _{n = 1}^N{\left( {{y_n} - {g_n}\left( {{x_n}} \right)} \right)^2} where the sample index and the total number of training samples are n and N, respectively. The optimization of the latent variable Z reduces the Kullback-Leibler divergence (KLD) compared to the previous ones. Latent variable Z optimization under variational inference for p(z) requires a reduction of the KLD relative to the prior. Eq(z|y)q(θ|y)q(x|y,θ)logq(z|y)pz=KLD(q(z|y)pz {E_{q(z|y)q(\theta |y)q(x|y,\theta )\left[ {\log {{q(z|y)} \over {p\left( z \right)}}} \right]}} = KLD(q(z|y)p\left(z\right)

The Gaussian distribution KLD and a VAE reparametrization method were used for this optimization.

Generative adversarial network with optimization process

A GAN consists of two CNNs, a generator and a discriminator, with opposing conditional arguments. In the discriminator, we used the Patch-GAN to classify each patch as true or generated. The discriminator should punish local signal patches to accurately mimic high-frequency components. The GAN training total loss function is: LGANG,D=Ex,ylogDx,y+Ex,ylog1Dx,Gx,z {L_{GAN}}\left( {G,D} \right) = {E_{x,y}}\left[ {log D\left( {x,y} \right)} \right] + {E_{x,y}}\left[ {log \left( {1 - D\left( {x,G\left( {x,z} \right)} \right)} \right)} \right]

The generator (G) minimizes the loss function LGAN(G,D), while the discriminator D maximizes it to discriminate between the generated samples G(x,y) and the actual samples y. We provide the discriminator estimate error loss feedback to guarantee successful training. Therefore, the final goal function is: G*=ARGminGmaxDLGANG,D+γLL1G {G^*} = ARG\mathop {\min }\limits_G \;\mathop {\max }\limits_D \;{L_{GAN}}\left( {G,D} \right) + \gamma {L_{{L_1}\left( G \right)}} where LL1(G) is an extra L1 norm based on the generator function loss to approximate the ground truth output. γ is an adjustable parameter set to 100. The multi-modal feature-fusion of image and interaction information L_GAN LGAN causes the generative network to create the same pathological feature as the original signal: LGAN=n=1NlogDθxGθxIα,t {L_{GAN}} = \sum\nolimits _{n = 1}^N log D{\theta _x}\left( {G{\theta _x}\left( {{I^\alpha },t} \right)} \right) where t is the interactive information and N is the number of samples. We use the visual perceptual loss to maintain the perceptual similarity. The loss function is: LGAN/i.j=1Wi,jHi,ji=1Wi,jj=1Hi,j(φi,jIfφi,jGθgIα,t {L_{GAN/i.j}} = {1 \over {{W_{i,j}}{H_{i,j}}}}\sum\nolimits _{i = 1}^{{W_{i,j}}}\sum\nolimits _{j = 1}^{{H_{i,j}}}({\varphi _{i,j}}\left( {{I^f}} \right) - {\varphi _{i,j}}\left( {{G_{{\theta _g}\left( {{I^{\alpha ,t}}} \right)}}} \right)

Performance analysis
Dataset description

In this section, both study datasets are described. The CHB-MIT dataset [12] initially included 22 participants: 17 women aged 1.5–19 years and 5 men aged 3–22 years. The collection includes 198 seizures and 969 hours of EEG recordings. The number of seizures is lower than the number of seizure-free signals. The second dataset, KAU, was obtained at 256 Hz from two male scalp EEG patients aged 28 years. This dataset is similar to the CHB-MIT dataset. The subjects' ages were considered. The individuals in the CHB-MIT dataset are similar to these two cases. In both datasets, an age range of 1–28 years was chosen. This is significant as age considerably affects the clinical and electroencephalographic features of seizures [13]. Both individuals had 38-channel EEGs. They had two 495 s seizures and four 417 s seizures, respectively. The CHB-MIT dataset selects 18 out of 23 channels because they are similar to all recordings.

Experimental setup

The time-step value in this study varied from 0 to mt, with t set to 50. The prior and downstream tasks were each trained for 30 epochs with a batch size of 64. The network was trained using Python, PyTorch, and an RTX 3060Ti GPU. The BrOpt_VAGAN model with its complex VAE and GAN components can be very computationally intensive. Real-time EEG analysis requires significant hardware resources (e.g., GPUs, TPUs) that may not be available in all medical settings, especially for portable or low-power devices. Here, the modeling findings of the proposed artifact elimination method using the BrOpt_VAGAN network are quickly investigated. The simulation is performed by comparing the results with a number of well-known methods, including HEFS+DNN [12], VFCDM+CNN [14]. These are done to reduce artifacts due to random noise. The metrics used for assessment in this study include MSE and signal to noise ratio (SNR).

The MSE measure describes the difference between the real reaction and the intended response: MSE=1NOnDn2 MSE = {1 \over N}{\left( {{O^n} - {D^n}} \right)^2}

Table 1 shows the accuracy and error calculation for the proposed BrOpt_VAGAN method in terms of pseudo-clean and noisy input.

Accuracy performance of the proposed BrOpt_VAGAN model.

Mixtures of artifact components Accuracy [%] Error [%]
Pseudo-clean brain 98.5 12.41
eye 96.2 11.53
muscle 97.3 12.74

Noisy input brain 98.6 11.84
eye 95.9 11.90
muscle 93.5 12.56

Fig. 2 shows that BrOpt_VAGAN consistently achieves the lowest MSE % with values between 11.2 % and 12.6 % and thus has superior accuracy. In contrast, HEFS+DNN shows stable but higher MSE values (20.1 % to 20.9 %), while VFCDM+CNN has the highest MSE % across all channels (21.3 % to 21.76 %). This shows that BrOpt_VAGAN is the most effective method for error reduction regardless of the number of channels, outperforming both HEFS+DNN and VFCDM+CNN.

Fig. 2.

Comparison of MSE with the EEG+brain signal artifact.

Fig. 3 shows the MSE of HEFS+DNN, VFCDM+CNN, and BrOpt_VAGAN for EEG signals with eye signal artifacts across channels (2 to 10). HEFS+DNN has stable MSE values between 19.4 % and 19.5 % and thus shows a constant performance. VFCDM+CNN has the highest MSE % (21.5 % to 21.9 %), indicating that it is less effective at handling artifacts. BrOpt_VAGAN achieves lower MSE values (15.3 % to 15.9 %) than VFCDM+CNN but slightly higher than HEFS+DNN, suggesting moderate effectiveness in reducing artifacts. Overall, HEFS+DNN is the most stable, while BrOpt_VAGAN performs better than VFCDM+CNN in error minimization.

Fig. 3.

Comparison of MSE with the EEG+eye signal artifact.

Fig. 4 compares the MSE of the aforementioned methods when processing EEG signals with muscle signal artifacts over different numbers of channels. HEFS+DNN consistently shows the highest MSE %, ranging from 27.4 % to 28.6 %, indicating a significant error in handling muscle artifacts. VFCDM+CNN achieves lower MSE values, ranging from 19.2 % to 19.5 %, indicating moderate effectiveness in artifact reduction. BrOpt_VAGAN consistently achieves the lowest MSE %, with values between 12.3 % and 12.98 %, indicating superior performance in minimizing errors due to muscle artifacts. Overall, BrOpt_VAGAN is the most effective, followed by VFCDM+CNN, while HEFS+DNN is the least effective in this context.

Fig. 4.

Comparison of MSE with the EEG+muscle signal artifact.

Conclusion

The proposed study presents a BrOpt_VAGAN framework for automated classification of neurological disorders from raw EEG data. The experiments are performed with BrOpt_VAGAN, a publicly available benchmark dataset. The experiments were conducted under closed and open-eye conditions, as recommended in the CHB-MIT dataset publication and other research papers. The results show that the proposed method can improve the maximum performance by 98.7 % accuracy on the specified dataset using multiple channels. The performance improvement is shown for five-class classification, which confirms the effectiveness and efficiency of the BrOpt_VAGAN framework. Therefore, future work will focus on exploring other loss functions tailored to imbalanced data or incorporating ensemble methods that can also lead to improved accuracy. Furthermore, fine-tuning the self-supervised learning approach with larger and more diverse datasets could lead to better generalization and make the method more reliable in practical applications.

Language:
English
Publication timeframe:
6 times per year
Journal Subjects:
Engineering, Electrical Engineering, Control Engineering, Metrology and Testing