A Bearing Fault Diagnosis Model based on Minimum Average Composite Entropy and Parallel Attention Mechanism Convolutional Neural Network
Pubblicato online: 14 ago 2025
Pagine: 178 - 189
Ricevuto: 28 feb 2025
Accettato: 27 mag 2025
DOI: https://doi.org/10.2478/msr-2025-0022
Parole chiave
© 2025 Zhen Zhang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
Bearings are the “joints” of rotating machinery, which often have to operate under extreme conditions such as high temperatures, high pressure, high speed, high intensity, and variable loads. Failure of these components can severely compromise the safety of the equipment during operation. Therefore, the study of bearing fault diagnosis technology is crucial to minimize economic and life losses [1], [2].
Current bearing fault diagnosis technologies can be mainly categorized into two types: methods based on signal processing and those based on artificial intelligence. Currently, scientific researchers often study one type of fault diagnosis method in isolation. However, it is important to recognize that these two approaches are not independent but rather form a unified whole. It is a challenge to effectively improve the accuracy of fault diagnosis by focusing on only one type of method. For example, artificial intelligence (AI) techniques can help with the parameter optimization of fault diagnosis methods and the classification of faults within signal processing. Conversely, signal processing methods can improve the data quality of AI techniques and thus increase the accuracy and generalization capability of AI-based fault diagnosis methods.
The fault signal feature extraction method represented by variational mode decomposition (VMD) can effectively solve the modal aliasing phenomenon in empirical mode decomposition (EMD) and has attracted the attention of most researchers. However, in VMD, the penalty parameters α and the decomposition layer
Compared to back propagation (BP), convolutional neural network (CNN) has the advantage of detecting the deep connections of the data, preventing overfitting and having high diagnostic accuracy, which has attracted much attention in the academic community [11]. Chao used a CNN to identify the high-speed axial plunger pump cavitation intensity, which greatly improves the identification accuracy [12] Bao et al. used CNN to identify time-varying stress data and effectively identified the failure damage of a pipe rack of an offshore platform [13]. Jiang applied CNN to the fault diagnosis of nuclear power plant bearings. Compared with other methods, the fault diagnosis model trained with CNN has better anti-noise and generalization ability [14]. However, the above CNN-based fault diagnosis methods are based on one-dimensional images and grayscale graphs. According to research, CNN for two-dimensional image learning ability is much higher than that for one-dimensional signals, and the grayscale graph cannot clearly represent the relationship between signal amplitude and time, and it is easy to ignore the bearing fault signal in the time domain [15]. The attention mechanism (AM) continuously updates the feature weight according to the loss value obtained by network training, and gives each feature the weight corresponding to the weight, so that the training of the CNN is more effective. Han has used AM for fault diagnosis of transmission machinery, which improves the computational efficiency and diagnostic accuracy of CNN and achieves good results [16]. Xiang used AM to fault diagnose the fan fault, and the trained model can predict the abnormal state of the fan [17]. Xu applied AM to solve the problem of low performance of CNN in complex working environments, and the results show that the proposed model has good reliability in Guangdong bearing fault diagnosis [18]. It should be noted that the focus of AM itself is not the same, and there is obviously some deficiency in analyzing complex time-varying non-stationary fault signals with a single attention mechanism [19].
Based on the above analysis, an innovative feature parameter called minimum average composite entropy (MACE) is introduced in this paper, which can effectively compensate for sporadic noise and fault sensitivity. Using this parameter as a fitness function for the optimization algorithm of the important VMD parameters improves the quality of the input signals. In this paper, an innovative fault diagnosis algorithm based on a parallel fusion attention convolutional neural network (PFACNN) is proposed. By combining with MACE, this algorithm significantly improves the accuracy and generalization ability of the fault diagnosis model under complex noise working conditions. The research framework is shown in Fig. 1.

Research block diagram.
In actual production, fault-bearing signals often contain a large amount of noise and invalid signals. If these cannot be processed correctly, this significantly impairs the diagnostic accuracy of intelligent diagnosis algorithms. VMD can represent input signals as a set of modal functions, which is widely used in signal processing. Reference is made to the decomposition process in the literature [20]. When applying VMD, the mode number
When the bearing fault signal

Fig. 2 shows the schematic diagram of
As can be seen in Fig. 2,

Fig. 3 shows the variation of
As can be seen in Fig. 3, the data show that
After normalizing
Therefore, the MACE (
DBO is an optimization algorithm that simulates the behavior of the praying mantis and offers the advantages of a smaller number of parameters and rapid global search capabilities, as described in [23]. To solve the problems of slow convergence and susceptibility to local optima, the Lévy flight strategy is incorporated into the Mantis optimization algorithm in this paper to improve its global search proficiency. The Mantis position update formula is:
After optimizing the key parameters of the VMD with I-DBO, the optimal number of parameter modes
Fig. 4 shows the algorithm flow of the signal processing module.

Schematic diagram of the variation with defects.
The Gramian angular field (GAF) can convert one-dimensional waveforms into two-dimensional image mapping in the Cartesian coordinate system. Since it can well reflect the temporal characteristics of signals and has the advantage of fewer parameters to be adjusted and no need for human selection of wavelet basis, the GAF is highly appreciated by the academic community [25]. In the Gramian angular field mapping, the time series
From
As shown in (11), the GAF can clearly express the temporal relationship of the signal. After the time series of the signal is mapped to the polar coordinates, the time series can be calculated by the angle and angle difference of each time point, so that GAF and field (
As shown in (12), the length
The
During model operation, CNN disregards the correlation between each channel and spatial information, which leads to problems such as uneven resource distribution and lower resolution accuracy. The attention mechanism, on the other hand, focuses on prominent features, helping to improve the model training speed and resolution accuracy. Currently widely used attention mechanisms include the Channel Attention Module (CAM) and the Spatial Attention Module (SAM) [27]. The CAM evaluates the importance of each feature map and assigns different weights depending on its importance. The expression for the channel attention mechanism in this paper is:
Then the output

Schematic diagram of the channel attention mechanism.
The spatial attention mechanism is mainly that the input spatial information is transformed into another space by the spatial transformation module and while retaining the key position information, the output of each spatial information is weighted to distinguish the importance of the position. The channel attention mechanism

Schematic diagram of the spatial attention mechanism.
Overall, CAM focuses more on the “what” and SAM focuses more on the “what position”. Different information of attention can train different deep models.
Based on the above analysis, this paper proposes a fault diagnosis model of CNN based on the minimum average compound entropy and the parallel fusion attention mechanism CNN. The model consists of a signal processing module, a signal conversion module and a fault diagnosis classification module. In the signal processing module, the

MACE + PFACNN model structure diagram.
The bearing fault signal data from Case Western Reserve University (CWRU) has become the standard experimental data to verify the bearing fault extraction algorithm [28]. The data signal was obtained from the vibration signal at the driving end of the experimental system with a sampling frequency of 12000 Hz. In this test, discharge processing technology is used to determine the single point damage of the rolling bearing. The bearing model is 6205-2RS JEM SKF ball bearing, and the structural parameters are shown in Table 1. Fig. 8 shows the time domain waveform of the outer ring fault. The MACE is used as the fitness function of the improved Mantis algorithm and Fig. 9 shows the fitness function change curve. The amplitude is normalized.
Experimental parameters.
Inner diameter [mm] | Pitch diameter [mm] | Thickness [mm] | Outer diameter [mm] | Rolling diameter [mm] | Contact angle [°] |
---|---|---|---|---|---|
25 | 39 | 15 | 52 | 8 | 0 |

Time domain waveform of outer ring signal.

Fitness function.
As can be seen in Fig. 8, the fault signal contains a lot of noise and it is difficult to recognize the fault information. As can be seen in Fig. 9,
As can be seen in Fig. 10,

Entropy change curve.
As can be seen in Fig. 11(a), the reconstructed signal can clearly show the fault frequency information of the fourth order bearing inner ring. The repeated impact phenomenon shown in Fig. 11(b) is also clearer than Fig. 8. Fig. 11 proves the effectiveness of the proposed signal feature extraction method.

Analysis results after MACE-VMD.
To verify the classification accuracy of the proposed fault diagnosis model, inner ring, outer ring, and rolling element faults with a size of 0.18 mm, 0.36 mm and 0.54 mm were selected, along with a set of normal bearing faults, totaling 10 datasets. The data sample length is 2048, with 100 data points selected for training for each set. The training set randomly selects 70 sets, while the test set randomly selects 30 sets. The types of datasets and the selection of training and test sets are shown in Table 2. The diagnostic results of the model are shown in Table 3.
Dataset classification.
Fault location | Failure diameter [mm] | Tag | Dataset |
|
---|---|---|---|---|
Training set | Test set | |||
Regular | 0 | 1 | 70 | 30 |
Inner ring | 0.18 | 2 | 70 | 30 |
0.36 | 3 | 70 | 30 | |
0.54 | 4 | 70 | 30 | |
Outer ring | 0.18 | 5 | 70 | 30 |
0.36 | 6 | 70 | 30 | |
0.54 | 7 | 70 | 30 | |
Rolling body | 0.18 | 8 | 70 | 30 |
0.36 | 9 | 70 | 30 | |
0.54 | 10 | 70 | 30 |
Fault diagnosis results.
Category | Accuracy [%] | Category | Accuracy [%] |
---|---|---|---|
1 | 100 | 6 | 100 |
2 | 100 | 7 | 100 |
3 | 100 | 8 | 99.3 |
4 | 100 | 9 | 99.3 |
5 | 100 | 10 | 100 |
As shown in Table 3, MACE + PFACNN accuracy for the 10 categories in the CRWC dataset, 9 categories achieved 100 % accuracy, recall and F1 score, and only category 8 was misclassified. The fault accuracy of the MACE + PFACNN model is 99.3 %. The high accuracy classification of multi-category fault signals shows the good classification performance of this algorithm.
To test the influence of the signal processing module, the signal transformation module and the fault diagnosis module on model feature classification extraction, the model was ablated by deleting or replacing modules. The classification results of the different module combinations are shown in Table 4:
Results of ablation experiment.
Module |
Model | Accuracy [%] | ||
---|---|---|---|---|
1 | 2 | 3 | ||
× | × | √ | A | 91.2 |
× | √ | √ | B | 93.7 |
√ | × | √ | C | 97.4 |
√ | √ | × | D | 95.3 |
√ | √ | √ | E | 98.9 |
As can be seen in Table 4, the signal feature extraction module has the largest impact on the overall diagnostic accuracy in the ablation experiments. Using the same CNN diagnostic module, the Gramian angular field transformation can improve the diagnostic accuracy by 2.5 %, while using the signal feature extraction module can increase the diagnostic accuracy by 5.2 %. The combination of the signal feature extraction module and the fault diagnosis module outperforms the other module combinations. The experiments show the advantages of the proposed fault signal feature extraction module and the rationality of the proposed model.
To demonstrate the advantages of this algorithm, we will use the same dataset to analyze the noise resistance performance of popular algorithms from recent years. The comparative experiments are mainly divided into two categories: one is a combination of one-dimensional signal processing module + CNN, and the other is a combination of two-dimensional signals + CNN.
The one-dimensional signal processing module + CNN uses ensemble empirical mode decomposition (EEMD) to preprocess the signal and takes the classical correlation coefficient as the basis for extracting the effective modal components. The modal components with the highest correlation coefficient are selected as input features of the CNN. Three network modules, including CNN, CNN support vector machine (CNNSVM), and CNNBiGRU [29],[30],[31], were selected.
Two-dimensional signal + CNN selects three types of models with higher accuracy for comparative verification: grayscale image + CNN, continuous wavelet transform + CNN, and Gramian angular field feature parameters + parallel CNN. Fig. 12 shows the experimental results of the proposed algorithm and the comparison algorithms. Table 5 shows the accuracy, recall rate, and F1 results of the five methods under 0 dB working conditions. All data is the average value after 50 Monte Carlo experiments.

Analysis results of noise resistance.
Results of 0 dB white noise comparison test.
Model | MACE + PFACNN | IF+ CNN | M+ CNN | E+CNN | E+ CNNSVM |
---|---|---|---|---|---|
Accuracy [%] | 89.2 | 64.5 | 51.6 | 67.8 | 72.4 |
Recall rate [%] | 89.8 | 64.9 | 52.1 | 68.2 | 72.9 |
F1 [%] | 89.7 | 64.6 | 51.8 | 67.7 | 72.6 |
As can be seen in Fig. 12, the IF + CNN method with missing signal processing module, M + CNN and G + DCNN bearing fault diagnosis model can maintain good fault diagnosis accuracy under high signal-to-noise ratio conditions, but the low noise threshold accuracy and the fault diagnosis accuracy decreases rapidly below 3 dB. Improving the model diagnosis module can improve the base stage diagnosis accuracy, but cannot improve the fault diagnosis credibility threshold. The fault diagnosis model with the fault signal feature parameter extraction module can improve the confidence threshold of the diagnosis model, but the overall fault diagnosis accuracy is low due to the difference between one-dimensional and two-dimensional signals and the lack of fusion attention mechanism in neural network resolution. The proposed algorithm adds the fault signal feature extraction module based on the minimum average fusion entropy and the CNN fault diagnosis module of the parallel fusion attention mechanism, so that it has better diagnostic confidence threshold and diagnostic accuracy. As shown in Table 5, under the condition of 0 dB, MACE + PFACNN has the highest accuracy, precision and F1. The fault diagnosis model equipped with a signal processing module has much higher accuracy than the Missing model.
To further study the diagnostic accuracy of the complex noise model, a random periodic occasional pulse signal and white noise are added to the original signal to test the diagnostic accuracy of the algorithm and the comparison algorithm. The results are shown in Fig. 13. Table 6 shows the accuracy, recall rate, and F1 results of the five methods under 5 dB working conditions. All data is the average value after 50 Monte Carlo experiments.

Analysis results of the ability to resist complex noises.
Experimental results of 5 dB complex noise comparison.
Model | MACE + PFACNN | RVMD+ DCNN | RVMD+ CNN | E+CNN BiGRU | E+CNN SVM |
---|---|---|---|---|---|
Accuracy [%] | 91.3 | 80.1 | 52.1 | 54.2 | 48.3 |
Recall rate [%] | 91.6 | 80.5 | 52.7 | 54.9 | 48.8 |
F1 [%] | 91.4 | 80.3 | 52.5 | 54.6 | 48.6 |
As can be seen in Fig. 13, the diagnostic accuracy of all algorithms decreases after the addition of sporadic impact noise. The two types of algorithms that replaced the signal feature extraction module are less robust to sporadic impact noise and fail to accurately extract the signal's characteristic components. Both the diagnostic accuracy threshold and the overall accuracy have decreased significantly. In contrast, the three algorithms using the minimum average fusion entropy signal feature extraction module proposed in this paper show a more stable decrease. The MACE + PFACNN model is over 15 % higher than the comparison model. It can also be observed that the model accuracy of the missing signal transformation module is higher than that of the missing fault diagnosis module.
As shown in Table 6, under the condition of 5 dB complex noise, the fault accuracy, recall rate and F1 of the MACE + PFACNN algorithm are much higher than those of the comparison algorithm. Table 6 demonstrates the great advantages of the algorithm in this paper under the condition of complex noise.
In actual production, it is unavoidable that bearings operate under variable load conditions. To verify the generalization ability of this model under variable load conditions, this paper uses bearing data from Xichang University for validation, selecting three bearing types with capacities ranging from 1 to 3 HP. Both the model in this paper and the comparison model perform 50 Monte Carlo simulations, taking the average of the training results. The analysis results of the model in this paper are shown in Fig. 14, and the experimental results are shown in Table 7.

Analysis results of the generalization ability.
Generalization experiment results – Accuracy [%].
Model | MACE + PFACNN | E+CNN | E+SVM | IF+CNN | M+DCNN |
---|---|---|---|---|---|
|
|
|
|
|
|
I-II | 95.8 | 86.5 | 88.7 | 92.1 | 93.4 |
II-I | 96.0 | 88.7 | 89.3 | 90.1 | 93.5 |
I-III | 94.9 | 89.9 | 89.6 | 91.1 | 94.3 |
III-I | 95.6 | 88.9 | 90.1 | 92.6 | 94.2 |
II-III | 96.6 | 90.1 | 89.4 | 91.4 | 94.9 |
III-II | 94.8 | 90.0 | 91.1 | 92.1 | 94.3 |
Mean | 95.6 | 89.0 | 89.7 | 91.5 | 94.1 |
In Fig. 14, I-II indicates that the model was trained with I as the source domain and tested for its generalization ability with II as the target domain. Table 7 shows the accuracy [%] of each method. As can be seen in Fig. 14 and Table 7, the average accuracy of E + CNN and E + SVM is below 90 %. The average accuracy of IF + CNN and M + DCNN is slightly higher than that of the former two. The proposed MACE + PFACNN model has an accuracy of 95.67 %, which is 1.53 % higher than the highest M + DCNN model among the comparison algorithms. Fig. 14 and Table 7 show that the MACE + PFACNN model has good generalization ability.
The test equipment consists of the SpectraQuest rotating mechanical fault test bench, dynamic acceleration sensors from the Yangzhou Branch, and Bentley displacement sensors in conjunction with the LMS data acquisition equipment. The experimental speed is set to 2700 rpm, and MBER-12 K is used as the fault bearing model. Composite faults, which include ball, inner ring, and outer ring faults, are detected. The speed is kept constant at 2700 rpm and the LMS SCADAS mobile data acquisition system is used to acquire the vibration signals from the acceleration sensor at a sampling frequency of 12.8 kHz. Fig. 15 shows the layout of the experiment. Various faults are simulated in this experiment, including roller faults, inner ring faults, outer ring faults, and their combinations. Table 8 is the classification table of the data set. Fig. 16 is the confusion diagram and classification visualization diagram of MACE+PFACNN fault diagnosis.

Experimental layout.
1. Motor, 2. Coupling, 3. Acceleration sensor, 4. Bearing housing I, 5. Spindle, 6. Rotor, 7. Acceleration sensor, 8. Bearing housing II, 9. Bearing I, 10. Bearing II.
Dataset classification.
Fault location | Failure diameter [mm] | Tag | Dataset |
|
---|---|---|---|---|
Training set | Test set | |||
Regular | —— | 1 | 70 | 30 |
Inner ring | —— | 2 | 70 | 30 |
Outer ring | 90° | 3A | 70 | 30 |
135° | 3B | 70 | 30 | |
Regular | —— | 4 | 70 | 30 |
Compound failure | outer 90° | 5A | 70 | 30 |
outer 135° | 5B | 70 | 30 |

Fault diagnosis results.
As can be seen in Fig. 16 and Fig. 17, the MACE + PFACNN algorithm showed high classification accuracy and its classification accuracy for the five categories reached 100 %. More importantly, the classification accuracy is still 100 %.

Visualization results after diagnosis.
To further investigate the generalization ability of the bench test model, we obtained two single-fault test datasets with generalized abilities by changing the fault location in the outer ring: Dataset 3A and Dataset 3B, and two composite-fault test datasets with generalized abilities: Dataset 5A and Dataset 5B. Both our method and the comparison method used Monte Carlo simulations for 50 trials, and the average of the training results was taken as the final result. Fig. 18 shows four sets of generalized ability test data, with one operating condition as the source domain and another as the target domain. Specifically, Dataset 3A-3B indicates that Dataset 3A is the source data and Dataset 3B is the target data, with other numbered datasets following similar conventions. The generalization ability test results are shown in Table 9. Table 9 shows the accuracy [%] of each method.

Analysis results of generalization ability.
Generalization experiment results – Accuracy [%].
Model | MACE + PFACNN | E+CNN | E+SVM | IF+CNN | M+ DCNN |
---|---|---|---|---|---|
|
|
|
|
|
|
3A-3B | 97.98 | 91.14 | 91.12 | 89.99 | 90.11 |
3B-3A | 93.64 | 92.13 | 88.96 | 90.96 | 89.11 |
5A-5B | 93.87 | 90.11 | 89.13 | 86.57 | 90.40 |
5B-5A | 92.01 | 89.41 | 90.11 | 88.76 | 89.13 |
Mean | 94.37 | 90.69 | 89.83 | 89.07 | 89.68 |
As shown in Fig. 18 and Table 9, the average classification accuracy of the four comparison models is about 90 %, while the accuracy of the model proposed in this paper is 94.37 %, which is 3.68 % higher than that of the best-performing E + CNN model among the comparison algorithms. Moreover, the classification accuracy of single faults in this model is higher than that of composite faults. Fig. 18 and Table 9 show that the model proposed in this paper has good generalization abilities.
Given the low classification accuracy, low noise resistance, and weak generalization of intelligent diagnostic models for rotating machinery bearings, this paper focuses on the comprehensive optimization of the fault diagnosis models. The paper proposes an innovative fault signal feature extraction module based on MACE. This algorithm uses minimum composite entropy as a fitness function for the VMD key parameter optimization algorithm, which improves the quality of fault data. A novel PFACNN is also presented, which improves the accuracy of fault diagnosis. The main conclusions of this paper are:
Experimental data and bench tests show that the proposed MACE + PFACNN bearing fault diagnosis model achieves excellent classification accuracy with rates of 99.3 % and 100 %, respectively.
The fault signal processing improvement module based on MACE possesses high fault sensitivity and robustness to complex noise and effectively extracts fault signal features. It has a significant impact on the classification accuracy of the entire model.
Compared with the comparison model, the MACE + PFACNN bearing fault diagnosis model exhibits superior noise resistance, especially in the presence of composite noise, with resistance much higher than that of the comparison model.
In addition, the MACE + PFACNN bearing fault diagnosis model shows robust generalization ability under variable load conditions and variable fault locations. The classification accuracy for these two conditions is 1.53 % and 3.68 % higher than that of the best-performing comparison model, respectively.
The discussion of the generalization ability of the fault diagnosis model proposed in this paper refers only to the generalization ability of faults in different positions. The generalization ability at variable speed has not yet been investigated. Further research on the generalization ability of the fault diagnosis model for variable speed will be conducted. It should be a good method to convert the variable speed problem into the constant speed problem by using appropriate signal processing methods, e.g., tacholess order tracking (TLOT).
The fault diagnosis model proposed in this paper has good accuracy only for fault classification, but it has not addressed the solution of bearing fault degree. Further research on the fault diagnosis model will be conducted in the future to evaluate the fault degree.