Implementation and Evaluation of Machine Learning Algorithms in Ball Bearing Fault Detection
Online veröffentlicht: 12. Apr. 2025
Seitenbereich: 22 - 29
Eingereicht: 12. Apr. 2024
Akzeptiert: 07. März 2025
DOI: https://doi.org/10.2478/msr-2025-0004
Schlüsselwörter
© 2025 Pavle Stepanić et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Monitoring and predicting the condition of industrial systems and processes during operation has increasingly become the subject of intensive research in recent years, as it serves as the basis for the Condition-Based Maintenance concept and the Predictive Maintance concept, which are integral concepts of Industry 4.0 [1]. The early detection of malfunctions allows operators to rectify the problem as part of regular maintenance and thus avoid additional, unplanned, and often very costly downtime due to sudden failures. Early fault detection implies a sufficiently long warning time, and thus more time for appropriate corrective action and rational planning of system repair. This directly translates to cost and time savings while increasing the productivity of the entire process.
When investigating the most common failures of electrical equipment in industrial processes and electromotor drives [2], it was found that significant production losses are caused by faults in electric motors. Various factors such as environmental influences (dust, temperature, vibrations), production or design errors, improper installation, incorrect use, wear due to long-term operation, abrasion, erosion, material aging, etc. can influence motor failure. Detection of motor components that most frequently fail varies from study to study, but three main categories stand out: stator failure, rotor failure, and bearing failure [3], [4]. Several studies [5],[6],[7],[8] have concluded that the majority of electrical machine failures are related to bearing failures.
Bearings are critical elements in machinery as they support rotating structures and much of the energy is transmitted through bearings during operation. Bearing failures lead to other major and more expensive machine failures. Therefore, bearings in most machines are regularly maintained and replaced to prevent major breakdowns. On the other hand, frequent, unnecessary disassembly and assembly of electrical machines is also a major cause of failures. The advantages of the condition-based maintenance concept for bearings, which is based on the monitoring and control of selected condition parameters (e.g., vibrations and sound) and only intervenes if the condition parameters exceed defined limit values, are therefore extremely beneficial for increasing the productivity of the industrial plant. In this way, the effective lifetime of the entire system is increased and unnecessary downtime is avoided.
As bearing failures are in most cases associated with increased vibration levels, vibration monitoring is crucial for the detection of bearing failures. However, the choice of methods for vibration monitoring and the probability of fault detection depends largely on the chosen vibration signal processing technique. Vibration signals are typically noisy due to interference from other vibration sources. One of the key steps in acquiring fault information is to increase the signal-to-noise ratio, which is difficult to achieve without processing vibration signals [9].
From a mathematical point of view, bearing fault detection is a process in which data from the measurement space and/or characteristic features are mapped into the fault space. The mapping process actually represents a problem of pattern recognition and classification [10],[11],[12]. For intelligent fault diagnosis methods to be widely used in industrial machinery, the applied techniques must be general and the system must be adjustable, calibrated, and autonomous. The integration of the mentioned features is a significant challenge when implementing intelligent fault diagnosis systems in plants. In the last thirty years, with the development of computer and information technologies, various methods for fault diagnosis have been developed, which can be categorized into three groups [13],[14],[15],[16],[17],[18],[19]:
statistical approach; application of artificial intelligence; model-based techniques.
Two approaches can be recognized in the application of artificial intelligence techniques: the application of classical machine learning (ML) algorithms and deep learning (DL) algorithms. Classical ML methods have been used successfully for decades to detect and categorize bearing faults. Recent developments in DL algorithms over the last ten years have demonstrated the superiority of DL-based methods in the fault feature extraction and classification. Most of the literature applying ML algorithms reports satisfactory results with classification accuracy above 90 %. To achieve even better performance in versatile operating conditions and noisy environments, DL-based methods are becoming increasingly popular to meet this requirement. The effectiveness of ML and DL algorithms in bearing fault diagnostics relies on the strong correlation of the extracted features with the underlying physical laws. Both approaches (application of ML or DL) in research are justified and depend on the problem under consideration. To optimize the selection of algorithms for bearing fault detection, the authors propose a structured approach consisting of three key considerations [20]:
Data size and quality: A thorough understanding of environmental conditions and operational variations is crucial for data collection. Similar to the setup environment considerations, data collected under simple, controlled conditions can be effectively analyzed using classical ML models. Conversely, datasets characterized by significant noise, variations in operating conditions, or abrupt changes require robust DL architectures with sufficient data augmentation and noise-resistant features.
By systematically considering these factors, engineers and researchers can make informed decisions about the optimal ML/DL approach for bearing fault diagnosis, ensuring higher accuracy and reliability in real-world applications.
The success of a model in performing the task for which it was developed is its most important characteristic. Therefore, there is a tendency to continuously improve models. Research on the application of classical ML algorithms in bearing fault diagnosis shows a trend towards the integration of advanced feature extraction methods (e.g., wavelet transforms, fast fourier transform (FFT) and optimization algorithms (e.g., genetic algorithms (GA), principal component analysis (PCA)) to improve model performance. Table 1 provides an overview of some studies of this type, with emphasis on success in performing the task.
Hybrid ML models.
Applied ML model | Research description |
---|---|
SVM with GA | A study applied SVM combined with GA to develop optimal classifiers for distinguishing healthy and faulty bearings in ASD systems, achieving 97.5 % accuracy [21]. |
SVM and ANN with CWT | This study explored the use of SVM and ANN alongside CWT to analyze frame vibrations during motor start-up, achieving 96.67 % accuracy with SVM and 90 % with ANN [22]. |
PCA and SVDD | PCA and SVDD were used to predict bearing failures, achieving 93.45 % accuracy [23]. |
GA-based SVM | A GA-based kernel discriminative feature analysis was combined with one-against-all multicategory SVMs (OAA MCSVMs) for fault diagnosis in low-speed bearings, achieving the highest reported accuracy of 98.66 % [24]. |
FEM and WPT with SVM | A hybrid approach integrating FEM, WPT, and SVM was proposed for fault classification, achieving 81 % accuracy for inner race faults and 79 % for rolling body faults [25]. |
FFT-based feature extraction with SVM | The frequency domain features derived from FFT were used to train an SVM model for bearing fault classification, achieving 87.35 % accuracy [26]. |
Note:
SVM - support vector machines GA - genetic algorithm ASD - adjustable speed drive ANN - artificial neural networks CWT - continuous wavelet transform PCA - principal component analysis SVDD - support vector data description FEM - finite element method OAA - one-against-all MCSVM - multicategory SVM WPT - wavelet packet transform FFT - fast fourier transform
Zhang et al. (2020) presented a valuable study in which they systematically summarized the existing literature on bearing fault diagnosis using DL algorithms [20]. Their extensive analysis provides a detailed overview of the applications of DL algorithms and their success rates, reaching about 99 % in some studies.
In this paper, the results of using ML algorithms for to detect ball bearing faults by analyzing vibration signals are presented. In contrast to the analyzed and listed research approaches (ML, DL, hybrid ML models), this study focuses on hyperparameter optimization of ML algorithms to improve classical ML models. This approach to the detection of ball bearing faults represents a novel contribution to research circles. Of particular value in this study are the results of the optimized ML model, which shows maximum performance in detecting ball bearing failures. The paper is divided into five sections. Section 2 describes the experimental part, i.e. the method of vibration signal acquisition using the developed ball bearing test bench. The digital processing of the collected data is described, which is done by extracting suitable statistical parameters in the time domain. The operating principle of the ML algorithms: K-nearest neighbor (KNN) and support vector machine (SVM) for fault detection are described in Section 3. The results of the bearing classification into healthy and defective are presented. Section 4 presents the optimization of the hyperparameters of the ML algorithms to improve the performance. Section 5 contains concluding remarks.
Fig. 1 shows the system used to detect and record rolling bearing vibrations. The vibrations are recorded using two sensors mounted radially on the bearing housing (sensor 1 and sensor 2 placed at a 90° angle). The sensors are piezoelectric accelerometers manufactured by Brüel & Kjær. Low-level signals from the accelerometers are fed through preamplifiers (Charge Amplifier Type 2635, Brüel & Kjær). The outputs of the preamplifiers were connected to two channels of the acquisition and recording system (TEAC LX-10 Recording Unit). The sampling frequency per channel was set to 96 kHz. The LX-10 was connected to a PC on which the parameters of the TEAC were set during recording using special software installed on the computer (LX Navi). The vibrations were measured at a shaft rotation speed of 1518 rpm or 25.3 Hz. The collected digital signals were recorded on a PC and then processed using standard digital signal processing tools.
By recording vibration signals, a database of damaged and healthy bearings, i.e. the training set, is formed. An experimental recording of the vibration signals of 196 bearings was performed. Of the 66 undamaged bearings, 44 bearings were new, while 22 healthy bearings were already in use. The remaining 130 bearings were damaged. The vibrations of each bearing were recorded simultaneously over a period of 10 seconds at a sampling frequency of 96 kHz using two radially arranged sensors.

The developed ball bearing test station.
Next, feature extraction was performed for each recorded vibration signal by digital processing in the time domain. The following statistical parameters were used to describe the vibration signals:
Arithmetic mean
Root mean square
Modified square mean
Skewness index
Kurtosis index
I factor
where
Based on the aforementioned statistical parameters, a 9-dimensional feature vector is formed to represent the characteristics of the recorded vibration signals of bearings. The nine extracted features thus serve as inputs, while the classification into healthy bearings or bearings with faults represents the outputs for the developed ML-based algorithms.
The world of ML is vast and includes many techniques and algorithms that can be used in various applications. The present research problem demonstrates the need to select ML algorithms designed for classification that belong to the category of supervised learning, which implies the need for an adequate dataset. The data presented in Section 2 were processed using two ML algorithms: KNN and SVM.
The KNN algorithm is a popular and simple algorithm for supervised learning. KNN classifies an unknown instance by finding the k instances from the training set that are closest to this instance with respect to a selected metric, and assigns it the class that is most common among the mentioned k instances. Let X denote the matrix of instances from the training set and Y denotes the matrix of instances to be classified. These matrices are treated as row vectors Euclidean distance: Minkowski distance:
Cosine distance:
Several variants of the KNN algorithm were used for the classification of ball bearings. Table 2 shows all variants of the KNN algorithms used and their success rate in classifying the tested ball bearings.
The k-nearest neighbor (KNN) classification.
Model No | Model name | Distance metric | Distance weight | Number of neighbors | Classification success rate |
---|---|---|---|---|---|
1 | Cosine KNN | Cosine | Equal | 10 | 98.5 % |
2 | Coarse KNN | Euclidean | Equal | 100 | 74 % |
3 | Fine KNN | Euclidean | Equal | 1 | 97.4 % |
4 | Weighted KNN | Euclidean | Squared inverse | 10 | 97.4 % |
5 | Medium KNN | Euclidean | Equal | 1 | 98 % |
6 | Cubic KNN | Minkowski | Equal | 10 | 98.2 % |
SVM is one of the key methods of ML. It is based on a clear geometric intuition. The SVM algorithm works by creating a decision boundary or hyperplane between the data to form different classes. The main problem of the SVM algorithm is that the hyperplane may not be an appropriate boundary between the classes. In practice, the boundary shapes may be arbitrary. Moreover, the soft margin formulation does not solve this problem, since the soft margin assumes that the hyperplane represents the shape of the boundary approximately correctly, but the data sometimes ends up on the wrong side. It may be necessary for the boundary to be circular. In this case, the straight line is clearly not a good approximation. Therefore, kernel functions are needed to determine different shapes of hyperplanes according to the data arrangement. Kernel functions map the data to a different, often higher dimensional space, with the expectation that the classes will be easier to separate after this transformation and complex non-linear decision boundaries may be simplified to linear boundaries in the higher dimensional mapped feature space. The kernel functions used for the development of SVM classification models are given below:
Radial basis function (RBF):
Linear: Polynomial:
Three types of SVM algorithms were used for the classification of ball bearings. Table 3 shows the variants of the SVM algorithms used and their success rate in classifying the tested ball bearings.
SVM classification.
Model No | Kernel function | Classification success rate |
---|---|---|
1 | Linear | 93.9 % |
2 | Polynomial ( |
99.5 % |
3 | RBF | 99 % |
Note: RBF - radial basis function
The choice of quality measures for the models depends on the type of problem to be solved and the desired results. In this case, general performance measures such as the Confusion Matrix and the receiver operating characteristic (ROC) curve, are selected for the developed models (Fig. 2 and Fig. 3).

Confusion matrix for the cosine KNN algorithm.

ROC curve for the cosine KNN algorithm.
All KNN models achieved high classification success rates, with the highest success rate of 98.5 % achieved by the cosine KNN model. The cosine KNN model calculates the distance metric using cosine with a number of 10 neighbors. The confusion matrix criterion showed that the cosine KNN model had 128 successful and 2 unsuccessful classifications for the first class (damaged ball bearings) and 65 successful and 1 unsuccessful classification for the second class (healthy ball bearings). The ROC curve is a metric that is additionally used as a measure of the classifier quality. Two values are calculated for each threshold: the true positive ratio (TPR) and the false positive ratio (FPR). For a given class
Similar to the KNN algorithms, the developed SVM models have also shown high classification success rates. The best performing SVM model is the SVM algorithm with a second-order polynomial kernel function, which achieves a success rate of 99.5 %. The aforementioned model provided only one incorrect classification, which belonged to the first class (damaged ball bearings) (Fig. 4 and Fig. 5).

Confusion matrix for the quadratic SVM algorithm.

ROC curve for the quadratic SVM algorithm.
Although the ML algorithms presented in Section 3 have shown satisfactory performance in solving the classification problem, possible improvements are considered here. ML algorithms have hyperparameters that can be adjusted to maximize the performance of the developed model. Hyperparameter optimization has become increasingly prevalent in both scientific and commercial approaches, with the same goal: to obtain a more accurate model. The adjustment of hyperparameters in ML algorithms belongs to the category of optimization problems. Research in hyperparameter optimization has a relatively long history, spanning more than 30 years [27],28,29,[30], and it has been shown that different configurations of hyperparameters give the best results for different datasets [28].
The first step in formulating the optimization problem is to define the participants in the optimization process. The ML algorithm with
Hyperparameter optimization of ML algorithms can be performed with any black-box optimization method. Global optimization algorithms are preferred due to the non-convex nature of the optimization process. Bayesian optimization is a global optimization method that is very popular in hyperparameter optimization of ML algorithms. Shahriari, Brochu et al. have presented the mechanisms of Bayesian optimization in [32], [33]. Bayesian optimization is a process that iterates with two key segments: a probabilistic surrogate model and an acquisition function. The probabilistic surrogate model represents the objective function, while the acquisition function (2) uses the predictive distribution of the probabilistic surrogate model to determine the utility of the different candidate points:
The hyperparameter optimization was performed for the algorithms used in Section 3, i.e. for the SVM and KNN algorithms. The optimization of both types of algorithms resulted in models with 100 % accuracy. It can be concluded that the optimized models all contained correct classifications. The optimization of the SVM algorithm was performed using the presented Bayesian hyperparameter optimization. Table 4 shows the ranges of the parameters being optimized.
SVM hyperparameter search range.
Hyperparameter | Range |
---|---|
Box constraint level | 0.001-1000 |
Kernel scale | 0.001-1000 |
Kernel function | Gaussian, Linear, Quadratic, Cubic |
Standardize data | true, false |
After 78 seconds and 30 iterations, the optimization process was stopped, as shown in Fig. 6. The optimization results as well as the characteristics of the optimized SVM model can be found in Table 5.

The optimization flow for the SVM algorithm.
The optimized hyperparameters of the SVM algorithm.
Hyperparameter | Value |
---|---|
Box constraint level | 977.88 |
Kernel scale | 1 |
Kernel function | Quadratic |
Standardize data | true |
Accuracy | 100 % |
The optimization of the KNN algorithm was also performed using the presented Bayesian hyperparameter optimization. Table 6 contains the ranges of the parameters being optimized.
KNN hyperparameter search range.
Hyperparameter | Range |
---|---|
Number of neighbors | 1–98 |
Distance metric | Euclidean, Cosine, Euclidean, Correlation, Chebyshev, Hamming, Minakowski, Spearman, Jaccard, City block, Mahalanobis |
Distance weight | Equal, Inverse, Squared, Inverse |
Standardize data | true, false |
After about 35 seconds and 30 iterations, the optimization process is stopped, as shown in Fig. 7. The optimization results and the characteristics of the optimized KNN model are listed in Table 7.

The optimization flow for the KNN algorithm.
The optimized hyperparameters of the KNN algorithm.
Hyperparameter | Range |
---|---|
Number of neighbors | 1 |
Distance metric | Correlation |
Distance weight | Inverse |
Standardize data | true |
Accuracy | 100 % |
The paper presents a novel method for fault detection in ball bearings using two types of ML algorithms: KNN and SVM. The datasets for the development of the algorithms were obtained through vibration measurements at the developed station for capturing, recording, and analyzing vibrations in ball bearings. The first part of the research on the application of said ML algorithms in ball bearing fault detection involved experimentally varying the algorithm parameters to obtain a more precise classifier model. This approach resulted in highly accurate algorithms: the KNN classifier achieved an accuracy of 98.5 % and the SVM classifier achieved an accuracy of 99.5 %. The second part of the research consisted of optimizing the hyperparameters of the two applied ML algorithms. Bayesian optimization was used as the optimization method. The result of the applied optimization was hyperparameters that led to both the KNN and SVM algorithms achieving a maximum success rate of 100 %.
The approach of applying hyperparameter optimization to ML algorithms has proven to be fully justified as the obtained classifier models achieved maximum accuracy. The methodology presented in this paper is applicable to a wide range of challenges in industrial diagnostics and predictive maintenance.