GGLCM: A Real-time Early Anomaly Detection Method for Mechanical Vibration Data with Missing Labels
Pubblicato online: 22 lug 2025
Pagine: 157 - 163
Ricevuto: 28 mag 2024
Accettato: 16 mag 2025
DOI: https://doi.org/10.2478/msr-2025-0019
Parole chiave
© 2025 Hu Yu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In view of the increasing maintenance requirements for heavy rotating machinery, there is an increased need for improved reliability and stability in early anomaly detection [1].
Early anomaly detection can predict the abnormal mode in time according to the degradation rules. Unlike serious stage fault, early anomaly detection is difficult due to its great similarity with normal series. Therefore, the detection method is still in the initial stage [2]. Current early anomaly detection methods are divided into the following three categories: knowledge-driven method, model-driven method, and data-driven method [3].
The knowledge-driven method centers on establishing an expert prognosis platform that utilizes prior knowledge for fault detection through knowledge reasoning. Its merit lies in improving the scope and extension. In particular, Rojas et al. [4] proposed a knowledge-based fuzzy logic approach for predicting incipient faults in turbidity sensors at the water intake stage. Xu et al. [5] developed a belief rule-based expert system tailored for anomaly detection in marine diesel engines, providing accurate results even at low particle concentrations.
In contrast, the model-driven method aims to construct mathematical models to explain early anomaly mechanisms. By incorporating considerations of the equipment structural mechanisms, this method achieves higher detection accuracy compared to the prevailing approaches. Angelo et al. [6] applied a particle filtering algorithm for early-stage fault diagnosis, while Wang et al. [7] developed a distribution network fault prediction mechanism based on minimum hitting set characteristics.
Unfortunately, in many industrial systems, it is difficult to obtain expert prior knowledge and build fault-specific mechanism models. Consequently, the aforementioned knowledge-driven and model-driven methods have practical limitations. However, the data-driven method provides effective insights into early anomaly detection as it does not rely on prior knowledge or precise mechanism models. By leveraging historical data, this approach reveals the mapping relationship between the fault cause and the label, making it suitable for implicit modeling [8], especially in the context of early anomaly detection. Despite the increasing importance of data-driven methods for prediction, certain issues remain noteworthy. In addition to the frequently discussed challenges of early anomaly feature extraction and data imbalance caused by complicated working conditions, often underestimated factors include the sensitivity and real-time nature of the algorithms. The current state-of-the-art (SOTA) algorithms tend to rely heavily on the inherent properties of the data.
To solve the aforementioned problems, this paper proposes a mathematical method for early anomaly detection, namely, the Gramian gray level co-occurrence matrix (GGLCM) analysis method for mechanical vibration data with poor quantity, especially missing labels or non-labels. Inspired by the pixel level enhancement of image features, this method reconstructs real-time data from one dimension into two dimensions through the Gramian angular field (GAF), fixing the time series in the corresponding reconstructed images. Meanwhile, the gray level co-occurrence matrix (GLCM) helps to ensure the extraction and fusion of the real-time features. Then, the continuous anomaly alarm mechanism works for the real-time updated threshold calculated by the real-time features to locate the timing position of early anomalies. The main contributions are as follows:
A GAF-based signal processing method for mechanical vibration data was developed. It can effectively enhance the signal features from the perspective of dimensional reconstruction and is more efficient than other post-enhancement processing methods. First application of GLCM for image-based analysis of mechanical equipment faults, which enables robust anomaly feature extraction. The developed real-time continuous alarm mechanism guarantees both the sensitivity and reliability of early anomaly detection.
In this section, we will first present the problem description. Then a detailed introduction to the GGLCM will be given. In addition, the crucial application methods are comprehensively explained with the help of mathematical logic.
Assume that a certain subset of the original dataset contains
GGLCM is a high-quality signal processing method based on fault signal dimension reconstruction and texture analysis of fault images. Its main components are a GAF fault dimension reconstruction algorithm based on segmented fault cycles and a GLCM algorithm for capturing and analyzing the texture features of fault images. The detailed algorithm is shown in Fig. 1.

The diagram of the proposed GGLCM method.
Given training data {
Now we can represent the original input signal as
In each GGLCM, we have divided the potential feature into five parts: contrast (
Let the calculated feature quantities be represented as feature set
Next, we perform a PCA analysis on
We have designed a robust continuous fault alarm mechanism. Given as a calculated current fusion parameter denoted as
1 : 2 :
3 :
4 : 5 : 6 : 7 :
In this paper, the proposed method was validated using two datasets:
The public XJTU-SY bearing dataset [9]. The private dataset with overloaded gearboxes from Nanjing High Speed Gears (NGC) Co., Ltd.
NGC experimental platform is shown in Fig. 2.

NGC overloaded gearbox fatigue life test bench.
As shown in Fig. 3, cracks (3 mm length, 0.2 mm width, 0.2 mm depth) can be seen in the output stage on the inner and outer races of the third sun gear near the output-side bearing. Three holes (0.15 mm diameter, 1.5 mm depth) are drilled in the circumferential direction and target selected rolling elements. The parameters of the test gearbox are also listed in Table 1. The fatigue life test procedure of the steel press gearbox is shown in Fig. 4.

Schematic diagram of fault implantation.

Fatigue life test procedure of a steel press gearbox.
The parameters of the test gearbox.
Parameters | Parameter values |
---|---|
Type | MPGH2H900K63H |
Rotation speed ratio | 62.402 |
Rated power [kW] | 710 |
Input rotation speed [rpm] | 1 490 |
Maximum input torque [N·m] | 10 032 |
Maximum output torque [kN·m] | 626 |
Lubricating oil type | ISO VG 320 |
Oil flow rate of lubrication station [l/min] | 125 |
Oil mass [l] | 230 |
The proposed method is implemented without using any machine learning framework and validated on an Nvidia 3090 GPU and an Intel ® Xeon ® Gold 6248R CPU @ 3.00 GHz.
This research is solely concerned with validating the proprietary approach in terms of accuracy and generalization under different operating conditions. Two datasets were selected from each operating condition of the XJTU-SY dataset, specifically Bearing1_1, Bearing1_3, Bearing2_2, Bearing2_4, Bearing3_1, and Bearing3_4. Table 2 shows the detailed results for XJTU-SY datasets with missing labels as new data was gradually entered, the predicted labels remained unchanged, theoretically illustrating the low false alarm rate and high reliability of this method. Fig. 5 illustrates clear deterioration trends that are very similar to the actual state of the respective bearings. This observation shows that GGLCM is capable of detecting early, subtle fault shocks.

The temporal evolution of variable
The detailed results for XJTU-SY datasets.
Subject | Input ratio [%] |
|
Acc* [%] | |||||||
---|---|---|---|---|---|---|---|---|---|---|
+10 % | +20 % | +30 % | +0 % | +10 % | +20 % | +30 % | ||||
Bearing1_1 | 70 | 4 393.95 | 4 712.06 | 4 712.06 | 4 712.06 | 4 712.06 | 95.69 | 95.69 | 95.69 | 95.69 |
Bearing1_3 | 70 | 5 578.64 | 5 568.23 | 5 568.23 | 5 568.23 | 5 568.23 | 99.81 | 99.81 | 99.81 | 99.81 |
Bearing2_2 | 70 | 2 848.64 | 2 949.91 | 2 949.91 | 2 949.91 | 2 949.91 | 96.44 | 96.44 | 96.44 | 96.44 |
Bearing2_4 | 70 | 1 798.43 | 1 830.34 | 1 830.34 | 1 830.34 | 1 830.34 | 98.22 | 98.22 | 98.22 | 98.22 |
Bearing3_4 | 70 | 55 195.86 | 56 648.44 | 56 648.44 | 56 648.44 | 56 648.44 | 97.37 | 97.37 | 97.37 | 97.37 |
Bearing3_5 | 70 | 1 346.31 | 1 435.74 | 1 435.74 | 1 435.74 | 1 435.74 | 93.36 | 93.36 | 93.36 | 93.36 |
In addition, the NGC dataset with missing labels is used to validate the SOTA of this methodology. Fig. 6 shows that the current method maintains robust detection accuracy even under the challenging working conditions of heavy load and high speed, thus providing effective early warning.

The detection result on the NGC dataset.
Three groups of SOTA methods are performed on the NGC dataset, including self-adaptive deep feature matching (SDFM) [10], time-series transfer learning (TSTL) [11], and GGLCM.
Fig. 7 shows that the average detection accuracy of GGLCM is higher than that of SDFM and TSTL by 9.9 % and 4.9 %, respectively. In particular, GGLCM has an overwhelming advantage in the procastination ratio. The primary factor is that GGLCM minimizes the dependence on computer hardware resources and timing labels.

Performance comparison of three SOTA methods.
In summary, the GGLCM method effectively detects early anomalies under label scarcity: by using GAF to transform 1D vibration signals into 2D images, combined with GLCM for texture extraction and a continuous alarm mechanism, it achieves >93 % accuracy on the XJTU-SY and NGC datasets. It outperforms SOTA methods such as SDFM and TSTL and adapts to industrial environments with scarce labels. Future work could explore deep learning integration, multi-sensor fusion, or edge computing optimization for real-time use.
This paper presents the GGLCM method, which utilizes GAF to convert 1D mechanical vibration signals into 2D feature graphs and applies GLCM analysis to extract texture features, complemented by a continuous alarm mechanism for early anomaly detection under label scarcity. Experimental results with different datasets demonstrate the superior performance of GGLCM: in the XJTU-SY bearing dataset, GGLCM achieves a detection accuracy of 95.69–99.81 % for early faults, with an average of 97.3 %, while the industrial NGC heavy-load gearbox dataset has an accuracy of 98.7 % in label-scarce scenarios.
Compared to SOTA methods, GGLCM outperforms SDFM by 9.9 % and TSTL by 4.9 % in overall accuracy, with a procastination ratio reduced by 30 % compared to conventional approaches.
Future research could focus on integrating GGLCM with lightweight neural networks to automate feature optimization (targeting a 15 % reduction in computational latency), extending it to multi-sensor fusion for complex machinery systems, or optimizing it for edge devices to enable low-power, real-time deployment in resource-constrained industrial environments. These advancements would further cement GGLCM as a robust, data-efficient solution for early anomaly detection in rotating machinery.