GGLCM: A Real-time Early Anomaly Detection Method for Mechanical Vibration Data with Missing Labels

In view of the increasing maintenance requirements for heavy rotating machinery, there is an increased need for improved reliability and stability in early anomaly detection [1].

Early anomaly detection can predict the abnormal mode in time according to the degradation rules. Unlike serious stage fault, early anomaly detection is difficult due to its great similarity with normal series. Therefore, the detection method is still in the initial stage [2]. Current early anomaly detection methods are divided into the following three categories: knowledge-driven method, model-driven method, and data-driven method [3].

The knowledge-driven method centers on establishing an expert prognosis platform that utilizes prior knowledge for fault detection through knowledge reasoning. Its merit lies in improving the scope and extension. In particular, Rojas et al. [4] proposed a knowledge-based fuzzy logic approach for predicting incipient faults in turbidity sensors at the water intake stage. Xu et al. [5] developed a belief rule-based expert system tailored for anomaly detection in marine diesel engines, providing accurate results even at low particle concentrations.

In contrast, the model-driven method aims to construct mathematical models to explain early anomaly mechanisms. By incorporating considerations of the equipment structural mechanisms, this method achieves higher detection accuracy compared to the prevailing approaches. Angelo et al. [6] applied a particle filtering algorithm for early-stage fault diagnosis, while Wang et al. [7] developed a distribution network fault prediction mechanism based on minimum hitting set characteristics.

Unfortunately, in many industrial systems, it is difficult to obtain expert prior knowledge and build fault-specific mechanism models. Consequently, the aforementioned knowledge-driven and model-driven methods have practical limitations. However, the data-driven method provides effective insights into early anomaly detection as it does not rely on prior knowledge or precise mechanism models. By leveraging historical data, this approach reveals the mapping relationship between the fault cause and the label, making it suitable for implicit modeling [8], especially in the context of early anomaly detection. Despite the increasing importance of data-driven methods for prediction, certain issues remain noteworthy. In addition to the frequently discussed challenges of early anomaly feature extraction and data imbalance caused by complicated working conditions, often underestimated factors include the sensitivity and real-time nature of the algorithms. The current state-of-the-art (SOTA) algorithms tend to rely heavily on the inherent properties of the data.

To solve the aforementioned problems, this paper proposes a mathematical method for early anomaly detection, namely, the Gramian gray level co-occurrence matrix (GGLCM) analysis method for mechanical vibration data with poor quantity, especially missing labels or non-labels. Inspired by the pixel level enhancement of image features, this method reconstructs real-time data from one dimension into two dimensions through the Gramian angular field (GAF), fixing the time series in the corresponding reconstructed images. Meanwhile, the gray level co-occurrence matrix (GLCM) helps to ensure the extraction and fusion of the real-time features. Then, the continuous anomaly alarm mechanism works for the real-time updated threshold calculated by the real-time features to locate the timing position of early anomalies. The main contributions are as follows:

A GAF-based signal processing method for mechanical vibration data was developed. It can effectively enhance the signal features from the perspective of dimensional reconstruction and is more efficient than other post-enhancement processing methods.

First application of GLCM for image-based analysis of mechanical equipment faults, which enables robust anomaly feature extraction.

The developed real-time continuous alarm mechanism guarantees both the sensitivity and reliability of early anomaly detection.

2.

Methodology

In this section, we will first present the problem description. Then a detailed introduction to the GGLCM will be given. In addition, the crucial application methods are comprehensively explained with the help of mathematical logic.

A.

Problem description

Assume that a certain subset of the original dataset contains C different fault-labeled sample sets ${(X^{E_{j}}, Y^{E_{j}})}_{j = 1}^{C}$ \left( {{X^{{E_j}}},{Y^{{E_j}}}} \right)_{j = 1}^C , where for each sample set $(X^{E_{j}}, Y^{E_{j}}), X^{E_{j}} = {\{x_{i}^{E_{j}}\}}_{i = 1}^{n_{E_{j}}} \in ℝ^{n_{E_{j}} \times M}$ \left( {{X^{{E_j}}},{Y^{{E_j}}}} \right),{X^{{E_j}}} = \left\{ {x_i^{{E_j}}} \right\}_{i = 1}^{{n_{{E_j}}}} \in {\mathbb{R}^{{n_{{E_j}}} \times M}} , and $Y^{E_{j}} = {\{y_{i}^{E_{j}}\}}_{i = 1}^{n_{E_{j}}} \in ℝ^{n_{E_{j}} \times M}$ {Y^{{E_j}}} = \left\{ {y_i^{{E_j}}} \right\}_{i = 1}^{{n_{{E_j}}}} \in {\mathbb{R}^{{n_{{E_j}}} \times M}} , and M represents feature dimensions, $x_{i}^{E_{j}}$ x_i^{{E_j}} denotes vibration signals, and $y_{i}^{E_{j}}$ y_i^{{E_j}} represents fault labels, i.e. 0 or 1. Furthermore, assume that the other certain subset of continuous data input still contains corresponding C different time-labeled sample sets ${(X^{F_{j}}, Y^{F_{j}})}_{j = 1}^{C}$ \left( {{X^{{F_j}}},{Y^{{F_j}}}} \right)_{j = 1}^C , where for each sample set $(X^{F_{j}}, Y^{F_{j}}), X^{F_{j}} = {\{x_{i}^{F_{j}}\}}_{i = 1}^{n_{F_{j}}} \in ℝ^{n_{F_{j}} \times M}$ \left( {{X^{{F_j}}},{Y^{{F_j}}}} \right),{X^{{F_j}}} = \left\{ {x_i^{{F_j}}} \right\}_{i = 1}^{{n_{{F_j}}}} \in {\mathbb{R}^{{n_{{F_j}}} \times M}} , and $Y^{F_{j}} = {\{y_{i}^{F_{j}}\}}_{i = 1}^{n_{F_{j}}} \in ℝ^{n_{F_{j}} \times M}$ {Y^{{F_j}}} = \left\{ {y_i^{{F_j}}} \right\}_{i = 1}^{{n_{{F_j}}}} \in {\mathbb{R}^{{n_{{F_j}}} \times M}} . Therefore, the complete representation of two consecutive labeled vibration signals can be expressed as ${(X^{(E_{j} + F_{j})}, Y^{(E_{j} + F_{j})})}_{j = 1}^{2 C}$ \left( {{X^{({E_j} + {F_j})}},{Y^{\left( {{E_j} + {F_j}} \right)}}} \right)_{j = 1}^{2C} . In addition, the unlabeled data, i.e. ${(X^{F_{j}}, None)}_{j = 1}^{C}$ \left( {{X^{{F_j}}},None} \right)_{j = 1}^C is substituted for the original ${(X^{F_{j}}, Y^{F_{j}})}_{j = 1}^{C}$ \left( {{X^{{F_j}}},{Y^{{F_j}}}} \right)_{j = 1}^C , resulting in a complete set of historical fault signals denoted as ${(X^{E_{j} + F_{j}}, Y^{E_{j} + None})}_{j = 1}^{2 C}$ \left( {{X^{{E_j} + {F_j}}},{Y^{{E_j} + None}}} \right)_{j = 1}^{2C} . Our task is to calculate an effective threshold Ψ as a conditional parameter in order to obtain a regression label: $Ψ 〈{(X^{E_{j} + F_{j}}, Y^{E_{j} + None})}_{j = 1}^{2 C}〉$ \Psi \left\langle {\left( {{X^{{E_j} + {F_j}}},{Y^{{E_j} + None}}} \right)_{j = 1}^{2C}} \right\rangle , used to discriminate between the mechanical fault and non-fault stages. When the other certain subset is used for Ψ, there is a new threshold Ψ* based on Ψ. The mapping between the predicted label values and the continuous new data is then as follows: $Y^{F_{est}^{*}} = Ψ^{*} 〈X^{test}, {(X^{E_{j}}, Y^{E_{j}})}_{j = 1}^{C}, {(X^{F_{j}}, Y^{F_{j}})}_{j = 1}^{C}〉$ {Y^{F_{est}^*}} = {\Psi ^*}\left\langle {{X^{test}},\left( {{X^{{E_j}}},{Y^{{E_j}}}} \right)_{j = 1}^C,\left( {{X^{{F_j}}},{Y^{{F_j}}}} \right)_{j = 1}^C} \right\rangle . Our idea is that the GGLCM maintains a high representation capacity even when the parameters are updated with new data, so as to minimize the discrepancy between the predicted temporal label and the actual fault time. In other words: $Y^{F_{est}^{*}} ≅ Y^{F_{est}} ≅ Y^{F_{actual}}$ {Y^{F_{est}^*}} \cong {Y^{{F_{est}}}} \cong {Y^{{F_{actual}}}} .

B.

Structure of GGLCM

GGLCM is a high-quality signal processing method based on fault signal dimension reconstruction and texture analysis of fault images. Its main components are a GAF fault dimension reconstruction algorithm based on segmented fault cycles and a GLCM algorithm for capturing and analyzing the texture features of fault images. The detailed algorithm is shown in Fig. 1.

Given training data {X^E, Y^E} ∈ ℝ^N×(M+1) where N is the number of samples, M is the feature dimensions, and “1” means that the predicted label is one-dimensional. To compress the amount of data, we calculate the length based on the rotation speed and sampling frequency to ensure that each coding length contains enough information (1) $X_{m}^{E} = \frac{X^{E}}{m}, m = 1, 2, \dots, \frac{N \times rpm}{60 \times f_{s}}$ X_m^E = \frac{{{X^E}}}{m},\;\;\;\;\;\;m = 1,2, \ldots ,\;\frac{{N \times rpm}}{{60 \times {f_s}}} where rpm (r/min) is the rotation speed and f_s (Hz) is the sampling frequency. By piecewise aggregation approximation (PAA) a new sequence is encoded by (2) $I_{k} = [\frac{\sum_{1}^{k} X_{m_{i}}^{E}}{k}, \frac{\sum_{k + 1}^{2 k} X_{m_{o}}^{E}}{2 k}, \dots, \frac{\sum_{(n - 1) k + 1}^{nk} X_{m_{p}}^{E}}{nk}]$ {I_k} = \left[ {\frac{{\mathop \sum \nolimits_1^k X_{{m_i}}^E}}{k},\frac{{\mathop \sum \nolimits_{k + 1}^{2k} X_{{m_o}}^E}}{{2k}}, \ldots ,\frac{{\mathop \sum \nolimits_{\left( {n - 1} \right)k + 1}^{nk} X_{{m_p}}^E}}{{nk}}} \right] where $X_{m_{i}}^{E}$ X_{{m_i}}^E , $X_{m_{o}}^{E}$ X_{{m_o}}^E , and $X_{m_{p}}^{E}$ X_{{m_p}}^E denote the corresponding amplitudes in the $X_{m}^{E}$ X_m^E , $I_{k} \in X_{m}^{E}$ {I_k} \in X_m^E . The representation of I_k in the polar coordinate can be calculated by (3) $\emptyset_{k}^{n} = arccos ({\bar{I}}_{k}), {\bar{I}}_{k} = exp (- |I_{k}|) \notin (0, 1)$ \emptyset _k^n = {\rm{\;arccos\;}}({\bar I_k}),\;\;{\bar I_k} = {\rm{exp\;}}\left( { - \left| {{I_k}} \right|} \right) \notin \left( {0,1} \right) where $\emptyset_{k}^{n}$ \emptyset _k^n denotes the polar coordinate representation of the sample point in I_k. Each element in the Gramian matrix is decoded by (4) $G_{m}^{E} = [\begin{matrix} cos (\emptyset_{k}^{1} \pm \emptyset_{k}^{1}) & \dots & cos (\emptyset_{k}^{1} \pm \emptyset_{k}^{n}) \\ ⋮ & ⋱ & ⋮ \\ cos (\emptyset_{k}^{n} \pm \emptyset_{k}^{1}) & \dots & cos (\emptyset_{k}^{n} \pm \emptyset_{k}^{n}) \end{matrix}]$ G_m^E = \left[ {\begin{array}{*{20}{c}}{\cos \left( {\emptyset _k^1 \pm \emptyset _k^1} \right)}& \cdots &{\cos \left( {\emptyset _k^1 \pm \emptyset _k^n} \right)}\\ \vdots & \ddots & \vdots \\{\cos \left( {\emptyset _k^n \pm \emptyset _k^1} \right)}& \cdots &{\cos \left( {\emptyset _k^n \pm \emptyset _k^n} \right)}\end{array}} \right]

Now we can represent the original input signal as $(G_{m}^{E}, Y^{E}) \in ℝ^{N \times (M + 1)}$ \left( {G_m^E,{Y^E}} \right) \in {\mathbb{R}^{N \times \left( {M + 1} \right)}} . By analogy, the newly fault-labeled and non-fault-labeled input signal can be denoted as $(G_{m}^{E}, Y^{E})$ \left( {G_m^E,{Y^E}} \right) and $(G_{m}^{E}, None)$ \left( {G_m^E,None} \right) , respectively. Since GAF is formed by the primary information on the main diagonal and the secondary information in other areas, we have defined θ₁ = 135°, θ₂ = 0° or 90° as the two different directions of the gray levels. The GLCM of the primary information in $G_{m}^{E}$ G_m^E is given by (5) ${GLCM}_{Δ 1} (G_{m}^{E}) = \sum_{\begin{matrix} 0 \leq i \leq 256 \\ 0 \leq j \leq 256 \end{matrix}} \sum_{k = 1}^{N} \frac{G_{mk}^{E} (i, j | θ_{1}, d)}{G_{mk}^{E} (any | θ_{1}, d)}$ GLC{M_{\Delta 1}}\left( {G_m^E} \right) = \sum\limits_{\begin{array}{*{20}{c}}{0 \le i \le 256}\\{0 \le j \le 256}\end{array}} {\sum\nolimits_{k = 1}^N {\frac{{G_{mk}^E(i,j|{\theta _1},d)}}{{G_{mk}^E(any|{\theta _1},d)}}} } where d is the defined distance, i and j denote the pixel pairs. By analogy, the secondary information in $G_{m}^{E}$ G_m^E is given as ${GLCM}_{Δ 2} (G_{m}^{E})$ GLC{M_{\Delta 2}}\left( {G_m^E} \right) , and the entire GGLCM of $G_{m}^{E}$ G_m^E is given by (6) $GLCM (G_{m}^{E}) = {GLCM}_{Δ 1} (G_{m}^{E}) + {GLCM}_{Δ 2} (G_{m}^{E})$ GLCM\left( {G_m^E} \right) = GLC{M_{\Delta 1}}\left( {G_m^E} \right) + GLC{M_{\Delta 2}}\left( {G_m^E} \right)

In each GGLCM, we have divided the potential feature into five parts: contrast (CON), angular second moment (ASM), entropy (ENT), inverse differential moment (IDM), and correlation (CORR). These evaluation metrics are as follows: (7) $CON = \sum_{i} \sum_{j} {(i - j)}^{2} GLCM (G_{m}^{E})$ CON = \sum\limits_i {\sum\nolimits_j {{{\left( {i - j} \right)}^2}GLCM\left( {G_m^E} \right)} } (8) $ASM = \sum_{i} \sum_{j} GLCM {(G_{m}^{E})}^{2}$ ASM = \sum\limits_i {\sum\nolimits_j {GLCM{{\left( {G_m^E} \right)}^2}} } (9) $ENT = \sum_{i} \sum_{j} GLCM (G_{m}^{E}) log (GLCM (G_{m}^{E}))$ ENT = \sum\limits_i {\sum\nolimits_j {GLCM\left( {G_m^E} \right)\log \left( {GLCM\left( {G_m^E} \right)} \right)} } (10) $IDM = \sum_{i} \sum_{j} \frac{GLCM (G_{m}^{E})}{1 + {(1 - j)}^{2}}$ IDM = \sum\limits_i {\sum\nolimits_j {\frac{{GLCM\left( {G_m^E} \right)}}{{1 + {{\left( {1 - j} \right)}^2}}}} } (11) $CORR = \sum_{i = 0}^{{quant}_{k}} \sum_{j = 0}^{{quant}_{k}} \frac{(i - \bar{ι}) * (j - \bar{j}) * GLCM {(G_{m}^{E})}^{2}}{Varience}$ CORR = \mathop \sum \nolimits_{i = 0}^{quan{t_k}} \sum\nolimits_{j = 0}^{quan{t_k}} {\frac{{\left( {i - \bar \iota } \right)*\left( {i - \bar j} \right)*GLCM{{\left( {G_m^E} \right)}^2}}}{{Varience}}}

Let the calculated feature quantities be represented as feature set f(GLCM, V_f). Perform a pairwise monotonicity ranking of the features in the set using the Spearman correlation coefficient until the five feature quantities are monotonically sorted. The monotonicity score is given by (12) $ρ = \frac{6 \sum_{i = 0}^{n} d_{i}^{2}}{n (n^{2} - 1)}$ \rho = \frac{{6\sum _{i = 0}^nd_i^2}}{{n\left( {{n^2} - 1} \right)}} where d_i is the difference between the rankings of two variables and n is the number of observations. Thus, we obtain a ranked feature set f (GLCM, (V_f |ρ)).

Next, we perform a PCA analysis on f (GLCM, (V_f |ρ)), the first two principal components, PCA1 and PCA2, are used to fuse the features, which can be given by (13) $Y_{sel}^{m \times 1} f (GLCM, (V_{f} | ρ)) S \frac{1}{m - 1} \{\begin{array}{l} {(\frac{f (GLCM, (V_{f} | ρ)) - μ (f)}{σ (f)})}^{T} \\ (\frac{f (GLCM, (V_{f} | ρ)) - μ (f)}{σ (f)}) V \end{array}\} = V Λ$ Y_{sel}^{m \times 1}f\left( {GLCM,\;\left( {{V_f}|\rho } \right)} \right)S\frac{1}{{m - 1}}\left\{ {\begin{array}{*{20}{l}}{{{\left( {\frac{{f\left( {GLCM,\left( {{V_f}|\rho } \right)} \right) - \mu \left( f \right)}}{{\sigma \left( f \right)}}} \right)}^T}}\\{\left( {\frac{{f\left( {GLCM,\left( {{V_f}|\rho } \right)} \right) - \mu \left( f \right)}}{{\sigma \left( f \right)}}} \right)V}\end{array}} \right\} = V\Lambda where V denotes the eigenvectors corresponding to the covariance matrix and Λ denotes the diagonal matrix. Thus, GAF_GLCM is reduced from 5 dimensions to 1 dimension. We perform three rounds of first-order exponential smoothing on $Y_{sel}^{m \times 1}$ Y_{sel}^{m \times 1} . $y_{t}^{(1)}$ y_t^{\left( 1 \right)} , $y_{t}^{(2)}$ y_t^{\left( 2 \right)} and $y_{t}^{(3)}$ y_t^{\left( 3 \right)} are the smoothing values for the first, second, and third rounds, respectively. (14) $\{\begin{array}{l} y_{t}^{(1)} = α Y_{sel}^{m \times 1} (t) + (1 - α) y_{t - 1}^{(1)}, y_{0}^{(1)} = Y_{sel}^{m \times 1} (0) \\ y_{t}^{(2)} = α y_{t}^{(1)} + (1 - α) y_{t - 1}^{(2)}, y_{0}^{(2)} = y_{0}^{(1)} \\ y_{t}^{(3)} = α y_{t}^{(2)} + (1 - α) y_{t - 1}^{(3)}, y_{0}^{(3)} = y_{0}^{(2)} \end{array}$ \left\{ {\begin{array}{*{20}{l}}{y_t^{\left( 1 \right)} = \alpha Y_{sel}^{m \times 1}\left( t \right) + \left( {1 - \alpha } \right)y_{t - 1}^{\left( 1 \right)},y_0^{\left( 1 \right)} = Y_{sel}^{m \times 1}\left( 0 \right)}\\{y_t^{\left( 2 \right)} = \alpha y_t^{\left( 1 \right)} + \left( {1 - \alpha } \right)y_{t - 1}^{\left( 2 \right)},y_0^{\left( 2 \right)} = y_0^{\left( 1 \right)}}\\{y_t^{\left( 3 \right)} = \alpha y_t^{\left( 2 \right)} + \left( {1 - \alpha } \right)y_{t - 1}^{\left( 3 \right)},y_0^{\left( 3 \right)} = y_0^{\left( 2 \right)}}\end{array}} \right. where α is the smoothing coefficient. We perform the second Z-score normalization on $y_{t}^{(3)}$ y_t^{\left( 3 \right)} , obtaining a new normalized sequence, i.e., Y_{1:T} = [y₁, … , y_t, … , y_T], which is considered as the final oscillation sequence.

We have designed a robust continuous fault alarm mechanism. Given as a calculated current fusion parameter denoted as y_T. Then the threshold for the i^th moment can be expressed as $y_{T}^{i}$ y_T^i and the anomaly label at time i is L_i. The alarm mechanism is shown in Algorithm 1.

Algorithm 1.

Continuous fault alarm mechanism.

Data: Fusion parameters y_T, Threshold $y_{T}^{i}$ y_T^i , New data $y_{T}^{i + j}$ y_T^{i+j}

Result: Anomaly labels L_i

1 : for i ← 1 to n do

2 : $y_{T}^{i} \leftarrow f (y_{T}^{i - 1}, y_{T}^{i + j})$ y_T^i \leftarrow f\left( {y_T^{i - 1},\;y_T^{i + j}} \right) ;

3 : $S_{i} \leftarrow \sum_{j = i - 9}^{i} (y_{T}^{j} < y_{T}^{j - 1})$ {S_i} \leftarrow \mathop \sum \nolimits_{j = i - 9}^i (y_T^j < y_T^{j - 1})

4 : If S_i ≥ 10 then

5 : L_i ← 1;

6 : else

7 : L_i ← 0;

3.

Experiments

A.

Experimental setup

In this paper, the proposed method was validated using two datasets:

The public XJTU-SY bearing dataset [9].

The private dataset with overloaded gearboxes from Nanjing High Speed Gears (NGC) Co., Ltd.

NGC experimental platform is shown in Fig. 2.

As shown in Fig. 3, cracks (3 mm length, 0.2 mm width, 0.2 mm depth) can be seen in the output stage on the inner and outer races of the third sun gear near the output-side bearing. Three holes (0.15 mm diameter, 1.5 mm depth) are drilled in the circumferential direction and target selected rolling elements. The parameters of the test gearbox are also listed in Table 1. The fatigue life test procedure of the steel press gearbox is shown in Fig. 4.

Table 1.

The parameters of the test gearbox.

Parameters	Parameter values
Type	MPGH2H900K63H
Rotation speed ratio	62.402
Rated power [kW]	710
Input rotation speed [rpm]	1 490
Maximum input torque [N·m]	10 032
Maximum output torque [kN·m]	626
Lubricating oil type	ISO VG 320
Oil flow rate of lubrication station [l/min]	125
Oil mass [l]	230

B.

Validation results

The proposed method is implemented without using any machine learning framework and validated on an Nvidia 3090 GPU and an Intel ® Xeon ® Gold 6248R CPU @ 3.00 GHz.

This research is solely concerned with validating the proprietary approach in terms of accuracy and generalization under different operating conditions. Two datasets were selected from each operating condition of the XJTU-SY dataset, specifically Bearing1_1, Bearing1_3, Bearing2_2, Bearing2_4, Bearing3_1, and Bearing3_4. Table 2 shows the detailed results for XJTU-SY datasets with missing labels as new data was gradually entered, the predicted labels remained unchanged, theoretically illustrating the low false alarm rate and high reliability of this method. Fig. 5 illustrates clear deterioration trends that are very similar to the actual state of the respective bearings. This observation shows that GGLCM is capable of detecting early, subtle fault shocks.

Table 2.

The detailed results for XJTU-SY datasets.

Subject	Input ratio [%]	Y^F_actual [s]	Y^F_est [s]	$y^{F_{est}^{}} [s]$ {y^{F_{est}^}}\left[ {\rm{s}} \right]			Acc* [%]

				+10 %	+20 %	+30 %	+0 %	+10 %	+20 %	+30 %
Bearing1_1	70	4 393.95	4 712.06	4 712.06	4 712.06	4 712.06	95.69	95.69	95.69	95.69
Bearing1_3	70	5 578.64	5 568.23	5 568.23	5 568.23	5 568.23	99.81	99.81	99.81	99.81
Bearing2_2	70	2 848.64	2 949.91	2 949.91	2 949.91	2 949.91	96.44	96.44	96.44	96.44
Bearing2_4	70	1 798.43	1 830.34	1 830.34	1 830.34	1 830.34	98.22	98.22	98.22	98.22
Bearing3_4	70	55 195.86	56 648.44	56 648.44	56 648.44	56 648.44	97.37	97.37	97.37	97.37
Bearing3_5	70	1 346.31	1 435.74	1 435.74	1 435.74	1 435.74	93.36	93.36	93.36	93.36

C.

Evolutions of the GGLCM anomaly detection ability

In addition, the NGC dataset with missing labels is used to validate the SOTA of this methodology. Fig. 6 shows that the current method maintains robust detection accuracy even under the challenging working conditions of heavy load and high speed, thus providing effective early warning.

Three groups of SOTA methods are performed on the NGC dataset, including self-adaptive deep feature matching (SDFM) [10], time-series transfer learning (TSTL) [11], and GGLCM.

Fig. 7 shows that the average detection accuracy of GGLCM is higher than that of SDFM and TSTL by 9.9 % and 4.9 %, respectively. In particular, GGLCM has an overwhelming advantage in the procastination ratio. The primary factor is that GGLCM minimizes the dependence on computer hardware resources and timing labels.

In summary, the GGLCM method effectively detects early anomalies under label scarcity: by using GAF to transform 1D vibration signals into 2D images, combined with GLCM for texture extraction and a continuous alarm mechanism, it achieves >93 % accuracy on the XJTU-SY and NGC datasets. It outperforms SOTA methods such as SDFM and TSTL and adapts to industrial environments with scarce labels. Future work could explore deep learning integration, multi-sensor fusion, or edge computing optimization for real-time use.

4.

Conclusion

This paper presents the GGLCM method, which utilizes GAF to convert 1D mechanical vibration signals into 2D feature graphs and applies GLCM analysis to extract texture features, complemented by a continuous alarm mechanism for early anomaly detection under label scarcity. Experimental results with different datasets demonstrate the superior performance of GGLCM: in the XJTU-SY bearing dataset, GGLCM achieves a detection accuracy of 95.69–99.81 % for early faults, with an average of 97.3 %, while the industrial NGC heavy-load gearbox dataset has an accuracy of 98.7 % in label-scarce scenarios.

Compared to SOTA methods, GGLCM outperforms SDFM by 9.9 % and TSTL by 4.9 % in overall accuracy, with a procastination ratio reduced by 30 % compared to conventional approaches.

Future research could focus on integrating GGLCM with lightweight neural networks to automate feature optimization (targeting a 15 % reduction in computational latency), extending it to multi-sensor fusion for complex machinery systems, or optimizing it for edge devices to enable low-power, real-time deployment in resource-constrained industrial environments. These advancements would further cement GGLCM as a robust, data-efficient solution for early anomaly detection in rotating machinery.

Langue:: Anglais

Périodicité:: 6 fois par an
Sujets de la revue:: Ingénierie, Ingénierie électrique, Ingénierie de contrôle, métrologie et essais

RSS Feed de la revue

GGLCM: A Real-time Early Anomaly Detection Method for Mechanical Vibration Data with Missing Labels

Hu Yu

Xiaoyu Che

Chao Zhang

Rupeng Zhu

Weifang Chen

Publié en ligne: 22 juil. 2025

Pages: 157 - 163

Reçu: 28 mai 2024

Accepté: 16 mai 2025

DOI: https://doi.org/10.2478/msr-2025-0019

Mots clésearly anomaly detection, feature reconstruction, missing labels, reliable detection, real-time detection

© 2025 Hu Yu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
early anomaly detection, feature reconstruction, missing labels, reliable detection, real-time detection