High-Dimensional Feature Optimization and Real-Time Prediction Model with Support Vector Machines for Fault Diagnosis of Electrical Equipment

In today’s continuous development of electrification in China, whether it is important equipment such as generators, transformers, circuit breakers, or auxiliary equipment such as capacitors, insulators, transformers, etc., once a fault failure occurs, it will surely cause a partial or even total blackout in the area, which will jeopardize the daily life of the nationals and the production of the enterprises, and even bring danger to the people’s lives and make the country’s economy to cause a huge loss [1–3].

Social demand for power equipment is increasing, equipment maintenance costs are also increasing, according to statistics, equipment maintenance management costs account for 65% to 85% of the total cost of construction equipment management, once the failure may result in huge cost expenditures and other significant adverse effects, so the maintenance and management of power equipment is exceptionally critical [4–6]. However, the traditional management mode is relatively single, generally taken after the failure to deal with the way, often can not monitor the equipment operating status in a timely manner, often in the event of equipment failure can not run before the start of maintenance work, so the key issue of guaranteeing the normal operation of the equipment is to grasp the operating status of the electric power equipment, designed to detect in advance and deal with the problem of equipment failures in a timely manner, and the state of maintenance is based on the equipment real-time monitoring data to determine its Conditional maintenance is based on real-time monitoring data to determine the health status of the equipment to determine whether the equipment needs maintenance [7–10]. Early detection of power equipment faults and preventing them from occurring can ensure the normal operation of power equipment and improve the reliability of grid power supply, which is of great significance [11–12].

With the increase in the frequency of use of electrical equipment, people have higher and higher requirements for the reliability of electrical equipment, and the fault diagnosis of electrical equipment has become a major topic in the research of electrical field [13–14]. Electrical equipment faults have randomness, ambiguity and uncertainty. When a fault occurs, there is usually a phenomenon that a certain fault state is related by a variety of information data, or a kind of information data reflects the occurrence of multiple faults, and different fault states may coexist [15–17]. For these characteristics, a suitable fault diagnosis model based on multi-source information is needed for equipment state identification [18]. Therefore, the information fusion diagnosis and condition assessment of power equipment faults is not only academically important for scientific research, but also has good engineering application value [19–21]. First, through a variety of sensors from the accompanying faults of multi-phenomenon observation, analysis, exploration of insulation fault occurrence, development mechanism and potential laws, in order to facilitate a more comprehensive and in-depth explanation of the fault evolution process, the development of more accurate fault state identification and early warning methods, and second, the use of multi-source information differences and complementarities to make up for the defects of the detection based on a single piece of information, make full use of various types of information, relevant expert experience, integration of the The key factors of power equipment fault causation and development are comprehensively analyzed to obtain stable and reliable information characterizing the fault state, which facilitates the development of a more fault-tolerant and reliable fault diagnosis and evaluation system, improves the speed of system information processing and the correctness of decision-making, and also changes the traditional research ideas and direction of independent diagnosis of each sensor [22–25].

The article combines principal component analysis to study the fault diagnosis problem of electrical equipment, and proposes a fault diagnosis model based on optimized parameter SVM, which extracts the key fault features of electrical equipment through PCA, and realizes the preprocessing of equipment fault information and data dimensionality reduction. Subsequently, a fault classification method based on hierarchical binary trees is given, and parameter optimization is carried out using the empire competition algorithm. The optimized parameters are used to construct the MC-HBT classifier. The constructed ISFD-POPS model is tested experimentally for diagnosis of five types of electrical equipment faults, and then the electrical equipment fault prediction model is established on this basis, i.e., the electrical equipment fault prediction model of the limit learning machine is improved by the improved sparrow algorithm, and based on the experimental analysis of the actual data of 22 sets of known faults, the good prediction performance of the model is proved.

2

Electrical equipment fault diagnosis model

2.1

Principal component analysis

2.1.1

Principal Element Analysis

Principal Component Analysis (PCA), also known as Principal Component Analysis, is commonly used in data statistical techniques and is modeled as a PCA model. That is, the sample matrix is centered by the PCA method, and from this, its covariance matrix and its eigenvalues are derived to form a vector of component patterns (eigenvalues are arranged from largest to smallest). According to the size of the cumulative variance contribution rate, the dimensions with smaller influence factors are discarded to realize data dimensionality reduction. The principal element analysis method doesn’t require the creation of a mathematical model of the fault, but rather a statistical model of the data [26]. The principal element analysis method takes the original measurement data through data dimensionality reduction processing, extracts the features and information with large data influence factors, and eliminates the correlation between the data and inter-system noise. The principal element analysis method is widely used in data compression, fault diagnosis and prediction, signal processing, fuzzy identification, and other important fields.

Principal Component Analysis (PCA) is a multivariate statistical method that can perform data dimensionality reduction and can retain the main feature information of the data. Combined with PCA mathematical principles, the following are as follows:

Assume that the measurement data matrix is X ∈ R^m×p, n is the number of samples of measurement data variables, p is the number of measurement variables, and the data matrix X is: 1 $X = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 p} \\ x_{21} & x_{22} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{n p} \end{matrix}] = (^{x_{1}, x_{2}, \dots x_{p}) T}$ where x_j =(x_1j,x_2j,⋯,x_nj)^T, j = 1,2,⋯,p, the above p observed variables (x₁, x₂,⋯,x_p)^T will become new comprehensive variables by PCA analysis as follows: 2 $\begin{array}{l} F_{j} = α_{j 1} x_{1} + α_{j 2} x_{2} + \dots + α_{j p} x_{p} \\ = {\begin{array}{l} F_{1} = α_{11} x_{1} + α_{12} x_{2} + \dots + α_{1 p} x_{p} \\ F_{2} = α_{21} x_{1} + α_{22} x_{2} + \dots + α_{2 p} x_{p} \\ \dots \\ F_{p} = α_{p 1} x_{1} + α_{p 2} x_{2} + \dots + α_{p r} x_{p} \end{array}, j = 1, 2, \dots, p \end{array}$

The model satisfies the conditions: 1)

F_i, F_j is uncorrelated (i ≠ j and i, j = 1,2,⋯ p.

2)

D(F₁) > D(F₂) > ⋯ > D(F_p).

3)

$a_{k 1}^{2} + a_{k 2}^{2} + ... a_{k p}^{2} = 1$ ,k = 1,2,…, p.

Where F is the principal component, also known as the composite variable. a_ij, which satisfies the sum of squared values equal to 1, is called the principal component coefficient. Thus the above model can be expressed in matrix as: 3 $F = A X$ 4 $F = {(F_{1}, F_{2}, \dots, F_{p})}^{r}, X = {(x_{1}, x_{2}, \dots x_{p})}^{r}$ 5 $A = [\begin{matrix} a_{11} & a_{12} & \dots & a_{1 p} \\ a_{21} & a_{22} & \dots & a_{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{p 1} & a_{p 2} & \dots & a_{p p} \end{matrix}] = (^{a_{1}, a_{2}, \dots, a_{p}) T}$

The principal element analysis is geometrically interpreted to mean that the sample has an elliptical distribution in the two-dimensional plane, assuming that the number of variables in the sample is 2 and the number of samples is n. The origin remains unchanged and a new coordinate system is established. The angular difference between the new coordinate system and the old coordinate system is θ. the angle θ can be found by the rotation operation of the matrix, and the relation is satisfied: 6 $Y = [\begin{matrix} y_{11} y_{12} \dots y_{1 n} \\ y_{21} y_{22} \dots y_{2 n} \end{matrix}] = [\begin{matrix} \cos θ & \sin θ \\ - \sin θ & \cos θ \end{matrix}] [\begin{matrix} x_{11} x_{12} \dots x_{1 n} \\ x_{21} x_{22} \dots x_{2 n} \end{matrix}] = U \cdot X$ where U⁻¹ = U^T, sin² θ + cos² θ = 1. After rotational transformation, the main element analysis geometry is shown in Fig. 1.

In the new coordinate y₁ – y₂, the correlation between the n sample points changes dramatically: the variance is concentrated on the: y₁ axis and the correlation tends to 0. When the number of variables is greater than 2, and so on. Principal element analysis is mainly based on the correlation between variables to put the eigenvalues in order. Specific quantification is carried out by means of the covariance matrix, which is given by the matrix F: 7 $V a r (F) = V a r (A X) = Λ = (\begin{matrix} λ_{1} \\ λ_{2} \\ ⋱ \\ λ_{n} \end{matrix})$

The original data, i.e., the covariance matrix of the input samples is set to V, and the input samples are first standardized, and the covariance matrix V becomes the correlation matrix R = XX′ after mathematical operations.

The orthogonal matrix is A, and the principal component covariance matrix is Var(F)= AXX′ A′ = ARA′ = Λ. RA′ = A′ Λ can be derived, based on the properties of matrices and determinants: 8 $[\begin{matrix} r_{11} & r_{12} & \dots & r_{1 p} \\ r_{21} & r_{22} & \dots & r_{2 p} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ r_{p 1} & r_{p 2} & \dots & r_{p p} \end{matrix}] \cdot [\begin{matrix} a_{11} & a_{21} & \dots & a_{p 1} \\ a_{12} & a_{22} & \dots & a_{p 2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{1 p} & a_{2 p} & \dots & a_{p p} \end{matrix}] = [\begin{matrix} a_{11} & a_{21} & \dots & a_{p 1} \\ a_{12} & a_{22} & \dots & a_{p 2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{1 p} & a_{2 p} & \dots & a_{p p} \end{matrix}] \cdot [\begin{matrix} λ_{1} \\ λ_{2} \\ ⋱ \\ λ_{n} \end{matrix}]$

The correlation coefficient matrix has eigenvalue λ₁ < λ₂ < ⋯ λ_n and has a corresponding a_i called eigenvector.

Then we have Var(F_i) = λ_i, which corresponds to the covariance: 9 $C o v (a_{i^{'}} X^{'}, a_{j} X) = a_{i^{'}} R a_{j} = a_{i^{'}} (\sum_{α = 1}^{p} λ_{α} a_{α} a_{α^{'}}) a_{j} = \sum_{α = 1}^{p} λ_{α} (a^{'} a_{α}) ({a^{'}}_{α} a_{j}) = 0, i \neq j$

This results in an uncorrelated composite variable for the random variables and the composite variable decreases in descending order according to variance. The new composite variables are as follows: 10 ${\begin{matrix} F_{1} = a_{11} x_{1} + a_{12} x_{2} + \dots + a_{1 p} x_{p} \\ F_{2} = a_{21} x_{1} + a_{22} x_{2} + \dots + a_{2 p} x_{p} \\ ⋮ \\ F_{p} = a_{p 1} x_{1} + a_{p 2} x_{2} + \dots + a_{p p} x_{p} \end{matrix}$

2.1.2

Determination of the number of primary elements

The number of principal elements has an impact on the completeness of information, and the cumulative variance contribution method is commonly used to determine the number of principal elements. In this paper, a method of determining the number of principal elements jointly with the mean value of the complex correlation coefficient (MCC) combined with the cumulative variance contribution ratio (CPV) is used [27]. The cumulative variance contribution rate generated from the DS table was used as the basis for the initial determination of the number of principal elements, and the mean value of the complex correlation coefficient was examined, and iterative calculations and corrections were carried out to finally determine the number of principal elements.

Cumulative contribution to variance (CPV) method: 11 $C P V_{k} = \sum_{i = 1}^{k} P V_{i} = \sum_{i = 1}^{k} (\frac{λ_{i}}{\sum_{j = 1}^{m} λ_{j}})$ where PV_i is the variance contribution ratio, size: $λ_{i} / \sum_{j = 1}^{m} λ_{j}$ .

The size of the cumulative variance contribution rate is generally chosen to be 85% or more. In order to prevent subjectivity in determining the number of principal elements, the average value of the complex correlation coefficient is introduced to improve the cumulative variance contribution rate based on the cumulative variance contribution rate to determine the number of principal elements.

The specific algorithm for PCA dimensionality reduction for 80 sets of 19-dimensional data is as follows: 1)

A total of 80 groups of sample data to establish a sample library, define the covariance diml ~ dim19, where dim1 = Mysample (:1), dim2 = Mysample(:2), and so on.

2)

The variance of each dimension of the diagonal std(dim1)/(19-1), std(dim2)/(19-1). The cov function is called to compute the result cov (Mysample). The above process can be verified by the sample matrix centering function: x=Mysample-repmat (mean (Mysample), 19, 1), C = (x^t * x)/(size(x,1)–1), where size(x, 1) is the sample dimension n=19.

3)

Find the eigenvalues of C based on the covariance matrix with eigenvectors [eigeavalues, eigenvectors] = eig(cov), and select the components to compose the pattern vector.

4)

The data after dimensionality reduction: FinalData=rowFeatureVector*rowDataAdjust, where the first term is the transpose of the pattern vector and is arranged in ascending order of feature values. The second term is the transpose of each dimension of data minus the mean value for that dimension. The cumulative contribution is calculated, discarding the less influential dimensions to ensure that the cumulative contribution to variance is greater than 0.85.

5)

Calculate the complex correlation coefficient $ρ (x_{j}, T) = \sqrt{\sum_{i = 1}^{k} λ_{i} P_{j, i}^{2}}$ , until the average value of the complex correlation coefficient is greater than 0.9.

6)

The average value of the complex correlation coefficient is combined with the CPV algorithm to jointly determine the number of principal elements.

2.2

ISFD-POPS modeling

In order to achieve accurate diagnosis of electrical equipment fault problems, this chapter proposes an electrical equipment fault diagnosis model (ISFD-POPS) based on PCA and optimization parameter SVM. Firstly, the typical fault samples are selected by screening the fault information of the electrical equipment and equipment state discriminative decision library, and the typical fault samples of the electrical equipment are divided into the training set and the test set, and at the same time, the dimensionality reduction of PCA is carried out, so as to leave as many fault features as possible on the basis of not affecting the overall diagnosis of the model. Then the MC-HBT classifier was constructed according to the specific structure of electrical equipment, and the Gaussian radial basis (RBF) kernel function was selected as the kernel function of the ISFD-POPS model. Then, the parameters of the kernel function are optimally selected using ICA, and the MC-HBT classifier is trained using the training set to generate an optimized MC-HBT classifier with optimized parameters.

The ISFD-POPS model is shown in Fig. 2.

2.2.1

PCA-based critical fault characterization

PCA can reduce high dimensional data to low dimensions and retain some of the most important features in the original data while removing noise and some correlated features, thus improving the processing efficiency of the data and reducing the time cost [28]. In this section, PCA is used to reduce the dimensionality of fault samples, and the following steps are generally followed when PCA is applied to fault diagnosis:

Assume that there are m substation sample fault data with n fault features each, forming matrix X_mn: 12 $X_{m n} = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 n} \\ x_{21} & x_{22} & \dots & x_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{m 1} & x_{m 2} & \dots & x_{m n} \end{matrix}]$

Step1:

First homogenize the matrix X_mn, which can be obtained as equation (13): 13 $X = X_{m n} - {\bar{X}}_{m n}$

Step2:

Based on the homogenized matrix X, calculate the covariance matrix C: 14 $C = \frac{1}{n - 1} X X^{T}$

Step3:

Next, the eigenvalues λ_i and eigenvectors ω_i of the covariance matrix C are calculated.

Step4:

Select the eigenvectors ω₁,ω₂,⋯,ω_k corresponding to the k largest eigenvalues, and form the matrix W_kn according to the size of the corresponding eigenvalues from top to bottom by rows, and the eigenvalue selection method is: 15 $\frac{\sum_{i = 1}^{k} λ_{i}}{\sum_{i = 1}^{n} λ_{i}} \geq t$

Where, the larger t retains more eigenvalues, which can be selected according to the complexity of fault information.

Step5:

Thus the matrix after dimensionality reduction can be obtained: 16 $Y_{k m} = W_{k n} X_{m n}$

Step6:

Calculate the contribution rate of all principal component vectors in turn, and the contribution rate y_i of principal component vectors to fault information is: 17 $y_{i} = \frac{λ_{i}}{\sum_{i = 1}^{n} λ_{i}}$

2.2.2

Nonlinear transformations of SVMs

The core goal of SVM is to find an optimal classification hyperplane for classification through individual break training for linearly differentiable problems, which is a typical binary classification problem.

SVM nonlinear transformation has the following steps: Step 1:

Firstly, the SVM nonlinear mapping function is constructed, as shown in equation (18). 18 ${\begin{array}{l} m i n Φ (ω, ξ) = \frac{1}{2} | | ω | |^{2} + C \sum_{i = 1}^{l} ξ_{i} \\ s . t . {\begin{matrix} y_{i} [ω^{T} φ (x_{i}) + b] \geq 1 - ξ_{i} \\ ξ_{i} \geq 0, i = 1, 2, \dots, l \end{matrix} \end{array}$ where Φ is a nonlinear mapping. ω is the normal vector of the SVM hyperplane. ξ_i is a relaxation variable. The parameter C is the penalty factor. The following equation is the constraints of the above equation. in the following equation, x_i is the ith fault sample. b is the offset of the SVM hyperplane. gg is a category label. y_i is a category label that takes the value {-1, 1}.

Step2:

Next, the Langrange function is constructed as shown in equation (19). 19 $L (ω, b, ξ, α, β) = Φ (ω, ξ) - \sum_{i = 1}^{l} α_{i} {y_{i} [ω^{T} φ (x_{i}) + b] - 1 + ξ_{i}} - \sum_{i = 1}^{l} β_{i} ξ$ where α_i, β_i is the Lagrange multiplier, where α_i > 0, β_i > 0.

Step3:

Through the Lagrangian function, its dualized form can be obtained as: 20 ${\begin{array}{l} \max Ψ (α) = - \frac{1}{2} \sum_{i, j = 1}^{l} α_{i} α_{j} y_{i} y_{j} (x_{i} x_{j}) + \sum_{i = 1}^{l} α_{i} \\ s . t . \sum_{i = 1}^{l} α_{i} y_{i} = 0 (α_{i} \in [0, C]; i = 1, 2, \dots, l) \end{array}$ where the lower equation is the constraint of the upper equation.

Step4:

Thus the objective function can be obtained, as shown in Eq. (21). 21 $f (x) = s i g n [\sum_{i = 1}^{l} α_{i} y_{i} K (x, x_{i}) + b]$

Where K(x,x_i) is the kernel function, the kernel function has the ability to deal with nonlinear data, in order to achieve the nonlinear transformation of SVM and realize the accurate diagnosis of intelligent substation fault diagnosis problems, it is necessary to choose a suitable kernel function and optimize the parameters of the kernel function with a suitable optimization algorithm.

2.2.3

ICA-based SVM parameter optimization

Like all optimization algorithms, the Imperial Competition Algorithm (ICA) starts by randomly generating a number of initial solutions, and filtering out the optimal solution among these initial solutions by optimizing the objective function set. In this section, the RBF kernel function is selected as the kernel function for the ISFD-POPS model. 22 $K (x_{i}, x_{j}) = \exp (- γ | | x_{i} - x_{j} | |^{2}), γ > 0$ where γ is called the key parameter of the model. During the training process of SVM, the penalty factor C and the kernel function ν have a great impact on the overall generalization performance of the model, so these two parameters need to be adjusted to improve the generalization performance of SVM. In this chapter, in order to apply ICA for parameter optimization, the state cost function is set as the objective function. In the case of insufficient sample size, in order to make full use of the dataset to test the model, so that the model has a better generalization effect, and to reduce the overfitting phenomenon due to the small sample size, this chapter chooses the average classification accuracy after K -fold cross-validation (randomly dividing the dataset into k packages, and each time, one of the packages is used as the test set, and the rest of the k – 1 packages are used as the training set for training) to construct the objective function P(C,γ) : 23 $P (C, γ) = - \frac{1}{k} \sum_{i = 1}^{k} (\frac{f_{T}^{i}}{f^{i}} \times 100 %)$

Where, f ⁱ is the number of samples in the i th validation set when the SVM algorithm classification is performed. $f_{T}^{i}$ is the number of correctly classified samples in the validation set when the classification is performed. k is the number of folds for cross-validation.

2.2.4

MC-HBT Classifier

In this section, a multi-classification classifier based on hierarchical binary tree (MC-HBT) is proposed. In this section, the MC-HBT classifier is used as the basis for fault diagnosis in substations, and the MC-HBT classifier is shown in Fig. 3. (a: Normal. b: Circuit breaker steal trip. c: Control circuit disconnection fault. d: Operating mechanism oil (gas) pressure low blocking trip fault. e: Hydraulic mechanism oil pump pumping timeout. f: Circuit breaker interrupter room explosion.: Protection action circuit breaker refuses to divide: h: SF6 gas pressure low fault.)

2.2.5

ISFD-POPS modeling process

In order to achieve accurate fault diagnosis, the ISFD-POPS model proposed in this section needs to be trained several times. By screening the fault information of electrical equipment, the extracted data are divided into training and testing sets, the RBF kernel function is selected, and ICA is used to optimize the selection of kernel function parameters to generate the MC-HBT classifier. The overall operation steps are as follows: Step1:

The initial sample set is linearly transformed using PCA, and the transformed cumulative contribution rate is used to select the principal components. The linearly transformed sample set is downscaled by the selected principal components to obtain a new set of sample sets.

Step2:

Focus on training the new samples, and form the objective function using the new sample set and the SVM with unknown parameters.Step3: Use ICA to find out the optimized solution of the objective function, determine the optimized parameters of the SVM for fault classification of electrical equipment, and use the optimized parameters to build the first layer classifier.

Step4:

According to the order of MC-HBT classifiers, the training set is classified sequentially, and all classifiers are built according to the method in Step3 to form the MC-HBT classifier model for fault identification.

Step5:

Input the test samples into the ISFD-POPS model and calculate their predicted values.

The overall flow of the ISFD-POPS model is shown in Figure 4.

3

ISFD-POPS based fault diagnosis of electrical equipment

The selection of the types of electrical equipment faults studied in this section is based on two main considerations. On the one hand, the gradual faults of electrical equipment regulating valves are highly uncertain and not suitable for fault diagnosis studies. Therefore, abrupt faults are mainly studied in the subsequent fault diagnosis. On the other hand, considering the probability of each fault occurring in the actual working process, the main study will focus on some of the faults with a higher probability of occurrence. Considering the above two aspects, the types of faults that were studied were: five faults, namely, F1 valve clogging fault, F7 fluid overheating evaporation or critical flow fault, F8 actuator actuator twisting fault, F10 membrane head perforation fault, and F14 pressure sensor fault. The simulation diagrams of 5 kinds of faults (F1, F7, F8, F10, F14) are shown in Fig. 5 to Fig. 9. (In the figure, CV represents the output signal amplitude of the external controller, P1 and P2 represent the pre-valve and post-valve pressure magnitude of the control valve, x represents the amplitude of the stem displacement, F represents the amplitude of the medium flow rate in the main pipeline, T1 represents the amplitude of the fluid temperature, and the diagnostic fault signal variable.)

Both the training dataset and the test dataset were subjected to PCA dimensionality reduction, with 95% of the principal components selected, and the original 6-dimensional data was reduced to 3-dimensional data. The number of samples in both the training and test datasets is selected to be 550, and the fault diagnosis results of PCA-PSO-SVM and PCA-SVM are shown in Table 1 and Table 2. The fault diagnosis accuracy of the traditional method is 54.36%, while the fault diagnosis accuracy increases to 62.84% after applying the ISFD-POPS model proposed in this paper. Before applying the ISFD-POPS model proposed in this paper, the AUC value is 0.92, while after applying the ISFD-POPS model proposed in this paper, the AUC value is 0.99. This indicates that the improved method is superior compared to the original method.

Table 1.

Before using the model presented in this article

	Prediction fault type
Actual fault type		f₁	f₇	f₈	f₁₀	f₁₄
	f₁	550	0	0	0	0
	f₇	0	550	0	0	0
	f₈	195	0	95	86	174
	f₁₀	256	0	81	138	75
	f₁₄	193	0	140	55	162

Table 2.

After using the model presented in this article

	Prediction fault type
Actual fault type		f₁	f₇	f₈	f₁₀	f₁₄
	f₁	523	0	27	0	0
	f₇	0	550	0	0	0
	f₈	63	0	205	5	277
	f₁₀	78	0	134	275	63
	f₁₄	65	0	278	32	175

4

Fault prediction model for electrical equipment based on lifting limit learning machine

4.1

Fault prediction methods for electrical equipment

In this paper, the Extreme Learning Machine (ELM) model is used to predict the characteristic gas content inside the electrical equipment, so as to accurately predict future failures of the electrical equipment.

Extreme learning machine (ELM) is a new type of single hidden layer feed-forward neural network algorithm. With the rapid rise of neural networks in recent years, there are more and more neural network algorithms applied to the field of transformer fault prediction, single hidden layer neural network because of its very good learning ability, once applied, it quickly became a research hotspot, the algorithm works with the advantages of strong learning ability, fast learning speed, and strong generalization ability [29].

The theoretical background of the algorithm of ELM as a mathematical foundation is as follows: 1)

Interpolation Theorem

The interpolation theorem states that the SLFNs of neurons of any M completely different random samples (x_i,n_i) ∈ R^d × R^mi = 1,2,…,M with Q hidden layers can be represented as follows: 24 $\sum_{i = 1}^{Q} α_{i} g_{i} (x_{j}) = \sum_{i = 1}^{Q} α_{i} G (α_{i}, b_{i}, x_{j}) = o_{j} j = 1, 2, ..., M$

By this SLFNs in the process of approximating the sample without mean error, also to satisfy $\sum_{j = 1}^{\tilde{n}} | | o_{j} - t_{j} | | = 0$ then there must exist (a_i,b_i) and α_i to satisfy: 25 $\sum_{i = 1}^{Q} α_{i} G (α_{i}, b_{i}, x_{j}) = t_{j} j = 1, 2, ..., M$

Let $\sum_{i = 1}^{\tilde{n}} α_{i} G (α_{i}, b_{i}, x_{j}) = H$ , then the above equation can be written as: 26 $H α = T$ where H is the implicit layer output matrix of SLFN as follows: 27 $H = [\begin{matrix} h (x_{i}) \\ ... \\ h (x_{M}) \end{matrix}] = {[\begin{matrix} G (a_{1}, b_{1}, x_{1}) & ... & G (a_{Q}, b_{Q}, x_{1}) \\ ... & ... & ... \\ G (a_{1}, b_{1}, x_{M}) & ... & G (a_{Q}, b_{Q}, x_{M}) \end{matrix}]}_{M \times Q}$

Among them: 28 $α = {[\begin{matrix} α_{1}^{'} \\ ... \\ α_{q}^{T} \end{matrix}]}_{Q \times M}, T = {[\begin{matrix} t_{1}^{T} \\ ... \\ t_{q}^{T} \end{matrix}]}_{M \times m}$

Provide a data set (x_i,n_i) ∈ R^d × R^m with N samples and an arbitrarily small integer Y > 0, then there will always exist a SLFNs containing M(M < N) hidden layer nodes with activation function g. is infinitely differentiable in a given space and the parameter ${a_{i}, b_{i}}_{i = 1}^{M}$ is randomly generated in a given space, then there will always exist: 29 $| | \begin{matrix} H_{N \times M} α_{M \times m} - T_{N \times m} \end{matrix} | | < T$

However, because the interpolation theorem has a certain oscillatory nature, which leads to a decrease in the generalization ability of the algorithm, based on this drawback, the approximation ability of the hidden layer neurons is investigated.

Assuming that there exists F² (X) for X the space of function g on a n -dimensional Euclidean space, the function g must satisfy $\int_{x} | g (x) |^{2} d x < \infty$ . For any u,ν ∈ F²(X), the <u,ν> inner product is: 30 $< u, v > = \int_{x} u (x) v (x) d x$

According to the operational definition of the function paradigm, the formula used to describe the distance between the objective function f and the model function f_n of the neural network is: 31 $| | f_{n} - f | | = {[\int_{x} | f_{n} (x) - f (x) |^{2} d x]}^{\frac{1}{2}}$

Given a function g(x), if g(x) is a polynomial function, then for any q ∈ [1, ∞), span{g(a* x+b) : (a, b) ∈ R^d × R} is compact in space.

R^d a bounded productable continuous function, while $\int_{R}^{d} L (x) d x \neq 0$ , it likewise follows that for any q ∈[1, ∞), span{g(L((x – a/b) : (a, b) ∈ R^d × R* is also compact in space.

Through the above two primitives can be seen, for the parameters in the limit learning machine, through training and adjustment can be approximated by any function, but this is only a simple explanation of the activation function for the incremental function or a generalized approximation of the RBF function, and did not give the corresponding instructions on how to get the optimal parameters, so many scholars have made a large number of studies for the optimization of the parameters of the two functions, and corresponding algorithms have also been proposed. This paper is based on the improved sparrow algorithm to optimize the parameters of these two functions, so that the limit learning machine model is more suitable for the prediction of the dissolved gas content of the transformer.

2)

ELM Model.

The bias vectors between the input and implicit layers and the weight vectors between the implicit and output layers in SLFNs need to be determined. For the connection weight vector α the value of α* can be determined by the least squares solution of Hα = T:

|| Hα* – T ||= min || Hα – T || If the training samples N and the number of neurons L in the hidden layer are equal, and the output matrix of the hidden layer is a square matrix, then there exists an equality in a_i,b_i,α_i such that Hα = T holds. But in most cases, there is no equality. In this case, the least squares solution for the above linear system becomes: 32 $α^{*} = H^{+} T$ where H⁺ is the Moore-Penrose generalized inverse matrix of H.

Based on the above description ELM algorithm can be simply described as the following steps: (1)

Given a training set (x_i,n_i) ∈ R^d × R^mi = 1, 2…, N, an implicit layer function G(a_i,b_i,α_i) and the number of implicit layer neurons L.

(2)

Randomly generate the parameters (a_i,b_i)i = 1,2,…, L of the hidden layer.

(3)

Compute the output matrix H of the neurons in the hidden layer.

(4)

Compute the connection weight vector α* = H⁺T between the hidden layer and the output layer.

(5)

Generation algorithm

The structure of the extreme learning machine is shown in Fig. 10.

From the above statement we can conclude the advantages of Extreme Learning Machine algorithm: (1)

The training speed is fast, only need to train once, no need to repeat the training.

(2)

The algorithm generalizes very well and does not easily fall into local optimum.

Disadvantages of Extreme Learning Machine:

(3)

As a neural network, there is still overfitting.

(4)

The random selection of input weights and implicit layer bias causes the algorithm to be not very effective when targeting to a particular problem.

Through the theoretical knowledge of ELM, the network structure diagram, as well as the advantages and disadvantages can be clearly seen, in the ELM network, the choice of connection weights and bias has a decisive role in improving the performance of the model, if you can choose the optimal bias and weight according to the characteristics of the different input data sets, the algorithmic model performance will be more excellent. Therefore, this paper is based on this, choose FA-ISSA to find the optimization of these two parameters, and further improve the prediction accuracy of the algorithm.

4.2

Construction of the prediction model

4.2.1

Selection and Processing of Feature Quantities

In this paper, the contents of the five gases, H₂, CH₄, C₂H₄, C₂H₆, C₂H₂, are input as characterization data for the FA-ISSA-ELM model. Since the contents of the five different gases in the oil are somewhat differentiated, they need to be normalized. The normalization formula is as follows: 33 $X_{i, j}^{1} = \frac{X_{i, j}}{\sum_{k = 1}^{5} X_{i, k}}$

Where x_ij denotes the value of the normalized characteristic gas. $X_{i, j}^{1}$ denotes the value of the original characteristic gas content.

4.2.2

Predictive models

In this chapter, the lifting limit learning machine is used as the model for training and testing, and the error criterion selects the three variables of root mean square error, mean absolute error, and mean absolute percentage error as the evaluation criteria for the prediction accuracy of the prediction model. Sigmoid function is chosen for the activation function of the hidden layer. The connection weights between the input layer and the hidden layer, as well as the bias of the hidden layer neurons after FA-ISSA optimization, are selected as the initial network weights and bias. The values of the previous moments are used as inputs to predict the values of the current moment. That is to say, the value from 1 to n time is input and the value from n+1 time is output, then the value from 2 to n+1 time is input and the value from n + 1 time is output. According to the results of parameter selection to establish the model, the flow of the dissolved gas content prediction model is shown in Figure 11.

4.3

Effectiveness of fault prediction for electrical equipment

In this paper, 14 groups of five characteristic gases data measured for transformer No. 2 in a 110KV step-up substation of a nuclear power plant are used as the fault characterization data of the electrical equipment to verify the accuracy of the prediction model and to predict whether there is a fault in this transformer. Since the 14 sets of data are not equal interval data, Newton interpolation is utilized to preprocess the data into equal interval data. Therefore the fault data is supplemented to 22 groups of data, 20 of which are used as the training set, and the last 2 groups of data actually measured are used as the test set, and the transformer fault characterization gas data (μL/L) is shown in Table 3.

Table 3.

Transformer fault characteristic gas data(μL/L)

Time	H₂	C₂H₆	C₂H₄	CH₄
Supplementary data	22.1	110.1	67.6	32
2022/08/11	22.3	109.2	67.8	31.6
2022/09/01	21.6	107.8	66.7	31.7
2022/09/08	21.1	106.7	66.6	31.2
2022/10/11	19.9	106.5	64.5	30.6
Supplementary data	20.2	104.8	64.2	30.9
2022/12/25	18.2	103.9	62.6	30.5
Supplementary data	19.1	104.7	62.9	30.5
Supplementary data	18.8	103.7	63.4	30.4
2023/01/01	18.6	102.4	62.5	30.7
Supplementary data	18.7	102.7	62.5	31
2023/02/12	18.7	102.5	61.9	30.2
Supplementary data	19	103.9	62.5	31.6
2023/04/12	19.3	103.5	63	31.5
2023/05/01	19.5	103.8	63.2	31
Supplementary data	19.6	101.9	63.5	29.5
Supplementary data	19.9	102.5	63.2	31
2023/07/02	21.2	104.5	64.5	31.5
2023/08/11	21.2	105.1	64.2	31.5
2023/10/11	21.6	108.6	64.4	30
2024/01/01	21.9	110.6	65.3	32.6
2024/01/15	22	115.2	65.1	33.3

Based on the H2 gas content data in the table, the optimal value and optimization seeking error are derived using the electrical equipment fault prediction model based on the lifting limit learning machine. The prediction results of ammonia gas at different dimensions are shown in Table 4. As can be seen from the table, the parameter optimization error appears to be significantly reduced at dimension 5, while a minimum value of 0.00241 appears at dimension 6, after which the optimization error basically remains near the minimum value. In order to reduce computational complexity, parameter optimization is not carried out at higher dimensions, and the optimal dimensional reference and parameter reference are used in the optimization process for other gases.

Table 4.

The prediction of ammonia in different dimensions

Dimension m	Parameter C	Parameter g	Parameter ε	Parameter optimization error
2	6	220.3	0.000945	0.25
3	25.6	125.4	0.08	0.18
4	9.5	255.1	0.000945	0.344
5	72.14	46.8	0.145	0.0051
6	55.31	91.3	0.000945	0.00241
7	53.64	75.6	0.00088	0.00575

The fault feature prediction model is built according to the optimal dimensions and optimal parameters of H2, and the 20 sets of data from the training set are utilized to predict the 2 sets of test sets, and the H2 prediction results are shown in Fig. 12. As can be seen in the figure, the last two data points of H2 are test data, which are closer to the actual data, and their relative errors RE% are 3.51% and -3.68, respectively, and the average absolute error MAPE% is 3.3%, which is relatively small and basically meets the prediction requirements.

In order to further verify the effectiveness of the prediction model, based on the existing H2 data, its training set is also made as a test set input into the constructed prediction model, and the comparison between the actual data of the training set and the predicted data is shown in Fig. 13. As can be seen from the figure, due to the insufficiency of the training samples, which is not enough to fully reflect the coupling relationship between the characteristic gases, as well as the existence of a certain prediction accuracy of the prediction model itself, resulting in the existence of a difference between the predicted values and the actual values of each set of predicted values, and the error is constantly increasing. Nevertheless, from the figure can also be seen that the trend of the predicted data and the trend of the actual data is the same, there is an increase in the error, but it is very small, which means that the use of the prediction model established in this paper to predict the unknown data, the value and trend of the actual value is also close to the prediction of the data can be put into the prediction after the establishment of the electrical equipment diagnostic model, to determine whether the equipment may be a thermal fault, which facilitates maintenance personnel to discover potential electrical equipment faults and maintain the equipment as early as possible to avoid electrical equipment failures.

The fault characteristic prediction model is built according to the optimal dimensions and optimal parameters of different gases, and the BP neural network is used to re-predict the characteristic data, and the prediction results are shown in Table 5 when compared with the prediction model built in this paper. Since the values of C₂H₂ in the given fault data are all 0, they are not given consideration. As can be seen from the table, except for the prediction of C₂H₆, the BP prediction model has a higher accuracy than the prediction model proposed in this paper, and the rest of them are much lower than the accuracy of the SVR prediction model, which in turn verifies the accuracy of the prediction model constructed in this paper, and at the same time the accuracy of the prediction model for H₂ is much lower relative to the prediction models for all other gases, which may be due to the fact that when constructing the model, the parameters are not optimal, or the fit of H₂ in the iso-interval data, the fitting curve of H₂ is not accurate enough and needs further adjustment. However, it can be seen from the table that the prediction model established in this paper basically meets the prediction requirements with high accuracy and strong practicality.

Table 5.

Test results

Name	Time	Actual value	SVR/BP Predictive value		RE% of SVR/BP		MAPE% of SVR/BP
H₂	1.39	20.51	20.937	20.902	4.12	5.33	4.04	5.14
H₂	2.01	20.59	20.96	20.72	-3.99	5.32	4.04	5.14
C₂H₆	0.67	111.8	111.503	111.812	0.66	-0.079	1.925	0.301
C₂H₆	1.89	113.52	112.57	112.43	1.91	1.46	1.925	0.301
C₂H₄	1.48	64.41	64.12	64.95	1.145	-2.2	0.9975	1.735
C₂H₄	2.31	64.89	64.285	64.89	-0.59	2.9	0.9975	1.735
CH₄	0.98	31.86	32.122	31.924	2.5	3.6	2.502	3.43
CH₄	2.47	31.6	31.74	32.18	1.184	-3.19	2.502	3.43

5

Conclusion

The article is about fault diagnosis of electrical equipment and fault prediction models, and it is intended to provide relevant research work through example analysis to prove the practicality of the two models proposed in this paper.

Using the prediction model proposed in this paper to predict the fault characteristics of H₂, it is found that the final prediction results are closer to the actual results, and the error is relatively small, with the relative error RE% of 3.51% and -3.68 respectively, and the average absolute error MAPE% of 3.3%.

In the fault characteristic prediction experiments built according to the optimal dimensions and optimal parameters of different gases, except for the prediction of C₂H₆, the BP prediction model has a higher accuracy than the prediction model proposed in this paper, the rest of them are all lower than the accuracy of the fault prediction model for electrical equipment constructed in this paper based on the lifting-limit learning machine, which can be obtained from the fact that the prediction model built in this paper has a good prediction performance.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

High-Dimensional Feature Optimization and Real-Time Prediction Model with Support Vector Machines for Fault Diagnosis of Electrical Equipment

Lei Li

Yanling Gao

Publié en ligne: 19 mars 2025

Reçu: 25 oct. 2024

Accepté: 03 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0480

Mots clésSupport vector machine, Principal component analysis, Fault diagnosis, Extreme learning machine, Fault prediction

© 2025 Lei Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Support vector machine, Principal component analysis, Fault diagnosis, Extreme learning machine, Fault prediction