Feeder loss estimation of transformer in long-short memory network, based on FCM clustering
Data publikacji: 24 wrz 2025
Otrzymano: 10 sty 2025
Przyjęty: 05 maj 2025
DOI: https://doi.org/10.2478/amns-2025-0995
Słowa kluczowe
© 2025 Songyu Wu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
The distribution network feeder line loss rate, as a key indicator for measuring the economic efficiency and technical management level of power supply enterprises, is not only directly related to the power management efficiency of enterprises, but also closely related to factors such as the scientific design of regional distribution networks, technical application effects, equipment operation status, personnel professional level, and overall management efficiency [1-2]. Optimizing the composition of distribution network line losses is a direct way for power supply enterprises to improve economic efficiency, and it is also an important means to achieve increased income and efficiency. With the continuous improvement of the operation mechanism of power supply enterprises, the control standards for line loss rate are becoming stricter. In this context, the importance of deepening the analysis of distribution line loss is becoming increasingly significant [3].
The core value of line loss analysis lies in evaluating the rationality of the operation of the distribution system, identifying problems in operation strategies, distribution architecture, equipment efficiency, metering equipment accuracy, and electricity management, and providing scientific basis for formulating accurate loss reduction strategies [4]. Setting a reasonable benchmark value for line loss rate plays a key decision-making support role for power supply enterprises to scientifically plan line loss management and formulate effective loss reduction measures, which contains significant economic and social benefits. However, current research in this field is still lacking. Reference [5] proposes a calculation strategy for line loss rate under three-phase equilibrium state. This method improves the previous electricity loss standard to represent the severity of electricity theft, and uses the line loss rate range standard to represent the severity of electricity theft. However, its limitation is that it only considers two factors: electricity loss and three-phase imbalance. Reference [6] considered the characteristics of load in the classification analysis of distribution networks, and set the benchmark value of distribution network feeders as the median value for their typical distribution, which is mainly located at the median value ±0.5%. However, this method has strong subjectivity, and the universality of its benchmark value determination method is limited. In addition, given the large number of feeders in the distribution network, manual setting methods are clearly not effective in meeting application needs, and their practical application value is not high.
With the improvement of artificial intelligence level, data-driven line loss rate prediction models have become a research focus, mainly including two types: (1) in a single estimation model, literature [7] uses fast independent component analysis method for feature selection of feeder line loss data, and then uses support vector regression to predict the feeder line loss of the distribution network; Reference [8] improved the support vector regression model by using evolutionary computation particle swarm optimization, significantly enhancing the accuracy of predicting feeder line losses in distribution networks; Reference [9] is based on the traditional grey relational analysis method to screen the data features of feeder line losses in distribution networks. Through this data preprocessing, the prediction accuracy of subsequent neural network models is improved. Reference [10] first reconstructs the characteristics of feeder line loss data using a denoising auto-encoder, and then uses a long short-term memory network to predict the feeder line loss rate; Reference [11] adopts a deep transfer learning strategy to predict and analyze the network loss problem of distribution lines containing distributed new energy sources. (2) In the multi model fusion model, reference [12] is based on the Bootstrap Aggregating base learner model and integrated into the random forest algorithm to achieve line loss rate prediction of distribution network feeders; Reference [13] uses a Boosting strong classifier to improve the performance of Extreme Gradient Boosting Tree (XGBoost) and applies it to predict the line loss rate of distribution network feeders; Reference [14] applies stacked generalization techniques to predict feeder line losses in distribution networks, where the meta model uses gradient boosting trees and the base model uses traditional machine learning models. These studies have improved the accuracy of line loss rate prediction to a certain extent, but the models used are all machine learning models, which suffer from insufficient feature mining and generalization ability when dealing with complex scenes.
This article proposes a method for estimating feeder line loss, which integrates fuzzy C-means clustering and long short-term memory network Transformer model. Firstly, the fuzzy C-means clustering technique is used to group the data related to feeder line loss estimation. Then, independent line loss estimation is performed for each group, and the data is preprocessed to achieve more refined control of feeder line loss in the distribution network. In addition, this article also introduces an improved long short-term memory network Transformer model with a double-layer structure. The biggest advantage of this model is that it uses a multi attention head mechanism to fuse and generate features of distribution network feeder line loss data, and improves prediction efficiency and accuracy through parallel computing, thereby achieving efficient and accurate prediction of short-term distribution network feeder line loss.
Given the diversity and complexity of the index system for feeder line loss rate in distribution networks, as well as the shortcomings in information integrity and real-time performance of current distribution networks, in order to improve the universality and operability of the method, while balancing the sensitivity of the index to line loss and the convenience of data acquisition, this research work constructs a three-dimensional evaluation framework for feeder line loss rate based on three core perspectives: line characteristics, operating parameters, and management level, as detailed in Table 1.
Evaluation System of Line Loss Rate Index
Dimension | Index |
---|---|
Line characteristics | Line current carrying capacity |
Cable conversion rate | |
Capacity of distribution transform | |
Power supply radius | |
Operation parameter | Average electricity consumption |
Maximum load rate | |
Annual maximum current | |
Management level | Meter reading accuracy |
Equipment aging rate |
When constructing the indicator system for distribution line loss rate, the two core dimensions - line attributes and operating parameters - have a direct impact on the theoretical line loss rate, while the management factor dimension mainly affects the management of line loss rate. The specific application strategy of the indicators is explained as follows: Firstly, based on the screening principle of “high representation within the class and low correlation between classes”, two indicators, power supply radius and average power consumption, are selected from the line attributes and operating parameters, and the fuzzy C-means clustering algorithm [15] is used to finely classify the feeder lines. Secondly, to enhance the practicality of the theoretical line loss rate correction value, both the line properties and operating parameters are included in the ground state correction category of statistical calculation methods. In addition, management factor indicators are used to solve the management line loss coefficient in the optimization model of the benchmark value of the theoretical line loss rate of the feeder line.
Regarding the power supply radius. It is defined as the physical length of the line within the interval from the power point of the distribution network to its power supply end. When calculating the equivalent resistance of the feeder line, this study adopts the equivalent resistance method for processing [16]. Where, Line current carrying capacity. This indicator measures the maximum current value that a distribution network feeder can safely pass through, directly related to the current carrying potential of the line and the potential increase in line loss rate. Cable laying ratio. Defined as the proportion of cable length to overall length in the feeder line, distinguished from overhead lines, cable lines are prone to eddy current losses due to bundling effects, resulting in a higher overall line loss rate. Therefore, the increase in the proportion of cable laying is often accompanied by an increase in line losses. Rated capacity of distribution transformer. The design capability of the distribution transformer equipped on the feeder line is affected by the transformer model in its economic operating range. By optimizing the load rate to the economic range, the line loss can be minimized and the efficiency can be maximized. Annual average power supply. As the average value of annual active power supply, this indicator reflects the load level of the distribution network and has a certain correlation with the line loss rate. Annual peak withstand current. Under the premise of not endangering the safe operation of the equipment, the maximum current value that the feeder can temporarily withstand, exceeding this limit will cause equipment failure. Based on the traditional equivalent resistance method, effective calculation of feeder line losses in distribution networks can be achieved [17]. In the equation, Annual peak load rate. This indicator measures the maximum operating load of a transformer during normal operation throughout the year. In this study, it specifically refers to the highest load rate value recorded over an annual time span, and its calculation formula is defined as the ratio of the actual maximum load power of the transformer to its rated capacity [18].
Where,
Given the complexity of the distribution network structure, the non-uniformity of equipment operating parameters, the diversity of operating modes, and the limitations of big data analysis methods, the current efficiency of data application is low, and the setting of benchmark values for line loss rates mainly relies on historical data or management experience, lacking sufficient theoretical support. In response to this issue, this article proposes an optimization design framework that integrates fuzzy clustering and long short-term memory network Transformer, aiming to improve the accuracy of setting benchmark values for line loss rate. Figure 1 provides a detailed implementation process of the framework.
Firstly, extract feeder samples from the distribution network and comprehensively collect various parameters and feeder network topology information in the three-dimensional line loss rate index system. Subsequently, the fuzzy C-means long short-term memory network Transformer model was used to perform clustering analysis on the samples, and the optimal clustering results were determined through validity index testing, thus dividing various ground state feeders. On this basis, combined with the equivalent resistance method and statistical calculation method, the ground state correction calculation of the theoretical line loss rate is carried out based on the network topology of the ground state feeder. Finally, based on the classification principle, solve and obtain the benchmark values of optimized theoretical line loss rates for various types of feeders. This article mainly focuses on the design and implementation of the fuzzy C-means long short-term memory network Transformer model algorithm mentioned earlier. The last two processes are described in reference [19-20].

Method flowchart
Given the fuzziness of category boundaries in actual classification, this study adopts the fuzzy C-means clustering algorithm to preprocess feeder data. The specific operation process is as follows:
Where,
Long Short Term Memory Network (LSTM) [22], as a variant of Recurrent Neural Network, effectively maintains and regulates information flow by introducing three special cell state regulation mechanisms: input gate, forget gate, and output gate, while capturing long-term dependencies between data. This design alleviates the difficulties of gradient vanishing and exploding encountered by recurrent neural networks when processing training data. Specifically, long short-term memory networks can dynamically adjust their cellular state content based on input sequence information, and generate new memory states by integrating the current time step input with previous memory states, thereby ensuring the model’s ability in memory and reasoning.
Figure 2 shows the recurrent neural network architecture of the long short-term memory network, where the long short-term memory network units consist of two core states: hidden state

Recurrent Neural Network Architecture for Long Short Term Memory Networks
The Transformer architecture [23] is an innovative sequence to sequence prediction model that consists of two major components: an encoder and a decoder. The encoder is responsible for mapping the input historical feeder line loss data and meteorological information to a high-dimensional feature space, forming feature vectors that provide the decoder with accurate predictions based on contextual dependencies when generating output sequences. The encoder is composed of multiple layers stacked repeatedly, each layer containing a multi head self attention module and a feedforward neural network, and incorporating layer normalization and residual connection techniques to enhance model stability.
In the Transformer model, by performing a weighted sum operation on the historical input data sequence, the self attention mechanism can dynamically adjust the weights of various influencing factors, allowing the model to focus on any key position of the input line loss data sequence, rather than being limited to a fixed time window. As shown in Figure 3, the application of multi head attention mechanism enables the Transformer model to pay parallel attention to multiple different regions of the input sequence matrix X, endowing the model with global insight into time series and thereby improving its representation performance, making it perform better in processing global and local features of feeder line loss data sequences.
The composition of attention mechanism mainly includes three parts: matrix

Multi head Attention Structure Diagram
The prediction of feeder line loss is a task involving multivariate temporal analysis. Although long short-term memory networks have shown reliability in capturing multivariate temporal features, their multi gate computing mechanism increases computational complexity and is prone to problems such as long training time, high risk of overfitting, and difficulty in parallelization when processing long time series. In addition, when long short-term memory networks process long-term information, some time periods of data may be ignored or overwhelmed due to temporal noise, affecting their predictive performance. In contrast, the Transformer model can comprehensively consider all positions of the input sequence in a single processing, without being limited by the distance between features, and its parallel computing characteristics significantly improve the training efficiency of the model [24].
In view of the above issues, this article proposes a fusion model of long short-term memory network Transformer, which combines the self attention mechanism of Transformer and the sequence modeling advantages of long short-term memory network, and can optimize and improve the accuracy of feeder line loss prediction. Specifically, a long short-term memory network encoder is used to perform preliminary encoding of input data, followed by the use of a Transformer decoder to obtain the desired output data sequence. As shown in Figure 4, by adding a long short-term memory network module before the multi head attention module of the Transformer, the algorithm performance can be improved through fusion. Among them,
The long short-term memory network layer utilizes its unique gating mechanism to finely regulate the abandonment ratio, addition amount, and output amount of multidimensional device operating factors and feeder line loss information within the cellular state, effectively extracting key temporal features from multivariate data and forming valuable hidden state vectors. Furthermore, the self attention mechanism of the Transformer model is introduced to focus on a subset of features that are highly relevant to the current prediction task, reducing the allocation of attention to non core environmental factors. At the same time, historical state information is learned and fused with current observation data to jointly predict the degree of line loss. The specific calculation process is as follows:
Where, L represents the results

Transformer Model for Long Short Term Memory
In the fusion layer architecture, the output vectors of the long short-term memory network layer are cascaded with the corresponding input vectors at different positions, and then the fused vectors are passed to the activation function layer. Selecting Gaussian Error Linear Unit (GeLU) as the activation function, dynamically adjusting its weights and biases to adapt to diverse input distributions and task requirements. This mechanism enhances the ability of the long short-term memory network Transformer model to learn and represent complex functions and features in feeder line loss prediction tasks, thereby improving the accuracy and robustness of prediction. Given that each sub layer has adopted residual connections and normalization processing, the output expression of each sub layer is:
Where,
In the Long Short Term Memory Network Transformer architecture, the first layer of the network plays the role of an encoder, responsible for transforming time series data into vector representations; Subsequently, the second layer Transformer serves as a decoder for further processing. This model first uses long short-term memory networks to extract local features of sequence data, and then captures global feature relationships through Transformers, aiming to improve the accuracy of feeder line loss prediction.
This article uses data from 1121 feeder lines in a city in Guangdong Province to verify the effectiveness of the proposed method. The data is sourced from the line loss management system to ensure accurate quality, covering theoretical line loss rate, statistical line loss rate, power supply, line length, rated capacity of distribution transformers, line operation time, and distribution transformer operation time. To enhance the reliability of the validation, the data was divided into 10 equal parts based on the cross validation principle, and divided into training set, validation set, and test set in a ratio of 7:2:1. The final model estimation result was taken as the average of 10 cross validations.
To evaluate the accuracy of the model’s estimation of line loss rate, this paper introduces Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as error measurement indicators. RMSE has a high sensitivity to outliers and can effectively highlight large deviation data; MAE avoids the cancellation of positive and negative errors, ensuring the comprehensiveness of error assessment. The calculation formulas for the two indicators are expressed as follows [25]:
Where,
To verify the superiority of the method proposed in this article, a linear weighted model was also introduced as a reference, whose expression is:
Where,
This study adopts two strategies for weight allocation. The first strategy is Weighted Sum (WS), which sorts the estimation errors of each basic estimation model, selects the four models with the smallest errors, and assigns weights of 0.45, 0.35, 0.25, and 0.15 in descending order. The second strategy utilizes differential evolution optimization weighting (DEOW) algorithm, which constructs an optimization function aimed at minimizing estimation error and uses differential evolution optimization techniques to search for the optimal weight combination. The specific expression of the optimization function constructed is:
The following is a comprehensive evaluation of the performance of each model on three types of feeder line test sets. Table 2 provides a detailed comparison of the root mean square error (RMSE) and mean absolute error (MAE) of each model in terms of online loss rate estimation. Furthermore, Figure 5, Figure 6, and Figure 7 respectively present the distribution of line loss estimation errors for each type of feeder line in different models.
Comparison of Line Loss Rate Estimation Errors
Model | First category | Second category | Third category | |||
---|---|---|---|---|---|---|
MAE | RMSE | MAE | RMSE | MAE | RMSE | |
GBDT | 0.358 | 0.519 | 0.297 | 0.435 | 0.319 | 0.428 |
AdaBoost | 0.358 | 0.506 | 0.346 | 0.437 | 0.348 | 0.432 |
XGBoost | 0.361 | 0.487 | 0.314 | 0.469 | 0.305 | 0.371 |
WS | 0.347 | 0.472 | 0.281 | 0.396 | 0.324 | 0.409 |
DEOW | 0.334 | 0.493 | 0.276 | 0.408 | 0.351 | 0.478 |
Proposed algorithm | 0.309 | 0.426 | 0.268 | 0.375 | 0.283 | 0.365 |

Distribution of feeder line loss rate error (First category)

Distribution of feeder line loss rate error (Second category)

Distribution of feeder line loss rate error (Third category)
Firstly, the predictive performance of five basic estimation models for various types of feeders was explored. In addition to the algorithm mentioned in this article, the WS algorithm model has the lowest root mean square error (RMSE) when predicting the first and second types of feeders. However, when predicting the third type of feeder line, the performance of the WS algorithm model is not outstanding. On the contrary, the RMSE of the XGBoost model reaches its minimum, with a specific value of 0.37, and its average absolute error (MAE) is also the smallest value among this type of feeder line. When predicting the third type of feeder line, the XGBoost model’s error is mainly concentrated in the range of 0.21-0.32, and there are few outliers. However, when predicting the first and second types of feeders, XGBoost’s performance is not optimal, indicating that a single basic estimation model has limitations in predicting specific types of feeders. Next, we compared the linear weighted model with the baseline estimation model. When predicting the first and second types of feeders, the RMSE and MAE of the linear weighted model have decreased compared to the optimal baseline estimation models for various types of feeders. This is due to the quadratic integration of the prediction results by the linear weighted model, which improves the prediction accuracy. However, when predicting the third type of feeder, the overfitting problem caused by the small sample size results in lower predictive performance of the linear weighted model compared to the optimal baseline estimation model. This result indicates that although linear weighted models can improve prediction accuracy to some extent, there are still shortcomings when dealing with small sample data. In summary, neither a single estimation model nor a linear weighted model can achieve accurate prediction of line loss rates for multiple types of feeders. The prediction ability of a single model on a specific type of feeder is limited. Although the linear weighted model can further improve the prediction accuracy, it is prone to over fitting when dealing with small sample data. Therefore, we need to explore new prediction methods to achieve accurate prediction of various types of feeder line loss rates.
The model proposed in this article effectively alleviates the overfitting problem and improves the estimation accuracy of various types of feeder line loss rates. Specifically, in the estimation of line loss rates for the first and second types of feeders, the root mean square error (RMSE) of this model was reduced by 5.3% and 8.2% compared to the optimal base model for each type of feeder, respectively. This significant advantage is mainly attributed to the application of expert neural networks in the model, which can deeply explore the nonlinear relationship of line loss data, thereby achieving higher estimation accuracy than the base model and linear weighted model for feeder categories with sufficient training samples. Although the RMSE of this model is slightly higher than that of the XGBoost model by 1.7% on the third type of feeder with less training data, compared to the linear weighted model, its RMSE is reduced by 6.81%, demonstrating a certain degree of resistance to overfitting. Furthermore, from the perspective of Mean Absolute Error (MAE), compared with the linear weighted model, our method reduces MAE by 6.9%, 5.1%, and 11.5% respectively when estimating three types of feeders (as shown in Table 2). Meanwhile, compared with the optimal basic models of various types of feeders, the MAE of this method also decreased by 8.7%, 7.6%, and 5.2%, respectively. In addition, it can be seen from the error distribution diagrams (Figures 6-8) that the method proposed in this paper has a more concentrated and biased error distribution towards low error areas in estimating the line loss rate of each type of feeder line, and the median of the error distribution is close to the average, which further verifies the superiority of the proposed model in terms of accuracy and stability.
In order to comprehensively evaluate the feasibility of our method, we also compared it with the methods in references [26] and [27]. The specific comparison results are shown in Table 3.
Comparison of Estimation Errors of Different Algorithms
Method | First category | Second category | Third category | |||
---|---|---|---|---|---|---|
MAE | RMSE | MAE | RMSE | MAE | RMSE | |
Reference [26] | 0.365 | 0.478 | 0.296 | 0.396 | 0.317 | 0.396 |
Reference [27] | 0.359 | 0.468 | 0.287 | 0.387 | 0.328 | 0.413 |
Proposed algorithm | 0.317 | 0.446 | 0.273 | 0.371 | 0.287 | 0.375 |
According to the data in Table 3, the model proposed in this paper exhibits lower error levels in root mean square error (RMSE) and mean absolute error (MAE) compared to the other two algorithms. For example, in the estimation of the first type of feeder line, the MAE of our method is reduced by 13.15% compared to the method in reference [26]. In the estimation of the third type of feeder line, the RMSE of our method is reduced by 5.30% compared to literature [27]. This advantage can be attributed to the fact that the meta estimation models used in references [26] and [27] are traditional machine learning models, which have weaker data mining capabilities compared to the hybrid expert system (MoE) used in this paper. In summary, compared to existing line loss estimation models, the two-layer estimation model based on ensemble tree model and hybrid expert system proposed in this paper demonstrates better performance in estimating statistical line loss rates for different types of feeders.
This article proposes an innovative method for estimating feeder line loss, which combines fuzzy C-means clustering and long short-term memory network Transformer model. The specific contributions are summarized as follows: 1) A three-dimensional feeder line loss rate indicator system has been constructed, which covers three dimensions: line attributes, operating parameters, and management factors. The system fully considers the contribution rate of each indicator to the feeder line loss rate and the ease of data acquisition, ensuring the practicality and operability of the indicator system. 2) In response to the problem of a large number of feeders in the distribution network, this paper adopts the fuzzy C-means clustering algorithm, combined with the massive multi-source big data of the power grid monitoring system, to effectively divide complex feeders into a finite number of typical types. This method is based on the principle of similarity of similar feeders, setting theoretical benchmark values for line loss rates for various types of feeders, significantly reducing the dimensionality of the research object and greatly improving work efficiency. 3) A dual layer architecture long short-term memory network Transformer model is introduced, which generates efficient prediction vectors by combining multiple attention heads with feeder line loss data feature quantities. The parallel computing characteristics of the attention head enable the model to efficiently and accurately predict short-term feeder line losses.
Future research can focus on the following key areas to deepen development: 1) Data preprocessing and quality control improvement: Committed to developing new data preprocessing strategies to enhance data integrity and quality. Specific measures include innovation in data cleaning techniques, optimization of interpolation methods, and refinement of outlier detection algorithms, aimed at ensuring the accuracy and reliability of clustering and prediction models. 2) Clustering algorithm and parameter adaptive innovation: Explore and apply cutting-edge clustering techniques such as adaptive fuzzy C-means clustering, density clustering, etc., aiming to improve clustering accuracy and stability. At the same time, in-depth research on adaptive parameter selection mechanisms aims to minimize the potential impact of human intervention on clustering results. 3) Model generalization performance enhancement: By integrating more feature variables, constructing complex network architectures, or adopting ensemble learning strategies, the model’s generalization ability can be enhanced. In addition, exploring model transfer learning techniques and utilizing feeder data across regions and conditions can enhance the adaptability and prediction accuracy of the model in new environments. 4) Strategy for improving computational efficiency: Develop efficient algorithms and combine them with hardware acceleration technologies such as distributed computing frameworks and GPU acceleration to reduce the computational burden of Transformer models. At the same time, exploring model compression and pruning techniques to reduce model size, further accelerate the calculation process, and improve overall efficiency.