The electric power communication network has become an indispensable vital network to support the electric power system. The backbone optical transmission network of the power companies in each network and province is getting larger and larger, and more and more business information is carried. How to master the network proficiently and ensure the safe and stable operation of the communication network has become a complex problem in daily operation and management. At the same time, the power business has increasingly higher requirements for fault analysis, fault location, and fault handling capabilities after a networking accident occurs. This requires operation and maintenance personnel to be familiar with the system’s various alarms and fault states and equipment and to operate the network management system proficiently [1]. In addition, modern power business management also requires power maintenance personnel to analyse and deal with accidents. At the same time, they are required to improve their ability to deal with accidents and test the effectiveness of anti-accident plans through repeated communication accident drills. This requires the support of the fault simulation system.
At present, power system protection is mainly carried on transmission networks such as SDH and OTN. To ensure SDH network simulation process monitoring, process analysis, and process management, we need to propose models and methods for network alarm correlation analysis and fault handling procedures required for fault management. Some methods in the existing research focus on the correlation method between alarms. It mainly includes distributed association rules, fuzzy rule association, and hierarchical attribute similarity [2]. However, these methods only pay attention to the relationship between alarms and ignore the root cause of the alarm and the interaction between network elements. Regarding the relationship between alarms and faults, some scholars have briefly analysed the relationship between faults and alarms using association rule tools. Furthermore, some scholars have further proposed various SDH network alarm and fault correlation analysis methods from the perspective of optimising association rules.
However, these studies are mainly technical discussions and lack verification at the simulation system level. For example, in the fault simulation of the SDH network, some scholars designed a propagation simulation process for the correlation process between alarms. But this does not involve correlation analysis of failures.
However, current related research focuses on network fault alarm correlation and lacks theoretical support and practical business views. Therefore, this paper proposes a business-driven method first of all. Next, the article analyses possible faults and their impact on the upper layer from the bottom to the top layer from the business perspective [3]. This is a process in which a fault affects the cascade. After that, the article focuses on establishing a correlation analysis model between failure causes and alarms and a mapping model of the impact level of alarms on the business. Finally, a set of simulation systems is implemented based on the above simulation architecture, and the validity of the simulation architecture is verified through the inversion of the accident process.
The current hierarchical structure of the electric power communication network includes the optical cable layer, the transmission layer, the data layer, and the business layer. A business-driven power communication network alarm-fault correlation analysis is proposed in this paper. This article first needs to build a complete mapping and analysis model between the various layers of the power communication network. According to the scenario proposed in this paper, the optical cable layer mainly targets the specific optical cable at the bottom layer. The transport layer is mainly oriented to the transport network based on SDH and OTN technologies. The data layer mainly constructs various data networks based on IP technology carried on the transmission network [4]. Finally, the business layer refers to various systems such as AC/DC co-control and wide-area security in system protection. This article intends to build the interaction relationship between faults and alarms between the various layers of the network from the bottom-up from the business perspective. The details are shown in Figure 1 below.
To achieve business-oriented unified management, we must complete the end-to-end intercommunication between optical transport layer equipment and data layer equipment. At the business display level, it is necessary to display the alarm information and collaborative analysis results of the data layer and the transmission layer when an alarm occurs. Therefore, we need to adopt a standardised northbound interface protocol while realising business concatenation [5]. We extract equipment and network alarm information and complete the docking of the data layer and transmission layer alarms to achieve unified display, unified analysis, and precise positioning at the business management level. The system is mainly divided into a three-tier and a four-tier business model.
The three-tier business model is mainly applicable to the existing 2M dedicated line business. The models of these services include fibre optic cable layer, transport layer (SDH), service layer, etc., from the bottom-up. To better understand the three-layer model, let’s take relay protection of a particular power company as an example. The schematic diagram of the three-layer business model of relay protection is shown in Figure 2.
The variable relay protection business is equipped with two independent paths, active and standby, at the bottom layer’s network layer and optical cable layer. One of the shorter orange paths is the active path, and the yellow path is the backup path. Interconnected alarms are generated when the main path of the service model, the optical cable layer, the network layer equipment, or the link carried by the backup path fails [6]. However, current alarms mainly come from the failure of information such as transmission equipment ports and multiplex sections. This does not consider the potential impact on the business when the alternate path is interrupted.
The four-layer service model is mainly applicable to the existing system protection services carried on the data network, for example, panoramic monitoring, dispatch automation, WAMS, etc. These business models include fibre optic cable layer, transmission layer (SDH), data layer (dispatch data network), business layer, etc., from the bottom-up. To better understand the three-tier model, we show a schematic diagram of the four-tier business model by analysing a particular substation dispatching automation business as an example. In Figure 3, the dispatch automation service is configured with three paths: primary, standby, and detour on the optical cable layer, SDH layer, and data layer of the bottom layer.
Based on the above two load-bearing models, we need to build an association model from the underlying fault to the business risk. At present, the correlation analysis between equipment failures and alarms and the correlation analysis methods between alarms have been thoroughly studied. However, in the field of electric power communication, there is not enough research on the influence of fault alarm and its inducement, as well as the mapping relationship between fault and business [7]. So then, we have built the alarm-fault correlation analysis model of the electric power communication network.
We assume that the total number of factors is
In the optimisation model, there are many variables, and the solution is complicated. Therefore, we need to select an appropriate algorithm to solve the problem according to the mathematical characteristics of the model.
First of all, in actual engineering, the probability of each mutually exclusive factor in each factor set is low simultaneously, and the occurrence process is primarily independent of each other. So we can ignore the high-order part of
In the optimisation model, we can prove that the problem is a continuous optimisation problem. Since the goal and constraints of the problem are both quadratic, the practical solution method is the Lagrange multiplier method [10]. To simplify the solution method, this paper proposes an extended Newton iteration method to solve the problem. The process is shown in Figure 4. The specific process is as follows:
Step 1: We set the initial solution as
Step 2: We let
Step 3: Normalise
Step 4: If |
Step 5: If
The correlation model and solution scheme between the cause and the fault alarm can be obtained by solving the above method.
Take a faulty optical fibre interruption in a robust communication network in a city on the east coast as an example. First, we list the classification and set of factors associated with the critical fault alarm (i.e. factor
As of the current statistics, the number of inducements leading to fibre interruption is 10 times. Among them, 1 time was caused by the strong wind, 0 times of the earthquake, 1 time of rat bite, 1 time of the fire, 2 times of optical fibre movement, and 5 times of construction excavation.
First, we model the inducement probability of natural factors. The article assumes that the level is 1 ~ 10, strong wind is divided into 1 ~ 12, rat biting degree is 1 ~ 4, and fire is divided into 1 ~ 5. The article assumes that the initial impact level of the earthquake is level 4, the strong wind is level 6, the rat-bite is level 2, and the fire is level 1. At this time, he was entering a level 8 typhoon and a level 1 rat biting danger. We can obtain the probability of four natural factors by applying the quantitative method of inducement probability: 0, 0.33
Human factors. We assume that the level of moving fibre is only 1, and the level of construction cut is 2, and the arrival rates of the two factors are 0.001 and 0.002, respectively. At present, moving optical fibre exists, and the construction has the possibility of level 1 cutting off. Therefore, the probability of occurrence of construction factors can be obtained as 1 −
Equipment factors. We assume that its use time is 3 years, and the service life is 5 years. Then the corresponding failure probability is 1 − 0.4
Let’s take natural factors as an example, and we assume that
Similarly, we can get that the weights of construction factors are 0.654 and 0.832, the corresponding aggregate probability is 0.402, the weight of the equipment itself is 1, and the corresponding inducement probability is 0.005.
Finally, the possible probability of occurrence of a severe fault alarm can be obtained as 0.4182. That is, the probability of fibre interruption failures is very high under current conditions. Among them, artificial factors are most likely to be natural factors, and again equipment factors. This is consistent with the historical statistical data value. Therefore, when troubleshooting the cause of the failure, we can follow this sequence to guide the operation and maintenance personnel effectively.
Services are ultimately carried on equipment and links. The equipment should include three states: operation, maintenance, and failure. The corresponding power business includes three states: normal, interrupted, and detour. The relationship between the states is shown in Figure 5 below. The overhaul is a known deterministic action, and the business path is migrated before overhaul without affecting the transmission of the business. Because the service interruption and detour caused by equipment or link failure are unknown, failure to handle the interruption in time is likely to cause various security incidents [11]. However, there is currently a lack of adequate analysis for the risks of circuitous state networks. So we will quantify and model the risk level.
In the power communication network, the business operation risk refers to the possibility of unsafe operation of the power grid due to the failure of the power communication service channel. Therefore, balancing the number of channels for essential services to reduce business operation risks will become a necessary technical means to improve the reliability of power communication network services.
Risk is defined as the product of the probability and the value of the impact after it occurs. Then the risk quantification formula
With known equipment, nodes, service importance, network topology, and service distribution, we can estimate the consequences of equipment and link failures. We map it to each of the above intervals to predict the risk level of the business.
For the four-layer business model proposed above, we build a bottom-up power communication network alarm-fault correlation model. The details are shown in Figure 6 below. It can be seen that each layer in the figure generates an alarm. The red line indicates that a fault will occur when the two ends of the line are not connected.
In the simulation system, after a failure occurs, the system displays the alarm status of the equipment and the affected business. For example, the fault display in the data layer is shown in Figure 7 below.
This paper proposes a business-driven power communication network alarm-fault correlation analysis to model the impact of network failures on services. From a business point of view, we have built an interactive correlation model for different businesses. At the same time, this article builds a business risk model based on this to map the underlying network faults to different risk levels. Through the construction of a business model and business risk model, a set of business-oriented simulation systems is built. Finally, the effectiveness of fault correlation analysis is verified through examples.