Accesso libero

Application of Bayesian Network and Support Vector Machine in Evaluation and Prediction of Network Security Situation

 e   
27 feb 2025
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

Introduction

Bayesian network is very successful as a tool in the field of decision-making.We pay more attention to the independence of data, and we provide an effective Bayesian network learning algorithm[1]. This paper describes a novel Bayesian method. A method for evaluating Bayesian network algorithm is also compared with various methods [2]. This paper concludes that Bayesian networks are easier to construct and explain. It also shows a model that combines domain knowledge with Bayesian method to learn new knowledge [3]. Score measurement is a necessary process of Bayesian network algorithm. The fitting degree reflecting the structure and data is the score measurement calculation [4].Bayesian method can avoid data fitting, and we describe the technology of network construction parameters[5]. Bayesian network model can be used to represent the analysis dependency structure. This paper presents a new method to analyze data sets when the amount of available data is small [6]. This paper introduces support vector machine in detail, We can provide an opportunity for support vector machines to work differently from neural networks[7]. To solve the disease problem, we adopt the support vector machine method. Two other types of tissues and cells were also analyzed to demonstrate how the vector machine simulates supporting multiple directions works[8]. The accuracy of support vector machine algorithm is obviously higher than other algorithms. Therefore, this paper uses support vector machine algorithm to carry out relevance feedback image retrieval [9].In order to improve the upper bound of vector machine operation, we use conformal mapping to enlarge and separate boundary surfaces at oblique angles to improve the separability between classes[10]. We find a way to solve the situation problem of network-related security. According to the relevant tests, the accuracy of this algorithm is higher than other algorithms[11].In order to better analyze the relevant variation law of grey parameters adapted to the interval, we need to use the color scale change filling method to improve the prediction accuracy [12]. The purpose of using more accurate temporal method to predict network security is to improve the accuracy of security situation. We can use genetic algorithm and wave network to predict [13]. We will use the new security situation to evaluate the target network model, and choose different experiments to provide more detailed information about the complex network situation [14]. Based on RBF neural security situation analysis, the detection after prediction can be improved by different stages of operation means, and the sensitivity of perception interval can be effectively improved [15].

Operation Promotion Mode of Perceived Security Degree in Network Situation Layer
Intrusion detection technology

Response module, analysis engine and data source are three basic components of intrusion detection system. According to the detected data sources, intrusion detection systems can be divided into three categories. The first type of data is usually the event log of the monitored system, which is highly dependent, poor in real-time and portability. Not applicable to Web-based protocols. The last type is often configured as distributed mode. Its detection target is a variety of data sources, and the detected data is rich. The first two methods take the essence and discard the dross, which overcomes the defect of single structure, but the difficulty and overhead of network management are relatively large.

Data mining technology

Data mining is a technology of analyzing each data from a large number of data and finding its rules, the conversion rules mainly including the data preparation stage are very different, and there are about three differences. Object data mining is a domain technology with strong operability. Data mining discovers hidden clues through the appearance of events, thus discovering the potential laws and connections between seemingly unrelated things, thus realizing the prediction of the future.

Data fusion technology

Like situational awareness, the concept of data fusion was first put forward for military application. It began in the early 1970s, and this technology has made real progress and development in 1980s.

Data fusion technology has developed rapidly in recent decades. Researchers at home and abroad are doing research on this technology, and a large number of research results can be seen in various academic conferences and journals. However, the research on this technology in China was mainly around 1980s. Fusion technology is to comprehensively analyze the received data. Data processing according to specific standards. Data fusion technology realizes the fast and effective processing of multi-channel information in modern battlefield. It analyzes and synthesizes multi-source data and provides services for specific task decision-making.

Situation prediction

Under the increasingly complex network environment, if we can predict the future security situation and changing trend of the network, we can provide guidance for network managers to choose the correct treatment scheme, which can enhance the initiative of network defense and reduce the risk as much as possible.

Situation prediction is based on collecting, transforming and processing historical and current situation data sequences, By establishing a mathematical model to find the development law and change of situation data, and then similarly reasoning the future development trend, forming scientific judgment, prediction and estimation, thus making qualitative or quantitative description and issuing early warning, so that managers can get accurate planning and decision-making. The specific implementation process is as follows: firstly, we use technical methods to process and transform the obtained historical situation data sequence, then use mathematical models to discover and identify the relationship between security situation data sequences, establish equation relations including time variables and situation variables, and finally get the situation changing with time.

Two-class support vector machine
Optimal hyperplane

In statistical learning, VC dimension is an index to measure the ability of classification function set, indicating that the function set can correctly classify the maximum N value of any data set containing 2N samples, reflecting the generalization ability of the model. Generalization ability is the key to evaluating the advantages and disadvantages of the algorithm. The stronger the algorithm, the better the performance of the model on unseen data. Its quantitative definition is shown in equations (1) and (2). R(α)Remp(α)+ψ(Nh) ψ(Nh)=h(ln(2Nh))ln(η4)N

Where ψ(Nh) is called the confidence range, h refers to the VC dimension, and N represents the number of samples.

Optimizing super-layers can reduce the structural risk of support vector machines.Given a hyperplane wTx + b = 0. If the hyperplane can satisfy formula (3): y={ 1,wTx+b1 1,wTx+b1

Let the interval between hyperplanes wTx + b = 1 and wTx + b = −1 be Δ. Then the hyperplane is a Δ-interval classification hyperplane. The calculation formula is as follows: hmin([R2Δ2,n])+1

The key dimensions of the data are extracted and screened according to the radius spherical surface during collection. The results show that the VC dimension decreases with the increase of classification interval, and increasing the interval can enhance the generalization of the model. Figure 1 shows that the interval Δ is expanded to improve the separability of samples, and even samples with unknown labels can be accurately classified.

Figure 1.

Linear plane optimal equation diagram

Hard spaced support vector machine

The analysis in Figure 1 shows that the surface hyperplanes generated by the three classifications are l1, l2 and l, respectively, where l is between l1 and l2, and the expression of hyperplane l is given by equations (5) and (6). l1:wTx+b=1 l2:wTx+b=1

The classification plane level of the intermediate region is wTx + b = 0.

Let xi and xj be any point on l1 and l2, then the distance Δ between l1 and l2 is the projection of interval line segment xixj to rule vector w facing different objects. Equation (7). d=wT(xixj)/w

Equations (5) and (6) are combined to obtain Equation (8). d=2w

The risk in discrimination decreases with confidence, as shown in equation (9): maxw,b2w wTxi+b1,yi=1 s.t.wTxi+b1,yi=1

The objective problem is transformed into a convex form, and equation (10) shows: maxw,bw22 s.t.yi(wTxi+b)1

Citing the dual multiplier A to obtain the Lagrange function, equation (11) shows: L(w,b,α)=12w2+i=1Nαi(1yi(wTxi+b))

Calculate the partial derivatives of w and b as follows: Lw=wi=1Nαiyixi=0 Lb=i=1Nαiyi=0

Calculate the dual problem of the original equation, as shown in equation (14): maxα(i=1Nαi12i,j=1Nαiαjyiyjxixj) s.t.i=1Nαiyi=0 αi0

Assuming α* optimal dual solution of the normal vector of the decision equation is w=i=1Nαi*yixi , the offset is b*=yjwTxjj{j|0<αj*} , the result of the decision function is f(x)=sgn(wT*x+b*) .

Under the nonlinear modulation of hierarchical change, it is transformed into the dual equation of the objective equation. The purpose is to extend the relationship between samples to the problem of linear classification construction. When linear indivisibility occurs, it is necessary to distribute samples on both sides of the correct interval, so as to reduce the risk.

ξ can be introduced, which indicates the degree to which the training sample can be tolerated by “error” partition. The constraint condition becomes yi(wTxi+b)1ξ , and the sum of the degrees of “error” division is i=1Nξi . The support vector equations that guarantee maximum gaps and small mismatches are as follows: minw,bw22+Ci=1Nξis.t.yi(wTxi+b)1ξ0,i=1,,N

C is the penalty coefficient. The calculation formula of introducing multipliers into Lagrange equation is: L(w,b,α,β)=12w2+i=1Nαi(1yi(wTxi+b))+Ci=1Nξii=1Nβiξi

Formulas (17) (18) (19) can be obtained by calculating partial derivatives of w, b and ξi. Lw=wi=1Nαiyixi=0 Lb=i=1Nαiyi=0 Lξi=Cαiβi=0

Substituting equations 17-19 into equation 16 gives the solution of the dual equation, as follows: maxα(i=1Nαi12i,j=1Nαiαjyiyjxixj) i=1Nαiyi=0 s.t.0αiC

Nonlinear SVM

Most data are nonlinear, and traditional machine learning often transforms nonlinear data into linear problems by mapping them to high dimensions. However, this method is easy to cause dimension disaster and increase computational complexity. The kernel function is a mapping tool conforming to equation (21). K(x,z)=<ϕ(x),ϕ(z)>

Where ϕ represents a mapping from X to Hilbert space F.

The original objective equation is transformed into a high-dimensional space size map as equation (22): minw,bw22+Ci=1Nξi yi(wTϕ(xi)+b)1 s.t.ξi0,i=1,,N

In the process of analyzing the target equation, the dimension of analysis ϕ(xi) is further referred to and the data width is changed by operation, the amount of calculation is surging and it is difficult to solve. Therefore, the original equation is modeled and processed first, and then the control even form is transformed, as in Equation (23): F=maxα,β,γ12i=1Nj=1NyiyjαiαjK(xi,xj) +12s=1ut=1u(βsγs)(βtγt)K(xs*,xt*) +i=1Ns=1uαiyi(βsγs)K(xi*,xs*) i=1Nyiαi+s=1uβss=1uγs i=1Nyiαi+s=1uβs+s=1uγs=0 0αiC1,i=1,,N s.t.0βs,γsC2,s=1,,u

Where K(xi,xj)=<ϕ(xi),ϕ(xj)> . Let α* be Solving the dual decision equation is f(x)=sgn(α*yiK(xi,x+b*)) .

Analysis of Situation Assessment Results

The situation assessment experiment focuses on extracting good data representation and low-dimensional features from original data, and studies the influence of low-dimensional features extracted by Sartre algorithm on the accuracy and time of two classifications (normal and abnormal) and multiple classifications (specific attack types) of network data sets, thus verifying the effectiveness of Sartre algorithm for situation assessment.

In order to evaluate the proposed evaluation model and compare it with the models proposed by other researchers, the unified evaluation criteria in data mining will be selected and expressed by two-dimensional confusion matrix, as shown in Table 1.

Evaluation Indicators

Sample classification attack normal
Real attack TP FN
normal FP TN

The accuracy formula is shown in formula (24). AC=TP+TNTP+FP+FN+TN

The recall formula is shown in formula (25). R=TPTP+FN

The precision formula is shown in formula (26). P=TPTP+FP

The F1-score formula is shown in formula (27). F1=2×R×PR+P

The detection rate formula is shown in formula (28). DR=TNTN+FN

FalseAlarmRate (FAR): Indicates the percentage of normal data being misreported as attack data to the actual classified attack samples, as shown in Equation (29). FAR=FPTP+FP

BN Algorithm Parameter Selection

In the constructed environment, a DNN network and BN algorithm are used for experimental analysis. The purpose of the experiment is to study the efficiency of the improved algorithm and analyze the efficiency of BN model, so as to analyze the accuracy and time performance of situation assessment.

a. Softmax score is added to the output layer, and the number of hidden layers and neurons is determined by comparing the scores.

According to Table 2, with the increase of hidden layers, the higher the classification accuracy, but the more time it takes. In the big data environment, the training time will explode, so too many hidden layers are not feasible. When the number of hidden layers is 6, the accuracy and detection rate are the highest, the false alarm rate is the lowest, and the classification effect is better. However, the training time is 53.69 s, which is more than twice as long as that when the number of hidden layers is 3.

Benchmarking results of BN indicators under hidden layer

Hidden layer structure AC(%) DR(%) FAR(%) ttrain(s)
[110,90,80,60,40,20] 99.31 99.28 0.0962 53.69
[105,90,80,60,40] 99.15 98.99 0.108 49.03
[100,80,60,40,20] 98.98 98.86 0.121 44.32
[100,70,50,30] 98.66 98.33 0.135 39.03
[110,70,40,30] 98.55 98.1 0.131 36.26
[90,60,40] 95.74 98.59 0.159 25.19
[80,60,30] 93.68 93.01 0.163 18.98
[90,40] 87.89 87.39 0.305 16.18
[70,30] 88.69 87.69 0.298 13.25
[70] 83.36 82.26 0.361 6.25
[60] 86.59 86.32 0.359 5.45

If the number of hidden layers is the same, set different numbers of neurons for comparison and classification, as shown in Table 3.

Comparison of Indicators

Hidden layer neuron AC(%) DR(%) FAR(%)
[110,90,80] 93.56 93.69 0.256
[90,80,60] 92.13 92.37 0.159
[100,70,40] 95.36 95.21 0.134
[90,65,40] 92.39 92.56 0.201
[90,90,90] 85.36 85.69 0.31

According to Table 3, when the number of neurons is set to [100,70,40] , the accuracy rate and detection rate are 95.36% and 95.21%, respectively, and the false alarm rate is the smallest.

b. Influence of hyper-parameters on performance

Through the previous analysis of BN algorithm, when extracting features, appropriate super-parameters are beneficial to extract low-dimensional features, thus affecting the classification accuracy and model training time.Use the training model to select the best parameters.

According to Table 4, through experiments, the optimal values of hyper-parameters in two-class and five-class cases are determined.

BN Sparse Parameters

Parameter Value Experimental optimum value
ρ 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 Two categories0.6, Five categories0.4
λ 0.01, 0.001, 0.000001, 0.05, 0.0005, 0.000005 Two categories0.000005, Five categories0.001
β 1, 2, 3, 4, 5, 6 Two categories3, Five categories3
epoch 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000 Two categories1000, Five categories600

c. Selection of BN Parameters

For the selection of hidden layers, 1-5 layers are selected in turn for experiments, and the structure of hidden layers is shown in Table 5.

BN Hidden Layer Structure

Number of hidden layers Two-class network structure Five-classification network structure
1 [60] [80]
2 [80,50] [90,50]
3 [80,50,30] [90,60,40]
4 [90,70,50,25] [100,80,50,30]
5 [100,70,45,30,10] [100,80,60,45,25]

When the number of hidden layers is different, the detection rates and false alarm rates of Class 5 and Class 2 are shown in Tables 6 and 7, respectively:

Comparison of classification results under five categories

Number of hidden layers DR(%) Normal Probe Dos U2R R2L FAR(%)
1 97.56 98.36 97.15 97.89 24.66 84.58 2.44
2 98.23 98.64 98.01 98.12 25.39 85.66 1.77
3 99.44 99.5 99.09 99.58 36.99 90.12 0.56
4 99.36 99.36 98.22 99.65 24.55 87.69 0.64
5 98.89 98.89 99.01 98.75 23.68 86.99 1.11

Comparison of classification results under two categories

Number of hidden layers DR(%) Normal Attack FAR(%)
1 98.34 98.5 97.15 1.64
2 99.45 99.5 99.11 0.55
3 99.33 99.33 98.76 0.67
4 99.01 99.3 98.71 0.99
5 98.21 99.09 98.01 1.79

Table 6 shows that the total detection rate is best with 3 hidden layers. With the increase to 4 layers, the DoS detection rate increased, but the U2R and R2L detection rates decreased, and more layers resulted in a decrease in both the overall detection rate and other types of detection rates.

According to Table 7, when the number of hidden layers is 2, the total detection rate is the highest, the detection rate of normal type and attack type is also the highest, and the false alarm rate is the lowest. Therefore, in the case of two classifications, the number of hidden layers is set to 2.

d. Selection of Final Parameters of BN Algorithm

After many experiments, BN and DNN network parameters are set as shown in Table 8 and Table 9.

DNN Parameters

Parameter category Value
Network structure [122,80,50,2],[122,90,60,40,5]
Activation function Relu,Sigmoid,Softmax
Optimization function Adam
Mini-Batch Size 128
Regularization L2=0.002, Dropout=0.5
Learning rate 0.01
epoch 200,600

BN Parameters

Parameter category Value
Network structure [122,100,70,40,122]
ρ 0.6, 0.4
λ 0.000005, 0.001
β 3
Optimization function Adam
epoch 600, 1000
Learning rate 0.001
Batch Size 500
Activation function Sigmoid
Situation assessment based on open data set

a. Influence of low-dimensional features on binary classification model

After determining the parameters, we continue to analyze the influence of data on classification. Compare the performance of BN-based algorithm and single DNN under training set and test set respectively, as shown in Figure 2.

Figure 2.

Comparison of two classification performances

As shown in Figure 2, compared with the DNN network, although the BN algorithm achieves a 0.125% accuracy improvement on the training set, the increase is small, which may imply that there is an overfitting phenomenon in the training process. However, on the test set, the performance of the BN algorithm is significantly better than that of the DNN network, and the accuracy rate is greatly improved by 4.09%. This result strongly proves the advantages of the BN algorithm in extracting effective features of the intrusion data set, thus effectively improving the classification accuracy.

The comparison on the test set is shown in Table 11.

According to Table 10 and Table 11, compared with DNN network, the training time and test time of BN algorithm on training set are reduced by 22.17% and 75.96% respectively, and the training time and test time on test set are reduced by 28.28% and 73.89% respectively. Experiments show that SAE algorithm can reduce the training and testing time of DNN network, which is very important to measure the efficiency of network security situation assessment.

Time comparison between DNN and BN on training set

Algorithm Training time(s) Test time(s)
DNN 7236.421 98.258
BN 5632.459 23.618

Time comparison between DNN and BN on test set

Algorithm Training time(s) Test time(s)
DNN 912.365 18.984
BN 654.325 4.957

b. Influence of low-dimensional features on five-classification model

In order to verify the superiority of BN algorithm, similar to the experimental process of binary classification, the performance of BN algorithm and DNN algorithm are compared.

According to Figure 3, in five classifications, all performance indexes of BN algorithm are better than DNN network on the test set, and the accuracy rate is improved by 4.42%.

Figure 3.

Comparison of five classifications performance between DNN and BN

At the same time, the training and test time of the algorithm in five categories are counted. The time on the training set is shown in Table 12, and the training time and test time on the test set are shown in Table 13.

Time comparison on training sets

Algorithm Training time(s) Test time(s)
DNN 49567.01 354.983
BN 26451.2 58.145

Time comparison on test sets

Algorithm Training time(s) Test time(s)
DNN 3215.332 63.982
BN 1249.547 6.487

According to Table 12 and Table 13, compared with DNN network, the training time and test time of BN algorithm on training set are reduced by 46.64% and 83.62% respectively, and the training time and test time on test set are reduced by 61.14% and 89.86% respectively. Generally speaking, the proposed BN algorithm is more efficient than DNN network in five classifications.

c. Comparative analysis with other algorithms

The accuracy pair is shown in Figure 4, the recall pair is shown in Figure 5, the F1-score pair is shown in Figure 6, and the accuracy pair is shown in Figure 7

Figure 4.

Comparison of accuracy rate P

Figure 5.

Comparison of recall rate R

Figure 6.

F1-score comparison

Figure 7.

Accuracy AC comparison

According to figs. 4, 5, 6 and 7, the detection rates of U2R and R2L are relatively low in SVM and DT algorithms. This is because the two types of data in the training set are relatively small, which leads to very little feature information, so the detection rate of this algorithm for these two types of attacks is not very high. Overall, the simulation results show that BN algorithm is superior to other algorithms in accuracy, accuracy, recall and F1 value, and improves the accuracy of a few classifications, so BN algorithm is effective.

According to the above analysis, the same number of test samples are randomly selected in the test set, and combined with the proposed situation quantification method based on attack probability, the network security situation is quantitatively evaluated, and the security situation level is classified. The results are shown in Table 14.

Situation Assessment Values and Grade Division of 10 Groups of Data

Sample number Actual value Evaluation value Situation grade Operational status
1 0.19 0.18 Safety Basically normal
2 0.33 0.35 Low risk Mild influence
3 0.39 0.4 Low risk Mild influence
4 0.66 0.63 High risk Major damage
5 0.05 0.06 Safety Basically normal
6 0.52 0.52 Moderate risk Medium threat
7 0.19 0.19 Low risk Mild influence
8 0.36 0.37 Low risk Mild influence
9 0.59 0.55 Moderate risk Medium threat
10 0.18 0.15 Safety Basically normal

The calculated evaluation value is similar to the ideal output of the actual test sample, and the error is small, and the situation level and running status accurately match the evaluation table, which shows that the algorithm is better than the traditional method.

The situation assessment results obtained by using SVM, DT, DNN and BN are shown in Figure 8

Figure 8.

Situation results under different algorithms

According to Figure 8, compared with the actual situation value, the situation evaluation value obtained by BN, DNN, SVM and DT algorithm has a higher degree of fitting with the actual value curve, followed by DNN network, the situation curve under DT algorithm is higher than the actual situation curve as a whole, and the situation curve under SVM algorithm is lower than the actual situation curve as a whole.

According to Table 15, from the classification time, BN algorithm has the least classification time, DNN network has the longest running time, and SVM running efficiency is better than DT algorithm, in which BN algorithm has improved classification time efficiency by 75.63%, situation assessment time efficiency by 34.39%, total consumption time efficiency by 73.95%, and BN algorithm has improved total time efficiency by 42.31% and 53.87% respectively compared with SVM and DT algorithms.

Comparison of operational efficiency under different algorithms

Method Classification time(s) Evaluation time(s) Total time(s)
BN 28.5753 3.2636 31.8339
DNN 117.2427 4.9744 122.2171
SVM 50.5316 4.6532 55.1848
DT 64.6728 4.3367 69.0095

Simulation results show that using BN algorithm to extract effective features of network data sets to achieve dimension reduction can effectively reduce the time of classification and situation assessment, and the evaluated situation value is close to the actual value, which can be used as the basis for situation assessment. Secondly, the deep neural network DNN and shallow learning algorithms SVM and DT have better evaluation results, but the running time is too long and the efficiency is low.

To sum up, the improved BN situation assessment algorithm is better on NSL-KDD dataset, which can effectively improve time efficiency, has certain feasibility, and still has a lot of room for improvement.

Conclusion

The rapid development of Internet technology, the network has spread all over people’s lives, and the insecurity factors caused by the network are also increasing, thus causing more and more harm, which has brought serious influence on people’s lives. In order to increase the security of the network, this paper uses Bayesian network and support vector machine to evaluate and predict the network security situation. This method has played a very positive role in the emergency response capability of network security. It improves the prediction speed and accuracy of security situation, and it is very important for network security situation awareness technology in various fields, so it is a hot research content at present. Its main research direction is to improve the speed and accuracy of prediction. This paper puts forward an improved optimization algorithm and prediction model under the existing technical premise.

Lingua:
Inglese
Frequenza di pubblicazione:
1 volte all'anno
Argomenti della rivista:
Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro