Boosting, also known as reinforcement learning, is an significant ensemble learning technique, which can enhance a weak learner whose prediction accuracy is only slightly higher than random guessing into a strong learner with higher prediction accuracy [1]. Boosting provides a new method and efficient new idea for the design of learning algorithm in the case of very difficult algorithm [2]. As a meta-algorithm framework, Boosting can be applied to nearly all of the popular machine learning algorithms to further improve the original algorithm's prediction accuracy. AdaBoost, short for Adaptive learning, is one of the most successful representatives [3, 4]. Since AdaBoost was proposed, numbers of well-known researchers in the domain of machine learning have been constantly investing in the research of algorithm-related theories. These solid theories have laid sound foundations for the successful application of the AdaBoost algorithm [5]. AdaBoost's success is not only that it is an efficient algorithm, but also that it makes Boosting from an initial conjecture into a truly practical algorithm. Several techniques used by AdaBoost, like breaking the original sample distribution, bring significant enlightenment to the design of other statistical learning algorithms as well. Meanwhile, these relevant theoretical research achievements have made great promotion of ensemble learning, such as:

The face detection based on AdaBoost by S. Yin et al. [6] has the characteristics of bidirectional data calculation and data diversity. The system adopts a parallel configurable architecture, and realizes the integral image calculation based on bidirectionality to improve the efficiency of parallel processing. Meanwhile, it realizes sub-window adaptive cascade classification for data diversity to further improve the efficiency of diverse face detection, and the maximum performance and minimum power consumption of 30 frames per second in 1080p video.

S.W. Foo et al. [7] studied the performance of the AdaBoost algorithm for updating noise estimates in sub-control-based speech enhancement. It classifies signal frames into speech and non-speech models, and calculates the power spectrum with an estimator from the time periods recognized as non-speech, and assumes it is the noise power spectrum. The experimental results show good effects.

W. Hu et al. [8] proposed an improved online AdaBoost, using an online Gaussian mixture model as a weak classifier. Moreover, they proposed a distributed intrusion detection framework using the online AdaBoost algorithm to build a locally parameterized detection model on each node. Using a small number of samples within the node, combined with the local parameter model, a global detection model is constructed on each node. The experimental results show that compared with the traditional online AdaBoost based on decision tree, the improved AdaBoost based on Gaussian mixture model has lower false positive rate and higher detection rate.

Trustworthy network is a two-layer forwarding network composed of communication terminals, programmable network switches, multi-protocol network controllers and protocol conversion gateways. The communication terminal, the programmable network switch and the protocol conversion gateway can support the both proprietary trusted protocol stack and the TCP/IP protocol open protocol stack at the same time. The multi-protocol network controller can switch the protocol stack in real time based on the network operation and maintenance requirements for communication

In Figure 1, between the link layer and the network layer, there is a network security control layer LSC (Link Security Control) sublayer in the trustworthy network. Trustworthy network encodes the data, which is in the transmission by the source end of network communication.

By introducing the LSC sublayer, the standard TCP/IP protocol stack is transformed into a proprietary trustworthy protocol stack, which can not only directly use the existing Ethernet link communication, but also be used on the operating system and network applications based on the TCP/IP protocol stack. Meanwhile, it is easy to design and develop, as a reliable communication protocol to ensure network security.

In the research of network anomaly intrusion detection, a large number of researchers mainly focus on three areas: packet characteristics, network flow behavior and user behavior [9, 10].

Packet characteristics: it mainly refers to the relevant protocols of data packets, such as TCP, UDP, source address and destination address. Types of current network attacks include [10]:

Distributed Denial of Service (DDoS). Distributed denial of service attacks refer to one attacker controls multiple machines located in different locations and uses these machines to attack the victim at the same time, or multiple attackers in different locations launching attacks on one or several targets at the same time. This type of attack is called a distributed denial of service attack, since the origin of the attack is distributed in different locations, and this can contain multiple attackers.

User-To-Root (U2R). This type of attack refers to that the attacker initially only has the general user privilege account of a node, and then obtains the root privilege of the superuser [11].

Remote-To-Local (R2L). This type of attack refers to that the attacker does not have a user account on a node initially, while uses password guessing and cracking to find the vulnerability of the victim to obtain the local access authority of the node.

Probe. The target of this attack is to collect information about the target network or port. Although this attack method will not lead to the collapse of the computer system, it is often a pre-preparation for subsequent attacks.

Network flow behavior: network flow behavior analysis is an efficient method to improve network security by aggregating flow data, ensuring behavior, and extracting corresponding flow behavior features at different geographic locations and times [12].

User behavior: through the identification of the IP address, MAC address, token and other network access authentication certificates of the equipment used by the user, the behavior of the user in the process of accessing the network is counted and analyzed, so as to discover the rules existing in the process of accessing the network, and combine these rules with the network anomaly intrusion strategy to detect the abnormal user behavior in the network [13].

In the domain of network anomaly intrusion detection, intrusion detection systems and firewall systems are usually used together. Intrusion detection mainly detects whether there is malicious access, tampering with information in the system, or even crashing the system. The methods of network anomaly intrusion detection can be mainly divided into traditional rule-based methods and machine learning model-based methods [9, 11].

Traditional methods mainly include packet filtering, application proxy and other technologies [10]. Packet filtering is the precipitation of knowledge formed by domain experts, but the rules formed by domain knowledge have a defense range delineated by the rules, and cannot defend against all attacks. In the case of a large amount of network access, the application proxy technology cannot achieve a good balance between the normal service and the security detection. In the traditional method, there is also network detection based on signal processing technology, which mainly use the general likelihood ratio to detect the abnormal signal. But this method also relies on human experience [9].

Methods based on machine learning can be divided into supervised learning and unsupervised learning [10].

The methods based supervised learning can be summarized as follows:

Method based on Support Vector Machine (SVM). SVM is a relatively simple supervised machine learning algorithm used for regression and/or classification. Basically, SVM finds a hyper-plane that creates a boundary between the types of data. This hyper-plane is nothing but a line in 2-dimensional space. With the hinge loss function, SVM computes the empirical risk and adds a regularization term to the solution system to make the optimization of the structural risk. SVM is considered as a classifier with robustness and sparsity. It can perform nonlinear classification through the kernel method, which is one of the commonly used kernel learning methods [14].

Robust Support Vector Machine Method (RSVM). With RSVMs, adding averaging techniques to standard SVMs smoothes the decision surface and automatically controls the amount of regularization [15]. Compared to standard SVM, RSVM has significantly reduced number of support vectors. As a result, RSVMs have a faster test time.

Bayesian network method. It is considered as one of the most efficient theoretical models in the domain of uncertain knowledge representation and reasoning currently. It is also known as the belief network, and is an extension of the Bayes method. A Bayesian network is a Directed Acyclic Graph. It consists of nodes representing variables and directed edges connecting these nodes. The directed edges represent the mutual relationship between the nodes, and the nodes represent random variables. The strength of the relationship is expressed by the conditional probability, and the prior probability is used for the nodes without parent nodes. Uncertain and probabilistic events can be expressed and analyzed by Bayesian network methods well. Bayesian network methods can be used for decision-making that rely on multiple control factors conditionally, and can deduct from imprecise, incomplete or uncertain information or knowledge [16].

Method based on decision tree model. Decision tree models are recursive-based methods that use some splitting criterion, such as information gain, to classify data [1].

Methods based on neural network. Neural network, or connection model, is an algorithmic mathematical model that imitates animal neural networks’ behavioral characteristics for distributed parallel information processing. It relies on the complexity of the system, and by regulating the interconnected relationship between numbers of internal nodes to accomplishes the purpose of processing information [17]. In the 1990s, limited by computer hardware resources, the training of neural networks with 2–3 layers has been very slow. However, with the accumulation of hardware resources and data, neural network methods have achieved leading results in specific tasks in many fields [18].

The methods based unsupervised learning can be summarized as follows:

Methods based on mathematical statistic. Such methods mainly evaluate the abnormality of samples through probability distribution. However, if the sample method does not satisfy the Gaussian distribution or the pre-assumed distribution, the effect of the method will be affected. The representative method is chi-square test [19].

Methods based on principal component analysis. The principal component analysis method mainly includes two scores. It mainly uses the two proportion scores of the main principal component and the secondary principal component. If it exceeds a certain threshold, it can be judged that the access is abnormal. The main advantage of this method is that it does not require any statistical assumptions about the distribution and has high computational efficiency [20]. However, since there is no specific standard for the selection of scores, the application scenario of the algorithm is limited.

Methods based on information theory. This method mainly uses the concepts of entropy, cross entropy, and information gain in information theory to evaluate whether the model is suitable for new data sets [1, 21].

Methods based on mixed model. This method mainly uses the general approximation of the mixed model to the distribution to fit the distribution. Before the rise of deep learning methods, this type of methods was widely used for anomaly detection, such as Gaussian mixture model [22, 23, 24]. If the Laplace distribution is approximated, an infinite number of Gaussian components are required, which is limited in practical applications.

The effect of supervised learning methods is generally better than that of unsupervised learning, but supervised learning requires a large amount of labeled data, while unsupervised learning does not require a large number of labeled samples [25].

This paper applies Adaboost algorithm in trustworthy network for anomaly intrusion detectionn. This paper uses a simple decision tree as the base weak learner, and uses AdaBoost algorithm to integrate multiple weak learners into a strong learner by re-weighting the samples to further improve the defense effect of the entire trustworthy network against malicious behavior in network attacks.

The flow of AdaBoost algorithm for network intrusion detection is shown in Figure 2. Firstly, preprocess the network access data after the trustworthy network authentication, and construct the decision tree model and an AdaBoost-based ensemble model. Then, use the exponential loss function to optimize and train the model to identify network attacks and normal network access. Finally, evaluate the performance of network anomaly intrusion detection based on AdaBoost on the test set. The evaluation indicator is the P-R curve and the average precision based on the P-R curve.

Generally, the verification of the LSC sublayer is based on the hash function. If the verification code or verification method is leaked, it is likely to cause the verification of the LSC sublayer to fail. Therefore, new defense mechanisms are needed. After the verification of the LSC sublayer, the data segment is parsed, as shown in Figure 3. In the intrusion detection process, the text is encoded and the data set is divided into training set and test set.

The network access data is encoded in text, and each character in the network access data is mapped to the corresponding digital format according to the code table. The specific text format will be described in Section 3 Experiments. In the process, set the input length required by the model, complete the insufficient input length, and truncate the excess length. Assuming there is a dataset {(_{1}, _{2}, …, _{n}_{1}, _{2}, …, _{n}

To verify the method effect, the data set is divided into training data and test data according to a certain proportion. The training data is used for model training, and the test data is used for model selection.

The model construction process is shown in Figure 4.

A single decision tree can divide the input data to different leaf nodes recursively according to the information gain of each node. _{i}_{i}

The construction of a single decision tree is as follows:
_{j}_{j}_{j}_{j}

AdaBoost model is constructed as follows:

Firstly, initialize data weight

Secondly, for _{m}

On the

Finally, use the following model to make predictions.
_{m}_{m}

In the model training process, the exponential loss function is mainly used to optimize the training of the model. The specific steps are as follows:

Firstly, optimize the best parameters of the

Network intrusion anomaly detection is a binary classification model, as the predicted value

Define the exponential loss function as follows:

In the

Step by step to optimize:

In the

Finally, make forward update based on the result of the decision tree.

On the test data set, evaluate the performance of AdaBoost algorithm for anomaly intrusion detection. The evaluation indicator is the P-R curve and the average precision based on the P-R curve.

To verify the effect of AdaBoost algorithm for anomaly intrusion detection, this paper uses the network message data extracted from the trustworthy network. 20,000 network access data was collected, of which 15,000 are normal network access and 5,000 are abnormal network access. In the experiment, the data set is divided into training data and test data according to the ratio of 7:3. The training data is used to optimize the model parameters, and the test data is used to evaluate the performance of AdaBoost algorithm for anomaly intrusion detection.

In the network intrusion detection, the data of normal network access and abnormal network access are often unbalanced, and the P-R curve is more sensitive to the unbalanced data set. The trend of the P-R curve and the size of the average precision are used to reflect the model's intrusion detection ability.

This experiment ran on a 4-core Intel(R) i7-4720HQ-CPU@2601Mhz laptop with 16GB memory. The data set used is a network access data set extracted from real-world trustworthy network, including 20,000 normal access records and 5,000 offensive access records.

An example of network access is presented as follows:

ff ff ff ff ff ff 00 e0 4c 81 53 57 09 ad 00 79

d5 00 00 00 08 00 45 00 01 5e d4 5f 00 00 80 11

65 30 00 00 00 00 ff ff ff ff 00 44 00 43 01 4a

ca 8e 01 01 06 00 dd 6e 8d 3a 00 00 80 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0

4c 81 53 57 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 63 82

53 63 35 01 03 3d 07 01 00 e0 4c 81 53 57 32 04

0a 01 01 1b 0c 0f 4c 41 50 54 4f 50 2d 54 52 52

4d 4e 48 44 41 51 12 00 00 00 4c 41 50 54 4f 50

2d 54 52 52 4d 4e 48 44 41 3c 08 4d 53 46 54 20

35 2e 30 37 0e 01 03 06 0f 1f 21 2b 2c 2e 2f 77

79 f9 fc ff

Preprocess the access data firstly. According to Figure 3, including the LSC sublayer, the first 14 bytes are the source address, the destination address and the frame type, and the 15–24 bytes is the LSC sublayer. Assuming that the LSC sublayer has been verified invalid, some abnormal traffic bypasses the LSC sublayer verification mechanism, the middle part of the data packets is intercept, and the content is as follows:

45 00 01 5e d4 5f 00 00 80 11

65 30 00 00 00 00 ff ff ff ff 00 44 00 43 01 4a

ca 8e 01 01 06 00 dd 6e 8d 3a 00 00 80 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0

4c 81 53 57 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 63 82

53 63 35 01 03 3d 07 01 00 e0 4c 81 53 57 32 04

0a 01 01 1b 0c 0f 4c 41 50 54 4f 50 2d 54 52 52

4d 4e 48 44 41 51 12 00 00 00 4c 41 50 54 4f 50

2d 54 52 52 4d 4e 48 44 41 3c 08 4d 53 46 54 20

35 2e 30 37 0e 01 03 06 0f 1f 21 2b 2c 2e 2f 77

Label the dataset, where normal access is 1, and abnormal access is 0. AdaBoost algorithm is a kind of ensemble learning algorithm with sample weight adjustment. The basic classifier of the ensemble is a simple classification decision tree. The number of basic classifiers is 150. In the training phase, the learning rate is set to 0.005.

Input the training data into the model and minimize AdaBoost's loss function to get the optimal parameters. On the test data set, the evaluation indicator is the P-R curve and the average precision based on the P-R curve.

ROC curve is always used to evaluate the effect of binary classification by most of the existing studies. However, this paper chooses P-R curve as the evaluation index. The reason is that P-R curve is sensitive to data imbalance, and changes in the proportion of positive and negative samples will cause large changes in P-R curve, while ROC curve is insensitive, and the ROC curve will change very little when the proportion of positive and negative samples changes. On the other side, in network intrusion detection, the number of normal access is far more than the number of abnormal access.

The results of network anomaly intrusion detection method based on AdaBoost are as follows:

True Positive (TP): the result of AdaBoost-based method is abnormal access, and it is abnormal access in fact.

False positive (FP): the result of AdaBoost-based method is abnormal access, while it is normal access in fact.

True negative (TN): the result of AdaBoost-based method is normal access, and it is normal access in fact.

False negative (FN): the result of AdaBoost-based method is normal access, while it is abnormal access in fact.

Based on the above four classification results, precision and recall are defined as follows:

Given _{n}_{n}

For the experimental dataset, the P-R curve and average precision are shown in Figure 5.

The average accuracy is 0.999771897810219, and the vertical axis of the P-R curve is almost close to the line

This paper applied AdaBoost algorithm in trustworthy network for anomaly intrusion detection. This method can realize network access anomaly monitoring at the network edge and micro-perimeter. Taking a simple decision tree as the base weak learner, AdaBoost algorithm is used to combine multiple weak learners into a strong learner by re-weighting the samples to further improve the defense effect of the entire trustworthy network against malicious behavior. After experimental verification, the average precision of the proposed method exceeds 0.999, indicating that it has a significant detection effect on abnormal network attacks and normal network access, and effectively improves the security of trustworthy networks. The next step will be to verify the real-time performance of the proposed method for practical application in large-scale networks.

