1. bookAHEAD OF PRINT
Dettagli della rivista
License
Formato
Rivista
eISSN
2444-8656
Prima pubblicazione
01 Jan 2016
Frequenza di pubblicazione
2 volte all'anno
Lingue
Inglese
Accesso libero

Algorithm of overfitting avoidance in CNN based on maximum pooled and weight decay

Pubblicato online: 22 Aug 2022
Volume & Edizione: AHEAD OF PRINT
Pagine: -
Ricevuto: 25 Oct 2021
Accettato: 15 May 2022
Dettagli della rivista
License
Formato
Rivista
eISSN
2444-8656
Prima pubblicazione
01 Jan 2016
Frequenza di pubblicazione
2 volte all'anno
Lingue
Inglese
Introduction

With the development of artificial intelligence technology, deep learning technology, such as the convolutional neural network (CNN), has attracted more and more attention from experts and engineering technicians and has been widely used [1,2,3,4,5]. For example, in the field of image analysis, relatively to fully connected neural networks, CNN has greatly improved the accuracy and efficiency of image analysis due to its few connection parameters, shared connection weight values, and local receptive fields, and has been widely recognised by experts and technicians in the field. However, if the structure of a CNN model is complex, the number of neurons contained and the weight parameters are too large, the sample set for training is too small, the number of training rounds is too large, etc., may make the model fall into overfitting during the training process [6, 7], which shows that during training, the data curve fits well and the correct rate is high, but when the trained model is used in the test data set, its correct rate is very low [8].

Aiming at the overfitting phenomenon in the process of model training, many scholars have proposed a series of solutions. For example, given the problems in early machine learning, Li et al. [9] discussed three solutions to overfitting problems based on the BP algorithm, namely early stopping method, adjustment method and automatic generation of hidden nodes. This is an earlier method of researching overfitting problems. The method of machine learning dropout in 2012 has already been proposed to solve the overfitting problems by Srivastava et al. [10] and Hinton [11]. Chen et al. [12] proposed an improved singular value decomposition (SVD) algorithm by studying the overfitting phenomenon in the scoring prediction problem. There are also related research reports in other documents [13,14,15]. For the overfitting problem of the CNN model based on deep learning, a series of solutions have been proposed [16, 17], including the data expansion method, regular method, integrated learning method, early stopping method, etc. However, affected by various factors, including knowledge level, technical application and research time, as well as the insufficient development of deep learning technology-related theories, these research results have their own advantages, but they all have certain limitations.

Based on the characteristics of CNN, the present research on the overfitting of the CNN model by the scholars was carried out. We discuss the method of the maximum value discarding of the pooling layer based on the mask, and by combining the weight decay technology, overfitting can be reduced. Through theoretical analysis and experimental comparison, it is proved that this method can avoid overfitting effectively in CNN model training and improve the ability to detect images.

Structure of CNN

In general, the CNN structure consists of three layers that are convolution, fully connected and pooling. A group of convolution operations and a group of pooling operations form a convolution-pooling operation. After a group of convolution-pooling operations is executed, the next group of convolution-pooling operations will continue to be executed to complete the sample feature extraction, and the final fully connected layer performs a one-dimensional operation on the obtained features to complete the feature classification. As CNN uses the local connection of neurons, with the sharing of local connection parameters, and the use of downsampling for pooling, it has fewer number of parameter of weight value, fewer features, and better object representation than the general neural network in the extraction process of sample features. Moreover, the characterisation ability has better generalisation.

Assume xil x_i^l represents the input of i-th in l-th layer of CNN, i = 1, 2, 3,⋯, nl, nl represents the number of features in l-th layer, wl,k represents value of the convolution kernel which is k-th of the l-th layer, bl,k represents the bias of convolution kernel corresponding to k-th of l-th layer, zl+1,k represents the output value obtained by adding the corresponding bl,k to the convolution kernel wl,k and the input xil x_i^l after the convolution operation. The f (xl) is the activation function of this layer, and the entire convolution operation process can be expressed as: zl+1,k=i=1nlcon(wl,k,xil)+bil,k {z^{l + 1,k}} = \sum\limits_{i = 1}^{{n^l}} con({w^{l,k}},\;x_i^l) + b_i^{l,k} xl+1,k=f(zl+1,k) {x^{l + 1,k}} = f({z^{l + 1,k}}) In formula (1), con represents the convolution operation on the l-th layer, which is used as the input of the next layer (l + 1), and then processed by f (xl). The output xil+1 x_i^{l + 1} in k-th feature map of the (l + 1)-th layer has be obtained.

After the convolution operation, the next is a pooling layer, which samples the features of the convolution layer to reduce the dimensions and complexity of the network. The pooling operation is explained as follows: sml+1=pool(sm,1l,sm,2l,,sm,jl,,sm,nl) s_m^{l + 1} = pool(s_{m,1}^l,\;s_{m,2}^l, \ldots ,s_{m,j}^l, \ldots ,s_{m,n}^l) The formula (3), sm,il s_{m,i}^l represents the value of the i-th pooling unit in the m-th pooling area of the xl+1,k, the l is pooling layer, and n is the number of pooling units of the pooling region. For example, if a 2*2 pooled is used, the number of pooling units n = 4. Pool (.) represents the operation of pooling. The maximum pooled operation is widely used because it can retain the essential characteristics of the object. The maximum pooled method is adopted in this paper.

Immediately after the pooled operation is completed, the full connection operation will be performed, and the weight parameters of the backpropagation are updated through the loss function calculation. The loss function is directly in contact with the CNN network training effect and model performance. Therefore, the loss function designing is very important. For an explanation, see ‘Weight Decay Calculation’ in the next section.

Overfitting avoidance algorithm

The previous section introduced the basic principles of CNN. This section is based on the above, a CNN model with maximum pooled dropout and weight decay is designed, to avoid overfitting in the process of model training as much as possible.

Maximum pooled dropout

Maximum pooled is a common method in the CNN pooling layer. In this paper, the maximum pooled is introduced to reduce the occurrence of overfitting in the model training process. In addition, combined with the maximum pooled of dropout, it can also overcome the disadvantages of neurons with high activation value caused by average pooled, avoid the disadvantage of simple maximum pooled by ignoring the values of other pooling units and can effectively express the characteristics of pooling layer.

Before the maximum pooled dropout operation, first, we describe the mask processing method. The mask processing is to set the activation value of some neurons in the pooling region of the pooling layer to 0 with a certain probability and perform the maximum pooled dropout operation between the holding neurons. The following is a description of the design process.

Assuming rl represent the mask of l-th layer, which is the Bernoulli distribution (p), p is maintenance probability, which is a very important parameter. Its design process will be described in detail later. So the model processing can be expressed as follows: sml=rl*sml s_m^{\prime l} = {r^l}*s_m^l sml+1=Maxpooling(sm,1l,sm,2l,,sm,jl,sm,nl) s_m^{l + 1} = Maxpooling(s_{m,1}^{\prime l},\;s_{m,2}^{\prime l}, \cdots ,s_{m,j}^{\prime l},\;s_{m,n}^{\prime l}) In Eq. (4), sml s_m^{\prime l} represents the value corresponding to the m-th pooling area value sml s_m^l of the l-th layer after masked, and the sml+1 s_m^{l + 1} of Eq. (5) represents the result obtained by taking the maximum value of all pooling units in the m-th pooling area after the mask. The specific implementation is shown in Figure 1.

Fig. 1

Operation diagram of the mask based on maximum pooled.

In Figure 1, a 2*2 pooling kernel is adopted with a step of 2 for pooling. If only maximum pooled is adopted, the result is [[45], [57]]. If maximum pooled is carried out after using a mask, the result is [[42], [55]], because some pooling units are discarded.

The following is a method for designing the maximum pooled dropout and obtaining the maintenance probability (p).

From the above, the Bernoulli probability distribution p is a very important parameter, which is actually the maintenance probability in process of maximum pooling dropout. Therefore, the value is an important index that avoids overfitting as well as important parameters to determine the dropout of the neurons unit. Supposing that the maintenance probability of each pooled area in a feature map of being pooled is set to p, it can be sized manually depending on the actual situation (generally set p = 0.5), the discard probability (supposing q) is q = 1 − p. Meanwhile, each unit values (sm,1l,sm,2l,,sm,nl) (s_{m,1}^l,\;s_{m,2}^l, \ldots ,s_{m,n}^l) of the pooling region m in the l-th layer is assumed to be rearranged in ascending order, i.e. 0<dm,1l<dm,2l<<dm,nl 0 < d_{m,1}^l < d_{m,2}^l < \ldots < d_{m,n}^l (the value of dm,jl d_{m,j}^l is a pool cell sm,jl s_{m,j}^l ). Then dm,jl d_{m,j}^l is selected as the maximum value of the entire pooling area. That is: each value of the unit (dm,j+1l,dm,j+1l,,dm,nl) (d_{m,j + 1}^l,\;d_{m,j + 1}^l, \ldots ,d_{m,n}^l) is greater than dm,jl d_{m,j}^l being suppressed (discard), and only the value is less than or equal to dm,jl d_{m,j}^l being kept. Since max value is got in these maintenance values, the pooled output is dm,jl d_{m,j}^l , namely: pj=poss(sml+1=dm,j1)=pqnj,j=1,2,,n {p_j} = poss(s_m^{l + 1} = d_{m,j}^1) = p{q^{n - j}},j = 1,\;2, \ldots ,n

Analysing formula (6), it can be seen that when performing maximum pooled discarding in the pooling area, the j-th activation value dm,jl d_{m,j}^l in the arrangement is selected by polynomial ascending order as the output value of the pooling area, namely: sml+1=dm,jl,pj(p1,p2,p3,,pn) s_m^{l + 1} = d_{m,j}^l,{p_j} \in \left( {{p_1},{p_2},{p_3}, \ldots ,{p_n}} \right)

Assuming the number of feature maps is r in l-th layer, each feature map size is s, pooling region size is t * t, and the size of the pooling step is t. It means the number of the pooled region rs/t without considering overlapping pooled. Then the calculation amount (the number of parameters) of the l-th layer of the model during training is (t + 1)2rs/t (including a bias). After the maximum pooled dropout is introduced, under the relevant probability constraints, the pooling units are randomly suppressed, that is, t is reduced, and the calculation amount of model training is reduced exponentially, which effectively reduces the complexity and can more effectively reduce the occurrence of the overfitting phenomenon.

Weight decay calculation

When training a big CNN, the above-mentioned maximum pooled dropout can better reduce the occurrence of overfitting. However, if the loss function value of the model changes drastically during the training process, it indicates that the weight value of related neurons is too large, so the absolute value of the derivative obtained by backpropagation is large, that is, the complexity of the model is large. The weight decay can reduce the complexity of the model by restricting the value of the neuron so it can’t be too big, thereby reducing the overfitting to a certain extent occurs.

Usually, we use the mean square deviation function as the loss function to detect overfitting of CNN, and is also used for linear regression. This function (named L0) is shown in Eq. (8), the parameter is: N represents the number of raining samples, yi represents the value of the actual label in the object of the sample i, and oi represents the value of output in object i. L0=1Ni=1N(yioi)2 {L_0} = {1 \over N}\sum\limits_{i = 1}^N {({y_i} - {o_i})^2}

To reduce the weight value of neurons in the network, we introduce a penalty term in the loss function shown in formula (8), then the loss function (L) is: L=L0+λ2ww2 L = {L_0} + {\lambda \over 2}\sum\limits_w {\left\| w \right\|^2} The right term in formula (9) is the penalty term, where the connection coefficient w of the neuron represents the weight, λ (λ > 0) is the coefficient of the penalty item, where it is used to calculate the proportional relationship between the penalty item ∑w ||w||2 and L0. 1/2 is a constant designed for the convenience of derivation. Taking the derivative of (9), we get: {Lw=L0w+λwLb=L0b \left\{ {\matrix{ {{{\partial L} \over {\partial w}} = {{\partial {L_0}} \over {\partial w}} + \lambda w} \hfill \cr {{{\partial L} \over {\partial b}} = {{\partial {L_0}} \over {\partial b}}} \hfill \cr } } \right. In the Eq. (10), b is the bias of neurons (such as convolution kernels) in the CNN model, which is existed in output oi in formula (8), oi = w · xi−1 + b, by analysing formula (10), it can be found that adding the penalty term does not affect the update of bias b, that is, formula (8) is the same as the partial derivative of formula (9) on b, but for the update of network weight w, after introducing the penalty term into formula (9), there is: w=wη(L0w+λw)=wηL0wηλw=(1ηλ)wηL0w w^\prime = w - \eta \left( {{{\partial {L_0}} \over {\partial w}} + \lambda w} \right) = w - \eta {{\partial {L_0}} \over {\partial w}} - \eta \lambda w = \left( {1 - \eta \lambda } \right)w - \eta {{\partial {L_0}} \over {\partial w}} Equation (11), η represents the learning rate. w′ is the new weight obtained by adding the penalty term according to formula (9) after updating the weight. In the analysis of the Eq. (11), it is shown that the coefficient w is 1 when the penalty item does not exist. After adding a penalty item, the coefficient of w is 1 − ηλ. Since η and λ are positive numbers smaller than 1, the equation 1 − ηλ < 1, that is, the updated weight with penalty items is smaller than that without penalty items. This is the theoretical significance of weight decaying. Note that the item ηL0w \eta {{\partial {L_0}} \over {\partial w}} in Eq. (11) is the gradient of the weight decay during backpropagation. The existence of this term has nothing to do with the penalty term, so we do not include this term when discussing the weight decay.

Continue analysis of Eq. (11), when w > 0, the w′ becomes smaller, and when w < 0, the w′ becomes larger. Due to |w| < 1, the effect of Eq. (11) means that make |w′| → 0, that is, the value of w′ is reduced as much as possible, which is shown to the function of reducing the weight of the network, reducing the complexity of the network, and avoiding overfitting. It should be pointed out that the penalty term coefficient λ in formula (9) is an important parameter, and its value directly affects the training effect of the model. If λ is set too large, the weight value w will decrease too fast, and under-fitting may happen, even if model training cannot be carried out, and if λ is set too small, overfitting will happen. The setting of a can be carried out by using the rules based on the Bayes method. The details can be found in Ref. [18], which is limited to space and will not be detailed here.

Experiment and result analysis
Experimental configuration and environment

Running parameters: The adaptive learning rate algorithm (AdaGrad) is applied in model training, which is an algorithm based on gradient optimisation. The initial the learning rate η = 0.1, and it can automatically adjust. The network weight w of initialisation is based on normal distribution, the mean value is 0, the variance is 0.01, the batch size during training is 100, the bias value is initialised b = 0, the maximum pooled default maintenance probability p = 0.5, the penalty term coefficient λ in the weight decay calculation is carried out to set according to the literature [18], this experiment λ = 0.

Experimental environment conditions: deep learning framework is TensorFlow, programming language is Python3.8.4, OS is Win10, Hardware includes Intel Core i7 CPU@3.00, RAM 8GB, NVIDIA GTX 3080Ti GPU.

Experimental data set: There are 2 data sets in this experiment, one is CVC-ClincDB [19], which belongs to a hospital in Barcelona, Spain. It is a collection of gastric polyp case images from 23 patients, including 31 sequences of 612 clear images, which are divided into 4 categories according to pathological types, namely normal images, polyp images 1, 2, and 3. The resolution of each image is 384*384, 3 channels (RGB), and some of the samples are shown in Figure 2 (left). Because the data set is small, it can be used to detect whether the method will be overfitting in the application of small data sets. Before the experiment, it was divided into a training set (500 images) and a test set (112 images), and the data were normalised to the interval [0, 1]. Another CIFAR-10 is a data set consisting of 60,000 colour images, which includes 10 categories. Some samples are shown in Figure 2 (right). Before the experiment, the data set is also divided into 50,000 images used for training and 10,000 images used to test. Each sample size is 32*32, 3 channels RGB image.

Fig. 2

Some samples of CVC-ClincDB and CIFAR-10 data sets.

Comparative experiment: To compare experimental results, we selected the GPU parallel optimization-based immune CNN method proposed by Gong et al. [6] (ICNN), the improved neural networks of feature detector algorithm proposed by Hinton et al. [11] (INN_FD) and the random CNN pooling method proposed by Zeiler and Fergus [20] (SCP) for comparative experiments. Among them, the ICNN method has a low test accuracy in the data set (only 80.2%), but its performance is stable and has a solid theory basis. The INN_FD method proposed by Hinton is a discard algorithm combined with a fully connected layer, the method is mature in technology, simple and convenient to implement. The SCP method proposed in Ref. [20] uses a random selection of pooled mask values in the process of training the model, when applying the model to detect samples, the probability of pooling unit in the pooling area is used as the probability weighted mean value of the model, so this method is a better performance to reduce the overfitting of the training model. Therefore, it is feasible for us to choose the above three methods for comparative experiments.

Experimental evaluation indicators: one indicator is the change curve of the loss value and the accuracy, and the other indicator is the error rate generated when the 4 methods are used for data set detection. The actual effect of the method is verified by comparing the change of the curve and the error rate of the sample test with various methods.

CVC-ClincDB data set experiment

For the convenience of description, the maximum pooled dropout and weight decay algorithm proposed in this paper is expressed as MD_WD. The CNN structure proposed by the MD_WD is 1*28*28->6C3->1P2->12C3->1P2->1000N->4N. The model consists of 2 groups of convolution-pooled layers and a fully connected layer. The input of the model is 28*28 single-channel image, and the output is 4 categories. The first group of convolution-pooled is 6C3->1P2, the second group of convolution-pooled is 12C3->1P2, where C means convolution, P means pooled, 6C3 means the number of convolution kernels is 6, and the size of each kernel is 3*3, 1P2 means step size is 1, the size of the pooled kernel is 2*2, 1000N represents the whole connecting layer with 1000 neurons The network structure of the other three comparison methods (ICNN/INN_FD/SCP) see relevant references, which is limited to space and will not be described in detail.

Figure 3 shows the change curve of the loss function value and the correct rate of the MD_WD method proposed during model training, where the maximum number of iterations epoch = 20,000. It can be seen from the left figure that with the increase in the number of iterations, the loss function value of both the training set and the verification set is decreasing. When the number of iterations reaches 10000, the change curve of the loss function value of the training set and the verification set tends to be stable. Similarly, it can be found from the right figure that with the increase of the number of iterations, the accuracy of the training set and the test set is also increasing. When the number of iterations reaches 10000, the curve changes tend to be stable, indicating that our method of MD_WD has not been overfitting in the experiment.

Fig. 3

Curve of iteration times, loss value and accuracy in CVC-ClincDB dataset.

To further verify the performance of the proposed method, we use the above 4 methods to test the data set under different maintenance probabilities, and the error rates obtained are shown in Table 1. Because the ICNN method does not involve the change of maintenance probability, this method does not consider the impact of maintenance probability when the error rate test is carried out. Comparing the data from Table 1, firstly, looking from the horizontal, when the maintenance probability is 0.5, each method gets the lowest error rate except ICNN, but the error rate of the INN_FD varieties most sharply with the maintenance probability; Secondly, looking at the lowest error from 4 methods longitudinally, our MD_WD method has a minimum error, the value is only 1.63%, the SCP method has the smallest value of 2.08%, and the value of INN_FD is 6.23%, indicating that this method can be used on small data sets. It reduces overfitting very well and thus can achieve better detection results.

The error rate (%) under different maintenance probabilities in CVC-ClincDB.

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ICNN 6.36 6.36 6.36 6.36 6.36 6.36 6.36 6.36 6.36
INN_FD 7.27 6.85 6.50 6.23 6.20 6.31 6.38 6.40 6.51
SCP 2.23 2.17 2.10 2.12 2.08 2.26 2.31 2.37 2.49
MD_WD 2.21 2.15 2.03 1.95 1.63 1.88 2.19 2.26 2.27
CIFAR-10 dataset

The experiment in Section 4.2 is based on the test of a small data set, and the CIFA-10 is a large natural colour image data set with 3 channels. Comparing with the data set of CVC-ClincDB in Section 4.2, the difference in each category of this data set is larger. Therefore, the structure of CNN is explored: 3*32*32->16C5-> 2P2->64C3->2P2->2000N->10N, the meaning of each item in the network structure is the same as the previous experiment, and the experiments of the other three methods are the same as the experiments in Section 4.2. The training is carried out for 5000 iterations, and the change curve obtained by the method MD_WD is shown in Figure 4. By analysing the curve on the left of Figure 4, it can be found that with the increase of iteration rounds, the loss curve of the verification set continues to decline, and the curve changes tend to be stable after 4000 iterations. Although the loss value of the training set changes drastically during the iteration process, this value is also decreasing with the rounds of iterations increasing. The accuracy in the right figure gradually rises as the number of iteration rounds increases, and tends to level off around 4000 iterations. It shows that our MD_WD method can still effectively reduce the occurrence of overfitting when tested on large data sets. It shows that the method in this paper can still effectively reduce overfitting when applied to large data sets.

Fig. 4

Curve of iteration times, loss value and accuracy in the CIFAR-10 dataset.

Same as the previous data set experiment, in this data set, we use the above 4 methods to test through the respective trained models under the constraints of different maintenance probability values p. The error rate of 4 methods obtained under the different maintenance probabilities values is shown in Table 2, and ICNN is independent of the maintenance probability, its explanation is the same as in the previous section. As shown by the horizontal analysis of the error rate in Table 2, the method achieves its lowest misclassification rate between maintenance probability 0.4∼0.6.

The error rate (%) under different maintenance probabilities in CIFAR-10.

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ICNN 7.44 7.44 7.44 7.44 7.44 7.44 7.44 7.44 7.44
INN_FD 8.87 7.95 7.76 7.10 6.15 7.21 7.38 8.12 8.69
SCP 4.83 4.67 4.55 4.31 4.09 3.92 4.16 4.47 5.39
MD_WD 3.21 3.08 2.97 2.93 2.81 2.88 3.08 3.16 3.35

The method of INN_FD has a minimum error rate of 6.15% when p = 0.5, SCP has a minimum error rate of 3.92% when p = 0.6, the method of this paper has a minimum error rate of 2.81% when p = 0.5. In addition, longitudinal analysis of the lowest error rate of each method, the minimum error rate of the MD_WD method is only 2.81%, indicating that the method proposed in this paper can also be applied to large data sets and has achieved good generalisation.

Conclusion

One of the challenges encountered in deep neural networks is the generalisation of the network. The so-called generalisation is to make the model perform well on the test set and also on the training set. This paper proposed a maximum pooled and weight decay in the improved CNN structure and fuses them to reduce the occurrence of model overfitting during training, to improve the generalisation performance of the model. The characteristics of this paper include two aspects. One is to design a dropout method in pooling, that is, after the pooling layer is processed by the mask method, sort the unit values in the pooled area with a certain probability and then discard them according to the maximum value. The second is to introduce a penalty item to update the weight (weight decay) in the iterative update process of weights to reduce the complexity of the network, and analyse and derive it.

However, there are some shortcomings in the proposed method. One is that while performing maximum pooled dropout, when the convolution operation is completed and the semi-linear activation function (ReLU) is used for further processing, it may cause the value of the non-maximum output unit to be 0, thus causing the corresponding neuron to fail update (gradient is 0), that is, a dead neural unit. For another reason, the penalty term parameter λ introduced not only lacks the theoretical basis but also theoretically seems to have a certain connection with the maximum pooled dropout probability p. But the two facts are not combined for this study. Therefore, the solution to these problems is the goal of the research at the next step.

Fig. 1

Operation diagram of the mask based on maximum pooled.
Operation diagram of the mask based on maximum pooled.

Fig. 2

Some samples of CVC-ClincDB and CIFAR-10 data sets.
Some samples of CVC-ClincDB and CIFAR-10 data sets.

Fig. 3

Curve of iteration times, loss value and accuracy in CVC-ClincDB dataset.
Curve of iteration times, loss value and accuracy in CVC-ClincDB dataset.

Fig. 4

Curve of iteration times, loss value and accuracy in the CIFAR-10 dataset.
Curve of iteration times, loss value and accuracy in the CIFAR-10 dataset.

The error rate (%) under different maintenance probabilities in CVC-ClincDB.

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ICNN 6.36 6.36 6.36 6.36 6.36 6.36 6.36 6.36 6.36
INN_FD 7.27 6.85 6.50 6.23 6.20 6.31 6.38 6.40 6.51
SCP 2.23 2.17 2.10 2.12 2.08 2.26 2.31 2.37 2.49
MD_WD 2.21 2.15 2.03 1.95 1.63 1.88 2.19 2.26 2.27

The error rate (%) under different maintenance probabilities in CIFAR-10.

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ICNN 7.44 7.44 7.44 7.44 7.44 7.44 7.44 7.44 7.44
INN_FD 8.87 7.95 7.76 7.10 6.15 7.21 7.38 8.12 8.69
SCP 4.83 4.67 4.55 4.31 4.09 3.92 4.16 4.47 5.39
MD_WD 3.21 3.08 2.97 2.93 2.81 2.88 3.08 3.16 3.35

Xiaohan Yang, Xiaojuan Li, Yong Guan, et al. Overfitting reduction of pose estimation for deep learning visual odometry. China Communications, 2020, 17(6): 196–210. YangXiaohan LiXiaojuan GuanYong Overfitting reduction of pose estimation for deep learning visual odometry China Communications 2020 17 6 196 210 10.23919/JCC.2020.06.016 Search in Google Scholar

Cha K H, Petrick N, Pezeshk A, et al. Reducing overfitting of a deep learning breast mass detection algorithm in mammography using synthetic images. (Conference Paper), Progress in Biomedical Optics and Imaging, Proceedings of SPIE, 2019, 188–194. ChaK H PetrickN PezeshkA Reducing overfitting of a deep learning breast mass detection algorithm in mammography using synthetic images (Conference Paper), Progress in Biomedical Optics and Imaging, Proceedings of SPIE 2019 188 194 10.1117/12.2512604 Search in Google Scholar

Gonzalez German, Ash Samuel Y, Estepar Raul San Jose. Reply to Mummadi et al.: Overfitting and use of mismatched cohorts in deep learning models: preventable design limitations. American Journal of Respiratory and Critical Care Medicine, 2018, 198(4): 545–555. GonzalezGerman Ash SamuelY EsteparRaul San Jose Reply to Mummadi et al.: Overfitting and use of mismatched cohorts in deep learning models: preventable design limitations American Journal of Respiratory and Critical Care Medicine 2018 198 4 545 555 10.1164/rccm.201803-0540LE683508629641211 Search in Google Scholar

Ashiquzzaman A, Tushar A K, Islam M R, et al. Reduction of overfitting in diabetes prediction using deep learning neural network. IT Convergence and Security, 2017, 2018, 449: 35–43. AshiquzzamanA TusharA K IslamM R Reduction of overfitting in diabetes prediction using deep learning neural network IT Convergence and Security 2017, 2018 449 35 43 10.1007/978-981-10-6451-7_5 Search in Google Scholar

Shun Zhang, Yihong Gong, Jinjun Wang. The development of deep convolutional neural networks and their applications in the field of computer vision. Chinese Journal of Computers, 2019, 42(03): 453–482. DOI:10.11897/SP.J.1016.2019.00453. ZhangShun GongYihong WangJinjun The development of deep convolutional neural networks and their applications in the field of computer vision Chinese Journal of Computers 2019 42 03 453 482 10.11897/SP.J.1016.2019.00453 Apri DOISearch in Google Scholar

Gong T, Fan T, Guo J, et al. GPU-based parallel optimization of immune convolutional neural network and embedded system. Engineering Applications of Artificial Intelligence, 2016, 36(25): 226–238. GongT FanT GuoJ GPU-based parallel optimization of immune convolutional neural network and embedded system Engineering Applications of Artificial Intelligence 2016 36 25 226 238 10.1016/j.engappai.2016.08.019 Search in Google Scholar

Danfeng Liu, Jianxia Liu. Neural network model for deep learning overfitting problem. Journal of Natural Science of Xiangtan University, 2018, 40(2): 96–99. LiuDanfeng LiuJianxia Neural network model for deep learning overfitting problem Journal of Natural Science of Xiangtan University 2018 40 2 96 99 Search in Google Scholar

Junhua Cheng, Guohui Zeng, Dunke Lu, et al. Dropout-based improved convolutional neural network model averaging method. Journal of Computer Applications, 2019, 39(06): 1601–1606. ChengJunhua ZengGuohui LuDunke Dropout-based improved convolutional neural network model averaging method Journal of Computer Applications 2019 39 06 1601 1606 Search in Google Scholar

Jianchuan Li, Guojun Qin, Xisen Wen, Fuqing Hu. The overfitting problem of neural network learning algorithm and its solution. Vibration, Testing and Diagnosis, 2002, (04): 16–20+76. LiJianchuan QinGuojun WenXisen HuFuqing The overfitting problem of neural network learning algorithm and its solution Vibration, Testing and Diagnosis 2002 04 16 20+76 Search in Google Scholar

Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929–1958. SrivastavaN HintonG KrizhevskyA Dropout: A simple way to prevent neural networks from overfitting The Journal of Machine Learning Research 2014 15 1 1929 1958 Search in Google Scholar

Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detector. arXiv Preprint, 2012, 2012: arXiv. 1207.0580. HintonG E SrivastavaN KrizhevskyA Improving neural networks by preventing co-adaptation of feature detector arXiv Preprint 2012 2012 arXiv. 1207.0580. Search in Google Scholar

Dawei Chen, Zhao Yan, Haoyan Liu. Overfitting phenomenon of SVD series algorithms in scoring prediction. Journal of Shandong University (Engineering Science Edition), 2014, 44(03): 15–21. ChenDawei YanZhao LiuHaoyan Overfitting phenomenon of SVD series algorithms in scoring prediction Journal of Shandong University (Engineering Science Edition) 2014 44 03 15 21 Search in Google Scholar

Weiwei Shen, Ying Li, Zhihao Yang, et al. Attribute reduction to prevent overfitting. Application Research of Computers, 2020, 37(09): 2665–2668. ShenWeiwei LiYing YangZhihao Attribute reduction to prevent overfitting Application Research of Computers 2020 37 09 2665 2668 Search in Google Scholar

Guanghua Qin, Zuoyong Li. Research and application of BP network over-fitting problem. Journal of Wuhan University (Engineering Science Edition), 2006, (06): 55–58. QinGuanghua LiZuoyong Research and application of BP network over-fitting problem Journal of Wuhan University (Engineering Science Edition) 2006 06 55 58 Search in Google Scholar

Zheng Yang, Fan Gao, Songnian Fu, et al. Overfitting effect of artificial neural network based nonlinear equalizer, mathematical origin to transmission evolution. 2020, 63(6): 82–97. DOI:10.1007/s11432-020-2873-x. YangZheng GaoFan FuSongnian Overfitting effect of artificial neural network based nonlinear equalizer, mathematical origin to transmission evolution 2020 63 6 82 97 10.1007/s11432-020-2873-x Apri DOISearch in Google Scholar

Chiyuan Zhang, Oriol Vinyals, Remi Munos, Samy Bengio. A study on overfitting in deep reinforcement learning. Statistics, 2018, (2): 1–25. ZhangChiyuan VinyalsOriol MunosRemi BengioSamy A study on overfitting in deep reinforcement learning Statistics 2018 2 1 25 Search in Google Scholar

Jinzhe Wang, Zeru Wang, Hongmei Wang. Improved CNN algorithm based on PSO algorithm and dropout. Journal of Changchun University of Technology, 2019, 40(01): 26–30. WangJinzhe WangZeru WangHongmei Improved CNN algorithm based on PSO algorithm and dropout Journal of Changchun University of Technology 2019 40 01 26 30 Search in Google Scholar

Mackay DJC. Bayesian interpolation. Neural Computation, 1992, 4(3): 415–447. MackayDJC Bayesian interpolation Neural Computation 1992 4 3 415 447 10.1007/978-94-017-2219-3_3 Search in Google Scholar

Mackay DJC. Bayesian interpolation. Neural Computation, 1992, 4(3): 415–447. MackayDJC Bayesian interpolation Neural Computation 1992 4 3 415 447 10.1007/978-94-017-2219-3_3 Search in Google Scholar

Zeiler M D, Fergus R. Stochastic pooled for regularization of deep convolutional neural networks. https://arxiv.org/pdf/1301.3557.pdf [2018-09-17]. ZeilerM D FergusR Stochastic pooled for regularization of deep convolutional neural networks https://arxiv.org/pdf/1301.3557.pdf [2018-09-17]. Search in Google Scholar

Articoli consigliati da Trend MD

Pianifica la tua conferenza remota con Sciendo