Application of Nonlinear Fractional Differential Equations in Computer Artificial Intelligence Algorithms

In order to study the application of nonlinear fractional differential equations in computer artificial intelligence algorithms. First, the concept, properties and commonly used neural network models of artificial neural network are introduced, the domestic and foreign status quo of the application of fractional calculus theory to neural network technology is described. Then, the definition, properties and numerical calculation methods of fractional calculus theory are introduced in detail. Then, based on the analysis of artificial intelligence neural network algorithm, the theory of fractional differentiation is introduced, construct BP neural network based on fractional order theory. The Sigmoid function is used as the node function of the neural network, and the actual data is used as the sample set, train a fractional-order network. Finally, by training the network, summarize the change of the two parameters a and p in the function, the impact on the training of the entire network, and make a simple comparison with the fractional order neural network based on the sigmoid function. Experiments show that a variable-order iterative learning algorithm is proposed and applied to the training of neural networks, the results show the feasibility of this algorithm and its advantages in convergence speed and convergence accuracy.


Introduction
Artificial intelligence algorithms accumulate a large amount of user data through such network platforms, the values of AI algorithm controllers (and in many cases developers) are thus deeply embedded in the supposedly neutral technology, and quietly affect the audience's ideas, decisions and behavior patterns.Through continuous machine learning, algorithms have achieved huge advantages over humans in data capture, learning, and push, and thus strengthen the impact on the audience.Including platform media such as Toutiao and social platforms such as WeChat, are actively using artificial intelligence algorithms to increase user stickiness, and increase market penetration.It is different from the news manual filtering and pushing mechanism in the traditional media era, and it is also different from the news social filtering mechanism in the social network era, AI algorithm recommendation mechanism in the era of big data, with stronger data capture ability and learning ability, the scale and efficiency of information push increase exponentially.The advent of artificial intelligence algorithms has revolutionized this.Relying on the application of big data technology, the artificial intelligence algorithm is based on the user's behavior data -such as browsing content, forwarding, comments, etc., and deep machine learning and algorithm analysis of identity data, accurately identify and push information about value needs and related preferences for each user, that is, "Only what you pay attention to is the headline."On this basis, artificial intelligence algorithm recommendations are based on different value preferences and information needs, divide users into multiple overlapping groups, and push the required information for them respectively, realize big data filtering which is different from Moments filtering, so that users can control the capture and reception of information more freely [1].Fractional calculus is used to describe the memory and genetic properties of various materials and processes, provides a powerful tool, it has been used in many scientific and engineering fields, such as viscoelasticity, anomalous diffusion, fluid mechanics, biology, chemistry, acoustics, control theory, etc.In this way, fractional differential equations, which are class of integral-differential equations with singularity, naturally appear in applied research.The existence and uniqueness theorem for solutions of fractional ordinary differential equations has been proposed.For linear fractional differential equations, the commonly used integral transformation method, including Laplace transform, Fourier transform and Mellin transform, the analytical solution of the problem is obtained.

Solving methods of nonlinear fractional differential equations
There are two commonly used numerical algorithms for nonlinear fractional differential equations, they are the prediction-correction solution and the time-frequency domain conversion algorithm, respectively.The former method is a classical calculation method, which is a generalization of the Adama-Bashforth-Moulton method for solving first-order differential equations, it is widely used in practical fractional order calculations.The latter method includes approximation methods implemented by continuous fractionation, expansion and interpolation, and approximation methods implemented by curve fitting stage identification techniques.
Decomposition methods have been effectively used to solve linear or nonlinear fractional differential equations.The numerical format of the decomposition-based fractional differential equation is given in [1], the ADM-Pade approximation technique is also used in fractional differential equations, the Rach-Adomian-Meyers modified decomposition method is extended to solve nonlinear fractional differential equations.Other analytical and numerical methods for nonlinear fractional differential equations can be found in [2].We consider the initial value problem of nonlinear fractional ordinary differential equations Table 1 Properties of independent variables： where f is an analytical nonlinear function and   t g is the system input.Applying the fractional integral operator  t J to both sides of equation ( 1), we get： We decompose the solution into , then, decomposition of Analytical Nonlinear Substituting the solution and the decomposition of the nonlinear term into equation (3), we get: From this we obtain the recursive form of the solution components: Or apply the recursive format modified by Wazwaz: Here we decompose the system input as: The approximate solution for term n is . Next we consider the Rach-Adomian Meyers modified decomposition method.For initial value problems (1) and ( 2), if  is a rational number, p q /   , where p and q are co-prime positive integers, the system input   t g can be expressed in the form of a generalized power series: We note that every function   t g parsed at point 0  t can be expressed in the form of Eq. ( 12).We decompose the solution into a generalized power series: In this way (13) satisfies the initial value condition (2), then, the nonlinear term is written as: Calculating the fractional derivatives item by item, we have: Substituting Equations ( 12), ( 15) and ( 16) into Equation ( 1), and comparing the coefficients of the same power, we obtain the recurrence format of the coefficient n a as: The approximate solution for term n is: Accelerated Convergence Techniques are used to accelerate the convergence of sequences or series, and even extend the domain of convergence.In this article, we use iterative Shanks, for example, we obtain the approximate sequence . First, we denote , and then, transform: Table 1 shows the transformation process, and

S
in the table is the result after transformation.We express:

The basic algorithm of computer artificial intelligence network
Artificial intelligence neural network includes input layer, hidden layer and output layer, each layer is composed of multiple neuron nodes.The hidden layer can be one layer or multiple layers.Although the hidden layer is not connected to the outside world, its state is crucial, to a considerable extent, it directly affects the mapping relationship between input and output.Many scholars have proved that the three-layer neural network that only contains the input layer, the hidden layer and the output layer already has a strong approximation ability [3].The learning of artificial intelligence neural network belongs to supervised learning, the learning process consists of two parts: signal forward propagation and error back propagation.The basic training process is to input sample data from the input layer, and then input it to the output layer after implicit, and finally output by the output layer.Both the hidden layer and the output layer have differentiable excitation functions, and the output of each layer of neurons will only affect the output of the next layer of neurons.In the process of data forward propagation, the weights and thresholds of the network will not change.If the output is different from the expected value, the error is back-propagated.The neural network adjusts the connection weights of each layer through the error, and so on, until the error meets the requirements.The sample data is input from the input layer to the hidden layer, and the input of a single node t of the hidden layer is: The output of the hidden layer node t is: where   met f is the node function or activation function of the hidden layer.Generally, different functions are selected according to the actual situation of the sample, such as linear function, hyperbolic tangent function, etc.The data is then input from the hidden layer to the output layer, and the input of a single node k of the output layer is: The output of the output layer node k is: Where   is the node function of the output layer.In the initial stage, there is an error between the actual output value of the network and the expected value.In actual data training, the calculation formula of error usually adopts the square error formula, namely formula (25).
where y is the expected output value and k y is the expected output value of the output neuron k.The error here refers to the sum of the errors of all neurons in the output layer.
The final stable state of the artificial intelligence neural network is that the actual output value is the same or infinitely close to the expected value.Therefore, in the training of the data, it is necessary to constantly modify the connection weights and thresholds of the neural network.The weight correction formula of the neural network adopts the gradient descent method, and its essence is a simple static optimization algorithm of steepest descent [4].
Where u is the weight correction coefficient, that is, the learning rate.The size of the learning rate affects the convergence speed of the algorithm.If the learning rate is too small, the network will converge very slowly.If the learning rate is too large, the network will oscillate and fail to converge.In the classical network algorithm, in each iteration, an accurate one-dimensional search is required to obtain the optimal iterative step size.However, one-dimensional search requires multiple calculations, which consumes a lot of computing time and is difficult to program and apply.Therefore, one-dimensional search is generally not used to optimize the learning rate, but a certain value of 0-1 is used.The size of the initial weights also affects the learning rate.Usually the initial weights will choose positive and negative decimals near 0, preferably random and uniform distribution, which can expand the search range of the optimal weights.The partial derivative of the error E to the weight W in equations (26-27) is also called the weight correction of one iteration.The weight correction amount can be decomposed into a relationship with the input of the hidden layer or the input of the output layer, and easy to calculate fractions.Substitute equations ( 21) and ( 22) into the weight corrections of the input layer and hidden layer weights st W , and obtain the following formula: In MATLAB, run the integer order artificial intelligence neural network program, the number of samples is 30, and get Figure 1 and Figure 2, the abscissa is the number of iteration steps, the ordinate is the training error, P is the number of neurons in the hidden layer, u is the learning rate.Figure 1 shows that the artificial intelligence neural network has the same number of neurons in the hidden layer, under the conditions of different learning rates, training on the same set of data.Figure 2 shows that the artificial intelligence neural network has the same learning rate, under the condition that the number of neurons in the hidden layer is different, training on the same set of data [5].It can be seen from Figure 1 that the smaller the learning rate, the slower the network convergence speed.Figure 2 shows that there are too many neurons in the hidden layer, which makes the convergence rate very slow, the convergence is very poor, but it is not that the fewer the number of neurons, the smaller the convergence error of the network.Only with the appropriate number of neurons, will get better convergence and less convergence error.In this section, the node function of the neural network selects the Sigmoid function, because the output of this function is close to the signal output form of biological neurons, it can simulate the nonlinear characteristics of biological neurons.Moreover, the nonlinear characteristics of the sigmoid function, it can also enhance the nonlinear mapping ability of neural network [8].The mathematical expression of the sigmoid function is: The first derivative expression of the sigmoid function is: Considering the hope that the artificial intelligence neural network itself, it can transform the derivative order according to the change of the convergence error, and realize the global self-adaptation, that is, construct the self-adaptive artificial intelligence neural network.When the error between the previous iteration and the next iteration is quite different, the fractional order takes the smaller value, in order to ensure that the network can learn at a relatively fast speed, and in order to prevent network training saturation, that is, the error does not drop but rises, and the magnitude of the adjustment of the fractional order before and after is slightly larger [9, but considering the training process of artificial intelligence neural network, by adjusting the two parameters  、 , the training state of the network is changed.Therefore, the node functions of the hidden layer and the output layer can use the following functions: When  is the same and the value of  is less than 1, the output value of the function changes faster.The larger the value of  is, the slower the output value of the number rises, and gradually becomes flat.When  is equal to 2, the  function takes a number less than 1, its function graph is similar to the Sigmoid function graph, and the value interval is similar, and the upper bound of the function within (0,1) tends to 1.
When  is the same and takes a number less than 1, no matter what value  takes, function graphs are all similar to Sigmoid function graphs.Only when the value of p is small, the function value will be less than 0, and the others are all within (0,1).When  is equal to 2, X takes a value greater than 0, the function value does not change much before and after, and tends to a smooth straight line.From the comparison of the curves in Figure 3, it can be found that, if both  and  take values greater than 1 at the same time, the change of the function value of    When  and  are less than 1, the error curves of different  are not very different, the final convergence is not very good, and the convergence error value is still relatively large.It can be seen from this that the smaller the value of  , the faster the curve declines, that is, the faster the error between the actual output value and the expected value changes.Similar to the previous conclusion, the values of  and  should not be too large, otherwise the convergence of the network will be poor, the purpose of training is not achieved.Through the training of fractional-order artificial intelligence neural network based on sigmoid function, the effects of fractional order, learning rate and the number of neurons in the hidden layer on its training are summarized.Comparing the training results of the fractional-order artificial intelligence neural network with the integer-order artificial intelligence network, summarize the advantages and disadvantages of each.On this basis, a variable-order iterative algorithm is proposed, that is, the switching between integer order and fractional order, and the fractional order adaptively adjusts the two algorithms according to the error before and after.Through the actual network training results, the advantages of the algorithm are obtained, and a fractional-order artificial intelligence network based on this function is constructed for training, summarize the effect of two parameters in the function on network training, and make a simple comparison with the network based on the sigmoid function.

Conclusion
The author mainly studies the algorithm and application of computer artificial intelligence neural network based on fractional calculus theory.The author introduced fractional order theory into the algorithm of computer artificial intelligence neural network, the fractional-order artificial network algorithm is derived from the two fractional-order definitions, and the neural network is trained by using a specific data sample set, and compared with the training results of the integer order neural network, the simulation results show that, the fractional-order computer artificial intelligence neural network has a faster training speed, but it is slightly insufficient in the convergence accuracy.Therefore, a variable-order iterative learning algorithm is proposed and applied to the training of neural networks, the results show that, the feasibility of this algorithm and its advantages in convergence speed and convergence accuracy.
However, the latter variable-order iterative algorithm proposed by the author, it is to adjust the fractional order adaptively according to the ratio of the error before and after.Although the training effect of artificial intelligence network is good, it is too simple, and for different sample sets, some data in the algorithm may need to be modified.Therefore, it is necessary to find a better and more widely used order adjustment algorithm.

Figure 1 .Figure 2
Figure 1.The training error diagram of integer-order artificial intelligence neural network with different learning rates x f is relatively stable, not much difference before and after.Only one of  and  is greater than 1 and the other is less than 1, the function value will change greatly.When  takes a number less than 1 or  takes a number greater than 1, the function  x f is similar to the Sigmoid function.In practical application, the relationship between parameters and the relationship with function values should be considered for parameter adjustment[10].

Figure 3
Figure 3 Function diagram of   x f