This paper uses the principle of compressed mapping to discuss the existence and uniqueness of the explicit finite difference method for the fractional diffusion equation with time delay. The Laplace transform method obtains the necessary expression form of the solution. At the same time, the existence theorem and the existence and uniqueness theorem of the solution to the boundary value problem is established. Finally, an example is given to verify the correctness of the conclusion. The experimental results show that the parallel algorithm proposed in this paper agrees with the exact solution.
- Fractional differential equations
- Principle of contraction mapping
- College mathematics
- Fractional Riesz
Fractional differential equations play a very important role in dynamic systems related to mathematics, physics, chemistry, and fluid mechanism. As most fractional problems are difficult to obtain accurate solutions, more and more research is looking for their numerical algorithms. Many numerical algorithms solve fractional equations, such as the finite difference method . As a typical fractional differential equation, the fractional reaction-diffusion equation has attracted everyone's attention.
General-purpose GPU technology has also become a reality with the improvement of programming models and hardware resources. The maturity and development of CUDA make it simple to develop non-image applications, and it also provides an energy-efficient architecture for particle transport . Fractional numerical simulations are extremely time-consuming, and short-storage principles and parallel computing are applied to overcome this difficulty. Some scholars have proposed a parallel method for solving Riesz fractional equations based on MPI. This parallel algorithm still has a parallel efficiency of 79.39% compared to eight processes in 64 processes. However, there is no related research on fine-grained data-level parallel algorithms on GPUs or other accelerators. This paper studies the following parallel algorithms for fractional reaction-diffusion equations in Riesz space:
The fractional derivative
The grid point
The parallel algorithm on the fractional differential equation
Parallel algorithm for Rising space fractional reaction-diffusion equation with CUDA. call kernel kernel1<<<bs1, ts1>>>(…) for n = 1to N-1do Call Kernel Kernel2<<<bs2, ts2>>>(…) call kernel kernel3 <<<bs3, ts3>>> (…) end for record time T2 output T2-T1 and UN… free GPU memory and stop CUDA environment
Parallel algorithm for Rising space fractional reaction-diffusion equation with CUDA.
call kernel kernel1<<<bs1, ts1>>>(…)
for n = 1to N-1do
Call Kernel Kernel2<<<bs2, ts2>>>(…)
call kernel kernel3 <<<bs3, ts3>>> (…)
record time T2
output T2-T1 and UN…
free GPU memory and stop CUDA environment
The initial value condition
The most time-consuming CUDAkernel2
The computational complexity of ker
The experimental platform is composed of NVIDIA Quadro FX 5800 GPU and quad-core Intel Xeon E5540 CPU. The operating system is Kylin 3.1. The experiments all adopt double-precision operations. The compilers are CUDANVCC3.0 and Intel Fortran 11.1. The experiments all use three-level optimization. Use MPICH21.3 to communicate on the CPU. Each CPU core corresponds to an MPI process, thus using quad-core MPI parallel . The exact solution of the numerical example is. The fine-grained data-level parallel algorithm proposed in this paper agrees well with the exact solution. The values of Δt and h are 0.5/256 and 2.0/17.
The performance comparison of the parallel algorithms of GPU and CPU is shown in Figure 1. The parallel algorithm on the CPU adopts the MPI-based parallel programming model . Figure 1a shows that N's performance comparison is fixed at 64, and M-1 is increased from 4096 to 20480. The speedup ratio is between 3.07 and 4.20. Figure 1b shows the performance comparison of N increasing from 128 to 2048 when M-1 is fixed at 40960. The speedup ratio is between 4.12 and 4.25.
The comparison of the effect of using algorithm 2 and similar optimization algorithm 3 is shown in Figure 2a. The thread block size is fixed at 64, and the value of N is also fixed at 64. “Opt2” means comparing algorithm 3 and original algorithm 2 to obtain two grid point data at the time step of a thread block calculation. “Opt4” and “opt8” respectively represent the comparison between the optimized algorithm of the four and eight grid point data at the time step of a thread block calculation and the original algorithm 2. “Opt2”, “opt4” and “opt8” can get 1.33, 1.60 and 1.78 times the acceleration effect respectively. The effect of thread block size on algorithm performance is shown in Figure 2b. Among them, M-1 is 12288, and N is 256. It can be seen that 64 is a better choice . Limited by the size of shared memory, compilation fails when the thread block size is 256.
The calculation algorithm of the BS value itself models the relationship between the parallel algorithm, the GPU architecture, and the programming model. Modeling is a very amazing thing, but very, very difficult. It isn't easy to quantify the relevant parameters of the calculation algorithm of the BS value in this paper. Therefore, in the various parallel algorithm research, we have done, there is no calculation algorithm for the value of BS, but the value of BS is tested from a practical perspective. We take a better value. Generally, 32, 64, 128, 192, 256, 512, etc. are more appropriate.
This paper proposes a fine-grained data-level parallel algorithm on GPU for the explicit finite difference method of the fractional reaction-diffusion equation in Riesz space. The experimental results show that the fine-grained data-level parallel algorithm proposed in this paper is effective. Therefore, parallel algorithms for numerical methods of fractional differential equations on GPU are worthy of attention by researchers.