1. bookAHEAD OF PRINT
Journal Details
License
Format
Journal
eISSN
2444-8656
First Published
01 Jan 2016
Publication timeframe
2 times per year
Languages
English
access type Open Access

The Uniqueness of Solutions of Fractional Differential Equations in University Mathematics Teaching Based on the Principle of Compression Mapping

Published Online: 15 Jul 2022
Volume & Issue: AHEAD OF PRINT
Page range: -
Received: 07 Jan 2022
Accepted: 29 Mar 2022
Journal Details
License
Format
Journal
eISSN
2444-8656
First Published
01 Jan 2016
Publication timeframe
2 times per year
Languages
English
Abstract

This paper uses the principle of compressed mapping to discuss the existence and uniqueness of the explicit finite difference method for the fractional diffusion equation with time delay. The Laplace transform method obtains the necessary expression form of the solution. At the same time, the existence theorem and the existence and uniqueness theorem of the solution to the boundary value problem is established. Finally, an example is given to verify the correctness of the conclusion. The experimental results show that the parallel algorithm proposed in this paper agrees with the exact solution.

Keywords

MSC 2010

Introduction

Fractional differential equations play a very important role in dynamic systems related to mathematics, physics, chemistry, and fluid mechanism. As most fractional problems are difficult to obtain accurate solutions, more and more research is looking for their numerical algorithms. Many numerical algorithms solve fractional equations, such as the finite difference method [1]. As a typical fractional differential equation, the fractional reaction-diffusion equation has attracted everyone's attention.

General-purpose GPU technology has also become a reality with the improvement of programming models and hardware resources. The maturity and development of CUDA make it simple to develop non-image applications, and it also provides an energy-efficient architecture for particle transport [2]. Fractional numerical simulations are extremely time-consuming, and short-storage principles and parallel computing are applied to overcome this difficulty. Some scholars have proposed a parallel method for solving Riesz fractional equations based on MPI. This parallel algorithm still has a parallel efficiency of 79.39% compared to eight processes in 64 processes. However, there is no related research on fine-grained data-level parallel algorithms on GPUs or other accelerators. This paper studies the following parallel algorithms for fractional reaction-diffusion equations in Riesz space: u(x,t)t=u(x,t)+xD0a(x,t)+q(x,t) {{\partial u\left({x,t} \right)} \over {\partial t}} =- u\left({x,t} \right){+ _x}D_0^a\left({x,t} \right) + q\left({x,t} \right)

Suppose equation a = 2 (1) is the standard reaction-diffusion equation. u(x, t) and g(x) are both real-valued and sufficiently good functions. xD0a(x,t) _xD_0^a\left({x,t} \right) is the fractional derivative of Riesz.

Related background

The fractional derivative xD0a(x,t) _xD_0^a\left({x,t} \right) of Riesz has the following definition in the range of 1 < a ≤ 2: xD0a(x,t)=Ia=c(I+a+Ia) _xD_0^a\left({x,t} \right) =- {I^a} =- c\left({I_ + ^{- a} + {I^{- a}}} \right)

c = 1/(2 cos( / 2)). Definition tn = nΔt, xi = ih, 0 ≤ nN, 0 ≤ nN, 0 ≤ iM. Where M, N is a positive integer [3]. uin u_i^n and qin q_i^n are the numerical approximations of and q(xi, tn). The time layer adopts the first-order forward Euler approximate format. The explicit finite difference approximation of grid point (tn, xi) is: uin+1=k=0i2Cgi+1kukn+C(g0+g2)ui1n+((1Δt)+2Cg1)uin+C(g0+g2)ui+1n+k=i+1MCgki+1uin+Δtqin \matrix{{u_i^{n + 1} = \sum\limits_{k = 0}^{i - 2} {{C_{gi + 1 - k}}u_k^n + C\left({{g_0} + {g_2}} \right)u_{i - 1}^n +}} \hfill{\left({\left({1 - \Delta t} \right) + 2C{g_1}} \right)u_i^n + C\left({{g_0} + {g_2}} \right)u_{i + 1}^n +} \hfill{\sum\limits_{k = i + 1}^M {C{g_{k - i + 1}}u_i^n + \Delta tq_i^n}} \hfill}

g0 = 1, gk = (−1)k a(a−1)…(ak + 1) / k!, k = 1, 2, 3, …, C = −Δt / (cha).

Parallel algorithm details

The grid point uin+1 u_i^{n + 1} of the n + 1 time layer only depends on the n time layer ui1n u_{i - 1}^n , uin u_i^n , ui+1n u_{i + 1}^n . Its computational complexity is O(NM). From equation (3), we know that uin+1 u_i^{n + 1} depends on all grid points in the previous time layer. Its computational complexity is O(NM2). This can lead to a much larger amount of calculation than classical differential equations. This is especially true when M is large.

The parallel algorithm on the fractional differential equation GPU includes three parts: preprocessing, parallel solver, and postprocessing. Preprocessing prepares initialization variables, matrices, etc. on GPU. The parallel solver is concentrated in the iteration of equation (3), which is the most time-consuming part of the entire numerical method. Postprocessing mainly outputs result information [4]. ker nel1 calculates the initial conditions according to g(x) in equation (1). ker nel2 represents the multiplication and addition operation of the coefficient and uin u_i^n . ker nel3 mainly performs the sum of Δtqin \Delta tq_i^n and ker nel2 calculation results.

Algorithm:

Parallel algorithm for Rising space fractional reaction-diffusion equation with CUDA.

call kernel kernel1<<<bs1, ts1>>>(…)

for n = 1to N-1do

Call Kernel Kernel2<<<bs2, ts2>>>(…)

call kernel kernel3 <<<bs3, ts3>>> (…)

end for

record time T2

output T2-T1 and UN

free GPU memory and stop CUDA environment

Basic CUDA core

The initial value condition g(x) determines ker nel1, for example, g(x) is x2 (2 − x)2 in the example in this article. The calculation of ker nel3 is similar to that of ker nel1, and it is also very simple. ker nel2 is shown in Algorithm 2. Where BS is the thread block size in CUDA, which is a predefined constant value. Such as. Each thread block completes the calculation of a grid uin+1 u_i^{n + 1} in the n + 1 time step. Assume that M-1 is divisible by BS and BS is an integer power of 2 [5].

ekη+tid1 e_{k\eta+ tid}^1 is the coefficient defined in equation (3) to calculate ubidn+1 u_{bid}^{n + 1} . bid is the thread block number. In algorithm 2, all grid points in the nth time step are divided into (M-1)/BS parts. Each thread processes BS multiplication and BS − addition operations and then stores the local variable μ in the shared memory [6]. Parallel summation operations in a thread block are shown in lines 13–16. The final result is saved in the global memory by the first thread, and the data is persisted.

Algorithm 2

The most time-consuming CUDAkernel2

input : n, M′ = M − 1, u*n u_*^n ;

output : ubidn+1 u_{bid}^{n + 1} .

shared memory sm0→BS−1

for all thread sin every thread block doinparallel

local var iablestid, bid, bs, η, μ

tidthreadId.x

bidblockId.x

bsblockDim.x

ηM/ bs

μ = 0

fork = 0 toηbyk = k + 1do

μμ+ekμ+tid1gukη+tidn \mu\leftarrow \mu+ e_{k\mu+ tid}^1gu_{k\eta+ tid}^n

smtidμ

_ syncthreads ()

forbs / 2 to1byk = k / 2do

iftid < kthen

smtidsmtid+k

_ syncthreads()

endfor

iftid == 0then

ubidn+1sm0 u_{bid}^{n + 1} \leftarrow s{m_0}

endfor

Optimization

The computational complexity of ker nel1 and ker nel3 is O(M), and the computational complexity of ker nel2 is O(M). In Algorithm 2, each thread block loads all the grid point variables u*n u_*^n at the n time step and only obtains the value of one grid point variable at the n+1th time step at a time. So u*n u_*^n can be reused to reduce global memory access [7]. Each thread block loads u*n u_*^n once to get the value of grid points 2, 4, and even eight at the n+1th time step. Algorithm 3 needs to use double shared memory, and the number of thread blocks needs to be halved.

Algorithm 3

Optimized CUDA ker nel2

input : n, M =, −1, u*n u_*^n ;

output : u2bidn+1 u_{2\,bid}^{n + 1} , u2bid+1n+1 u_{2\,bid + 1}^{n + 1}

shared memory sm[0 →BSg2 − 2]

for all thread sin every thread block doinparallel

local var iablestid, bid, bs, η, μ

gettid, bid, bs, ηlikeAlg orithm2

μ2 = μ2 = 0

fork = 0toηbyk = k + 1do

μ1μ1+ekμ+tid1gukη+tidn {\mu _1} \leftarrow {\mu _1} + e_{k\mu+ tid}^1gu_{k\eta+ tid}^n

μ2μ2+ekμ+tid2gukη+tidn {\mu _2} \leftarrow {\mu _2} + e_{k\mu+ tid}^2gu_{k\eta+ tid}^n

endfor

smtidμ1

smtidBSμ2

_ syncthreads()

forbs / 2to1byk = k / 2do

iftid < kthen

smtidsmtid + smtid+k

smtid+BSsmtid + smtid+k+BS

_ syncthreads()

iftid== 0then

u2bidn+1sm0 u_{2\,bid}^{n + 1} \leftarrow s{m_0}

u2bid+1n+1smBS u_{2\,bid + 1}^{n + 1} \leftarrow s{m_{BS}}

endfor

endfor

Experimental results and discussion

The experimental platform is composed of NVIDIA Quadro FX 5800 GPU and quad-core Intel Xeon E5540 CPU. The operating system is Kylin 3.1. The experiments all adopt double-precision operations. The compilers are CUDANVCC3.0 and Intel Fortran 11.1. The experiments all use three-level optimization. Use MPICH21.3 to communicate on the CPU. Each CPU core corresponds to an MPI process, thus using quad-core MPI parallel [8]. The exact solution of the numerical example is. The fine-grained data-level parallel algorithm proposed in this paper agrees well with the exact solution. The values of Δt and h are 0.5/256 and 2.0/17.

Performance comparison

The performance comparison of the parallel algorithms of GPU and CPU is shown in Figure 1. The parallel algorithm on the CPU adopts the MPI-based parallel programming model [9]. Figure 1a shows that N's performance comparison is fixed at 64, and M-1 is increased from 4096 to 20480. The speedup ratio is between 3.07 and 4.20. Figure 1b shows the performance comparison of N increasing from 128 to 2048 when M-1 is fixed at 40960. The speedup ratio is between 4.12 and 4.25.

Figure 1

Performance comparison of parallel algorithms on GPU and CPU

The comparison of the effect of using algorithm 2 and similar optimization algorithm 3 is shown in Figure 2a. The thread block size is fixed at 64, and the value of N is also fixed at 64. “Opt2” means comparing algorithm 3 and original algorithm 2 to obtain two grid point data at the time step of a thread block calculation. “Opt4” and “opt8” respectively represent the comparison between the optimized algorithm of the four and eight grid point data at the time step of a thread block calculation and the original algorithm 2. “Opt2”, “opt4” and “opt8” can get 1.33, 1.60 and 1.78 times the acceleration effect respectively. The effect of thread block size on algorithm performance is shown in Figure 2b. Among them, M-1 is 12288, and N is 256. It can be seen that 64 is a better choice [10]. Limited by the size of shared memory, compilation fails when the thread block size is 256.

Figure 2

The effect of algorithm optimization and the effect of thread block size on algorithm performance

Discussion

The calculation algorithm of the BS value itself models the relationship between the parallel algorithm, the GPU architecture, and the programming model. Modeling is a very amazing thing, but very, very difficult. It isn't easy to quantify the relevant parameters of the calculation algorithm of the BS value in this paper. Therefore, in the various parallel algorithm research, we have done, there is no calculation algorithm for the value of BS, but the value of BS is tested from a practical perspective. We take a better value. Generally, 32, 64, 128, 192, 256, 512, etc. are more appropriate.

Conclusion

This paper proposes a fine-grained data-level parallel algorithm on GPU for the explicit finite difference method of the fractional reaction-diffusion equation in Riesz space. The experimental results show that the fine-grained data-level parallel algorithm proposed in this paper is effective. Therefore, parallel algorithms for numerical methods of fractional differential equations on GPU are worthy of attention by researchers.

Figure 1

Performance comparison of parallel algorithms on GPU and CPU
Performance comparison of parallel algorithms on GPU and CPU

Figure 2

The effect of algorithm optimization and the effect of thread block size on algorithm performance
The effect of algorithm optimization and the effect of thread block size on algorithm performance

Zhang, Y., Li, Y., & Chen, M. Iterative learning control for linear generalized distributed parameter system. Neural Computing and Applications., 2019; 31(9): 4503–4512 ZhangY. LiY. ChenM. Iterative learning control for linear generalized distributed parameter system Neural Computing and Applications. 2019 31 9 4503 4512 10.1007/s00521-018-3835-0 Search in Google Scholar

Zangeneh-Nejad, F., Sounas, D. L., Alù, A., & Fleury, R. Analogue computing with metamaterials. Nature Reviews Materials., 2021; 6(3): 207–225 Zangeneh-NejadF. SounasD. L. AlùA. FleuryR. Analogue computing with metamaterials Nature Reviews Materials. 2021 6 3 207 225 10.1038/s41578-020-00243-2 Search in Google Scholar

Aboutanios, E., Sethu, V., Ambikairajah, E., Taubman, D. S., & Epps, J. Teaching Signal Processing Through Frequent and Diverse Design: A Pedagogical Approach. IEEE Signal Processing Magazine., 2021; 38(3): 133–143 AboutaniosE. SethuV. AmbikairajahE. TaubmanD. S. EppsJ. Teaching Signal Processing Through Frequent and Diverse Design: A Pedagogical Approach IEEE Signal Processing Magazine. 2021 38 3 133 143 10.1109/MSP.2021.3057855 Search in Google Scholar

Johnson, J. K., Geng, S., Hoffman, M. W., Adesnik, H., & Wessel, R. Precision multidimensional neural population code recovered from single intracellular recordings. Scientific reports., 2020; 10(1): 1–23 JohnsonJ. K. GengS. HoffmanM. W. AdesnikH. WesselR. Precision multidimensional neural population code recovered from single intracellular recordings Scientific reports. 2020 10 1 1 23 10.1038/s41598-020-72936-1752483932994474 Search in Google Scholar

Dirks, C., Striewski, P., Wirth, B., Aalto, A., & Olguin-Olguin, A. A mathematical model for bleb regulation in zebrafish primordial germ cells. Mathematical Medicine and Biology: A Journal of the IMA., 2021; 38(2): 218–254 DirksC. StriewskiP. WirthB. AaltoA. Olguin-OlguinA. A mathematical model for bleb regulation in zebrafish primordial germ cells Mathematical Medicine and Biology: A Journal of the IMA. 2021 38 2 218 254 10.1093/imammb/dqab00233601409 Search in Google Scholar

Phelps, G., Gitomer, D. H., Iaconangelo, C. J., Etkina, E., Seeley, L., & Vokos, S. Developing Assessments of Content Knowledge for Teaching Using Evidence-centered Design. Educational Assessment., 2020; 25(2): 91–111 PhelpsG. GitomerD. H. IaconangeloC. J. EtkinaE. SeeleyL. VokosS. Developing Assessments of Content Knowledge for Teaching Using Evidence-centered Design Educational Assessment. 2020 25 2 91 111 10.1080/10627197.2020.1756256 Search in Google Scholar

Kovács, I. A., Barabási, D. L., & Barabási, A. L. Uncovering the genetic blueprint of the C. elegans nervous system. Proceedings of the National Academy of Sciences., 2020; 117(52): 33570–33577 KovácsI. A. BarabásiD. L. BarabásiA. L. Uncovering the genetic blueprint of the C. elegans nervous system Proceedings of the National Academy of Sciences. 2020 117 52 33570 33577 10.1073/pnas.2009093117777713133318182 Search in Google Scholar

Gençoğlu, M. & Agarwal, P. Use of Quantum Differential Equations in Sonic Processes. Applied Mathematics and Nonlinear Sciences., 2021; 6(1): 21–28 GençoğluM. AgarwalP. Use of Quantum Differential Equations in Sonic Processes Applied Mathematics and Nonlinear Sciences. 2021 6 1 21 28 10.2478/amns.2020.2.00003 Search in Google Scholar

Harisha, Ranjini, P., Lokesha, V. & Kumar, S. Degree Sequence of Graph Operator for some Standard Graphs. Applied Mathematics and Nonlinear Sciences., 2020; 5(2): 99–108 Harisha RanjiniP. LokeshaV. KumarS. Degree Sequence of Graph Operator for some Standard Graphs Applied Mathematics and Nonlinear Sciences. 2020 5 2 99 108 10.2478/amns.2020.2.00018 Search in Google Scholar

Huang, Y., Karami, B., Shahsavari, D., & Tounsi, A. Static stability analysis of carbon nanotube reinforced polymeric composite doubly curved micro-shell panels. Archives of Civil and Mechanical Engineering. 2021;, 21(4): 1–15 HuangY. KaramiB. ShahsavariD. TounsiA. Static stability analysis of carbon nanotube reinforced polymeric composite doubly curved micro-shell panels Archives of Civil and Mechanical Engineering 2021 21 4 1 15 10.1007/s43452-021-00291-7 Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo