The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P. and Tomov, S. (2009). Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series180(1): 012037.10.1088/1742-6596/180/1/012037Search in Google Scholar

Amdahl, G.M. (1967). Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the Spring Joint Computer Conference, AFIPS’67 (Spring), Atlantic City, NJ, USA, pp. 483–485.10.1145/1465482.1465560Search in Google Scholar

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A. and Sorensen, D. (1999). LAPACK Users’ Guide, 3rd Edn., SIAM, Philadelphia, PA.10.1137/1.9780898719604Search in Google Scholar

Buttari, A., Langou, J., Kurzak, J. and Dongarra, J. (2009). A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing35(1): 38–53.10.1016/j.parco.2008.10.002Search in Google Scholar

Bylina, B. (2018). The block WZ factorization, Journal of Computational and Applied Mathematics331(C): 119–132.10.1016/j.cam.2017.10.004Search in Google Scholar

Bylina, B. and Bylina, J. (2007). Incomplete WZ factorization as an alternative method of preconditioning for solving Markov chains, in R. Wyrzykowski et al. (Eds.), PPAM, Lecture Notes in Computer Science, Vol. 4967, Springer, Berlin/Heidelberg, pp. 99–107.10.1007/978-3-540-68111-3_11Search in Google Scholar

Bylina, B. and Bylina, J. (2009). Influence of preconditioning and blocking on accuracy in solving Markovian models, International Journal of Applied Mathematics and Computer Science19(2): 207–217, DOI: 10.2478/v10006-009-0017-3.10.2478/v10006-009-0017-3Open DOI Search in Google Scholar

Bylina, B. and Bylina, J. (2015). Strategies of parallelizing nested loops on the multicore architectures on the example of the WZ factorization for the dense matrices, in M. Ganzha et al. (Eds.), Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, Annals of Computer Science and Information Systems, Vol. 5, IEEE, Piscataway, NJ, pp. 629–639.10.15439/2015F354Search in Google Scholar

Donfack, S., Dongarra, J., Faverge, M., Gates, M., Kurzak, J., Luszczek, P. and Yamazaki, I. (2015). A survey of recent developments in parallel implementations of Gaussian elimination, Concurrency and Computation: Practice and Experience27(5): 1292–1309.10.1002/cpe.3306Search in Google Scholar

Dongarra, J., DuCroz, J., Duff, I.S. and Hammarling, S. (1990). A set of level-3 basic linear algebra subprograms, ACM Transactions on Mathematics Software16(1): 1–17.10.1145/77626.79170Search in Google Scholar

Dongarra, J.J., Faverge, M., Ltaief, H. and Luszczek, P. (2013). Achieving numerical accuracy and high performance using recursive tile LU factorization, Concurrency and Computation: Practice and Experience26(6): 1408–1431.10.1002/cpe.3110Search in Google Scholar

Dumas, J.G., Gautier, T., Pernet, C., Roch, J.L. and Sultan, Z. (2016). Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination, Parallel Computing57: 235–249.10.1016/j.parco.2015.10.003Search in Google Scholar

Evans, D. and Hatzopoulos, M. (1979). A parallel linear system solver, International Journal of Computer Mathematics7(3): 227–238.10.1080/00207167908803174Search in Google Scholar

Flynn, M.J. (1972). Some computer organizations and their effectiveness, IEEE Transactions on Computers21(9): 948–960.10.1109/TC.1972.5009071Search in Google Scholar

García, I., Merelo, J., Bruguera, J. and Zapata, E. (1990). Parallel quadrant interlocking factorization on hypercube computers, Parallel Computing15(1–3): 87–100.10.1016/0167-8191(90)90033-6Search in Google Scholar

Gustavson, F.G. (1997). Recursion leads to automatic variable blocking for dense linear-algebra algorithms, IBM Journal of Research and Development41(6): 737–756.10.1147/rd.416.0737Search in Google Scholar

Intel (2019). Math Kernel Library, https://software.intel.com/en-us/mkl.Search in Google Scholar

Kurzak, J., Langou, J., Langou, C.D.J., Ltaief, H., Luszczek, P., Yarkhan, A., Haidar, A., Hoffman, J., Agullo, P.D.E., Buttari, A. and Hadri, B. (2010). PLASMA Users’ Guide: Parallel Linear Algebra Software for Multicore Architectures, Version 2.3., http://icl.cs.utk.edu/projectsfiles/plasma/pdf/users_guide.pdf.Search in Google Scholar

Marqués, M., Quintana-Ortí, G., Quintana-Ortí, E.S. and van de Geijn, R.A. (2011). Using desktop computers to solve large-scale dense linear algebra problems, The Journal of Supercomputing58(2): 145–150.10.1007/s11227-010-0394-2Search in Google Scholar

Rao, S.C.S. (1997). Existence and uniqueness of WZ factorization, Parallel Computing23(8): 1129–1139.10.1016/S0167-8191(97)00042-2Search in Google Scholar

Yalamov, P. and Evans, D. (1995). The WZ matrix factorisation method, Parallel Computing21(7): 1111–1120.10.1016/0167-8191(94)00088-RSearch in Google Scholar

Yarkhan, A., Kurzak, J., Luszczek, P. and Dongarra, J. (2017). Porting the PLASMA numerical library to the OpenMP standard, International Journal of Parallel Programming45(3): 612–633, DOI:10.1007/s10766-016-0441-6.10.1007/s10766-016-0441-6Open DOI Search in Google Scholar

eISSN:: 2083-8492
Język:: Angielski

Częstotliwość wydawania:: 4 razy w roku
Dziedziny czasopisma:: Mathematics, Applied Mathematics

Kanał RSS czasopisma

The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

Data publikacji: 04 lip 2019

Zakres stron: 407 - 419

Otrzymano: 08 wrz 2018

Przyjęty: 02 mar 2019

DOI: https://doi.org/10.2478/amcs-2019-0030

Słowa kluczowetiled algorithm, WZ factorization, solution of linear systems, Amdahl’s law, high performance computing, multicore architectures

© 2019 Beata Bylina et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Słowa kluczowe
tiled algorithm, WZ factorization, solution of linear systems, Amdahl’s law, high performance computing, multicore architectures