The Non–Symmetric –Step Lanczos Algorithm: Derivation of Efficient Recurrences and Synchronization–Reducing Variants of BiCG and QMR

Stefan Feuerriegel; H. Martin Bücker

Accesso libero

The Non–Symmetric s–Step Lanczos Algorithm: Derivation of Efficient Recurrences and Synchronization–Reducing Variants of BiCG and QMR

Stefan Feuerriegel

H. Martin Bücker

| 30 dic 2015

International Journal of Applied Mathematics and Computer Science

Volume 25 (2015): Numero 4 (December 2015)

Special issue: Complex Problems in High-Performance Computing Systems, Editors: Mauro Iacono, Joanna Kołodziej

INFORMAZIONI SU QUESTO ARTICOLO

Articolo precedente

Articolo Successivo

Cita

Pubblicato online: 30 dic 2015

Pagine: 769 - 785

Ricevuto: 15 apr 2014

DOI: https://doi.org/10.1515/amcs-2015-0055

Parole chiave
synchronization-reducing, -step Lanczos, -step BiCG, -step QMR, efficient recurrences

© 2015 Stefan Feuerriegel et al., published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Balay, S., Gropp, W.D., McInnes, L.C. and Smith, B.F. (1997). Efficient management of parallelism in object oriented numerical software libraries, in E. Arge et al. (Eds.), Modern Software Tools in Scientific Computing, Birkhäuser Press, Boston, MA, pp. 163–202.10.1007/978-1-4612-1986-6_8Search in Google Scholar

Bücker, H.M. (2002). Iteratively solving large sparse linear systems on parallel computers, in J. Grotendorst, D. Marx and A. Muramatsu (Eds.), Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, NIC Series, Vol. 10, John Von Neumann Institute for Computing, Jülich, pp. 521–548.Search in Google Scholar

Bücker, H.M. and Sauren, M. (1996). A parallel version of the quasi-minimal residual method based on coupled two-term recurrences, in J. Waśniewski et al. (Eds.), Applied Parallel Computing, Lecture Notes in Computer Science, Vol. 1184, Springer, Berlin, pp. 157–165.10.1007/3-540-62095-8_17Search in Google Scholar

Bücker, H.M. and Sauren, M. (1997). A variant of the biconjugate gradient method suitable for massively parallel computing, in G. Bilardi et al. (Eds.), Solving Irregularly Structured Problems in Parallel, Lecture Notes in Computer Science, Vol. 1253, Springer, Berlin, pp. 72–79.10.1007/3-540-63138-0_7Search in Google Scholar

Bücker, H.M. and Sauren, M. (1999). Reducing global synchronization in the biconjugate gradient method, in T. Yang (Ed.), Parallel Numerical Computations with Applications, Kluwer Academic Publishers, Norwell, MA, pp. 63–76.10.1007/978-1-4615-5205-5_5Search in Google Scholar

Cappello, F., Geist, A., Gropp, B., Kale, L., Kramer, B. and Snir, M. (2009). Toward exascale resilience, International Journal of High Performance Computing Applications23(4): 374–388.10.1177/1094342009347767Search in Google Scholar

Carson, E. and Demmel, J. (2014). A residual replacement strategy for improving the maximum attainable accuracy of s-step Krylov subspace methods, SIAM Journal on Matrix Analysis and Applications35(1): 22–43.10.1137/120893057Search in Google Scholar

Carson, E., Knight, N. and Demmel, J. (2014). Avoiding communication in nonsymmetric Lanczos-based Krylov subspace methods, SIAM Journal on Scientific Computing35(5): S42–S61.10.1137/120881191Search in Google Scholar

Chronopoulos, A.T. (1986). A class of parallel iterative methods implemented on multiprocessors, Technical report UIUCDCS-R-86-1267, Department of Computer Science, University of Illinois, Urbana, IL.Search in Google Scholar

Chronopoulos, A.T. and Gear, C.W. (1989). s-Step iterative methods for symmetric linear systems, Journal of Computational and Applied Mathematics25(2): 153–168.10.1016/0377-0427(89)90045-9Search in Google Scholar

Chronopoulos, A.T. and Swanson, C.D. (1996). Parallel iterative s-step methods for unsymmetric linear systems, Parallel Computing22(5): 623–641.10.1016/0167-8191(96)00022-1Search in Google Scholar

Curfman McInnes, L., Smith, B., Zhang, H. and Mills, R.T. (2014). Hierarchical Krylov and nested Krylov methods for extreme-scale computing, Parallel Computing40(1): 17–31.10.1016/j.parco.2013.10.001Search in Google Scholar

Davis, N.E., Robey, R.W., Ferenbaugh, C.R., Nicholaeff, D. and Trujillo, D.P. (2012). Paradigmatic shifts for exascale supercomputing, Journal of Supercomputing62(2): 1023–1044.10.1007/s11227-012-0789-3Search in Google Scholar

Demmel, J., Heath, M. and van der Vorst, H. (1993). Parallel numerical linear algebra, Acta Numerica2(1): 111–197.10.1017/S096249290000235XSearch in Google Scholar

Duff, I.S. (2012). European exascale software initiative: Numerical libraries, solvers and algorithms, in D. Hutchison et al. (Eds.), Euro-Par 2011: Parallel Processing Workshops, Lecture Notes in Computer Science, Vol. 7155, Springer, Berlin, pp. 295–304.10.1007/978-3-642-29737-3_34Search in Google Scholar

Duff, I.S. and van der Vorst, H.A. (1999). Developments and trends in the parallel solution of linear systems, Parallel Computing25(13–14): 1931–1970.10.1016/S0167-8191(99)00077-0Search in Google Scholar

Feuerriegel, S. and Bücker, H.M. (2013a). A normalization scheme for the non-symmetric s-step Lanczos algorithm, in J. Kolodziej et al. (Eds.), ICA3PP, Part II, Lecture Notes in Computer Science, Vol. 8286, Springer, Berlin, pp. 30–39.Search in Google Scholar

Feuerriegel, S. and Bücker, H.M. (2013b). Synchronization-reducing variants of the biconjugate gradient and the quasi-minimal residual methods, in J. Kolodziej et al. (Eds.), ICA3PP, Part I, Lecture Notes in Computer Science, Vol. 8285, Springer, Berlin, pp. 226–235.Search in Google Scholar

Fischer, B. and Freund, R. (1994). An inner product-free conjugate gradient-like algorithm for Hermitian positive definite systems, in J. Brown et al. (Eds.), Cornelius Lanczos International Centenary Conference, SIAM, Philadelphia, PA, pp. 288–290.Search in Google Scholar

Fletcher, R. (1976). Conjugate gradient methods for indefinite systems, in G. Watson (Ed.), Numerical Analysis, Lecture Notes in Computer Science, Vol. 506, Springer, Berlin, pp. 73–89.10.1007/BFb0080116Search in Google Scholar

Freund, R. and Nachtigal, N. (1991). QMR: A quasi-minimal residual method for non-Hermitian linear systems, Numerische Mathematik60(1): 315–339.10.1007/BF01385726Search in Google Scholar

Freund, R.W. and Hochbruck, M. (1991). A biconjugate gradient type algorithm on massively parallel architectures, in R. Vichnevetsky and J.J.H. Miller (Eds.), IMACS’91, Criterion Press, Dublin, pp. 720–721.Search in Google Scholar

Freund, R.W. and Hochbruck, M. (1992). A biconjugate gradient-type algorithm for the iterative solution of non-Hermitian linear systems on massively parallel architectures, in C. Brezinski and U. Kulisch (Eds.), IMACS’91, North Holland, Amsterdam, pp. 169–178.Search in Google Scholar

Freund, R.W. and Nachtigal, N.M. (1994). An implementation of the QMR method based on coupled two-term recurrences, SIAM Journal on Scientific Computing15(2): 313.10.1137/0915022Search in Google Scholar

Ghysels, P., Ashby, T.J., Meerbergen, K. and Vanroose, W. (2013). Hiding global communication latency in the GMRES algorithm on massively parallel machines, SIAM Journal on Scientific Computing35(1): C48–C71.10.1137/12086563XSearch in Google Scholar

Ghysels, P. and Vanroose, W. (2014). Hiding global synchronization latency in the preconditioned conjugate gradient algorithm, Parallel Computing40(7): 224–238.10.1016/j.parco.2013.06.001Search in Google Scholar

Gustafsson, M., Demmel, J. and Holmgren, S. (2012a). Numerical evaluation of the communication-avoiding Lanczos algorithm, Technical Report 2012-001, Department of Information Technology, Uppsala University, Uppsala.Search in Google Scholar

Gustafsson, M., Kormann, K. and Holmgren, S. (2012b). Communication-efficient algorithms for numerical quantum dynamics, in K. Jónasson (Ed.), Applied Parallel and Scientific Computing, Lecture Notes in Computer Science, Vol. 7134, Springer, Berlin, pp. 368–378.Search in Google Scholar

Gutknecht, M.H. (1997). Lanczos-type solvers for nonsymmetric linear systems of equations, Acta Numerica6(1): 271–397.10.1017/S0962492900002737Search in Google Scholar

Hernandez, V., Roman, J.E. and Vidal, V. (2005). SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems, ACM Transactions on Mathematical Software31(3): 351–362.10.1145/1089014.1089019Search in Google Scholar

Hoemmen, M.F. (2010). Communication-avoiding Krylov Subspace Methods, Ph.D. thesis, EECS Department, University of California, Berkeley, CA.Search in Google Scholar

Kandalla, K., Yang, U., Keasler, J., Kolev, T., Moody, A., Subramoni, H., Tomko, K., Vienne, J., De Supinski, B. and Panda, D. (2012). Designing non-blocking allreduce with collective offload on InfiniBand clusters: A case study with conjugate gradient solvers, Proceedings of the 2012 IEEE 26th International Parallel Distributed Processing Symposium (IPDPS), Shanghai, China, pp. 1156–1167.Search in Google Scholar

Kim, S.K. (2010). Efficient biorthogonal Lanczos algorithm on message passing parallel computer, in C.H. Hsu and V. Malyshkin (Eds.), MTPP 2010, Lecture Notes in Computer Science, Vol. 6083, Springer, Berlin, pp. 293–299.10.1007/978-3-642-14822-4_33Search in Google Scholar

Kim, S.K. and Chronopoulos, A. (1991). A class of Lanczos-like algorithms implemented on parallel computers, Parallel Computing17(6–7): 763–778.10.1016/S0167-8191(05)80065-1Search in Google Scholar

Kim, S.K. and Chronopoulos, A.T. (1992). An efficient nonsymmetric Lanczos method on parallel vector computers, Journal of Computational and Applied Mathematics42(3): 357–374.10.1016/0377-0427(92)90085-CSearch in Google Scholar

Kim, S.K. and Kim, T.H. (2005). A study on the efficient parallel block Lanczos method, in J. Zhang, J.-H. He and Y. Fu (Eds.), Computational and Information Science, Lecture Notes in Computer Science, Vol. 3314, Springer, Berlin/Heidelberg, pp. 231–237.Search in Google Scholar

Lanczos, C. (1950). An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research of the National Bureau of Standards45(4): 255–282.10.6028/jres.045.026Search in Google Scholar

Meurant, G. (1986). The conjugate gradient method on supercomputers, Supercomputer13: 9–17.Search in Google Scholar

Mohiyuddin, M., Hoemmen, M., Demmel, J. and Yelick, K. (2009). Minimizing communication in sparse matrix solvers, Conference on High Performance Computing Networking, Storage and Analysis, Portland, OR, USA, pp. 36:1–36:12.Search in Google Scholar

Saad, Y. (1989). Krylov subspace methods on supercomputers, SIAM Journal on Scientific and Statistical Computing10(6): 1200–1232.10.1137/0910073Search in Google Scholar

Sauren, M. and Bücker, H.M. (1998). On deriving the quasi-minimal residual method, SIAM Review40(4): 922–926.10.1137/S0036144596319368Search in Google Scholar

Shalf, J., Dosanjh, S. and Morrison, J. (2011). Exascale computing technology challenges, in D. Hutchison et al. (Eds.), High Performance Computing for Computational Science, VECPAR 2010, Lecture Notes in Computer Science, Vol. 6449, Springer, Berlin, pp. 1–25.10.1007/978-3-642-19328-6_1Search in Google Scholar

van der Vorst, H. (1990). Iterative methods for the solution of large systems of equations on supercomputers, Advances in Water Resources13(3): 137–146.10.1016/0309-1708(90)90005-OSearch in Google Scholar

van der Vorst, H.A. and Ye, Q. (2000). Residual replacement strategies for Krylov subspace iterative methods for the convergence of true residuals, SIAM Journal on Scientific Computing22(3): 835–852.10.1137/S1064827599353865Search in Google Scholar

Van Rosendale, J. (1983). Minimizing inner product data dependencies in conjugate gradient iteration, NASA Contractor Report NASA-CR-172178, NASA Langley Research Center, Hampton, VA.Search in Google Scholar

Zhu, S.-X., Gu, T.-X. and Liu, X.-P. (2014). Minimizing synchronizations in sparse iterative solvers for distributed supercomputers, Computers & Mathematics with Applications67(1): 199–209.10.1016/j.camwa.2013.11.008Search in Google Scholar

Zuo, X., Gu, T.-X. and Mo, Z. (2010). An improved GPBi-CG algorithm suitable for distributed parallel computing, Applied Mathematics and Computation215(12): 4101–4109.10.1016/j.amc.2009.11.044Search in Google Scholar

eISSN:: 2083-8492
Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Mathematics, Applied Mathematics

Feed RSS della rivista