[[1] M. Adams, P. Colella, D. T. Graves, J. N. Johnson, Keen, N. D., T. J. Ligocki, D. F. Martin, P. W. McCorquodale, D. Modiano, P. Schwartz, T. Sternberg, and B. van Straalen. Chombo software package for AMR applications - design document. Technical Report LBNL-6616E, Lawrence Berkeley National Laboratory, Jan 2015.]Search in Google Scholar
[[2] S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In Modern Software Tools in Scientific Computing, pages 163–202. Birkhäuser Press, 1997.10.1007/978-1-4612-1986-6_8]Search in Google Scholar
[[3] W. Bangerth, R. Hartmann, and G. Kanschat. deal.II – a general purpose object oriented finite element library. ACM Trans. Math. Softw., 33(4):24/1–24/27, 2007.10.1145/1268776.1268779]Search in Google Scholar
[[4] P. Bastian, C. Engwer, D. Göddeke, O. Iliev, O. Ippisch, M. Ohlberger, S. Turek, J. Fahlke, S. Kaulmann, S. Müthing, and D. Ribbrock. EXA-DUNE: Flexible pde solvers, numerical methods and applications. In Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science, pages 530–541. Springer, 2014.10.1007/978-3-319-14313-2_45]Search in Google Scholar
[[5] M. Bauer, F. Schornbaum, C. Godenschwager, M. Markl, D. Anderl, H. Köstler, and U. Rüde. A python extension for the massively parallel multiphysics simulation framework walberla. International Journal of Parallel, Emergent and Distributed Systems, 31(6):529–542, 2016.10.1080/17445760.2015.1118478]Search in Google Scholar
[[6] B. Bergen, T. Gradl, F. Hülsemann, and U. Rüde. A massively parallel multigrid method for finite elements. Computing in Science and Engineering, 8(6):56–62, 2006.10.1109/MCSE.2006.102]Search in Google Scholar
[[7] B. Bergen and F. Hülsemann. Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numer. Linear Algebra Appl., 11:279–291, 2004.10.1002/nla.382]Search in Google Scholar
[[8] M. Blatt, A. Burchardt, A. Dedner, C. Engwer, J. Fahlke, B. Flemisch, C. Gersbacher, C. Grüser, F. Gruber, C. Gräninger, D. Kempf, R. Klöfkorn, T. Malkmus, S. Müthing, M. Nolte, M. Piatkowski, and O. Sander. The distributed and unified numerics environment, version 2.4. Archive of Numerical Software, 4(100):13–29, 2016.]Search in Google Scholar
[[9] M. Bolten, F. Franchetti, P. H. J. Kelly, C. Lengauer, and M. Mohr. Algebraic description and automatic generation of multigrid methods in SPIRAL. Concurrency and Computation: Practice and Experience, 29(17):4105:1–4105:11, 2017. Special Issue on Advanced Stencil-Code Engineering.10.1002/cpe.4105]Search in Google Scholar
[[10] T. Brandvik and G. Pullan. SBLOCK: A framework for efficient stencil-based PDE solvers on multi-core platforms. In 2010 10th IEEE International Conference on Computer and Information Technology, pages 1181–1188, Jun 2010.10.1109/CIT.2010.214]Search in Google Scholar
[[11] M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In 2011 IEEE International Parallel Distributed Processing Symposium, pages 676–687, May 2011.10.1109/IPDPS.2011.70]Search in Google Scholar
[[12] C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, F. Cantonnet, T. El-Ghazawi, A. Mohanti, Y. Yao, and D. Chavarría-Miranda. An evaluation of global address space languages: Co-array Fortran and unified parallel C. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ‘05, pages 36–47, New York, NY, USA, 2005. ACM.10.1145/1065944.1065950]Search in Google Scholar
[[13] Z. DeVito, N. Joubert, F. Palaciosy, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pages 1–12. ACM, 2011.10.1145/2063384.2063396]Search in Google Scholar
[[14] H. C. Edwards, C. R. Trott, and D. Sunderland. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. Journal of Parallel and Distributed Computing, 74(12):3202 – 3216, 2014. Special issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.10.1016/j.jpdc.2014.07.003]Search in Google Scholar
[[15] R. D. Falgout, J. E. Jones, and U. M. Yang. The design and implementation of hypre, a library of parallel high performance preconditioners. In Numerical Solution of Partial Differential Equations on Parallel Computers, pages 267–294, Berlin, Heidelberg, 2006. Springer.10.1007/3-540-31619-1_8]Search in Google Scholar
[[16] M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not., 33(5):212–223, May 1998.10.1145/277652.277725]Search in Google Scholar
[[17] K. Fürlinger, C. Glass, A. Knüpfer, J. Tao, D. Hünich, K. Idrees, M. Maiterth, Y. Mhedheb, and H. Zhou. DASH: Data structures and algorithms with support for hierarchical locality. In Euro-Par 2014 Workshops (Porto, Portugal), pages 542–552, 2014.10.1007/978-3-319-14313-2_46]Search in Google Scholar
[[18] B. Gmeiner, T. Gradl, H. Köstler, and U. Rüde. Highly parallel geometric multigrid algorithm for hierarchical hybrid grids. In K. Binder, G. Münster, and M. Kremer, editors, NIC Symposium 2012, volume 45 of Publication series of the John von Neumann Institute for Computing, pages 323–330, Jülich, Germany, 2012.]Search in Google Scholar
[[19] B. Gmeiner, M. Huber, L. John, U. Rüde, and B. Wohlmuth. A quantitative performance study for Stokes solvers at the extreme scale. J. Comput. Sci., 17(3):509–521, 2016.10.1016/j.jocs.2016.06.006]Search in Google Scholar
[[20] B. Gmeiner, H. Köstler, M. Stürmer, and U. Rüde. Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrency and Computation: Practice and Experience, 26(1):217–240, 2014.10.1002/cpe.2968]Search in Google Scholar
[[21] B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth. Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems. SIAM J. Sci. Comput., 37(2):C143–C168, 2015.10.1137/130941353]Search in Google Scholar
[[22] B. Gmeiner, U. Rüde, H. Stengel, C. Waluga, and B. Wohlmuth. Towards textbook efficiency for parallel multigrid. Numer. Math. Theory Methods Appl., 8:2246, 2015.10.4208/nmtma.2015.w10si]Search in Google Scholar
[[23] T. Gysi, T. Grosser, and T. Hoefler. MODESTO: Data-centric analytic optimization of complex stencil programs on heterogeneous architectures. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ‘15, pages 177–186, New York, NY, USA, 2015. ACM.10.1145/2751205.2751223]Search in Google Scholar
[[24] T. Gysi, C. Osuna, O. Fuhrer, M. Bianco, and T. C. Schulthess. STELLA: A domain-specific tool for structured grid methods in weather and climate models. In Proceedings International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pages 41:1–41:12. ACM, Nov 2015.10.1145/2807591.2807627]Search in Google Scholar
[[25] M. Heisig. Petalisp: A common lisp library for data parallel programming. In 11th European Lisp Symposium, page 4, 2018.]Search in Google Scholar
[[26] M. Heisig and H. Köstler. Petalisp: run time code generation for operations on strided arrays. In Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, pages 11–17. ACM, 2018.10.1145/3219753.3219755]Search in Google Scholar
[[27] M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397–423, 2005.10.1145/1089014.1089021]Search in Google Scholar
[[28] L. V. Kale and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. SIGPLAN Notices, 28(10):91–108, Oct 1993.10.1145/167962.165874]Search in Google Scholar
[[29] N. Kohl, D. Thönnes, D. Drzisga, D. Bartuschat, and U. Rüde. The hyteg finite-element software framework for scalable multigrid solvers. International Journal of Parallel, Emergent and Distributed Systems, 0(0):1–20, 2018.]Search in Google Scholar
[[30] H. Köstler, C. Schmitt, S. Kuckuk, F. Hannig, J. Teich, and U. Rüde. A scala prototype to generate multigrid solver implementations for different problems and target multi-core platforms. Int. J. of Computational Science and Engineering, 14(2):150–163, 2017.10.1504/IJCSE.2017.082879]Search in Google Scholar
[[31] H. Köstler, M. Stürmer, and T. Pohl. Performance engineering to achieve real-time high dynamic range imaging. Journal of Real-Time Image Processing, pages 1–13, 2013.10.1007/s11554-012-0312-3]Search in Google Scholar
[[32] S. Kronawitter, S. Kuckuk, H. Köstler, and C. Lengauer. Automatic data layout transformations in the exastencils code generator. Modern Physics Letters A, 28(03):1850009, 2018.10.1142/S0129626418500093]Search in Google Scholar
[[33] S. Kronawitter, S. Kuckuk, H. Köstler, and C. Lengauer. Automatic data layout transformations in the ExaStencils code generator. Parallel Processing Letters, 28(03):1850009, 2018.10.1142/S0129626418500093]Search in Google Scholar
[[34] S. Kronawitter, S. Kuckuk, and C. Lengauer. Redundancy elimination in the ExaStencils code generator. In Algorithms and Architectures for Parallel Processing, pages 159–173, Cham, 2016. Springer International Publishing.10.1007/978-3-319-49956-7_13]Search in Google Scholar
[[35] S. Kuckuk, G. Haase, D. A. Vasco, and H. Köstler. Towards generating efficient flow solvers with the ExaStencils approach. Concurrency and Computation: Practice and Experience, 29(17):4062:1–4062:17, 2017. Special Issue on Advanced Stencil-Code Engineering.10.1002/cpe.4062]Search in Google Scholar
[[36] S. Kuckuk and H. Köstler. Automatic generation of massively parallel codes from ExaSlang. Computation, 4(3):27:1–27:20, 2016. Special Issue on High Performance Computing (HPC) Software Design.10.3390/computation4030027]Search in Google Scholar
[[37] S. Kuckuk and H. Köstler. Whole program generation of massively parallel shallow water equation solvers. In 2018 IEEE International Conference on Cluster Computing (CLUSTER), pages 78–87, Sept 2018.10.1109/CLUSTER.2018.00020]Search in Google Scholar
[[38] S. Kuckuk and H. Kstler. Automatic generation of massively parallel codes from exaslang. Computation, 4(3):27:1–27:20, 2016. Special Issue on High Performance Computing (HPC) Software Design.10.3390/computation4030027]Search in Google Scholar
[[39] S. Kuckuk, L. Leitenmaier, C. Schmitt, D. Schönwetter, H. Köstler, and D. Fey. Towards virtual hardware prototyping for generated geometric multigrid solvers. Technical Report CS 2017-01, Technische Fakultät, 2017.]Search in Google Scholar
[[40] C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, et al. Exastencils: Advanced stencil-code engineering. In European Conference on Parallel Processing, pages 553–564. Springer, 2014.10.1007/978-3-319-14313-2_47]Search in Google Scholar
[[41] C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, S. Kuckuk, H. Rittich, and C. Schmitt. ExaStencils: Advanced stencil-code engineering. In L. Lopes et al., editors, Euro-Par 2014: Parallel Processing Workshops, volume 8806 of Lecture Notes in Computer Science (LNCS), pages 553–564. Springer, 2014.10.1007/978-3-319-14313-2_47]Search in Google Scholar
[[42] A. Logg, K.-A. Mardal, and G. N. Wells. Automated Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational Science and Engineering (LNCSE). Springer, 2012.10.1007/978-3-642-23099-8]Search in Google Scholar
[[43] N. Maruyama, K. Sato, T. Nomura, and S. Matsuoka. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In SC ‘11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12, Nov 2011.10.1145/2063384.2063398]Search in Google Scholar
[[44] G. R. Mudalige, I. Reguly, M. B. Giles, C. Bertolli, and P. H. J. Kelly. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In Proc. Innovative Parallel Computing (InPar), San Jose, California, May 2012. IEEE.10.1109/InPar.2012.6339594]Search in Google Scholar
[[45] G. Ofenbeck, T. Rompf, and M. Püschel. Staging for generic programming in space and time. SIGPLAN Not., 52(12):15–28, Oct 2017.10.1145/3170492.3136060]Search in Google Scholar
[[46] M. Püschel, F. Franchetti, and Y. Voronenko. Spiral, volume 4, pages 1920–1933. Springer, 2011.]Search in Google Scholar
[[47] F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. Mcrae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly. Firedrake: Automating the finite element method by composing abstractions. ACM Trans. on Mathematical Software (TOMS), 43(3):24:1–24:27, 2016.10.1145/2998441]Search in Google Scholar
[[48] P. Rawat, M. Kong, T. Henretty, J. Holewinski, K. Stock, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. SDSLc: A multi-target domain-specific compiler for stencil computations. In Proc. 5th Int’l Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 6:1–6:10. ACM, Nov 2015.10.1145/2830018.2830025]Search in Google Scholar
[[49] C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler, and J. Teich. Exa-Slang: A domain-specific language for highly scalable multigrid solvers. In Proc. 4th Int’l Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 42–51. IEEE Computer Society, Nov. 2014.10.1109/WOLFHPC.2014.11]Search in Google Scholar
[[50] C. Schmitt, M. Schmid, F. Hannig, J. Teich, S. Kuckuk, and H. Köstler. Generation of multigrid-based numerical solvers for FPGA accelerators. In Proc. 2nd Int’l Workshop on High-Performance Stencil Computations (HiStencils), pages 9–15, Jan. 2015.]Search in Google Scholar
[[51] C. Schmitt, M. Schmid, S. Kuckuk, H. Köstler, J. Teich, and F. Hannig. Reconfigurable hardware generation of multigrid solvers with conjugate gradient coarse-grid solution. Parallel Processing Letters, 28(04):1850016, 2018.10.1142/S0129626418500160]Search in Google Scholar
[[52] J. Schmitt, H. Köstler, J. Eitzinger, and R. Membarth. Unified code generation for the parallel computation of pairwise interactions using partial evaluation. In 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC), pages 17–24. IEEE, 2018.10.1109/ISPDC2018.2018.00012]Search in Google Scholar
[[53] Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The Pochoir stencil compiler. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 117–128. ACM, 2011.10.1145/1989493.1989508]Search in Google Scholar
[[54] U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, San Diego, CA, USA, 2001.]Search in Google Scholar
[[55] A. Vogel, S. Reiter, M. Rupp, A. Nägel, and G. Wittum. UG 4: A novel flexible software system for simulating pde based models on high performance computers. Computing and Visualization in Science, 16(4):165–179, 2013.10.1007/s00791-014-0232-9]Search in Google Scholar
[[56] T. Weinzierl. The peano softwareparallel, automaton-based, dynamically adaptive grid traversals. ACM Transactions on Mathematical Software (TOMS), 45(2):14, 2019.10.1145/3319797]Search in Google Scholar