Anatomy of high-performance matrix multiplication
K Goto, RA van de Geijn
ACM Transactions on Mathematical Software (TOMS) 34 (3), 12, 2008
SUMMA: Scalable universal matrix multiplication algorithm
RA van de Geijn, J Watts
Concurrency Practice and Experience 9 (4), 255-274, 1997
High-performance implementation of the level-3 BLAS
K Goto, R Van De Geijn
ACM Transactions on Mathematical Software (TOMS) 35 (1), 1-14, 2008
FLAME: Formal linear algebra methods environment
JA Gunnels, FG Gustavson, GM Henry, RA Van De Geijn
ACM Transactions on Mathematical Software (TOMS) 27 (4), 422-455, 2001
Using PLAPACK--parallel linear algebra package
RA Van de Geijn, P Alpatov
MIT press, 1997
Elemental: A new framework for distributed memory dense matrix computations
J Poulson, B Marker, RA Van de Geijn, JR Hammond, NA Romero
ACM Transactions on Mathematical Software (TOMS) 39 (2), 1-24, 2013
Collective communication: theory, practice, and experience
E Chan, M Heimlich, A Purkayastha, R Van De Geijn
Concurrency and Computation: Practice and Experience 19 (13), 1749-1783, 2007
A fast solution method for three‐dimensional many‐particle problems of linear elasticity
Y Fu, KJ Klimkowski, GJ Rodin, E Berger, JC Browne, JK Singer, ...
International Journal for Numerical Methods in Engineering 42 (7), 1215-1229, 1998
BLIS: A framework for rapidly instantiating BLAS functionality
FG Van Zee, RA van de Geijn
ACM Transactions on Mathematical Software 41 (3), Article 14, 2015
The science of deriving dense linear algebra algorithms
P Bientinesi, JA Gunnels, ME Myers, ES Quintana-Ortí, RA van de Geijn
ACM Transactions on Mathematical Software (TOMS) 31 (1), 1-26, 2005
On reducing TLB misses in matrix multiplication
K Goto, R van De Geijn
Technical Report TR02-55, Department of Computer Sciences, U. of Texas at Austin, 2002
Interprocessor collective communication library (InterCom)
M Barnett, L Shuler, R van De Geijn, S Gupta, DG Payne, J Watts
Proceedings of IEEE Scalable High Performance Computing Conference, 357-364, 1994
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
E Chan, ES Quintana-Orti, G Quintana-Orti, R Van De Geijn
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms …, 2007
Programming matrix algorithms-by-blocks for thread-level parallelism
G Quintana-Ortí, ES Quintana-Ortí, RAVD Geijn, FGV Zee, E Chan
ACM Transactions on Mathematical Software (TOMS) 36 (3), 1-26, 2009
Solving dense linear systems on platforms with multiple hardware accelerators
G Quintana-Ortí, FD Igual, ES Quintana-Ortí, RA Van de Geijn
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of …, 2009
Parallel out-of-core computation and updating of the QR factorization
BC Gunter, RA Van De Geijn
ACM Transactions on Mathematical Software (TOMS) 31 (1), 60-78, 2005
Broadcasting on meshes with wormhole routing
M Barnett, DG Payne, RA Van De Geijn, J Watts
Journal of Parallel and Distributed Computing 35 (2), 111-122, 1996
A look at scalable dense linear algebra libraries
JJ Dongarra, R van de Geijn, DW Walker
Oak Ridge National Lab., TN (United States), 1992
Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
E Chan, FG Van Zee, P Bientinesi, ES Quintana-Orti, G Quintana-Orti, ...
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of …, 2008
A family of high-performance matrix multiplication algorithms
JA Gunnels, GM Henry, RA Van De Geijn
International Conference on Computational Science, 51-60, 2001
