The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science A Marek, V Blum, R Johanni, V Havu, B Lang, T Auckenthaler, A Heinecke, ... Journal of Physics: Condensed Matter 26 (21), 213201, 2014 | 186 | 2014 |
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor A Heinecke, K Vaidyanathan, M Smelyanskiy, A Kobotov, R Dubtsov, ... 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013 | 174 | 2013 |
Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers A Heinecke, A Breuer, S Rettenberger, M Bader, AA Gabriel, C Pelties, ... SC'14: Proceedings of the International Conference for High Performance …, 2014 | 102 | 2014 |
LIBXSMM: accelerating small matrix multiplications by runtime code generation A Heinecke, G Henry, M Hutchinson, H Pabst SC'16: Proceedings of the International Conference for High Performance …, 2016 | 97 | 2016 |
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems C Niethammer, S Becker, M Bernreuther, M Buchholz, W Eckhardt, ... Journal of chemical theory and computation 10 (10), 4455-4464, 2014 | 96 | 2014 |
From gpgpu to many-core: Nvidia fermi and intel many integrated core architecture A Heinecke, M Klemm, HJ Bungartz Computing in Science & Engineering 14 (2), 78-83, 2012 | 79 | 2012 |
591 TFLOPS multi-trillion particles simulation on SuperMUC W Eckhardt, A Heinecke, R Bader, M Brehm, N Hammer, H Huber, ... international supercomputing conference, 1-12, 2013 | 78 | 2013 |
Mixed precision training of convolutional neural networks using integer operations D Das, N Mellempudi, D Mudigere, D Kalamkar, S Avancha, K Banerjee, ... arXiv preprint arXiv:1802.00930, 2018 | 75 | 2018 |
Qsparse-local-SGD: Distributed SGD with quantization, sparsification, and local computations D Basu, D Data, C Karakus, SN Diggavi IEEE Journal on Selected Areas in Information Theory 1 (1), 217-226, 2020 | 72* | 2020 |
Sustained petascale performance of seismic simulations with SeisSol on SuperMUC A Breuer, A Heinecke, S Rettenberger, M Bader, AA Gabriel, C Pelties International Supercomputing Conference, 1-18, 2014 | 60 | 2014 |
Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices J Park, M Smelyanskiy, K Vaidyanathan, A Heinecke, DD Kalamkar, X Liu, ... SC'14: Proceedings of the International Conference for High Performance …, 2014 | 55 | 2014 |
A study of bfloat16 for deep learning training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019 | 48 | 2019 |
Anatomy of high-performance deep learning convolutions on SIMD architectures E Georganas, S Avancha, K Banerjee, D Kalamkar, G Henry, H Pabst, ... SC18: International Conference for High Performance Computing, Networking …, 2018 | 44 | 2018 |
Parallel matrix multiplication based on space-filling curves on shared memory multicore platforms A Heinecke, M Bader Proceedings of the 2008 workshop on Memory access on future processors: a …, 2008 | 42 | 2008 |
High order seismic simulations on the Intel Xeon Phi processor (Knights Landing) A Heinecke, A Breuer, M Bader, P Dubey International Conference on High Performance Computing, 343-362, 2016 | 39 | 2016 |
Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves M Bader, R Franz, S Günther, A Heinecke International Conference on Parallel Processing and Applied Mathematics, 628-638, 2007 | 36 | 2007 |
Performance optimizations for scalable implicit RANS calculations with SU2 TD Economon, D Mudigere, G Bansal, A Heinecke, F Palacios, J Park, ... Computers & Fluids 129, 146-158, 2016 | 34 | 2016 |
Extending a highly parallel data mining algorithm to the intel® many integrated core architecture A Heinecke, M Klemm, D Pflüger, A Bode, HJ Bungartz European Conference on Parallel Processing, 375-384, 2011 | 33 | 2011 |
Cache oblivious dense and sparse matrix multiplication based on Peano curves M Bader, A Heinecke Proceedings of the PARA 8, 2008 | 32 | 2008 |
Option pricing with a direct adaptive sparse grid approach HJ Bungartz, A Heinecke, D Pflüger, S Schraufstetter Journal of Computational and Applied Mathematics 236 (15), 3741-3750, 2012 | 31 | 2012 |