Ahmad Abdelfattah
Ahmad Abdelfattah
Research Scientist, Innovative Computing Laboratory, University of Tennessee
Verified email at icl.utk.edu
Title
Cited by
Cited by
Year
Performance, design, and autotuning of batched GEMM for GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
International Conference on High Performance Computing, 21-38, 2016
722016
High-performance matrix-matrix multiplications of very small matrices
I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ...
European Conference on Parallel Processing, 659-671, 2016
402016
High-performance tensor contractions for GPUs
A Abdelfattah, M Baboulin, V Dobrev, J Dongarra, C Earl, J Falcou, ...
Procedia Computer Science 80, 108-118, 2016
402016
Kblas: An optimized library for dense matrix-vector multiplication on gpu accelerators
A Abdelfattah, D Keyes, H Ltaief
ACM Transactions on Mathematical Software (TOMS) 42 (3), 1-31, 2016
352016
With extreme computing, the rules have changed
J Dongarra, S Tomov, P Luszczek, J Kurzak, M Gates, I Yamazaki, H Anzt, ...
Computing in Science & Engineering 19 (3), 52-62, 2017
292017
Parallel programming models for dense linear algebra on heterogeneous systems
J Dongarra, M Abalenkovs, A Abdelfattah, M Gates, A Haidar, J Kurzak, ...
Supercomputing frontiers and innovations 2 (4), 67-86, 2016
292016
A novel fast and accurate pseudo-analytical simulation approach for MOAO
É Gendron, A Charara, A Abdelfattah, D Gratadour, D Keyes, H Ltaief, ...
Adaptive Optics Systems IV 9148, 91486L, 2014
282014
The design of fast and energy-efficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques
A Haidar, A Abdelfattah, M Zounon, P Wu, S Pranesh, S Tomov, ...
International Conference on Computational Science, 586-600, 2018
232018
C++ api for blas and lapack
M Gates, P Luszczek, A Abdelfattah, J Kurzak, J Dongarra, K Arturov, ...
Technical Report 2, ICL-UT-17-03, 2017
18*2017
Pipelining computational stages of the tomographic reconstructor for multi-object adaptive optics on a multi-gpu system
A Charara, H Ltaief, D Gratadour, D Keyes, A Sevin, A Abdelfattah, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
172014
Optimizing memory-bound SYMV kernel on GPU hardware accelerators
A Abdelfattah, J Dongarra, D Keyes, H Ltaief
International Conference on High Performance Computing for Computational …, 2012
172012
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA
A Abdelfattah, A Haidar, S Tomov, J Dongarra
Journal of Computational Science 20, 85-93, 2017
152017
Systematic approach in optimizing numerical memory-bound kernels on GPU
A Abdelfattah, D Keyes, H Ltaief
European Conference on Parallel Processing, 207-216, 2012
152012
A guide for achieving high performance with very small matrices on GPU: a case study of batched LU and Cholesky factorizations
A Haidar, A Abdelfattah, M Zounon, S Tomov, J Dongarra
IEEE Transactions on Parallel and Distributed Systems 29 (5), 973-984, 2017
132017
Performance tuning and optimization techniques of fixed and variable size batched Cholesky factorization on GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
Procedia Computer Science 80, 119-130, 2016
132016
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
Proceedings of the International Conference on Supercomputing, 1-10, 2017
112017
On the development of variable size batched computation for heterogeneous parallel architectures
A Abdelfattah, A Haidar, S Tomov, J Dongarra
2016 IEEE International Parallel and Distributed Processing Symposium …, 2016
102016
Fast Batched Matrix Multiplication for Small Sizes using Half-Precision Arithmetic on GPUs
A Abdelfattah, S Tomov, J Dongarra
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2019
92019
Roadmap for the development of a linear algebra library for exascale computing: SLATE: Software for linear algebra targeting exascale
A Abdelfattah, H Anzt, A Bouteiller, A Danalis, J Dongarra, M Gates, ...
SLATE Working Note 1, Innovative Computing Laboratory, University of Tennessee, 2017
92017
Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs
A Abdelfattah, H Ltaief, D Keyes, J Dongarra
Concurrency and Computation: Practice and Experience 28 (12), 3447-3465, 2016
92016
The system can't perform the operation now. Try again later.
Articles 1–20