Follow
Khaled Hamidouche
Khaled Hamidouche
AMD Research
Verified email at amd.com
Title
Cited by
Cited by
Year
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters
AA Awan, K Hamidouche, JM Hashmi, DK Panda
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017
1602017
Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs
S Potluri, K Hamidouche, A Venkatesh, D Bureddy, DK Panda
2013 42nd International Conference on Parallel Processing, 80-89, 2013
1542013
MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters
S Potluri, D Bureddy, K Hamidouche, A Venkatesh, K Kandalla, ...
Proceedings of the International Conference on High Performance Computing …, 2013
532013
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning
AA Awan, K Hamidouche, A Venkatesh, DK Panda
Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016
472016
Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters
R Shi, S Potluri, K Hamidouche, J Perkins, M Li, D Rossetti, DKDK Panda
2014 21st International Conference on High Performance Computing (HiPC), 1-10, 2014
452014
A case for application-oblivious energy-efficient MPI runtime
A Venkatesh, A Vishnu, K Hamidouche, N Tallent, D Panda, D Kerbyson, ...
Proceedings of the International Conference for High Performance Computing …, 2015
412015
Designing MPI library with dynamic connected transport (DCT) of InfiniBand: early experiences
H Subramoni, K Hamidouche, A Venkatesh, S Chakraborty, DK Panda
Supercomputing: 29th International Conference, ISC 2014, Leipzig, Germany …, 2014
372014
Hand: A hybrid approach to accelerate non-contiguous data movement using mpi datatypes on gpu clusters
R Shi, X Lu, S Potluri, K Hamidouche, J Zhang, DK Panda
2014 43rd International Conference on Parallel Processing, 221-230, 2014
322014
Designing optimized mpi broadcast and allreduce for many integrated core (mic) infiniband clusters
K Kandalla, A Venkatesh, K Hamidouche, S Potluri, D Bureddy, DK Panda
2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 63-70, 2013
322013
Power-check: An energy-efficient checkpointing framework for HPC clusters
RR Chandrasekar, A Venkatesh, K Hamidouche, DK Panda
2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2015
282015
A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters
R Shi, S Potluri, K Hamidouche, X Lu, K Tomko, DK Panda
2013 IEEE International Conference on Cluster Computing (CLUSTER), 1-8, 2013
272013
Parallel smith-waterman comparison on multicore and manycore computing platforms with BSP++
K Hamidouche, FM Mendonca, J Falcou, ACMA de Melo, D Etiemble
International Journal of Parallel Programming 41, 111-136, 2013
272013
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
242016
A framework for an automatic hybrid MPI+ OpenMP code generation.
K Hamidouche, J Falcou, D Etiemble
SpringSim (hpc), 48-55, 2011
242011
Re-designing CNTK deep learning framework on modern GPU enabled clusters
DS Banerjee, K Hamidouche, DK Panda
2016 IEEE international conference on cloud computing technology and science …, 2016
232016
Scalable Graph500 design with MPI-3 RMA
M Li, X Lu, S Potluri, K Hamidouche, J Jose, K Tomko, DK Panda
2014 IEEE International Conference on Cluster Computing (CLUSTER), 230-238, 2014
232014
Hybrid bulk synchronous parallelism library for clustered SMP architectures
K Hamidouche, J Falcou, D Etiemble
Proceedings of the fourth international workshop on High-level parallel …, 2010
222010
Designing mpi library with on-demand paging (odp) of infiniband: challenges and benefits
M Li, K Hamidouche, X Lu, H Subramoni, J Zhang, DK Panda
SC'16: Proceedings of the International Conference for High Performance …, 2016
202016
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
192015
Designing scalable out-of-core sorting with hybrid MPI+ PGAS programming models
J Jose, S Potluri, H Subramoni, X Lu, K Hamidouche, K Schulz, H Sundar, ...
Proceedings of the 8th International Conference on Partitioned Global …, 2014
192014
The system can't perform the operation now. Try again later.
Articles 1–20