Nathan DeBardeleben
Title
Cited by
Cited by
Year
Addressing failures in exascale computing
M Snir, RW Wisniewski, JA Abraham, SV Adve, S Bagchi, P Balaji, J Belak, ...
The International Journal of High Performance Computing Applications 28 (2 …, 2014
3492014
Memory errors in modern systems: The good, the bad, and the ugly
V Sridharan, N DeBardeleben, S Blanchard, KB Ferreira, J Stearley, ...
ACM SIGARCH Computer Architecture News 43 (1), 297-310, 2015
2302015
Feng shui of supercomputer memory positional effects in DRAM and SRAM faults
V Sridharan, J Stearley, N DeBardeleben, S Blanchard, S Gurumurthi
SC'13: Proceedings of the International Conference on High Performance …, 2013
1682013
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation
D Tiwari, S Gupta, J Rogers, D Maxwell, P Rech, S Vazhkudai, D Oliveira, ...
2015 IEEE 21st International Symposium on High Performance Computer …, 2015
1142015
High-end computing resilience: Analysis of issues facing the HEC community and path-forward for research and development
N DeBardeleben, J Laros, JT Daly, SL Scott, C Engelmann, B Harrod
Whitepaper, Dec, 2009
722009
GPGPUs: How to Combine High Computational Power with High Reliability
LB Gomez, F Cappello, L Carro, N DeBardeleben, B Fang, S Gurumurthi, ...
632014
F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability
Q Guan, N Debardeleben, S Blanchard, S Fu
Proceedings of the 2014 IEEE 28th International Parallel and Distributed …, 2014
602014
On the diversity of cluster workloads and its impact on research results
G Amvrosiadis, JW Park, GR Ganger, GA Gibson, E Baseman, ...
2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18), 533-546, 2018
512018
Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters
WM Jones, JT Daly, N DeBardeleben
Proceedings of the 19th ACM International Symposium on High Performance …, 2010
432010
Application monitoring and checkpointing in HPC: looking towards exascale systems
WM Jones, JT Daly, N DeBardeleben
Proceedings of the 50th Annual Southeast Regional Conference, 262-267, 2012
342012
Inter-agency workshop on hpc resilience at extreme scale
J Daly, B Harrod, T Hoang, L Nowell, B Adolf, S Borkar, N DeBardeleben, ...
National Security Agency Advanced Computing Systems, 2012
332012
Developing scientific applications using eclipse
GR Watson, NA DeBardeleben
Computing in Science & Engineering 8 (4), 50-61, 2006
322006
Experimental framework for injecting logic errors in a virtual machine to profile applications for soft error resilience
N DeBardeleben, S Blanchard, Q Guan, Z Zhang, S Fu
European Conference on Parallel Processing, 282-291, 2011
312011
GPU behavior on a large HPC cluster
N DeBardeleben, S Blanchard, L Monroe, P Romero, D Grunau, C Idler, ...
European Conference on Parallel Processing, 680-689, 2013
272013
Towards practical algorithm based fault tolerance in dense linear algebra
P Wu, Q Guan, N DeBardeleben, S Blanchard, D Tao, X Liang, J Chen, ...
Proceedings of the 25th ACM International Symposium on High-Performance …, 2016
252016
Experimental and analytical study of xeon phi reliability
D Oliveira, L Pilla, N DeBardeleben, S Blanchard, H Quinn, I Koren, ...
Proceedings of the International Conference for High Performance Computing …, 2017
242017
Silent data corruption resilient two-sided matrix factorizations
P Wu, N DeBardeleben, Q Guan, S Blanchard, J Chen, D Tao, X Liang, ...
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017
192017
Interpretable anomaly detection for monitoring of high performance computing systems
E Baseman, S Blanchard, N DeBardeleben, A Bonnie, A Morrow
Outlier Definition, Detection, and Description on Demand Workshop at ACM …, 2016
182016
Exploring time and frequency domains for accurate and automated anomaly detection in cloud computing systems
Q Guan, S Fu, N DeBardeleben, S Blanchard
2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing …, 2013
182013
An investigation of the effects of hard and soft errors on graphics processing unit‐accelerated molecular dynamics simulations
RM Betz, NA DeBardeleben, RC Walker
Concurrency and Computation: Practice and Experience 26 (13), 2134-2140, 2014
162014
The system can't perform the operation now. Try again later.
Articles 1–20