The case for strong scaling in deep learning: Training large 3d cnns with hybrid parallelism Y Oyama, N Maruyama, N Dryden, E McCarthy, P Harrington, J Balewski, ... IEEE Transactions on Parallel and Distributed Systems 32 (7), 1641-1652, 2020 | 43 | 2020 |
Matrix engines for high performance computing: A paragon of performance or grasping at straws? J Domke, E Vatai, A Drozd, P ChenT, Y Oyama, L Zhang, S Salaria, ... 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 33 | 2021 |
Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers Y Oyama, A Nomura, I Sato, H Nishimura, Y Tamatsu, S Matsuoka 2016 IEEE International Conference on Big Data (Big Data), 66-75, 2016 | 29 | 2016 |
Accelerating Deep Learning Frameworks with Micro-batches Y Oyama, T Ben-Nun, T Hoefler, S Matsuoka IEEE Cluster 2018, 0 | 26* | |
Co-design center for exascale machine learning technologies (exalearn) FJ Alexander, J Ang, JA Bilbrey, J Balewski, T Casey, R Chard, J Choi, ... The International Journal of High Performance Computing Applications 35 (6 …, 2021 | 12 | 2021 |
Prediction apparatus, prediction method, and prediction program H Nishimura, S Matsuoka, A Nomura, Y Oyama, I Sato US Patent App. 15/439,304, 2018 | 5 | 2018 |
Learning system and learning method I Sato, R Fujisaki, A Nomura, Y Oyama, S Matsuoka US Patent 11,521,057, 2022 | 3 | 2022 |
Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization Y Oyama, N Maruyama, N Dryden, P Harrington, J Balewski, S Matsuoka, ... 並列/分散/協調処理に関するサマーワークショップ (SWoPP2019), 2019 | 2 | 2019 |
Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing A Kasagi, M Asaoka, A Tabuchi, Y Oyama, T Honda, Y Sakai, T Dang, ... 2021 Ninth International Symposium on Computing and Networking (CANDAR), 108-113, 2021 | 1 | 2021 |
Dihydrogen N Maruyama, BV Essen, NJ Dryden, TR Benson, TY Moon, Y Oyama Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States), 2020 | 1 | 2020 |
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching Y Oyama, T Ben-Nun, T Hoefler, S Matsuoka | 1 | 2018 |
Asynchronous, data-parallel deep convolutional neural network training with linear prediction model for parameter transition I Sato, R Fujisaki, Y Oyama, A Nomura, S Matsuoka Neural Information Processing: 24th International Conference, ICONIP 2017 …, 2017 | 1 | 2017 |
Accelerating AlphaFold2 Inference of Protein Three-Dimensional Structure on the Supercomputer Fugaku Y Oyama, A Tabuchi, A Tokuhisa Proceedings of the 13th Workshop on AI and Scientific Computing at Scale …, 2023 | | 2023 |
Accelerating Hybrid DFT Simulations Using Performance Modeling on Supercomputers Y Oyama, T Honda, A Ishikawa, K Shirahata 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet …, 2023 | | 2023 |
「富岳」 における密度汎関数法計算ソフトウェア CP2K の高速化 Y OYAMA, T HONDA, K SHIRAHATA 情報処理学会研究報告 (Web) 2022 (HPC-185), 2022 | | 2022 |
メモリアクセスデータを用いた機械学習によるアプリケーションの類型化 土川稔生, 遠藤敏夫, 大山洋介, 野村哲弘, 近藤正章, 松岡聡 研究報告ハイパフォーマンスコンピューティング (HPC) 2019 (12), 1-7, 2019 | | 2019 |
深層学習における BatchNormalization 使用時の計算時間と精度の関係性 八島慶汰, 大山洋介, 松岡聡 研究報告ハイパフォーマンスコンピューティング (HPC) 2018 (1), 1-6, 2018 | | 2018 |
機械学習による計算機トレースの自動生成 土川稔生, 大山洋介, 野村哲弘, 松岡聡 研究報告ハイパフォーマンスコンピューティング (HPC) 2018 (28), 1-6, 2018 | | 2018 |
Less is More: Accelerating Deep Neural Networks with Micro-Batching Y Oyama, T Ben-Nun, T Hoefler, S Matsuoka 情報処理学会研究報告, 2017 | | 2017 |
ディープラーニングのデータ並列学習における少精度浮動小数点数を用いた通信量の削減 大山洋介, 野村哲弘, 佐藤育郎, 松岡聡 情報処理学会研究報告, 2017 | | 2017 |