An in-network architecture for accelerating shared-memory multiprocessor collectives B Klenk, N Jiang, G Thorson, L Dennison 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020 | 63 | 2020 |
An Overview of MPI Characteristics of Exascale Proxy Applications B Klenk, H Fröning International Supercomputing Conference, 217-236, 2017 | 55 | 2017 |
Energy-efficient collective reduce and allreduce operations on distributed GPUs L Oden, B Klenk, H Fröning Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International …, 2014 | 30 | 2014 |
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors B Klenk, H Fröning, H Eberle, L Dennison IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 28 | 2017 |
Analyzing put/get apis for thread-collaborative processors B Klenk, L Oden, H Fröning 2014 43rd International Conference on Parallel Processing Workshops, 411-418, 2014 | 19 | 2014 |
Analyzing communication models for distributed thread-collaborative processors in terms of energy and time B Klenk, L Oden, H Froning 2015 IEEE International Symposium on Performance Analysis of Systems and …, 2015 | 18 | 2015 |
TeraRack: A Tbps Rack for Machine Learning Training M Khani, M Ghobadi, M Alizadeh, Z Zhu, M Glick, K Bergman, A Vahdat, ... | 8* | |
Energy-efficient stencil computations on distributed gpus using dynamic parallelism and gpu-controlled communication L Oden, B Klenk, H Fröning Energy Efficient Supercomputing Workshop (E2SC), 2014, 31-40, 2014 | 6 | 2014 |
Why Data Science and Machine Learning Need Silicon Photonics B Klenk, L Dennison Optical Fiber Communication Conference, M4F. 6, 2020 | 5 | 2020 |
GPU-centric communication for improved efficiency B Klenk, L Oden, H Fröning International Workshop on Green Programming, Computing and Data Processing …, 2014 | 4 | 2014 |
Energy-efficient Computing on Distributed GPUs using Dynamic Parallelism and GPU controlled Communication L Oden, B Klenk, H Froning 2nd Workshop on Energy-efficient SuperComputing (E2SC), 2014 | 2 | 2014 |
Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy L Oden, B Klenk, H Fröning Parallel Computing 57, 125-134, 2016 | 1 | 2016 |
DISTRIBUTED BATCH NORMALIZATION USING ESTIMATES AND ROLLBACK LR Dennison, B Klenk US Patent App. 16/669,979, 2020 | | 2020 |
DISTRIBUTED BATCH NORMALIZATION USING PARTIAL POPULATIONS LR Dennison, B Klenk US Patent App. 16/669,925, 2020 | | 2020 |
Communication Architectures for Scalable GPU-centric Computing Systems B Klenk | | 2018 |
Scalable Communication Architectures for GPU-centric Systems B Klenk, H Fröning | | |
Energy-efficient Distributed GPU Communication B Klenk, L Oden, H Fröning | | |
Future Memory Technologies B Klenk | | |