Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs S Potluri, K Hamidouche, A Venkatesh, D Bureddy, DK Panda 2013 42nd International Conference on Parallel Processing, 80-89, 2013 | 185 | 2013 |
Omb-gpu: A micro-benchmark suite for evaluating mpi libraries on gpu clusters D Bureddy, H Wang, A Venkatesh, S Potluri, DK Panda Recent Advances in the Message Passing Interface: 19th European MPI Users …, 2012 | 70 | 2012 |
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning AA Awan, K Hamidouche, A Venkatesh, DK Panda Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016 | 57 | 2016 |
MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters S Potluri, D Bureddy, K Hamidouche, A Venkatesh, K Kandalla, ... Proceedings of the International Conference on High Performance Computing …, 2013 | 52 | 2013 |
Compute optimization mechanism for deep neural networks A Bleiweiss, A Venkatesh, G Keskin, J Gierach, O Elibol, T Bar-On, ... US Patent 12,086,705, 2024 | 50 | 2024 |
Efficient intra-node communication on intel-mic clusters S Potluri, A Venkatesh, D Bureddy, K Kandalla, DK Panda 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid …, 2013 | 49 | 2013 |
A case for application-oblivious energy-efficient MPI runtime A Venkatesh, A Vishnu, K Hamidouche, N Tallent, D Panda, D Kerbyson, ... Proceedings of the international conference for high performance computing …, 2015 | 45 | 2015 |
Designing MPI library with dynamic connected transport (DCT) of InfiniBand: early experiences H Subramoni, K Hamidouche, A Venkatesh, S Chakraborty, DK Panda International Supercomputing Conference, 278-295, 2014 | 40 | 2014 |
The work of Einsiedler, Katok and Lindenstrauss on the Littlewood conjecture A Venkatesh Bulletin-American Mathematical Society 45 (1), 117, 2008 | 39 | 2008 |
Designing optimized mpi broadcast and allreduce for many integrated core (mic) infiniband clusters K Kandalla, A Venkatesh, K Hamidouche, S Potluri, D Bureddy, DK Panda 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 63-70, 2013 | 33 | 2013 |
Power-check: An energy-efficient checkpointing framework for HPC clusters RR Chandrasekar, A Venkatesh, K Hamidouche, DK Panda 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2015 | 32 | 2015 |
MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures SN Omkar, A Venkatesh, M Mudigere Engineering Applications of Artificial Intelligence 25 (8), 1611-1627, 2012 | 32 | 2012 |
Cuda kernel based collective reduction operations on large-scale gpu clusters CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016 | 28 | 2016 |
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ... 2015 IEEE International Conference on Cluster Computing, 78-87, 2015 | 28 | 2015 |
Evaluation of energy characteristics of mpi communication primitives with rapl A Venkatesh, K Kandalla, DK Panda 2013 IEEE International Symposium on Parallel & Distributed Processing …, 2013 | 25 | 2013 |
UPC on MIC: Early experiences with native and symmetric modes M Luo, M Li, A Venkatesh, X Lu, DK Panda 7th International Conference on PGAS Programming Models, 198, 2013 | 23 | 2013 |
A comprehensive performance evaluation of OpenSHMEM libraries on InfiniBand clusters J Jose, J Zhang, A Venkatesh, S Potluri, DK Panda Workshop on OpenSHMEM and Related Technologies, 14-28, 2014 | 20 | 2014 |
Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters H Subramoni, AA Awan, K Hamidouche, D Pekurovsky, A Venkatesh, ... High Performance Computing: 30th International Conference, ISC High …, 2015 | 19 | 2015 |
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters A Venkatesh, H Subramoni, K Hamidouche, DK Panda 2014 21st International Conference on High Performance Computing (HiPC), 1-10, 2014 | 19 | 2014 |
MIC-Check: A distributed check pointing framework for the Intel many integrated cores architecture R Rajachandrasekar, S Potluri, A Venkatesh, K Hamidouche, ... Proceedings of the 23rd international symposium on High-performance parallel …, 2014 | 19 | 2014 |