Akshay Venkatesh
Akshay Venkatesh
NVIDIA; Ohio State University
Verified email at - Homepage
Cited by
Cited by
Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs
S Potluri, K Hamidouche, A Venkatesh, D Bureddy, DK Panda
2013 42nd International Conference on Parallel Processing, 80-89, 2013
Omb-gpu: A micro-benchmark suite for evaluating mpi libraries on gpu clusters
D Bureddy, H Wang, A Venkatesh, S Potluri, DK Panda
Recent Advances in the Message Passing Interface: 19th European MPI Users …, 2012
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning
AA Awan, K Hamidouche, A Venkatesh, DK Panda
Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016
MVAPICH-PRISM: A proxy-based communication framework using InfiniBand and SCIF for Intel MIC clusters
S Potluri, D Bureddy, K Hamidouche, A Venkatesh, K Kandalla, ...
Proceedings of the International Conference on High Performance Computing …, 2013
Efficient intra-node communication on intel-mic clusters
S Potluri, A Venkatesh, D Bureddy, K Kandalla, DK Panda
2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid …, 2013
A case for application-oblivious energy-efficient MPI runtime
A Venkatesh, A Vishnu, K Hamidouche, N Tallent, D Panda, D Kerbyson, ...
Proceedings of the international conference for high performance computing …, 2015
Compute optimization mechanism for deep neural networks
A Bleiweiss, A Venkatesh, G Keskin, J Gierach, O Elibol, T Bar-On, ...
US Patent App. 15/858,014, 2019
Designing MPI library with dynamic connected transport (DCT) of InfiniBand: early experiences
H Subramoni, K Hamidouche, A Venkatesh, S Chakraborty, DK Panda
International Supercomputing Conference, 278-295, 2014
The work of Einsiedler, Katok and Lindenstrauss on the Littlewood conjecture
A Venkatesh
Bulletin of the American Mathematical Society 45 (1), 117-134, 2008
Designing optimized mpi broadcast and allreduce for many integrated core (mic) infiniband clusters
K Kandalla, A Venkatesh, K Hamidouche, S Potluri, D Bureddy, DK Panda
2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 63-70, 2013
MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures
SN Omkar, A Venkatesh, M Mudigere
Engineering Applications of Artificial Intelligence 25 (8), 1611-1627, 2012
Power-check: An energy-efficient checkpointing framework for HPC clusters
RR Chandrasekar, A Venkatesh, K Hamidouche, DK Panda
2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2015
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
Evaluation of energy characteristics of mpi communication primitives with rapl
A Venkatesh, K Kandalla, DK Panda
2013 IEEE International Symposium on Parallel & Distributed Processing …, 2013
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
UPC on MIC: Early experiences with native and symmetric modes
M Luo, M Li, A Venkatesh, X Lu, DK Panda
7th International Conference on PGAS Programming Models, 198, 2013
A comprehensive performance evaluation of OpenSHMEM libraries on InfiniBand clusters
J Jose, J Zhang, A Venkatesh, S Potluri, DK Panda
Workshop on OpenSHMEM and Related Technologies, 14-28, 2014
Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters
H Subramoni, AA Awan, K Hamidouche, D Pekurovsky, A Venkatesh, ...
High Performance Computing: 30th International Conference, ISC High …, 2015
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters
A Venkatesh, H Subramoni, K Hamidouche, DK Panda
2014 21st International Conference on High Performance Computing (HiPC), 1-10, 2014
MIC-Check: A distributed check pointing framework for the Intel many integrated cores architecture
R Rajachandrasekar, S Potluri, A Venkatesh, K Hamidouche, ...
Proceedings of the 23rd international symposium on High-performance parallel …, 2014
The system can't perform the operation now. Try again later.
Articles 1–20