Rethinking attention with performers K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ... arXiv preprint arXiv:2009.14794, 2020 | 948 | 2020 |
Masked language modeling for proteins via linearly scalable long-context transformers K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ... arXiv preprint arXiv:2006.03555, 2020 | 65 | 2020 |
Polyvit: Co-training vision transformers on images, videos and audio V Likhosherstov, A Arnab, K Choromanski, M Lucic, Y Tay, A Weller, ... arXiv preprint arXiv:2111.12993, 2021 | 44 | 2021 |
Rethinking attention with performers. arXiv K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ... preprint, 2020 | 28 | 2020 |
Ode to an ODE KM Choromanski, JQ Davis, V Likhosherstov, X Song, JJ Slotine, J Varley, ... Advances in Neural Information Processing Systems 33, 3338-3350, 2020 | 22 | 2020 |
Rethinking attention with Performers. arXiv 2020 K Choromanski, V Likhosherstov, D Dohan, X Song, A Gane, T Sarlos, ... arXiv preprint arXiv:2009.14794, 0 | 19 | |
Sub-linear memory: How to make performers slim V Likhosherstov, KM Choromanski, JQ Davis, X Song, A Weller Advances in Neural Information Processing Systems 34, 6707-6719, 2021 | 14 | 2021 |
Large‐scale log analysis of digital reading P Braslavski, V Likhosherstov, V Petras, M Gäde Proceedings of the Association for Information Science and Technology 53 (1 …, 2016 | 12 | 2016 |
On the expressive power of self-attention matrices V Likhosherstov, K Choromanski, A Weller arXiv preprint arXiv:2106.03764, 2021 | 11 | 2021 |
From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers K Choromanski, H Lin, H Chen, T Zhang, A Sehanobish, V Likhosherstov, ... International Conference on Machine Learning, 3962-3983, 2022 | 10 | 2022 |
Hybrid random features K Choromanski, H Chen, H Lin, Y Ma, A Sehanobish, D Jain, MS Ryoo, ... arXiv preprint arXiv:2110.04367, 2021 | 10 | 2021 |
Stochastic flows and geometric optimization on the orthogonal group K Choromanski, D Cheikhi, J Davis, V Likhosherstov, A Nazaret, ... International Conference on Machine Learning, 1918-1928, 2020 | 6 | 2020 |
UFO-BLO: Unbiased first-order bilevel optimization V Likhosherstov, X Song, K Choromanski, J Davis, A Weller arXiv preprint arXiv:2006.03631, 2020 | 5 | 2020 |
Inference and Sampling of -free Ising Models V Likhosherstov, Y Maximov, M Chertkov International Conference on Machine Learning, 3963-3972, 2019 | 4 | 2019 |
Ten months of digital reading: An exploratory log study P Braslavski, V Petras, V Likhosherstov, M Gäde Research and Advanced Technology for Digital Libraries: 20th International …, 2016 | 4 | 2016 |
Chefs' Random Tables: Non-Trigonometric Random Features V Likhosherstov, KM Choromanski, KA Dubey, F Liu, T Sarlos, A Weller Advances in Neural Information Processing Systems 35, 34559-34573, 2022 | 3 | 2022 |
Debiasing a first-order heuristic for approximate bi-level optimization V Likhosherstov, X Song, K Choromanski, JQ Davis, A Weller International Conference on Machine Learning, 6621-6630, 2021 | 3 | 2021 |
CWY parametrization for scalable learning of orthogonal and stiefel matrices V Likhosherstov, J Davis, K Choromanski, A Weller CoRR, abs/2004.08675, 2020 | 3 | 2020 |
Unlocking pixels for reinforcement learning via implicit attention KM Choromanski, D Jain, W Yu, X Song, J Parker-Holder, T Zhang, ... arXiv preprint arXiv:2102.04353, 2021 | 2 | 2021 |
Tractable minor-free generalization of planar zero-field Ising models V Likhosherstov, Y Maximov, M Chertkov Journal of Statistical Mechanics: Theory and Experiment 2020 (12), 124007, 2020 | 2 | 2020 |