Learning where to learn: Gradient sparsity in meta and continual learning J Von Oswald, D Zhao, S Kobayashi, S Schug, M Caccia, N Zucchet, ... Advances in Neural Information Processing Systems 34, 5250-5263, 2021 | 49 | 2021 |
Random initialisations performing above chance and how to find them F Benzing, S Schug, R Meier, J Von Oswald, Y Akram, N Zucchet, ... arXiv preprint arXiv:2209.07509, 2022 | 19 | 2022 |
A contrastive rule for meta-learning N Zucchet, S Schug, J Von Oswald, D Zhao, J Sacramento Advances in Neural Information Processing Systems, 2022 | 19 | 2022 |
Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation N Zucchet, J Sacramento Neural Computation 34 (12), 2022 | 18 | 2022 |
Uncovering mesa-optimization algorithms in transformers J von Oswald, E Niklasson, M Schlegel, S Kobayashi, N Zucchet, ... arXiv preprint arXiv:2309.05858, 2023 | 16 | 2023 |
The least-control principle for local learning at equilibrium A Meulemans, N Zucchet, S Kobayashi, J Von Oswald, J Sacramento Advances in Neural Information Processing Systems, 2022 | 15 | 2022 |
Online learning of long-range dependencies N Zucchet, R Meier, S Schug, A Mujika, J Sacramento Advances in Neural Information Processing Systems, 2023 | 6 | 2023 |
Gated recurrent neural networks discover attention N Zucchet, S Kobayashi, Y Akram, J Von Oswald, M Larcher, A Steger, ... arXiv preprint arXiv:2309.01775, 2023 | 4 | 2023 |
A complementary systems theory of meta-learning S Schug, N Zucchet, J von Oswald, J Sacramento Cosyne 2023, 2023 | | 2023 |