Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds A Zanette, E Brunskill International Conference on Machine Learning, 7304-7312, 2019 | 306 | 2019 |
Learning near optimal policies with low inherent bellman error A Zanette, A Lazaric, M Kochenderfer, E Brunskill International Conference on Machine Learning, 10978-10989, 2020 | 240 | 2020 |
Frequentist regret bounds for randomized least-squares value iteration A Zanette, D Brandfonbrener, E Brunskill, M Pirotta, A Lazaric International Conference on Artificial Intelligence and Statistics, 1954-1964, 2020 | 150 | 2020 |
Provable benefits of actor-critic methods for offline reinforcement learning A Zanette, MJ Wainwright, E Brunskill Advances in neural information processing systems 34, 13626-13640, 2021 | 127 | 2021 |
Exponential lower bounds for batch reinforcement learning: Batch rl can be exponentially harder than online rl A Zanette International Conference on Machine Learning, 12287-12297, 2021 | 88 | 2021 |
Provably efficient reward-agnostic navigation with linear value iteration A Zanette, A Lazaric, MJ Kochenderfer, E Brunskill Advances in Neural Information Processing Systems 33, 11756-11766, 2020 | 61 | 2020 |
Cautiously optimistic policy optimization and exploration with linear function approximation A Zanette, CA Cheng, A Agarwal Conference on Learning Theory, 4473-4525, 2021 | 59 | 2021 |
Almost horizon-free structure-aware best policy identification with a generative model A Zanette, MJ Kochenderfer, E Brunskill Advances in Neural Information Processing Systems 32, 2019 | 41 | 2019 |
Limiting extrapolation in linear approximate value iteration A Zanette, A Lazaric, MJ Kochenderfer, E Brunskill Advances in Neural Information Processing Systems 32, 2019 | 39 | 2019 |
Robust super-level set estimation using Gaussian processes A Zanette, J Zhang, MJ Kochenderfer Machine Learning and Knowledge Discovery in Databases: European Conference …, 2019 | 38 | 2019 |
Design of experiments for stochastic contextual linear bandits A Zanette, K Dong, JN Lee, E Brunskill Advances in Neural Information Processing Systems 34, 22720-22731, 2021 | 24 | 2021 |
Problem dependent reinforcement learning bounds which can identify bandit structure in mdps A Zanette, E Brunskill International Conference on Machine Learning, 5747-5755, 2018 | 23 | 2018 |
When is realizability sufficient for off-policy reinforcement learning? A Zanette International Conference on Machine Learning, 40637-40668, 2023 | 16 | 2023 |
Archer: Training language model agents via hierarchical multi-turn rl Y Zhou, A Zanette, J Pan, S Levine, A Kumar arXiv preprint arXiv:2402.19446, 2024 | 12 | 2024 |
Bellman residual orthogonalization for offline reinforcement learning A Zanette, MJ Wainwright Advances in Neural Information Processing Systems 35, 3137-3151, 2022 | 10 | 2022 |
Information directed reinforcement learning A Zanette, R Sarkar Tech. Rep., Technical report, Technical report, 2017 | 7 | 2017 |
Stabilizing q-learning with linear architectures for provable efficient learning A Zanette, M Wainwright International Conference on Machine Learning, 25920-25954, 2022 | 6 | 2022 |
Policy finetuning in reinforcement learning via design of experiments using offline data R Zhang, A Zanette Advances in Neural Information Processing Systems 36, 2024 | 4 | 2024 |
Enriching the finite element method with meshfree particles in structural mechanics A Zanette, M Ferronato, C Janna International Journal for Numerical Methods in Engineering 110 (7), 675-700, 2017 | 2 | 2017 |
Accelerating Best-of-N via Speculative Rejection R Zhang, M Haider, M Yin, J Qiu, M Wang, P Bartlett, A Zanette 2nd Workshop on Advancing Neural Network Training: Computational Efficiency …, 0 | 1 | |