Follow
Lauro Langosco
Lauro Langosco
Verified email at cam.ac.uk
Title
Cited by
Cited by
Year
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
3792023
Goal Misgeneralization in Deep Reinforcement Learning
L Langosco, J Koch, L Sharkey, J Pfau, L Orseau, D Krueger
ICML 2022, 9, 2022
1192022
Harms from Increasingly Agentic Algorithmic Systems
A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ...
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023
87*2023
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
842024
Unifying Grokking and Double Descent
X Davies, L Langosco, D Krueger
ML Safety Workshop Neurips 2022, 2023
332023
Neural Variational Gradient Descent
L Langosco di Langosco, V Fortuin, H Strathmann
ICML Workshop on Uncertainty & Robustness in Deep Learning, 2021
20*2021
Detecting Backdoors with Meta-Models
L Langosco, N Alex, W Baker, D Quarel, H Bradley, D Krueger
NeurIPS 2023 Workshop on Backdoors in Deep Learning-The Good, the Bad, and …, 2023
32023
Training Equilibria in Reinforcement Learning
L Langosco, D Krueger, A Gleave
Deep Reinforcement Learning Workshop NeurIPS 2022, 2022
2022
The system can't perform the operation now. Try again later.
Articles 1–8