Follow
Kshitij Sachan
Kshitij Sachan
Member of Technical Staff, Redwood Research
Verified email at rdwrs.com - Homepage
Title
Cited by
Cited by
Year
Polysemanticity and capacity in neural networks
A Scherlis, K Sachan, AS Jermyn, J Benton, B Shlegeris
arXiv preprint arXiv:2210.01892, 2022
132022
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
92024
Ai control: Improving safety despite intentional subversion
R Greenblatt, B Shlegeris, K Sachan, F Roger
arXiv preprint arXiv:2312.06942, 2023
52023
Debating with More Persuasive LLMs Leads to More Truthful Answers
A Khan, J Hughes, D Valentine, L Ruis, K Sachan, A Radhakrishnan, ...
arXiv preprint arXiv:2402.06782, 2024
32024
The system can't perform the operation now. Try again later.
Articles 1–4