On the opportunities and risks of foundation models R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ... arXiv preprint arXiv:2108.07258, 2021 | 4424 | 2021 |
Wilds: A benchmark of in-the-wild distribution shifts PW Koh, S Sagawa, H Marklund, SM Xie, M Zhang, A Balsubramani, ... arXiv preprint arXiv: 2012.07421, 2021 | 1472 | 2021 |
Holistic Evaluation of Language Models P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, M Yasunaga, Y Zhang, ... arXiv preprint arXiv:2211.09110, 2022 | 1217* | 2022 |
StarCoder: may the source be with you! R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, ... arXiv preprint arXiv:2305.06161, 2023 | 857* | 2023 |
Extending the WILDS Benchmark for Unsupervised Adaptation S Sagawa, PW Koh, T Lee, I Gao, SM Xie, K Shen, A Kumar, W Hu, ... arXiv preprint arXiv:2112.05090, 2021 | 128 | 2021 |
Evaluating Human-Language Model Interaction M Lee, M Srivastava, A Hardy, J Thickstun, E Durmus, A Paranjape, ... arXiv preprint arXiv:2212.09746, 2022 | 101 | 2022 |
Holistic Evaluation of Text-to-Image Models T Lee, M Yasunaga, C Meng, Y Mai, JS Park, A Gupta, Y Zhang, ... Thirty-seventh Conference on Neural Information Processing Systems Datasets …, 2023 | 98 | 2023 |
BioMedLM: A 2.7 B Parameter Language Model Trained On Biomedical Text E Bolton, A Venigalla, M Yasunaga, D Hall, B Xiong, T Lee, R Daneshjou, ... arXiv preprint arXiv:2403.18421, 2024 | 84* | 2024 |
Can small and synthetic benchmarks drive modeling innovation? a retrospective study of question answering modeling approaches NFLTL Robin, JP Liang | 29* | 2021 |
Cheaply estimating inference efficiency metrics for autoregressive transformer models D Narayanan, K Santhanam, P Henderson, R Bommasani, T Lee, ... Advances in Neural Information Processing Systems 36, 66518-66538, 2023 | 9* | 2023 |
VHELM: A Holistic Evaluation of Vision Language Models T Lee, H Tu, CH Wong, W Zheng, Y Zhou, Y Mai, JS Roberts, M Yasunaga, ... arXiv preprint arXiv:2410.07112, 2024 | 3* | 2024 |
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making M Li, S Zhao, Q Wang, K Wang, Y Zhou, S Srivastava, C Gokmen, T Lee, ... arXiv preprint arXiv:2410.07166, 2024 | 2 | 2024 |
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models J Somerville Roberts, T Lee, C Heem Wong, M Yasunaga, Y Mai, P Liang arXiv e-prints, arXiv: 2410.22456, 2024 | 1* | 2024 |