Follow
Hannah Rose Kirk
Title
Cited by
Cited by
Year
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models
HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ...
Advances in neural information processing systems 34, 2611-2624, 2021
1232021
Auditing large language models: a three-layered approach
J Mökander, J Schuett, HR Kirk, L Floridi
AI and Ethics, 1-31, 2023
1052023
Dataperf: Benchmarks for data-centric ai development
M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ...
Advances in Neural Information Processing Systems 36, 2024
702024
SemEval-2023 task 10: explainable detection of online sexism
HR Kirk, W Yin, B Vidgen, P Röttger
arXiv preprint arXiv:2303.04222, 2023
702023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning
H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022
622022
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale
Proceedings of the 2022 Conference of the North American Chapter of the …, 2021
422021
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
HR Kirk, B Vidgen, P Röttger, SA Hale
arXiv preprint arXiv:2303.05453, 2023
402023
Handling and Presenting Harmful Text in NLP
HR Kirk, A Birhane, B Vidgen, L Derczynski
EMNLP Findings, 2022
27*2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk
Proceedings of the 4th workshop on gender bias in natural language …, 2022
242022
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ...
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021
232021
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy
arXiv preprint arXiv:2308.01263, 2023
212023
Assessing language model deployment with risk cards
L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ...
arXiv preprint arXiv:2303.18190, 2023
152023
The nuances of Confucianism in technology policy: An inquiry into the interaction between cultural and political systems in Chinese digital ethics
HR Kirk, K Lee, C Micallef
International Journal of Politics, Culture, and Society, 1-24, 2020
102020
Casteist but not racist? quantifying disparities in large language model bias between india and the west
K Khandelwal, M Tonneau, AM Bean, HR Kirk, SA Hale
arXiv preprint arXiv:2309.08573, 2023
92023
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets
B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain
arXiv preprint arXiv:2305.15407, 2023
82023
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
HR Kirk, B Vidgen, SA Hale
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying …, 2022
62022
The past, present and better future of feedback learning in large language models for subjective human preferences and values
HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale
arXiv preprint arXiv:2310.07629, 2023
52023
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution
SM Hall, F Gonçalves Abrantes, H Zhu, G Sodunke, A Shtedritski, HR Kirk
Advances in Neural Information Processing Systems 36, 2024
32024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger
arXiv preprint arXiv:2311.08370, 2023
32023
Adversarial nibbler: A data-centric challenge for improving the safety of text-to-image models
A Parrish, HR Kirk, J Quaye, C Rastogi, M Bartolo, O Inel, J Ciro, ...
arXiv preprint arXiv:2305.14384, 2023
32023
The system can't perform the operation now. Try again later.
Articles 1–20