Follow
Xudong Han
Xudong Han
LibrAI & MBZUAI
Verified email at mbzuai.ac.ae - Homepage
Title
Cited by
Cited by
Year
Do-not-answer: A dataset for evaluating safeguards in llms
Y Wang, H Li, X Han, P Nakov, T Baldwin
arXiv preprint arXiv:2308.13387, 2023
1522023
Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models
N Sengupta, SK Sahu, B Jia, S Katipomu, H Li, F Koto, W Marshall, ...
arXiv preprint arXiv:2308.16149, 2023
902023
Diverse adversaries for mitigating bias in training
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2101.10001, 2021
662021
Balancing out bias: Achieving fairness through balanced training
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2109.08253, 2021
59*2021
Evaluating debiasing techniques for intersectional biases
S Subramanian, X Han, T Baldwin, T Cohn, L Frermann
arXiv preprint arXiv:2109.10441, 2021
542021
Contrastive learning for fair representations
A Shen, X Han, T Cohn, T Baldwin, L Frermann
arXiv preprint arXiv:2109.10645, 2021
302021
Optimising equal opportunity fairness in model training
A Shen, X Han, T Cohn, T Baldwin, L Frermann
arXiv preprint arXiv:2205.02393, 2022
252022
Fairlib: A unified framework for assessing and improving fairness
X Han, A Shen, Y Li, L Frermann, T Baldwin, T Cohn
Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022
24*2022
Decoupling Adversarial Training for Fair NLP
X Han, T Baldwin, T Cohn
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021
232021
Learning from failure: Integrating negative examples when fine-tuning large language models as agents
R Wang, H Li, X Han, Y Zhang, T Baldwin
arXiv preprint arXiv:2402.11651, 2024
212024
Does representational fairness imply empirical fairness?
A Shen, X Han, T Cohn, T Baldwin, L Frermann
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 …, 2022
212022
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang, J Gao, Y Zhang, W Che, ...
Journal of Artificial Intelligence Research 82, 687-775, 2025
172025
Towards equal opportunity fairness through adversarial learning
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2203.06317, 2022
142022
A chinese dataset for evaluating the safeguards in large language models
Y Wang, Z Zhai, H Li, X Han, L Lin, Z Zhang, J Zhao, P Nakov, T Baldwin
arXiv preprint arXiv:2402.12193, 2024
112024
Systematic evaluation of predictive fairness
X Han, A Shen, T Cohn, T Baldwin, L Frermann
arXiv preprint arXiv:2210.08758, 2022
112022
Fair enough: Standardizing evaluation and model selection for fairness research in NLP
X Han, T Baldwin, T Cohn
arXiv preprint arXiv:2302.05711, 2023
102023
Grounding learning of modifier dynamics: An application to color naming
X Han, P Schulz, T Cohn
arXiv preprint arXiv:1909.07586, 2019
62019
Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability?
G Kuzmin, A Vazhentsev, A Shelmanov, X Han, S Suster, M Panov, ...
Proceedings of the 13th International Joint Conference on Natural Language …, 2023
52023
Commodity recommendation for users based on E-commerce data
F Yang, X Han, J Lang, W Lu, L Liu, L Zhang, J Pan
Proceedings of the 2nd International Conference on Big Data Research, 146-149, 2018
52018
Do-not-answer: A dataset for evaluating safeguards in llms, 2023
Y Wang, H Li, X Han, P Nakov, T Baldwin
URL https://arxiv. org/abs/2308.13387, 0
5
The system can't perform the operation now. Try again later.
Articles 1–20