On the safety of conversational models: Taxonomy, dataset, and benchmark H Sun, G Xu, J Deng, J Cheng, C Zheng, H Zhou, N Peng, X Zhu, ... arXiv preprint arXiv:2110.08466, 2021 | 53 | 2021 |
Safety assessment of chinese large language models H Sun, Z Zhang, J Deng, J Cheng, M Huang arXiv preprint arXiv:2304.10436, 2023 | 51 | 2023 |
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements J Deng, J Cheng, H Sun, Z Zhang, M Huang arXiv preprint arXiv:2302.09270, 2023 | 26* | 2023 |
Alignbench: Benchmarking chinese alignment of large language models X Liu, X Lei, S Wang, Y Huang, Z Feng, B Wen, J Cheng, P Ke, Y Xu, ... arXiv preprint arXiv:2311.18743, 2023 | 11 | 2023 |
Critiquellm: Scaling llm-as-critic for effective and explainable evaluation of large language model generation P Ke, B Wen, Z Feng, X Liu, X Lei, J Cheng, S Wang, A Zeng, Y Dong, ... arXiv preprint arXiv:2311.18702, 2023 | 11 | 2023 |
Black-box prompt optimization: Aligning large language models without model training J Cheng, X Liu, K Zheng, P Ke, H Wang, Y Dong, J Tang, M Huang arXiv preprint arXiv:2311.04155, 2023 | 11 | 2023 |
Pal: Persona-augmented emotional support conversation generation J Cheng, S Sabour, H Sun, Z Chen, M Huang arXiv preprint arXiv:2212.09235, 2022 | 11 | 2022 |
Constructing highly inductive contexts for dialogue safety through controllable reverse generation Z Zhang, J Cheng, H Sun, J Deng, F Mi, Y Wang, L Shang, M Huang arXiv preprint arXiv:2212.01810, 2022 | 6 | 2022 |
InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning Z Zhang, J Cheng, H Sun, J Deng, M Huang Findings of the Association for Computational Linguistics: EMNLP 2023, 10421 …, 2023 | 2 | 2023 |