Ai alignment: A comprehensive survey J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ... arXiv preprint arXiv:2310.19852, 2023 | 181 | 2023 |
Aligner: Achieving efficient alignment through weak-to-strong correction J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, Y Yang arXiv preprint arXiv:2402.02416, 2024 | 41 | 2024 |
Aligner: Efficient alignment by learning to correct J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, T Qiu, J Dai, Y Yang The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 2 | 2024 |
Language Models Resist Alignment J Ji, K Wang, T Qiu, B Chen, J Zhou, C Li, H Lou, Y Yang arXiv preprint arXiv:2406.06144, 2024 | 1 | 2024 |