Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 754 | 2022 |
BBQ: A hand-built bias benchmark for question answering A Parrish, A Chen, N Nangia, V Padmakumar, J Phang, J Thompson, ... arXiv preprint arXiv:2110.08193, 2021 | 131 | 2021 |
Seasonal dynamics of bacterial meningitis: a time-series analysis J Paireau, A Chen, H Broutin, B Grenfell, NE Basta The Lancet global health 4 (6), e370-e377, 2016 | 98 | 2016 |
Training language models with language feedback at scale J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2303.16755, 2023 | 92* | 2023 |
Pretraining language models with human preferences T Korbak, K Shi, A Chen, RV Bhalerao, C Buckley, J Phang, SR Bowman, ... International Conference on Machine Learning, 17506-17533, 2023 | 88 | 2023 |
QuALITY: Question Answering with Long Input Texts, Yes! SR Bowman, A Chen, H He, N Joshi, J Ma, N Nangia, V Padmakumar, ... NAACL 2022, 2022 | 65* | 2022 |
Generating logical forms from graph representations of text and entities P Shaw, P Massey, A Chen, F Piccinno, Y Altun arXiv preprint arXiv:1905.08407, 2019 | 42 | 2019 |
Evoprompting: Language models for code-level neural architecture search A Chen, D Dohan, D So Advances in Neural Information Processing Systems 36, 2024 | 39 | 2024 |
Improving code generation by training with natural language feedback A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ... arXiv preprint arXiv:2303.16749, 2023 | 37 | 2023 |
Squality: Building a long-document summarization dataset the hard way A Wang, RY Pang, A Chen, J Phang, SR Bowman arXiv preprint arXiv:2205.11465, 2022 | 26 | 2022 |
Reasoning from radically incomplete information: The case of containers E Davis, G Marcus, A Chen Proceedings of the second annual conference on advances in cognitive systems …, 2013 | 21 | 2013 |
What do nlp researchers believe? results of the nlp community metasurvey J Michael, A Holtzman, A Parrish, A Mueller, A Wang, A Chen, D Madaan, ... arXiv preprint arXiv:2208.12852, 2022 | 19 | 2022 |
Training language models with language feedback J Scheurer, JA Campos, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2204.14146, 2022 | 17 | 2022 |
Adversarially constructed evaluation sets are more challenging, but may not be fair J Phang, A Chen, W Huang, SR Bowman arXiv preprint arXiv:2111.08181, 2021 | 12 | 2021 |
Teaching BERT to wait: Balancing accuracy and latency for streaming disfluency detection A Chen, V Zayats, DD Walker, D Padfield arXiv preprint arXiv:2205.00620, 2022 | 11 | 2022 |
Single-turn debate does not help humans answer hard reading-comprehension questions A Parrish, H Trivedi, E Perez, A Chen, N Nangia, J Phang, SR Bowman arXiv preprint arXiv:2204.05212, 2022 | 11 | 2022 |
Two failures of self-consistency in the multi-step reasoning of llms A Chen, J Phang, A Parrish, V Padmakumar, C Zhao, SR Bowman, K Cho arXiv preprint arXiv:2305.14279, 2023 | 10 | 2023 |
Sudden drops in the loss: Syntax acquisition, phase transitions, and simplicity bias in mlms A Chen, R Schwartz-Ziv, K Cho, ML Leavitt, N Saphra arXiv preprint arXiv:2309.07311, 2023 | 9 | 2023 |
Latent State Models of Training Dynamics MY Hu, A Chen, N Saphra, K Cho arXiv preprint arXiv:2308.09543, 2023 | 1 | 2023 |
AI safety by debate via regret minimization X Chen, A Chen, D Foster, E Hazan arXiv preprint arXiv:2312.04792, 2023 | | 2023 |