Direct preference optimization: Your language model is secretly a reward model R Rafailov, A Sharma, E Mitchell, CD Manning, S Ermon, C Finn Advances in Neural Information Processing Systems 36, 2024 | 1886 | 2024 |
Combo: Conservative offline model-based policy optimization T Yu, A Kumar, R Rafailov, A Rajeswaran, S Levine, C Finn Advances in neural information processing systems 34, 28954-28967, 2021 | 431 | 2021 |
Open x-embodiment: Robotic learning datasets and rt-x models A O'Neill, A Rehman, A Gupta, A Maddukuri, A Gupta, A Padalkar, A Lee, ... arXiv preprint arXiv:2310.08864, 2023 | 356* | 2023 |
Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback K Tian, E Mitchell, A Zhou, A Sharma, R Rafailov, H Yao, C Finn, ... arXiv preprint arXiv:2305.14975, 2023 | 191 | 2023 |
Offline reinforcement learning from images with latent space models R Rafailov, T Yu, A Rajeswaran, C Finn Learning for dynamics and control, 1154-1168, 2021 | 130 | 2021 |
OpenVLA: An Open-Source Vision-Language-Action Model MJ Kim, K Pertsch, S Karamcheti, T Xiao, A Balakrishna, S Nair, ... arXiv preprint arXiv:2406.09246, 2024 | 117 | 2024 |
Offline meta-reinforcement learning with advantage weighting E Mitchell, R Rafailov, XB Peng, S Levine, C Finn International Conference on Machine Learning, 7780-7791, 2021 | 114 | 2021 |
Diffusion model alignment using direct preference optimization B Wallace, M Dang, R Rafailov, L Zhou, A Lou, S Purushwalkam, S Ermon, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 93 | 2024 |
Disentangling length from quality in direct preference optimization R Park, R Rafailov, S Ermon, C Finn arXiv preprint arXiv:2403.19159, 2024 | 62 | 2024 |
From to : Your Language Model is Secretly a Q-Function R Rafailov, J Hejna, R Park, C Finn arXiv preprint arXiv:2404.12358, 2024 | 55 | 2024 |
Contrastive prefence learning: Learning from human feedback without rl J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum, WB Knox, D Sadigh arXiv preprint arXiv:2310.13639, 2023 | 48 | 2023 |
Visual adversarial imitation learning using variational models R Rafailov, T Yu, A Rajeswaran, C Finn Advances in Neural Information Processing Systems 34, 3016-3028, 2021 | 47 | 2021 |
Preference fine-tuning of llms should leverage suboptimal, on-policy data F Tajwar, A Singh, A Sharma, R Rafailov, J Schneider, T Xie, S Ermon, ... arXiv preprint arXiv:2404.14367, 2024 | 45 | 2024 |
Aligning modalities in vision large language models via preference fine-tuning Y Zhou, C Cui, R Rafailov, C Finn, H Yao arXiv preprint arXiv:2402.11411, 2024 | 40 | 2024 |
Direct preference optimization: Your language model is secretly a reward model (2023) R Rafailov, A Sharma, E Mitchell, S Ermon, CD Manning, C Finn arXiv preprint arXiv:2305.18290, 2022 | 39 | 2022 |
Vision-based manipulators need to also see from their hands K Hsu, MJ Kim, R Rafailov, J Wu, C Finn arXiv preprint arXiv:2203.12677, 2022 | 38 | 2022 |
An emulator for fine-tuning large language models using small language models E Mitchell, R Rafailov, A Sharma, C Finn, CD Manning arXiv preprint arXiv:2310.12962, 2023 | 30 | 2023 |
Open x-embodiment: Robotic learning datasets and RT-x models Q Vuong, S Levine, HR Walke, K Pertsch, A Singh, R Doshi, C Xu, J Luo, ... Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition …, 2023 | 27 | 2023 |
On the sum of powered distances to certain sets of points on the circle N Nikolov, R Rafailov Pacific journal of mathematics 253 (1), 157-168, 2011 | 23 | 2011 |
Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ... arXiv preprint arXiv:2404.01413, 2024 | 22 | 2024 |