Revisiting some common practices in cooperative multi-agent reinforcement learning W Fu, C Yu, Z Xu, J Yang, Y Wu arXiv preprint arXiv:2206.07505, 2022 | 37 | 2022 |
Continuously discovering novel strategies via reward-switching policy optimization Z Zhou, W Fu, B Zhang, Y Wu arXiv preprint arXiv:2204.02246, 2022 | 27 | 2022 |
Learning Agile Bipedal Motions on a Quadrupedal Robot Y Li, J Li, W Fu, Y Wu arXiv preprint arXiv:2311.05818, 2023 | 5 | 2023 |
Iteratively learn diverse strategies with state distance information W Fu, W Du, J Li, S Chen, J Zhang, Y Wu Advances in Neural Information Processing Systems 36, 2024 | 2 | 2024 |
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study S Xu, W Fu, J Gao, W Ye, W Liu, Z Mei, G Wang, C Yu, Y Wu arXiv preprint arXiv:2404.10719, 2024 | 1 | 2024 |
Iteratively learning novel strategies with diversity measured in state distances W Fu, W Du, J Li, S Chen, J Zhang, Y Wu | 1 | 2022 |
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores Z Mei, W Fu, G Wang, H Zhang, Y Wu arXiv preprint arXiv:2306.16688, 2023 | | 2023 |