Trust region-guided proximal policy optimization Y Wang, H He, X Tan, Y Gan Advances in Neural Information Processing Systems 32, 2019 | 54 | 2019 |
Stabilizing q learning via soft mellowmax operator Y Gan, Z Zhang, X Tan Proceedings of the AAAI Conference on Artificial Intelligence 35 (9), 7501-7509, 2021 | 6 | 2021 |
Alleviating the estimation bias of deep deterministic policy gradient via co-regularization Y Li, YH Wang, YZ Gan, XY Tan Pattern Recognition 131, 108872, 2022 | 3 | 2022 |
Robust Action Gap Increasing with Clipped Advantage Learning Z Zhang, Y Gan, X Tan Proceedings of the AAAI Conference on Artificial Intelligence 36 (8), 9145-9152, 2022 | | 2022 |
Smoothing Advantage Learning Y Gan, Z Zhang, X Tan Proceedings of the AAAI Conference on Artificial Intelligence 36 (6), 6657-6664, 2022 | | 2022 |