Natural actor–critic algorithms S Bhatnagar, RS Sutton, M Ghavamzadeh, M Lee Automatica 45 (11), 2471-2482, 2009 | 466 | 2009 |
Bayesian reinforcement learning: A survey M Ghavamzadeh, S Mannor, J Pineau, A Tamar arXiv preprint arXiv:1609.04436, 2016 | 228 | 2016 |
Best arm identification: A unified approach to fixed budget and fixed confidence V Gabillon, M Ghavamzadeh, A Lazaric NIPS-Twenty-Sixth Annual Conference on Neural Information Processing Systems, 2012 | 200 | 2012 |
Incremental natural actor-critic algorithms S Bhatnagar, M Ghavamzadeh, M Lee, RS Sutton Advances in neural information processing systems 20, 105-112, 2007 | 168 | 2007 |
High-confidence off-policy evaluation P Thomas, G Theocharous, M Ghavamzadeh Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015 | 161 | 2015 |
Regularized policy iteration. AM Farahmand, M Ghavamzadeh, C Szepesvári, S Mannor nips, 441-448, 2008 | 156 | 2008 |
Hierarchical multi-agent reinforcement learning R Makar, S Mahadevan, M Ghavamzadeh Proceedings of the fifth international conference on Autonomous agents, 246-253, 2001 | 152 | 2001 |
Supervised actor-critic reinforcement learning MT Rosenstein, AG Barto, J Si, A Barto, W Powell Learning and Approximate Dynamic Programming: Scaling Up to the Real World …, 2004 | 150 | 2004 |
Hierarchical multi-agent reinforcement learning M Ghavamzadeh, S Mahadevan, R Makar Autonomous Agents and Multi-Agent Systems 13 (2), 197-229, 2006 | 141 | 2006 |
A lyapunov-based approach to safe reinforcement learning Y Chow, O Nachum, E Duenez-Guzman, M Ghavamzadeh arXiv preprint arXiv:1805.07708, 2018 | 133 | 2018 |
High confidence policy improvement P Thomas, G Theocharous, M Ghavamzadeh International Conference on Machine Learning, 2380-2388, 2015 | 120 | 2015 |
Finite-Sample Analysis of Proximal Gradient TD Algorithms. B Liu, J Liu, M Ghavamzadeh, S Mahadevan, M Petrik UAI, 504-513, 2015 | 111 | 2015 |
Risk-constrained reinforcement learning with percentile risk criteria Y Chow, M Ghavamzadeh, L Janson, M Pavone The Journal of Machine Learning Research 18 (1), 6070-6120, 2017 | 110 | 2017 |
Bayesian multi-task reinforcement learning A Lazaric, M Ghavamzadeh ICML-27th International Conference on Machine Learning, 599-606, 2010 | 97 | 2010 |
Speedy Q-learning MG Azar, R Munos, M Ghavamzadaeh, HJ Kappen Spain, Granada: NIPS, 2011 | 96 | 2011 |
Ad recommendation systems for life-time value optimization G Theocharous, PS Thomas, M Ghavamzadeh Proceedings of the 24th International Conference on World Wide Web, 1305-1310, 2015 | 95 | 2015 |
More robust doubly robust off-policy evaluation M Farajtabar, Y Chow, M Ghavamzadeh International Conference on Machine Learning, 1447-1456, 2018 | 94 | 2018 |
Multi-bandit best arm identification V Gabillon, M Ghavamzadeh, A Lazaric, S Bubeck | 93 | 2011 |
Bayesian policy gradient algorithms Y Engel, M Ghavamzadeh Advances in neural information processing systems 19, 457, 2007 | 87 | 2007 |
Finite-sample analysis of least-squares policy iteration A Lazaric, M Ghavamzadeh, R Munos Journal of Machine Learning Research 13, 3041-3074, 2012 | 85 | 2012 |