Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023 | 6172 | 2023 |
Dota 2 with large scale deep reinforcement learning C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ... arXiv preprint arXiv:1912.06680, 2019 | 2022 | 2019 |
Learning dexterous in-hand manipulation OAIM Andrychowicz, B Baker, M Chociej, R Jozefowicz, B McGrew, ... The International Journal of Robotics Research 39 (1), 3-20, 2020 | 1850 | 2020 |
Evolution strategies as a scalable alternative to reinforcement learning T Salimans, J Ho, X Chen, S Sidor, I Sutskever arXiv preprint arXiv:1703.03864, 2017 | 1822 | 2017 |
Openai baselines P Dhariwal, C Hesse, O Klimov, A Nichol, M Plappert, A Radford, ... | 1067 | 2017 |
Stable baselines A Hill, A Raffin, M Ernestus, A Gleave, A Kanervisto, R Traore, P Dhariwal, ... | 946 | 2018 |
Parameter space noise for exploration M Plappert, R Houthooft, P Dhariwal, S Sidor, RY Chen, X Chen, T Asfour, ... arXiv preprint arXiv:1706.01905, 2017 | 765 | 2017 |
Emergent complexity via multi-agent competition T Bansal, J Pachocki, S Sidor, I Sutskever, I Mordatch arXiv preprint arXiv:1710.03748, 2017 | 494 | 2017 |
Schema networks: Zero-shot transfer with a generative causal model of intuitive physics K Kansky, T Silver, DA Mély, M Eldawy, M Lázaro-Gredilla, X Lou, ... International conference on machine learning, 1809-1818, 2017 | 287 | 2017 |
Ucb exploration via q-ensembles RY Chen, S Sidor, P Abbeel, J Schulman arXiv preprint arXiv:1706.01502, 2017 | 138 | 2017 |
Dota 2 with large scale deep reinforcement learning CB OpenAI, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ... arXiv preprint arXiv:1912.06680 2, 2019 | 121 | 2019 |
Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer G Yang, EJ Hu, I Babuschkin, S Sidor, X Liu, D Farhi, N Ryder, J Pachocki, ... arXiv preprint arXiv:2203.03466, 2022 | 120 | 2022 |
Tuning large neural networks via zero-shot hyperparameter transfer G Yang, E Hu, I Babuschkin, S Sidor, X Liu, D Farhi, N Ryder, J Pachocki, ... Advances in Neural Information Processing Systems 34, 17084-17097, 2021 | 96 | 2021 |
Evolution strategies as a scalable alternative to reinforcement learning. arXiv 2017 T Salimans, J Ho, X Chen, S Sidor, I Sutskever arXiv preprint arXiv:1703.03864, 2017 | 72 | 2017 |
Openai baselines (2017) P Dhariwal, C Hesse, O Klimov, A Nichol, M Plappert, A Radford, ... URL https://github. com/openai/baselines, 2016 | 63 | 2016 |
Dota 2 with large scale deep reinforcement learning. arXiv 2019 C Berner, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ... arXiv preprint arXiv:1912.06680, 0 | 53 | |
UCB and infogain exploration via q-ensembles RY Chen, J Schulman, P Abbeel, S Sidor arXiv preprint arXiv:1706.01502 9, 2017 | 29 | 2017 |
OpenAI baselines C Hesse, M Plappert, A Radford, J Schulman, S Sidor, Y Wu | 20 | 2017 |
Reinforcement learning with natural language signals S Sidor Massachusetts Institute of Technology, 2016 | 7 | 2016 |
Time resource networks S Sidor, P Yu, C Fang, B Williams arXiv preprint arXiv:1602.03203, 2016 | 2 | 2016 |