
Read full paper
Details
Joint research with University of Warsaw, Gdańsk University of Technology, Polish Academy of Sciences; presented at IJCNN 2021, DRL Workshop, NeurIPS 2020
- The research paper presents a novel method, Shoot Tree Search (STS), which makes it possible to more explicitly control the balance between the depth and breadth of the search needed for planning in large state spaces.
- The algorithm can be understood as an interpolation between two celebrated search mechanisms: MCTS and random shooting. It also lets the user control the bias-variance trade-off, akin to T D(n), but in the tree search context. In experiments on challenging domains, we show that STS can get the best of both worlds: consistently achieving higher scores.
Authors: Konrad Czechowski, Piotr Januszewski, Piotr Kozakowski, Łukasz Kuciński, Piotr Miłoś
Abstract
Planning in large state spaces inevitably needs to balance the depth and breadth of the search. It has a crucial impact on the performance of a planner and most manage this interplay implicitly. We present a novel method Shoot Tree Search (STS), which makes it possible to control this trade-off more explicitly. Our algorithm can be understood as an interpolation between two celebrated search mechanisms: MCTS and random shooting. It also lets the user control the bias-variance trade-off, akin to TD(n), but in the tree search context. In experiments on challenging domains, we show that STS can get the best of both worlds consistently achieving higher scores.
References
- Thomas Anthony, Zheng Tian, and David Barber. Thinking fast and slow with deep learning and tree search. In NIPS, 2017.
- Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1–43, 2012.
- Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017.
- Dorit Dor and Uri Zwick. Sokoban and other motion planning problems. Computational Geometry, 13(4):215–228, 1999.
- Matthew L. Ginsberg. GIB: Imperfect information in a computationally challenging game. Journal of Artificial Intelligence Research, 2001. ISSN 10769757. doi: 10.1613/jair.820.
- Xiaoxiao Guo, Satinder P. Singh, Honglak Lee, Richard L. Lewis, and Xiaoshi Wang. Deep learning for real-time atari game play using offline monte-carlo tree search planning. In NIPS, 2014.
- Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, and Peter W. Battaglia. Combining q-learning and search with amortized value estimates. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=SkeAaJrKDS.
- Karol Kurach, Anton Raichuk, Piotr Stanczyk, Michał Zaj ˛ac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, and Sylvain Gelly. Google research football: A novel reinforcement learning environment. arXiv preprint arXiv:1907.11180, 2019.
- Karol Kurach, Anton Raichuk, Piotr Stanczyk, Michał Zaj ˛ac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, and Sylvain Gelly. Google research football. https://github.com/google-research/football, 2019.
- Piotr Miłos, Łukasz Kucinski, Konrad Czechowski, Piotr Kozakowski, and Maciek Klimek. Uncertainty-sensitive learning and planning with ensembles. arXiv preprint arXiv:1912.09996, 2019.
- Sébastien Racanière, Theophane Weber, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, and Daan Wierstra. Imagination-augmented agents for deep reinforcement learning. In NIPS, 2017.
- Stéphane Ross and J. Andrew Bagnell. Reinforcement and imitation learning via interactive no-regret learning. CoRR, abs/1406.5979, 2014. URL http://arxiv.org/abs/1406.5979.
- Brian Sheppard. World-championship-caliber Scrabble. Artificial Intelligence, 2002. ISSN doi: 10.1016/S0004-3702(01)00166-7.
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George Van Den Driessche, Thore Graepel, and Demis
Hassabis. Mastering the game of Go without human knowledge. Nature, 2017. - David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 1144:1140–1144, 2018.
- Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.