Table of contents
Table of contents
Reinforcement learning is gaining notice as a way to train neural networks to solve open problems that require a flexible, creative approach. As a huge amount of computing power and time are required to train reinforcement learning agent, it is no surprise that researchers are looking for ways to shorten the process. Expert augmented learning appears to be an interesting way to do that.
This article looks at:
- Why the learning process in reinforcement learning is long and complex
- The transfer of expert knowledge into neural networks to solve this challenge
- Applying expert augmented reinforcement learning in practice
- Possible use cases for the technique
Montezuma’s revenge on AI
The most common way to validate reinforcement learning algorithms is to let them play Atari’s all-time classics like Space Invaders or Breakout. These games provide an environment that is complex enough to test if the model can deal with numerous variables, yet simple enough not to burn up the servers providing the computing power. Although the agents tend to crack those games relatively easily, games like classic Montezuma’s Revenge pose a considerable challenge. [irp posts=”18591″ name=”Building a Matrix with reinforcement learning and artificial imagination”] For those who missed this classic, Montezuma’s Revenge is a platform game where an Indiana Jones-like character (nicknamed Panama Joe) explores the ancient Aztec pyramids, which are riddled with traps, snakes, scorpions and sealed doors, the keys to which, of course, are hidden in other rooms. While similar to Mario Bros games, it was one of the first examples of the “Metroidvania” subgenre, with the Metroid and Castlevania series being the most well-known games. Montezuma’s Revenge provides a different gaming experience than Space Invaders: the world it presents is more open, and not all objects on the map are hostile. The agent needs to figure out that a snake is deadly, while the key is required to open the door and stepping on it is not only harmless but crucial to finishing the level. Currently, reinforcement learning alone struggles to solve Montezuma’s Revenge. Having a more experienced player providing a guidance could be a huge time-saver.The will chained, the mind unchained
To share human knowledge with a neural network, information must be provided about what experts do and how they behave in a given environment. In the case of Montezuma’s Revenge, this means providing a snapshot of the screen and the player’s reaction. If he or she is driving a car, any number of additional steps would have to be taken: the track would have to be recorded and information about the car and position of the steering wheel would also need to be provided. At every stage of training, the agent is not only motivated to maximize rewards, but also to mimic the human. This is particularly helpful when there is no immediate reward coming from the game environment. However, the drawback of following the expert is that the network doesn’t develop an ability to react to unexpected situations. Following the example of Raikonnen’s driving, the network would be able to perform well on a track that was recorded, but racing in other weather conditions or against new opponents would render the network helpless. This is precisely where reinforcement learning shines. [irp posts=”16925″ name=”Learning to run – an example of reinforcement learning”] In the case of Montezuma’s Revenge, our algorithm was trained to strike a balance between following the expert and maximizing the reward. Thus if the expert never stepped on the snake, the agent wouldn’t, either. If the expert had done something, it likely did the same. If the agent found itself in a new situation, it would try to follow the behavior of the expert. If the reward for ignoring suggestions was to high, it opted for the larger payload. If you get lost, get to the road and stick to it until you get into a familiar neighborhood, right? The agent is always motivated to mimic the expert’s actions. Methods which just initially copy human behavior and then let the agent explore randomly are too weak to deliver noteworthy results. The idea of augmenting reinforcement learning with expert knowledge proved to be surprisingly effective. Our model performed well in Montezuma’s Revenge, beating level after level. Moreover, it didn’t stop exploiting the reward policy to maximize its rewards. The Agent spotted an unpublished bug in the game. This discovery led to the score of the 804 900 points – a world record. Our agent was pushed on by the endless reward maximization loop depicted here: Although annoying, the loop itself is proof that the agent is not mindlessly following the expert. With enough motivation it is able to develop its own strategy to maximize its rewards, thus using the expert knowledge creatively. Cloning and enhancing human behavior are among the ultimate goals of machine learning. Nevertheless, the expert doesn’t actually need to be a human. This leads to interesting possibilities. A machine can be used to mimic other machines programmed with methods that don’t employ artificial intelligence and then build on top of it.Summary – reducing costs
Empowering reinforcement learning with expert knowledge opens new avenues of development for AI-powered devices.- It uses the best from two worlds by following human behavior and a superhuman talent characteristic for reinforcement learning agents manifesting in exploiting convenient opportunities and loopholes present in the environment.
- It increases safety by reducing randomness, especially in the early stage of learning.
- It significantly reduces the time required for learning, as the agent gets hints from a human expert, thus reducing the need for completely random exploration.