deepsense.aideepsense.ai logo
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI advisory
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
Expert augmented reinforcement learning - agents of Montezuma’s Revenge

Expert augmented reinforcement learning – agents of Montezuma’s Revenge

September 21, 2018/in Reinforcement learning /by Konrad Budek

Reinforcement learning is gaining notice as a way to train neural networks to solve open problems that require a flexible, creative approach. As a huge amount of computing power and time are required to train reinforcement learning agent, it is no surprise that researchers are looking for ways to shorten the process. Expert augmented learning appears to be an interesting way to do that.

This article looks at:

  • Why the learning process in reinforcement learning is long and complex
  • The transfer of expert knowledge into neural networks to solve this challenge
  • Applying expert augmented reinforcement learning in practice
  • Possible use cases for the technique

Designing a system of rewards that motivates an RL agent to behave in the way that is desired is fundamental to the technique. While this is indeed effective, there are still a number of drawbacks that limit its usefulness. One is the complexity of the training process, which grows rapidly with the complexity of the problems to be solved. What’s more, the agent’s first attempts to solve problems are usually entirely random. In learning to run, a project in which an agent was trained to move like a human, the agent would fall forward or backward during its few million initial runs.
When both the environment and the task are complex, the possibilities for “doing it wrong” grows and the data scientist may be unable to spot the hidden drawback within the model.
Of course, the agent looks for ways to maximize the reward and reduce the penalties usually without seeing the larger picture. That’s why any glitch in the environment will be maximally exploited when discovered. Here’s a good example from the game Qbert:

Details about both the agent and the bug found are covered in this paper: Arxiv.
The challenge in teaching neural networks to perform tasks humans do so effortlessly, like grabbing a can of coke or driving a car, is transferring the knowledge required to perform the task. It would be awesome just to put the neural network in the seat next to Kimi Raikkonen and let it learn how to handle the car like a professional driver. Unfortunately, that isn’t possible.
Or is it?

Montezuma’s revenge on AI

The most common way to validate reinforcement learning algorithms is to let them play Atari’s all-time classics like Space Invaders or Breakout. These games provide an environment that is complex enough to test if the model can deal with numerous variables, yet simple enough not to burn up the servers providing the computing power.
Although the agents tend to crack those games relatively easily, games like classic Montezuma’s Revenge pose a considerable challenge.

Related:  Building a Matrix with reinforcement learning and artificial imagination

For those who missed this classic, Montezuma’s Revenge is a platform game where an Indiana Jones-like character (nicknamed Panama Joe) explores the ancient Aztec pyramids, which are riddled with traps, snakes, scorpions and sealed doors, the keys to which, of course, are hidden in other rooms. While similar to Mario Bros games, it was one of the first examples of the “Metroidvania” subgenre, with the Metroid and Castlevania series being the most well-known games.
Montezuma’s Revenge provides a different gaming experience than Space Invaders: the world it presents is more open, and not all objects on the map are hostile. The agent needs to figure out that a snake is deadly, while the key is required to open the door and stepping on it is not only harmless but crucial to finishing the level.
Currently, reinforcement learning alone struggles to solve Montezuma’s Revenge. Having a more experienced player providing a guidance could be a huge time-saver.

The will chained, the mind unchained

To share human knowledge with a neural network, information must be provided about what experts do and how they behave in a given environment. In the case of Montezuma’s Revenge, this means providing a snapshot of the screen and the player’s reaction. If he or she is driving a car, any number of additional steps would have to be taken: the track would have to be recorded and information about the car and position of the steering wheel would also need to be provided.
At every stage of training, the agent is not only motivated to maximize rewards, but also to mimic the human. This is particularly helpful when there is no immediate reward coming from the game environment.
However, the drawback of following the expert is that the network doesn’t develop an ability to react to unexpected situations. Following the example of Raikonnen’s driving, the network would be able to perform well on a track that was recorded, but racing in other weather conditions or against new opponents would render the network helpless. This is precisely where reinforcement learning shines.

Related:  Learning to run - an example of reinforcement learning

In the case of Montezuma’s Revenge, our algorithm was trained to strike a balance between following the expert and maximizing the reward. Thus if the expert never stepped on the snake, the agent wouldn’t, either. If the expert had done something, it likely did the same. If the agent found itself in a new situation, it would try to follow the behavior of the expert. If the reward for ignoring suggestions was to high, it opted for the larger payload.
If you get lost, get to the road and stick to it until you get into a familiar neighborhood, right? The agent is always motivated to mimic the expert’s actions. Methods which just initially copy human behavior and then let the agent explore randomly are too weak to deliver noteworthy results.
The idea of augmenting reinforcement learning with expert knowledge proved to be surprisingly effective. Our model performed well in Montezuma’s Revenge, beating level after level. Moreover, it didn’t stop exploiting the reward policy to maximize its rewards. The Agent spotted an unpublished bug in the game. This discovery led to the score of the 804 900 points – a world record. Our agent was pushed on by the endless reward maximization loop depicted here:

Although annoying, the loop itself is proof that the agent is not mindlessly following the expert. With enough motivation it is able to develop its own strategy to maximize its rewards, thus using the expert knowledge creatively.
Cloning and enhancing human behavior are among the ultimate goals of machine learning. Nevertheless, the expert doesn’t actually need to be a human. This leads to interesting possibilities. A machine can be used to mimic other machines programmed with methods that don’t employ artificial intelligence and then build on top of it.

Summary – reducing costs

Empowering reinforcement learning with expert knowledge opens new avenues of development for AI-powered devices.

  • It uses the best from two worlds by following human behavior and a superhuman talent characteristic for reinforcement learning agents manifesting in exploiting convenient opportunities and loopholes present in the environment.
  • It increases safety by reducing randomness, especially in the early stage of learning.
  • It significantly reduces the time required for learning, as the agent gets hints from a human expert, thus reducing the need for completely random exploration.

As the cost of designing a reinforcement learning agent grows exponentially alongside the task’s level of complexity and the number of variables involved, using expert knowledge to train the agent is very cost-effective: it reduces not only the cost of data and computing power, but also the time required to gain results. The technical details of our solution can be found here: Arxiv.org and here: GitHub repository.

Special cooperation

In this project we cooperated with independent researcher Michał Garmulewicz (blog, github), who provided fundamental technical and conceptual input. We hope to continue such cooperation with Michał and other researchers.

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
  • Share on Reddit
  • Share by Mail
https://deepsense.ai/wp-content/uploads/2019/02/Expert-augmented-reinforcement-learning-–-agents-of-Montezuma’s-Revenge.jpg 337 1140 Konrad Budek https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Konrad Budek2018-09-21 13:23:422021-01-05 16:47:11Expert augmented reinforcement learning – agents of Montezuma’s Revenge

Start your search here

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    THE NEWEST AI MONTHLY DIGEST

    • AI Monthly Digest 20 - TL;DRAI Monthly Digest 20 – TL;DRMay 12, 2020

    CATEGORIES

    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • AI trends for 2021AI trends for 2021January 7, 2021
    • A comprehensive guide to demand forecastingA comprehensive guide to demand forecastingMay 28, 2019
    • What is reinforcement learning? The complete guideWhat is reinforcement learning? deepsense.ai’s complete guideJuly 5, 2018

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Knowledge base
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only
    Cookies To make this site work properly, we sometimes place small data files called cookies on your device. Most big websites do this too.
    Accept
    Change Settings
    Cookie Box Settings
    Cookie Box Settings

    Privacy settings

    Decide which cookies you want to allow. You can change these settings at any time. However, this can result in some functions no longer being available. For information on deleting the cookies, please consult your browser’s help function. Learn more about the cookies we use.

    With the slider, you can enable or disable different types of cookies:

    • Block all
    • Essentials
    • Functionality
    • Analytics
    • Advertising

    This website will:

    This website won't:

    • Essential: Remember your cookie permission setting
    • Essential: Allow session cookies
    • Essential: Gather information you input into a contact forms, newsletter and other forms across all pages
    • Essential: Keep track of what you input in a shopping cart
    • Essential: Authenticate that you are logged into your user account
    • Essential: Remember language version you selected
    • Functionality: Remember social media settings
    • Functionality: Remember selected region and country
    • Analytics: Keep track of your visited pages and interaction taken
    • Analytics: Keep track about your location and region based on your IP number
    • Analytics: Keep track of the time spent on each page
    • Analytics: Increase the data quality of the statistics functions
    • Advertising: Tailor information and advertising to your interests based on e.g. the content you have visited before. (Currently we do not use targeting or targeting cookies.
    • Advertising: Gather personally identifiable information such as name and location
    • Remember your login details
    • Essential: Remember your cookie permission setting
    • Essential: Allow session cookies
    • Essential: Gather information you input into a contact forms, newsletter and other forms across all pages
    • Essential: Keep track of what you input in a shopping cart
    • Essential: Authenticate that you are logged into your user account
    • Essential: Remember language version you selected
    • Functionality: Remember social media settings
    • Functionality: Remember selected region and country
    • Analytics: Keep track of your visited pages and interaction taken
    • Analytics: Keep track about your location and region based on your IP number
    • Analytics: Keep track of the time spent on each page
    • Analytics: Increase the data quality of the statistics functions
    • Advertising: Tailor information and advertising to your interests based on e.g. the content you have visited before. (Currently we do not use targeting or targeting cookies.
    • Advertising: Gather personally identifiable information such as name and location
    Save & Close