deepsense.aideepsense.ai logo
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI discovery workshops
    • GPT and other LLMs fast track workshop
    • Generative AI
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
Building a Matrix with reinforcement learning and artificial imagination

Building a Matrix with reinforcement learning and artificial imagination

August 2, 2018/in Reinforcement learning /by Konrad Budek

Time travel and unchaining the time-matter continuum is no big deal. Nor is recruiting a dragon slayer, a Jedi Knight and a Transformer – a child’s mind is able to create fantastic worlds in seconds. So what would happen if robots had an artificial imagination?

Developing innovative strategies in Go or unorthodox approaches to chess are just top-of-mind examples of how the agent in reinforcement learning can be creative.
Go, Chess and League of Legends all draw on the imagination: players use abstract thinking to predict their opponent’s actions and construct a strategy for upcoming moves. Keeping a few scenarios of upcoming actions in mind is one of aspects of using imagination, which is essential to optimal performance. The creation of sub-worlds in the mind can be subconscious.
Professional drivers or football players are basically using a world created within their minds to react in the real time and space around them. It’s hardly a big deal to run to where the ball was a moment ago. But it is crucial to be where it is going to be.
Reinforcement learning agents might appear to have no imagination at all – at the beginning of an experiment their actions are totally random. It is only a matter of rewards or penalties that they build a strategy to maximize the outcome – accident-free driving or effective control over a robotic arm.

Related:  What is reinforcement learning? The complete guide (OLD UNPUBLISHED)

In other words they learn, with knowledge gained through experimentation and experience. Knowledge is limited–so limited that at the beginning it is a big, fat 0.

Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world – Albert Einstein.

So what would have happened if a purely knowledge-powered reinforcement learning agent had an imagination? Does it even need one?

Imagine dragons cars, imagine worlds

AI has been empowered with imagination for two purposes – to tune up the performance of the agent and to create a separate world within its… well… mind.
The first case was done by David Ha (Google Brain) and Jürgen Schmidhuber (Nnaisense, IDSIA), where the agent was in control of a race car on a track. The model was awarded for every race it completed and track it visited. Every time the car finished a race, a new track was randomly generated. Although the agent learned to drive collision-free, the car was jerky and tight.

Related:  Learning to run - an example of reinforcement learning

Building an additional artificial neural network to predict the effect of moves and maneuvers before they were executed resulted in a smoother riding. Processing the movement within “imagination” before making the actual decision proved to significantly improve the agent’s performance. What’s more, the neural network was also able to generate the random race tracks on its own – that is, it was basically dreaming about racing. To get further information, please read a description of the “World models” experiment.
In another case, Deepmind conducted an experiment on rendering a 3D environment based on images the agent was fed. Although rotating an object within imagination is effortless for humans, machines struggle to do so, and spatial imagination has never been their strong suit. Nevertheless, the model was able to build a 3D environment with the 2D images of the object it was provided. You can find details about the experiment here.
So AI is currently able to dream, build scenarios of future actions within its mind and build fully functional and plausible models of the world with just a few clues about it.
So if an agent can effectively race without actually racing or build a world within, how about building an Inception? Do agents dream of electric worlds?

The brain in a vat paradox

Training a reinforcement learning agent is expensive mainly due to low sample-efficiency. The agent needs hours (day? months?) of experience to become proficient in the task it has to perform and first attempts are totally random and usually failed after just a few seconds of testing.
The first few hundred autonomous car rides end after a few seconds with the agent unceremoniously running into a tree or a wall. It needs a couple of days to figure out how to break or make a turn.
Is simulating an entire city for a car that crashes after a few seconds really necessary?
In fact, it isn’t.
The same applies to the artificial limb or any other environment. A robotic hand controlled by an RL agent starts from entirely random moves, breaks things and grabs everything but the can of coke. Nonetheless, simulating a full environment replete with realistic physics is unnecessary.

Related:  Playing Atari with deep reinforcement learning - deepsense.ai’s approach

That’s why deepsense.ai and Google brain designed neural networks to simulate the testing environment for the agent that generate plausible data, instead of providing the real thing. This amounts to entrapping the RL agent within a dream of a neural network designed to mimic the world in a Cartesian “evil genius” manner, providing the agent with fabricated signals instead of real ones.

The agent is a brain in a vat, unable to determine–and totally indifferent to–whether the training environment is a real one or merely a matrix created by the neural network.
But why to do so?

We need to go deeper

While the mentioned researchers managed to do a similar trick, deepsense.ai was the first to build Inception around Atari games, a standard benchmark environment for RL models. Building models that can effectively deal with playing Space Invaders or Breakout is a step toward designing agents that can carry out practical and useful tasks, such as driving autonomous cars.
Even the game Pong provides an environment with many variables to control, as perhaps best evidenced by the first few dozen trials ending without a ball being hit. At the same time, the gameplay can still be trained effectively without running the full environment. A Matrix (or Inception maybe?) built by a neural network is good enough to train an agent to play just as a simulator is sufficient to allow pilots to polish up their skills without sitting in a real plane, and thus to avoid the risk of a crash.
But “good enough” is hardly perfect, as the film below clearly shows. The screen on the right shows actual Pong, while the middle one features the simulation.

Did you see what happened? There is no spoon ball. When the agent gains skills and proficiency in its tasks, it may reach the end of the Matrix and break the illusion. The world ends, and there are no more skills to gain in that environment.

So where do we go from here? To update the Matrix, that’s where.

Matrix Reloaded

As the agent is unable to improve its skills within the testing environment, the environment must now be improved. The best way to do that is to let the agent wander around the full simulation to gather new data for the neural network simulating the world.
In the case of Pong or other Atari games, it is about observing the ball’s behavior or various types of aliens falling from the sky. If the training were being done on an autonomous car, it would be encountering new types of crossroads and bridges or parking near the shopping mall instead of just driving around a street corner.
Full of new memories, the agent shares its knowledge about the world with the neural network simulating the environment. The network rebuilds the simulated world and the training can be continued.
Still, there is no point in simulating Australia, Finland or Bielefeld city for an agent that will never drive out of Kansas City.

Imaging worlds

Building the artificial worlds controlled and simulated by a neural network greatly reduces the cost of acquiring the data required to train the reinforcement learning agent. On the other hand, the agent gains valuable skills while training in the artificial reality.
Currently, deepsense.ai and Google Brain are able to simulate Atari games. In the future, it will be possible to build a neural network simulating city environments or cities themselves to train autonomous cars or artificial robotic arms while saving significantly on maintenance costs.
So if we can do it now, are you sure there even IS a spoon?

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
  • Share on Reddit
  • Share by Mail
https://deepsense.ai/wp-content/uploads/2019/02/Building-a-Matrix-with-reinforcement-learning-and-artificial-imagination.jpg 337 1140 Konrad Budek https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Konrad Budek2018-08-02 14:42:592021-01-05 16:47:41Building a Matrix with reinforcement learning and artificial imagination

Start your search here

Build your AI solution
with us!

Contact us!

NEWSLETTER SUBSCRIPTION


    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    CATEGORIES

    • Generative AI
    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • How to access OpenAI models through API- differences, limitations & safety issuesHow to access OpenAI models through API: differences, limitations & safety issuesJune 4, 2023
    • How we integrated GPT with PDF documentsHow we developed a GPT‑based solution for extracting knowledge from documentsMay 26, 2023
    • Diffusion models in practice. Part 2: How good is your model?Diffusion models in practice. Part 2: How good is your model?May 8, 2023

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • AI software
    • Team augmentation
    • AI discovery workshops
    • GPT and other LLMs fast track workshop
    • Generative AI
    • Train your team
    • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • linkedin logo facebook logo twitter logo youtube logo medium logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only