Building a Matrix with reinforcement learning and artificial imagination
Time travel and unchaining the time-matter continuum is no big deal. Nor is recruiting a dragon slayer, a Jedi Knight and a Transformer – a child’s mind is able to create fantastic worlds in seconds. So what would happen if robots had an artificial imagination?
Developing innovative strategies in Go or unorthodox approaches to chess are just top-of-mind examples of how the agent in reinforcement learning can be creative.
Go, Chess and League of Legends all draw on the imagination: players use abstract thinking to predict their opponent’s actions and construct a strategy for upcoming moves. Keeping a few scenarios of upcoming actions in mind is one of aspects of using imagination, which is essential to optimal performance. The creation of sub-worlds in the mind can be subconscious.
Professional drivers or football players are basically using a world created within their minds to react in the real time and space around them. It’s hardly a big deal to run to where the ball was a moment ago. But it is crucial to be where it is going to be.
Reinforcement learning agents might appear to have no imagination at all – at the beginning of an experiment their actions are totally random. It is only a matter of rewards or penalties that they build a strategy to maximize the outcome – accident-free driving or effective control over a robotic arm.
In other words they learn, with knowledge gained through experimentation and experience. Knowledge is limited–so limited that at the beginning it is a big, fat 0.
Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world – Albert Einstein.
So what would have happened if a purely knowledge-powered reinforcement learning agent had an imagination? Does it even need one?
Imagine dragons cars, imagine worlds
AI has been empowered with imagination for two purposes – to tune up the performance of the agent and to create a separate world within its… well… mind.
The first case was done by David Ha (Google Brain) and Jürgen Schmidhuber (Nnaisense, IDSIA), where the agent was in control of a race car on a track. The model was awarded for every race it completed and track it visited. Every time the car finished a race, a new track was randomly generated. Although the agent learned to drive collision-free, the car was jerky and tight.
Building an additional artificial neural network to predict the effect of moves and maneuvers before they were executed resulted in a smoother riding. Processing the movement within “imagination” before making the actual decision proved to significantly improve the agent’s performance. What’s more, the neural network was also able to generate the random race tracks on its own – that is, it was basically dreaming about racing. To get further information, please read a description of the “World models” experiment.
In another case, Deepmind conducted an experiment on rendering a 3D environment based on images the agent was fed. Although rotating an object within imagination is effortless for humans, machines struggle to do so, and spatial imagination has never been their strong suit. Nevertheless, the model was able to build a 3D environment with the 2D images of the object it was provided. You can find details about the experiment here.
So AI is currently able to dream, build scenarios of future actions within its mind and build fully functional and plausible models of the world with just a few clues about it.
So if an agent can effectively race without actually racing or build a world within, how about building an Inception? Do agents dream of electric worlds?
The brain in a vat paradox
Training a reinforcement learning agent is expensive mainly due to low sample-efficiency. The agent needs hours (day? months?) of experience to become proficient in the task it has to perform and first attempts are totally random and usually failed after just a few seconds of testing.
The first few hundred autonomous car rides end after a few seconds with the agent unceremoniously running into a tree or a wall. It needs a couple of days to figure out how to break or make a turn.
Is simulating an entire city for a car that crashes after a few seconds really necessary?
In fact, it isn’t.
The same applies to the artificial limb or any other environment. A robotic hand controlled by an RL agent starts from entirely random moves, breaks things and grabs everything but the can of coke. Nonetheless, simulating a full environment replete with realistic physics is unnecessary.
That’s why deepsense.ai and Google brain designed neural networks to simulate the testing environment for the agent that generate plausible data, instead of providing the real thing. This amounts to entrapping the RL agent within a dream of a neural network designed to mimic the world in a Cartesian “evil genius” manner, providing the agent with fabricated signals instead of real ones.
The agent is a brain in a vat, unable to determine–and totally indifferent to–whether the training environment is a real one or merely a matrix created by the neural network.
But why to do so?
We need to go deeper
While the mentioned researchers managed to do a similar trick, deepsense.ai was the first to build Inception around Atari games, a standard benchmark environment for RL models. Building models that can effectively deal with playing Space Invaders or Breakout is a step toward designing agents that can carry out practical and useful tasks, such as driving autonomous cars.
Even the game Pong provides an environment with many variables to control, as perhaps best evidenced by the first few dozen trials ending without a ball being hit. At the same time, the gameplay can still be trained effectively without running the full environment. A Matrix (or Inception maybe?) built by a neural network is good enough to train an agent to play just as a simulator is sufficient to allow pilots to polish up their skills without sitting in a real plane, and thus to avoid the risk of a crash.
But “good enough” is hardly perfect, as the film below clearly shows. The screen on the right shows actual Pong, while the middle one features the simulation.
Did you see what happened? There is no spoon ball. When the agent gains skills and proficiency in its tasks, it may reach the end of the Matrix and break the illusion. The world ends, and there are no more skills to gain in that environment.
So where do we go from here? To update the Matrix, that’s where.
Matrix Reloaded
As the agent is unable to improve its skills within the testing environment, the environment must now be improved. The best way to do that is to let the agent wander around the full simulation to gather new data for the neural network simulating the world.
In the case of Pong or other Atari games, it is about observing the ball’s behavior or various types of aliens falling from the sky. If the training were being done on an autonomous car, it would be encountering new types of crossroads and bridges or parking near the shopping mall instead of just driving around a street corner.
Full of new memories, the agent shares its knowledge about the world with the neural network simulating the environment. The network rebuilds the simulated world and the training can be continued.
Still, there is no point in simulating Australia, Finland or Bielefeld city for an agent that will never drive out of Kansas City.
Imaging worlds
Building the artificial worlds controlled and simulated by a neural network greatly reduces the cost of acquiring the data required to train the reinforcement learning agent. On the other hand, the agent gains valuable skills while training in the artificial reality.
Currently, deepsense.ai and Google Brain are able to simulate Atari games. In the future, it will be possible to build a neural network simulating city environments or cities themselves to train autonomous cars or artificial robotic arms while saving significantly on maintenance costs.
So if we can do it now, are you sure there even IS a spoon?