Reinforcement learning Archives

Five AI trends 2020 to keep an eye on

While making predictions may be easy, delivering accurate ones is an altogether different story. That’s why in this column we won’t just be looking at the most important trends of 2020, but we’ll also look at how the ideas we highlighted last year have developed.

In summarizing the trends of 2020, one conclusion we’ve come to is that society is getting increasingly interested in AI technology, in terms of both the threats it poses and common knowledge about other problems that need to be addressed.

AI trends 2019 in review – how accurate were our predictions?

In our AI Trends 2019 blogpost we chronicled last year’s most important trends and directions of development to watch. It was shortly after launching the AI Monthly Digest, a monthly summary of the most significant and exciting machine learning news. Here’s a short summary of what we were right and wrong about in our predictions.

Chatbots and virtual assistants – powered by a focus on the development of Natural Language Processing (NLP), our prediction was accurate–the growth in this market would be robust. The chatbot market was worth $2.6 billion in 2019 and is predicted to reach up to $9.4 billion by 2024.
the time needed for training would fall – the trend gets reflected by larger neural networks being trained in a feasible time, with GPT-2 being the best example.
Autonomous vehicles are on the rise – the best proof is in our own contribution to the matter in a joint-venture with Volkswagen.
Machine learning and artificial intelligence are being democratized and productionized – According to Gartner, 37% of organizations have implemented AI in some form. That’s a 270% increase over the last four years.
AI and ML responsibility and transparency – the trend encompasses the delivering unbiased models and tools. The story of Amazon using an AI-based recruiting tool that turned out to be biased against female applicants made enough waves to highlight the need for further human control and supervision over automated solutions.

Apparently, deepsense.ai’s data science team was up to date and well-informed on these matters.

“It is difficult to make predictions, especially about the future.”
-Niels Bohr

The world is far from slowing down and Artificial Intelligence (AI) appears to be one of the most dominant technologies at work today. The demand for AI talents has doubled in the last two years with technology and the financial sector absorbing 60% of talented employees on the market.

The Artificial Intelligence market itself is predicted to reach $390.9 billion by 2025, mainly by primarily by automating dull and repetitive tasks. It is predicted that AI will resolve around 20% of unmet healthcare demands.

Considering the impact of AI on people’s daily lives, spotting the right trends to follow is even more important. AI is arguably the most important technology trend of 2020, so enjoy our list!

Natural language processing (NLP) – further development

Whether the world was ready for it or not, GPT-2 was released last year, with balance between safety and progress a guiding motif. Initially, OpenAI refused to make the model and dataset public due to the risk of the technology being used for malicious ends.

The organization released versions of the model throughout 2019, with each confirmed to be “hardened against malicious usage”. The model was considered cutting edge, though like most things in tech, another force soon prevailed. At the end of January 2020, Google Brain took the wraps off of Meena, a 2.6-billion parameter end-to-end neural conversational model trained on 341 GB of online text.

The convenience of NLP solutions is enjoyed by users who have embraced virtual assistants like Google Assistant, Alexa or Siri. According to Adroit Market Research, the market of Intelligent Virtual Assistants is predicted to grow at 33% compound annual growth rate between now and 2025. The market was valued at $2.1 billion in 2019. The increasing use of smartphones and other wearable intelligent devices, among other trends, is predicted to be a driver of the growth.

Started with a consumer-centric approach, virtual assistants are predicted to get more involved in business operations, further automating processes as well as tedious and repetitive tasks. According to Computerworld, approximately 40% of business representatives are planning to implement voice technology within 24 months – that is, no later than in 2021. NLP is shaping up to be a major trend not only this year, but well into the future.

Autonomous vehicles

It is 2020 and driverless cars have yet to hit the streets. In hindsight, the Guardian’s prediction that there would be 10 million self-driving cars on the road by 2020 is all too easy to scoff at now.

On the other hand, tremendous progress has been made and with every month the autonomous car gets closer to rolling out.

deepsense.ai has also contributed to the progress, cooperating with Volkswagen on building a reinforcement learning-based model that, when transferred from a simulated to a real environment, managed to safely drive a car.

But deepsense.ai is far from being the only company bringing significant research about autonomous cars and developing the technology in this field. Also, it is a great difference between seeing an autonomous car on busy city streets and in the slightly less demanding highway environment, where we can expect the automation and semi-automation of driving to first get done.

According to the US Department of Transportation, 63.3% of the $1,139 billion of goods shipped in 2017 were moved on roads. Had autonomous vehicles been enlisted to do the hauling, the transport could have been organized more efficiently, and the need for human effort vastly diminished. Machines can drive for hours without losing concentration. Road freight is globally the largest producer of emissions and consumes more than 70% of all energy used for freight. Every optimization made to fuel usage and routes will improve both energy and time management.

AI getting popular – beneath the surface

There is a lot of buzz around how AI-powered solutions impact our daily lives. While the most obvious change may be NLP powering virtual assistants like Google Assistant, Siri or Alexa, the impact on our daily lives runs much deeper, even if it’s not all that visible at first glance. Artificial intelligence-powered solutions have a strong influence on manufacturing, impacting prices and supply chains of goods.

Here are a few applications being used without batting an eye:

Demand forecasting – companies collect tremendous amounts of data on their customer relationships and transactional history. Also, with the e-commerce revolution humming along, retail companies have gained access to gargantuan amounts of data about customer service, products and services. deepsense.ai delivers demand forecasting tools that not only process such data but also combines it with external sources to deliver more accurate predictions than standard heuristics. Helping companies avoid overstocking while continuing to satisfy demand is one essential benefit demand forecasting promises.
Quality control – harnessing the power of image recognition enables companies to deliver more accurate and reliable quality control automation tools. Because machines are domain-agnostic, the tools can be applied in various businesses, from fashion to construction to manufacturing. Any product that can be controlled using human sight can also be placed under the supervision of computer vision-powered tools.
Manufacturing processes optimization – The big data revolution impacts all businesses, but with IoT and the building of intelligent solutions, companies get access to even more data to process. But it is not about gathering and endless processing in search of insights – the data is also the fuel for optimization, sometimes in surprising ways. Thanks solely to optimization, Google reduced its cooling bill by 40% without adding any new components to its system. Beyond cutting costs, companies also use process optimization to boost employee safety and reduce the number of accidents.
Office processes optimization – AI-powered tools can also be used to augment the daily tasks done by various specialists, including lawyers or journalists. Ernst & Young is using an NLP tool to review contracts, enabling their specialists to use their time more efficiently. Reuters, a global media corporation and press agency, uses AI-powered video transcription tools to deliver time-coded speech-to-text tools that are compatible with 11 languages.

Thanks to the versatility and flexibility of such AI-powered solutions, business applications are possible even in the most surprising industries and companies. So even if a person were to completely abandon technology (right…), the services and products delivered to them would still be produced or augmented with AI, be they clothing, food or furniture.

AI getting mainstream in culture and society

The motif of AI is prevalent in the arts, though usually not in a good way. Isaac Asimov was among the first writers to hold that autonomous robots would need to follow a moral code in order not to become dangerous to humans. Of course, literature has offered a number of memorable examples of AI run amok, including Terminator and HAL 9000 from Space Odyssey.

The question of moral principles may once have been elusive and abstract, but autonomous cars have necessitated a legal framework ascribing responsibility for accidents. Amazon learned about the need to control AI models the hard way, albeit in a less mobile environment: a recruiting tool the company was using had to be scrapped due to a bias against women.

The impact of AI applications on people’s daily lives, choices and careers is building pressure to deliver legal regulations on model transparency as well as information not only about outcomes, but also the reasons behind them. Delivering AI in a black-box mode is not the most suitable way to operate, especially as the number of decisions made automatically by AI-powered solutions increases.

Automating the development of AI

Making AI mainstream is not only about making AI systems more common, but widening the availability of AI tools and their accessibility to less-skilled individuals. The number of models delivering solutions to power with the machine and deep learning will only increase.It should therefore come as no surprise that the people responsible for automating others’ jobs are keen to support their own jobs with automation.

Google enters the field with AutoML, a tool that simplifies the process of developing AI and making it available for a wider audience, one that, presumably, is not going to use ML algorithms in some especially non-standard ways. AutoML joins IBM’s autoAI, which supports data preparation.

Also, there are targeted cloud offerings for companies seeking to harness ready-to-use components in their daily jobs with a view to augmenting their standard procedures with machine learning.

Summary

While the 2020 AI Trends themselves are similar to those of 2019, the details have changed immensely, thus refreshing our perspective seemed worth our while. The world is changing, ML is advancing, and AI is ever more ubiquitous in our daily lives.

Driverless car or autonomous driving? Tackling the challenges of autonomous vehicles

November 26, 2018/in Machine learning, Reinforcement learning /by Konrad Budek

Among both traditional carmakers and cutting-edge tech behemoths, there is massive competition to bring autonomous vehicles to market.

It was a beautiful, sunny day of June 18, 1914 when the brilliant engineer Lawrence Sperry stunned the jury of Concours de la Securité en Aéroplane (Airline Safety Competition) by flying in front of their lodge with his hands held high. It was the first time the public had ever seen a gyroscopic stabilizer, one of the first autopiloting devices. Over a hundred years later, automatic flight control devices and maritime autopilots are common, while cars still require human operation. Thanks to machine learning and autonomous cars, that’s about to change.

What is the future of autonomous vehicles?

According to recent reports, autonomous cars are going to disrupt the private, public and freight transportation industries. A recent Deloitte publication reports that society is putting more and more trust in autonomous vehicles. In 2017, 74% of US, 72% of German and 69% of Canadian respondents declared that fully autonomous cars would not be safe. But those rates have now dropped significantly, to 47%, 45% and 44%, respectively.
Plans for building self-driving cars have been revealed by BMW, Nissan and Ford, while Uber and the Google-affiliated Waymo are also in the thick of the race. Companies aim both to build urban driving vehicles and autonomous trucks, while a startup scene supporting autonomous technology is emerging.
Thanks to the increasing popularity of autonomous cars, up to 40% of mileage could be driven in self-driving vehicles in 2030. But, as always, the devil is in the details.

What is an autonomous car?

To answer that question, the National Highway Traffic Safety Administration uses the autonomous vehicle taxonomy designed by the Society of Automotive Engineers, which lists five levels of automation.

No automation – the driver performs all driving tasks
Driver assistance – the car has built-in functions to assist the driver, who nonetheless must remain engaged in the driving process. Cruise control is one of the best examples.
Partial automation – the vehicle has combined automated functions like acceleration and steering, but the driver must remain engaged. The gyroscopic stabilizer is an example of partial automation.
Conditional automation – a human driver is necessary in totally unpredictable situations, but not required to monitor the environment all the time. BMW currently has a fleet of about 40 level 4 cars unleashed on testing grounds near Munich and in California.
High automation – the car on this level may not even have a steering wheel and can deal with any situation encountered. Fully autonomous vehicles, which do not yet exist, occupy Level 5.

Building level 4 and 5 driverless vehicles is a great challenge because the driving process has a number of complicating factors. Unlike with a plane or ship, drivers usually have little to no time to respond to the changing environment. They must monitor the state of the machine, their surroundings, and the other drivers on the road. What’s more, any mistake can cause an accident – 37.133 people were killed in traffic accidents on American roads in 2017.
While we may not give ourselves credit as drivers, humans’ ability to process signals from various senses to control a car is a super power. It is not only about simply looking at the road – many drivers estimate the distance between cars by looking at reflections in the body of the car in front of it. Many drivers can hear changes in their engine’s performance or sense changing grip strength on various types of road.
To effectively replace human perception, sophisticated assistance systems rely on numerous sensors. GM’s report on autonomous cars and driving technology safety lists:

Cameras – detect and track pedestrians and cyclists, monitor free space and traffic lights
Articulating radars – detect moving vehicles at long range over a wide field of view
Short-range radars – monitor objects around the vehicle
Long-range radars – detect vehicles and measure velocity
Lidars – detect fixed and moving with objects high-precision laser sensors

Handling data from various sources that need to be processed in real time is a perfect task for deep neural networks, especially when it involves simultaneous work on non homogenous data taken from radars, images from cameras and lidar readings.
But building a system that automates driving is an enormous challenge, especially given the sheer number of serious decisions to be made when driving and the fact that a single bad decision can result in disaster.

Two ways how autonomous cars work

There are currently two approaches to building the models that control autonomous vehicles.
A component-based system – the controller is built with several independent models and software components each designed to handle one task, be it road sign recognition, managing the state of the vehicle or interpreting the sensors’ signals.

Pros – dividing the system into subsystems makes building the software easier. Each component can be optimized and developed individually thus improving the system as a whole.
Cons – developing the model requires a massive amount of data to be gathered and processed. The image recognition module needs to be fed different data than the engine control device. This makes preparing the dataset to train more than a little challenging. What’s more, the process of integrating the subsystems may be a challenge in and of itself.

End-to-end system – with this approach, a single model capable of conducting the entire driving process is built – from gathering information from the sensors to steering and reacting accordingly. deepsense.ai is moving ahead with just such a model.

Pros – it is easier to perform all the training within the simulation environment. Modern simulators provide the model with a high-quality, diverse urban environment. Using the simulated environment greatly reduces the cost of gathering data.

Although it is possible to label and prepare data gathered with the simulator, the technique requires a bit more effort. What’s more, it is possible to use a pre-trained neural network to mimic the simulated environment (a matrix of sorts) to further reduce the data to gather or generate. We expect the model to perform better than a component-based system would.

Cons – this type of model may be harder to interpret or reverse-engineer. When it comes to further tuning the model or reducing the challenge posed by the reality gap (see below) it may be a significant obstacle.

Facing the reality gap

Using a simulator-trained model in a real car is always challenging due to what is known as the reality gap. The reality gap represents all the differences and unexpected situations the model may encounter than the designer was able to predict and therefore prepare it for.
There are countless examples. The position of cameras in a real car may be different than in a simulated one, the simulation physics are necessarily incomplete, and there may be a hidden bug the model could exploit. Furthermore, the sensors’ readings may differ from the real ones concerning calibration or precision. There may be a construction feature that causes the car to behave differently in reality than in a simulation. Even the brightest data scientist is unable to predict all the possible scenarios. What would happen if a bird started to peck at the camera? Or if a car encountered a boy dressed as Superman pretending to fly? Or, more plausibly, after a collision, if there were an oil stain that looked exactly like a puddle, but would obviously have an entirely different effect on the tires’ grip of the road?
To address these challenges, data scientists randomize the data and the training environment to let the model gather more varied experiences. The model will learn how to control the car in changing weather and lighting conditions. By changing the camera and sensor settings, a neural network will gain enough experience to handle the differences or any changes that may occur when the model is being used.

Fighting the gap every lap

Using a simulated environment is one effective way of evaluating a model, but a pronounced reality gap still remains. To acquire better information (and also to have some fun on the job), data scientists evaluate their neural networks by launching them in small-scale models. There is currently an interesting Formula 1/10 autonomous car racing competition being held. Designing the software to control cars and compete against other teams is a challenging (yet fun) way to evaluate models. Small-scale models are tested on tracks with angles and long straights that are perfect for acceleration. Although the cars aren’t driven by humans, the team provides full-time technical assistance and puts the car back on track when it falters.
It’s also a great way to impress the jury in a similar manner to what Lawrence Sperry did more than a hundred years ago!
If this sounds interesting to you, deepsense.ai has recently launched a deep learning workshop where participants will learn to create models to control these small cars. The training will be conducted in cooperation with one of world champion Formula 1/10 racers.
The text was prepared in cooperation with Krzysztof Galias, deepsense.ai data scientist.

Expert augmented reinforcement learning – agents of Montezuma’s Revenge

September 21, 2018/in Reinforcement learning /by Konrad Budek

Reinforcement learning is gaining notice as a way to train neural networks to solve open problems that require a flexible, creative approach. As a huge amount of computing power and time are required to train reinforcement learning agent, it is no surprise that researchers are looking for ways to shorten the process. Expert augmented learning appears to be an interesting way to do that.

This article looks at:

Why the learning process in reinforcement learning is long and complex
The transfer of expert knowledge into neural networks to solve this challenge
Applying expert augmented reinforcement learning in practice
Possible use cases for the technique

Designing a system of rewards that motivates an RL agent to behave in the way that is desired is fundamental to the technique. While this is indeed effective, there are still a number of drawbacks that limit its usefulness. One is the complexity of the training process, which grows rapidly with the complexity of the problems to be solved. What’s more, the agent’s first attempts to solve problems are usually entirely random. In learning to run, a project in which an agent was trained to move like a human, the agent would fall forward or backward during its few million initial runs.
When both the environment and the task are complex, the possibilities for “doing it wrong” grows and the data scientist may be unable to spot the hidden drawback within the model.
Of course, the agent looks for ways to maximize the reward and reduce the penalties usually without seeing the larger picture. That’s why any glitch in the environment will be maximally exploited when discovered. Here’s a good example from the game Qbert:

Details about both the agent and the bug found are covered in this paper: Arxiv.
The challenge in teaching neural networks to perform tasks humans do so effortlessly, like grabbing a can of coke or driving a car, is transferring the knowledge required to perform the task. It would be awesome just to put the neural network in the seat next to Kimi Raikkonen and let it learn how to handle the car like a professional driver. Unfortunately, that isn’t possible.
Or is it?

Montezuma’s revenge on AI

The most common way to validate reinforcement learning algorithms is to let them play Atari’s all-time classics like Space Invaders or Breakout. These games provide an environment that is complex enough to test if the model can deal with numerous variables, yet simple enough not to burn up the servers providing the computing power.
Although the agents tend to crack those games relatively easily, games like classic Montezuma’s Revenge pose a considerable challenge.

For those who missed this classic, Montezuma’s Revenge is a platform game where an Indiana Jones-like character (nicknamed Panama Joe) explores the ancient Aztec pyramids, which are riddled with traps, snakes, scorpions and sealed doors, the keys to which, of course, are hidden in other rooms. While similar to Mario Bros games, it was one of the first examples of the “Metroidvania” subgenre, with the Metroid and Castlevania series being the most well-known games.
Montezuma’s Revenge provides a different gaming experience than Space Invaders: the world it presents is more open, and not all objects on the map are hostile. The agent needs to figure out that a snake is deadly, while the key is required to open the door and stepping on it is not only harmless but crucial to finishing the level.
Currently, reinforcement learning alone struggles to solve Montezuma’s Revenge. Having a more experienced player providing a guidance could be a huge time-saver.

The will chained, the mind unchained

To share human knowledge with a neural network, information must be provided about what experts do and how they behave in a given environment. In the case of Montezuma’s Revenge, this means providing a snapshot of the screen and the player’s reaction. If he or she is driving a car, any number of additional steps would have to be taken: the track would have to be recorded and information about the car and position of the steering wheel would also need to be provided.
At every stage of training, the agent is not only motivated to maximize rewards, but also to mimic the human. This is particularly helpful when there is no immediate reward coming from the game environment.
However, the drawback of following the expert is that the network doesn’t develop an ability to react to unexpected situations. Following the example of Raikonnen’s driving, the network would be able to perform well on a track that was recorded, but racing in other weather conditions or against new opponents would render the network helpless. This is precisely where reinforcement learning shines.

In the case of Montezuma’s Revenge, our algorithm was trained to strike a balance between following the expert and maximizing the reward. Thus if the expert never stepped on the snake, the agent wouldn’t, either. If the expert had done something, it likely did the same. If the agent found itself in a new situation, it would try to follow the behavior of the expert. If the reward for ignoring suggestions was to high, it opted for the larger payload.
If you get lost, get to the road and stick to it until you get into a familiar neighborhood, right? The agent is always motivated to mimic the expert’s actions. Methods which just initially copy human behavior and then let the agent explore randomly are too weak to deliver noteworthy results.
The idea of augmenting reinforcement learning with expert knowledge proved to be surprisingly effective. Our model performed well in Montezuma’s Revenge, beating level after level. Moreover, it didn’t stop exploiting the reward policy to maximize its rewards. The Agent spotted an unpublished bug in the game. This discovery led to the score of the 804 900 points – a world record. Our agent was pushed on by the endless reward maximization loop depicted here:

Although annoying, the loop itself is proof that the agent is not mindlessly following the expert. With enough motivation it is able to develop its own strategy to maximize its rewards, thus using the expert knowledge creatively.
Cloning and enhancing human behavior are among the ultimate goals of machine learning. Nevertheless, the expert doesn’t actually need to be a human. This leads to interesting possibilities. A machine can be used to mimic other machines programmed with methods that don’t employ artificial intelligence and then build on top of it.

Summary – reducing costs

Empowering reinforcement learning with expert knowledge opens new avenues of development for AI-powered devices.

It uses the best from two worlds by following human behavior and a superhuman talent characteristic for reinforcement learning agents manifesting in exploiting convenient opportunities and loopholes present in the environment.
It increases safety by reducing randomness, especially in the early stage of learning.
It significantly reduces the time required for learning, as the agent gets hints from a human expert, thus reducing the need for completely random exploration.

As the cost of designing a reinforcement learning agent grows exponentially alongside the task’s level of complexity and the number of variables involved, using expert knowledge to train the agent is very cost-effective: it reduces not only the cost of data and computing power, but also the time required to gain results. The technical details of our solution can be found here: Arxiv.org and here: GitHub repository.

Special cooperation

In this project we cooperated with independent researcher Michał Garmulewicz (blog, github), who provided fundamental technical and conceptual input. We hope to continue such cooperation with Michał and other researchers.

Building a Matrix with reinforcement learning and artificial imagination

August 2, 2018/in Reinforcement learning /by Konrad Budek

Time travel and unchaining the time-matter continuum is no big deal. Nor is recruiting a dragon slayer, a Jedi Knight and a Transformer – a child’s mind is able to create fantastic worlds in seconds. So what would happen if robots had an artificial imagination?

Developing innovative strategies in Go or unorthodox approaches to chess are just top-of-mind examples of how the agent in reinforcement learning can be creative.
Go, Chess and League of Legends all draw on the imagination: players use abstract thinking to predict their opponent’s actions and construct a strategy for upcoming moves. Keeping a few scenarios of upcoming actions in mind is one of aspects of using imagination, which is essential to optimal performance. The creation of sub-worlds in the mind can be subconscious.
Professional drivers or football players are basically using a world created within their minds to react in the real time and space around them. It’s hardly a big deal to run to where the ball was a moment ago. But it is crucial to be where it is going to be.
Reinforcement learning agents might appear to have no imagination at all – at the beginning of an experiment their actions are totally random. It is only a matter of rewards or penalties that they build a strategy to maximize the outcome – accident-free driving or effective control over a robotic arm.

In other words they learn, with knowledge gained through experimentation and experience. Knowledge is limited–so limited that at the beginning it is a big, fat 0.

Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world – Albert Einstein.

So what would have happened if a purely knowledge-powered reinforcement learning agent had an imagination? Does it even need one?

Imagine dragons cars, imagine worlds

AI has been empowered with imagination for two purposes – to tune up the performance of the agent and to create a separate world within its… well… mind.
The first case was done by David Ha (Google Brain) and Jürgen Schmidhuber (Nnaisense, IDSIA), where the agent was in control of a race car on a track. The model was awarded for every race it completed and track it visited. Every time the car finished a race, a new track was randomly generated. Although the agent learned to drive collision-free, the car was jerky and tight.

Building an additional artificial neural network to predict the effect of moves and maneuvers before they were executed resulted in a smoother riding. Processing the movement within “imagination” before making the actual decision proved to significantly improve the agent’s performance. What’s more, the neural network was also able to generate the random race tracks on its own – that is, it was basically dreaming about racing. To get further information, please read a description of the “World models” experiment.
In another case, Deepmind conducted an experiment on rendering a 3D environment based on images the agent was fed. Although rotating an object within imagination is effortless for humans, machines struggle to do so, and spatial imagination has never been their strong suit. Nevertheless, the model was able to build a 3D environment with the 2D images of the object it was provided. You can find details about the experiment here.
So AI is currently able to dream, build scenarios of future actions within its mind and build fully functional and plausible models of the world with just a few clues about it.
So if an agent can effectively race without actually racing or build a world within, how about building an Inception? Do agents dream of electric worlds?

The brain in a vat paradox

Training a reinforcement learning agent is expensive mainly due to low sample-efficiency. The agent needs hours (day? months?) of experience to become proficient in the task it has to perform and first attempts are totally random and usually failed after just a few seconds of testing.
The first few hundred autonomous car rides end after a few seconds with the agent unceremoniously running into a tree or a wall. It needs a couple of days to figure out how to break or make a turn.
Is simulating an entire city for a car that crashes after a few seconds really necessary?
In fact, it isn’t.
The same applies to the artificial limb or any other environment. A robotic hand controlled by an RL agent starts from entirely random moves, breaks things and grabs everything but the can of coke. Nonetheless, simulating a full environment replete with realistic physics is unnecessary.

That’s why deepsense.ai and Google brain designed neural networks to simulate the testing environment for the agent that generate plausible data, instead of providing the real thing. This amounts to entrapping the RL agent within a dream of a neural network designed to mimic the world in a Cartesian “evil genius” manner, providing the agent with fabricated signals instead of real ones.

The agent is a brain in a vat, unable to determine–and totally indifferent to–whether the training environment is a real one or merely a matrix created by the neural network.
But why to do so?

We need to go deeper

While the mentioned researchers managed to do a similar trick, deepsense.ai was the first to build Inception around Atari games, a standard benchmark environment for RL models. Building models that can effectively deal with playing Space Invaders or Breakout is a step toward designing agents that can carry out practical and useful tasks, such as driving autonomous cars.
Even the game Pong provides an environment with many variables to control, as perhaps best evidenced by the first few dozen trials ending without a ball being hit. At the same time, the gameplay can still be trained effectively without running the full environment. A Matrix (or Inception maybe?) built by a neural network is good enough to train an agent to play just as a simulator is sufficient to allow pilots to polish up their skills without sitting in a real plane, and thus to avoid the risk of a crash.
But “good enough” is hardly perfect, as the film below clearly shows. The screen on the right shows actual Pong, while the middle one features the simulation.

Did you see what happened? There is no spoon ball. When the agent gains skills and proficiency in its tasks, it may reach the end of the Matrix and break the illusion. The world ends, and there are no more skills to gain in that environment.

So where do we go from here? To update the Matrix, that’s where.

Matrix Reloaded

As the agent is unable to improve its skills within the testing environment, the environment must now be improved. The best way to do that is to let the agent wander around the full simulation to gather new data for the neural network simulating the world.
In the case of Pong or other Atari games, it is about observing the ball’s behavior or various types of aliens falling from the sky. If the training were being done on an autonomous car, it would be encountering new types of crossroads and bridges or parking near the shopping mall instead of just driving around a street corner.
Full of new memories, the agent shares its knowledge about the world with the neural network simulating the environment. The network rebuilds the simulated world and the training can be continued.
Still, there is no point in simulating Australia, Finland or Bielefeld city for an agent that will never drive out of Kansas City.

Imaging worlds

Building the artificial worlds controlled and simulated by a neural network greatly reduces the cost of acquiring the data required to train the reinforcement learning agent. On the other hand, the agent gains valuable skills while training in the artificial reality.
Currently, deepsense.ai and Google Brain are able to simulate Atari games. In the future, it will be possible to build a neural network simulating city environments or cities themselves to train autonomous cars or artificial robotic arms while saving significantly on maintenance costs.
So if we can do it now, are you sure there even IS a spoon?

What is reinforcement learning in Machine Learning

What is reinforcement learning? deepsense.ai’s complete guide

July 5, 2018/in Deep learning, Machine learning, Reinforcement learning, Popular posts /by Błażej Osiński and Konrad Budek

With an estimated market size of 7.35 billion US dollars, artificial intelligence is growing by leaps and bounds. McKinsey predicts that AI techniques (including deep learning and reinforcement learning) have the potential to create between $3.5T and $5.8T in value annually across nine business functions in 19 industries.

Although machine learning is seen as a monolith, this cutting-edge technology is diversified, with various sub-types including machine learning, deep learning, and the state-of-art technology of deep reinforcement learning.

What is reinforcement learning?

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem. To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.
Although the designer sets the reward policy–that is, the rules of the game–he gives the model no hints or suggestions for how to solve the game. It’s up to the model to figure out how to perform the task to maximize the reward, starting from totally random trials and finishing with sophisticated tactics and superhuman skills. By leveraging the power of search and many trials, reinforcement learning is currently the most effective way to hint machine’s creativity. In contrast to human beings, artificial intelligence can gather experience from thousands of parallel gameplays if a reinforcement learning algorithm is run on a sufficiently powerful computer infrastructure.

Examples of reinforcement learning

Applications of reinforcement learning were in the past limited by weak computer infrastructure. However, as Gerard Tesauro’s backgamon AI superplayer developed in 1990’s shows, progress did happen. That early progress is now rapidly changing with powerful new computational technologies opening the way to completely new inspiring applications.
Training the models that control autonomous cars is an excellent example of a potential application of reinforcement learning. In an ideal situation, the computer should get no instructions on driving the car. The programmer would avoid hard-wiring anything connected with the task and allow the machine to learn from its own errors. In a perfect situation, the only hard-wired element would be the reward function.

For example, in usual circumstances we would require an autonomous vehicle to put safety first, minimize ride time, reduce pollution, offer passengers comfort and obey the rules of law. With an autonomous race car, on the other hand, we would emphasize speed much more than the driver’s comfort. The programmer cannot predict everything that could happen on the road. Instead of building lengthy “if-then” instructions, the programmer prepares the reinforcement learning agent to be capable of learning from the system of rewards and penalties. The agent (another name for reinforcement learning algorithms performing the task) gets rewards for reaching specific goals.

Another example: deepsense.ai took part in the “Learning to run” project, which aimed to train a virtual runner from scratch. The runner is an advanced and precise musculoskeletal model designed by the Stanford Neuromuscular Biomechanics Laboratory. Learning the agent how to run is a first step in building a new generation of prosthetic legs, ones that automatically recognize people’s walking patterns and tweak themselves to make moving easier and more effective. While it is possible and has been done in Stanford’s labs, hard-wiring all the commands and predicting all possible patterns of walking requires a lot of work from highly skilled programmers.

For more real-life applications of reinforcement learning check this article.

Challenges with reinforcement learning

Creating realistic simulation environments

The main challenge in reinforcement learning lays in preparing the simulation environment, which is highly dependant on the task to be performed. When the model has to go superhuman in Chess, Go or Atari games, preparing the simulation environment is relatively simple. When it comes to building a model capable of driving an autonomous car, building a realistic simulator is crucial before letting the car ride on the street. The model has to figure out how to brake or avoid a collision in a safe environment, where sacrificing even a thousand cars comes at a minimal cost. Transferring the model out of the training environment and into to the real world is where things get tricky.

Scaling and tweaking neural networks

Scaling and tweaking the neural network controlling the agent is another challenge. There is no way to communicate with the network other than through the system of rewards and penalties.This in particular may lead to catastrophic forgetting, where acquiring new knowledge causes some of the old to be erased from the network (to read up on this issue, see this paper, published during the International Conference on Machine Learning).

Overcoming local optimum and task evasion

Yet another challenge is reaching a local optimum – that is the agent performs the task as it is, but not in the optimal or required way. A “jumper” jumping like a kangaroo instead of doing the thing that was expected of it-walking-is a great example, and is also one that can be found in our recent blog post.
Finally, there are agents that will optimize the prize without performing the task it was designed for. An interesting example can be found in the OpenAI video below, where the agent learned to gain rewards, but not to complete the race.

What distinguishes reinforcement learning from deep learning and machine learning?

In fact, there should be no clear divide between machine learning, deep learning and reinforcement learning. It is like a parallelogram – rectangle – square relation, where machine learning is the broadest category and the deep reinforcement learning the most narrow one.
In the same way, reinforcement learning is a specialized application of machine and deep learning techniques, designed to solve problems in a particular way.

Although the ideas seem to differ, there is no sharp divide between these subtypes. Moreover, they merge within projects, as the models are designed not to stick to a “pure type” but to perform the task in the most effective way possible. So “what precisely distinguishes machine learning, deep learning and reinforcement learning” is actually a tricky question to answer.

What is machine learning?

Machine learning is a form of AI in which computers are given the ability to progressively improve the performance of a specific task with data, without being directly programmed ( this is Arthur Lee Samuel’s definition). He coined the term “machine learning”, of which there are two types, supervised and unsupervised machine learning

Supervised machine learning happens when a programmer can provide a label for every training input into the machine learning system.

Example – by analyzing the historical data taken from coal mines, deepsense.ai prepared an automated system for predicting dangerous seismic events up to 8 hours before they occur. The records of seismic events were taken from 24 coal mines that had collected data for several months. The model was able to recognize the likelihood of an explosion by analyzing the readings from the previous 24 hours.

AAIA16 Data Mining Challenge Seismic Events Height Randomization

Some of the mines can be exactly identified by their main working height values. To obstruct the identification, we added some Gaussian noise

From the AI point of view, a single model was performing a single task on a clarified and normalized dataset. To get more details on the story, read our article about machine learning models predicting dangerous seismic events.
Unsupervised learning takes place when the model is provided only with the input data, but no explicit labels. It has to dig through the data and find the hidden structure or relationships within. The designer might not know what the structure is or what the machine learning model is going to find.

An example we employed was for churn prediction. We analyzed customer data and designed an algorithm to group similar customers. However, we didn’t choose the groups ourselves. Later on, we could identify high-risk groups (those with a high churn rate) and our client knew which customers they should approach first.
Another example of unsupervised learning is anomaly detection, where the algorithm has to spot the element that doesn’t fit in with the group. It may be a flawed product, potentially fraudulent transaction or any other event associated with breaking the norm.

What is deep learning?

Deep learning consists of several layers of neural networks, designed to perform more sophisticated tasks. The construction of deep learning models was inspired by the design of the human brain, but simplified. Deep learning models consist of a few neural network layers which are in principle responsible for gradually learning more abstract features about particular data.
Although deep learning solutions are able to provide marvelous results, in terms of scale they are no match for the human brain. Each layer uses the outcome of a previous one as an input and the whole network is trained as a single whole. The core concept of creating an artificial neural network is not new, but only recently has modern hardware provided enough computational power to effectively train such networks by exposing a sufficient number of examples. Extended adoption has brought about frameworks like TensorFlow, Keras and PyTorch, all of which have made building machine learning models much more convenient.

Example: deepsense.ai designed a deep learning-based model for the National Oceanic and Atmospheric Administration (NOAA). It was designed to recognize Right whales from aerial photos taken by researchers. For further information about this endangered species and deepsense.ai’s work with the NOAA, read our blog post. From a technical point of view, recognizing a particular specimen of whales from aerial photos is pure deep learning. The solution consists of a few machine learning models performing separate tasks. The first one was in charge of finding the head of the whale in the photograph while the second normalized the photo by cutting and turning it, which ultimately provided a unified view (a passport photo) of a single whale.

The third model was responsible for recognizing particular whales from photos that had been prepared and processed earlier. A network composed of 5 million neurons located the blowhead bonnet-tip. Over 941,000 neurons looked for the head and more than 3 million neurons were used to classify the particular whale. That’s over 9 million neurons performing the task, which may seem like a lot, but pales in comparison to the more than 100 billion neurons at work in the human brain. We later used a similar deep learning-based solution to diagnose diabetic retinopathy using images of patients’ retinas.

Reinforcement learning in detail

Reinforcement learning, as stated above employs a system of rewards and penalties to compel the computer to solve a problem by itself. Human involvement is limited to changing the environment and tweaking the system of rewards and penalties. As the computer maximizes the reward, it is prone to seeking unexpected ways of doing it. Human involvement is focused on preventing it from exploiting the system and motivating the machine to perform the task in the way expected. Reinforcement learning is useful when there is no “proper way” to perform a task, yet there are rules the model has to follow to perform its duties correctly. Take the road code, for example.

Example: By tweaking and seeking the optimal policy for deep reinforcement learning, we built an agent that in just 20 minutes reached a superhuman level in playing Atari games. Similar algorithms in principal can be used to build AI for an autonomous car or a prosthetic leg. In fact, one of the best ways to evaluate the reinforcement learning approach is to give the model an Atari video game to play, such as Arkanoid or Space Invaders. According to Google Brain’s Marc G. Bellemare, who introduced Atari video games as a reinforcement learning benchmark, “although challenging, these environments remain simple enough that we can hope to achieve measurable progress as we attempt to solve them”.

	Breakout
Initial performance	After 15 minutes of training	After 30 minutes of training

	Assault
Initial performance	After 15 minutes of training	After 30 minutes of training

In particular, if artificial intelligence is going to drive a car, learning to play some Atari classics can be considered a meaningful intermediate milestone. A potential application of reinforcement learning in autonomous vehicles is the following interesting case. A developer is unable to predict all future road situations, so letting the model train itself with a system of penalties and rewards in a varied environment is possibly the most effective way for the AI to broaden the experience it both has and collects.

Reinforcement learning vs deep learning ve machine learning: conclusion

The key distinguishing factor of reinforcement learning is how the agent is trained. Instead of inspecting the data provided, the model interacts with the environment, seeking ways to maximize the reward. In the case of deep reinforcement learning, a neural network is in charge of storing the experiences and thus improves the way the task is performed.

Is reinforcement learning the future of machine learning?

Although reinforcement learning, deep learning, and machine learning are interconnected no one of them in particular is going to replace the others. Yann LeCun, the renowned French scientist and head of research at Facebook, jokes that reinforcement learning is the cherry on a great AI cake with machine learning the cake itself and deep learning the icing. Without the previous iterations, the cherry would top nothing.
In many use cases, using classical machine learning methods will suffice. Purely algorithmic methods not involving machine learning tend to be useful in business data processing or managing databases.
Sometimes machine learning is only supporting a process being performed in another way, for example by seeking a way to optimize speed or efficiency.
When a machine has to deal with unstructured and unsorted data, or with various types of data, neural networks can be very useful.

Summary

Reinforcement learning is no doubt a cutting-edge technology that has the potential to transform our world. However, it need not be used in every case. Nevertheless, reinforcement learning seems to be the most likely way to make a machine creative – as seeking new, innovative ways to perform its tasks is in fact creativity. This is already happening: DeepMind’s now famous AlphaGo played moves that were first considered glitches by human experts, but in fact secured victory against one of the strongest human players, Lee Sedol.
Thus, reinforcement learning has the potential to be a groundbreaking technology and the next step in AI development.

Five AI trends 2020 to keep an eye on

AI trends 2019 in review – how accurate were our predictions?

Natural language processing (NLP) – further development

Autonomous vehicles

AI getting popular – beneath the surface

AI getting mainstream in culture and society

Automating the development of AI

Summary

Driverless car or autonomous driving? Tackling the challenges of autonomous vehicles

What is the future of autonomous vehicles?

What is an autonomous car?

Two ways how autonomous cars work

Facing the reality gap

Fighting the gap every lap

Expert augmented reinforcement learning – agents of Montezuma’s Revenge

Montezuma’s revenge on AI

The will chained, the mind unchained

Summary – reducing costs

Special cooperation

Building a Matrix with reinforcement learning and artificial imagination

Imagine dragons cars, imagine worlds

The brain in a vat paradox

We need to go deeper

Matrix Reloaded

Imaging worlds

Contact us

Locations

Let us know how we can help

Services

Resources

About us

Support

Join our community