Explore all the technology expertise we have to develop AI solutions

CTO Exclusive Survey:
2025 for LLMs x Applied AI

Take the Survey

Get to know us, our leadership, development direction, and why we call ourselves applied AI experts.

Look at our open positions and join the applied AI revolution!

Open Positions

With experience across industries,
we deliver impactful projects in these key sectors.

Home Blog Artificial intelligence imagining and reasoning about the future

Artificial intelligence imagining and reasoning about the future

deepsense.ai

1 minute

read

•

9 March, 2018

Table of contents

Researchers from the deepsense.ai machine learning team, Piotr Miłoś, Błażej Osiński and Henryk Michalewski, together with Łukasz Kaiser from Google Brain’s TensorFlow team optimized infrastructure for reinforcement learning in the Tensor2Tensor project. The team enhanced an advanced reinforcement learning package with improvements related to the state-of-the-art algorithm called Proximal Policy Optimization, which was originally developed by OpenAI. The algorithm proved to be very versatile and was used to solve games such as Dota 2, robotic tasks like Learning to Run (with our model in sixth place) and Atari games. [irp posts=”16874″ name=”Playing Atari with deep reinforcement learning – deepsense.ai’s approach”]

AI imagination and reasoning

The idea behind the improvements was to develop an artificial intelligence capable of imagining and reasoning about the future. Instead of using precise and costly simulators or even more costly real-world data, the new AI spends most of its energy on imagining possible future events. The process of imagining is much less costly than gathering real data. At the same time, a properly trained imagination is a far cry from daydreaming. In fact, it makes it possible to precisely model reality and reason about it hundreds of times faster than would be possible using simulators. The novelty of Tensor2Tensor consists in implementation of the Proximal Policy Optimization, which is completely contained in the computation graph. This is the main technical factor behind the lightning fast imagination. [irp posts=”15620″ name=”Five hottest big data trends 2018 for the techies”]

End-to-end training inside a computation graph

In the second stage of the project the researchers from deepsense.ai, the University of Warsaw and Google Brain are focusing on the end-to-end training of an reinforcement learning agent fully inside a computation graph. [irp posts=”15609″ name=”Five trends for business to surf the big data wave”] One of the steps in the experiment is the implementation of the Proximal Policy Optimization algorithm entirely using TensorFlow atoms. The training will be run on Cloud Tensor Processing Units (TPUs), which are custom Google-designed chips for machine learning. Assuming that a game simulator can be represented as a neural network, we expect that the whole training process can then be kept in the memory of the Cloud TPU. Stay tuned for the results of our project!

deepsense.ai

More resources by this author