AI Monthly Digest 20 – TL;DR
With the world spinning faster every day and delivering an insane amount of news and information to process, the temptation to cherry-pick content to consume increases. Powering summarization with AI tools is among the news we’re covering in this edition of AI Monthly Digest.
Other stories include the proof of deep learning supremacy in the world of board games, biased machines and what comes after Atari and board games.
TL;DR not working (yet)
A research team from the Allen Institute for Artificial Intelligence and the University of Washington has created a scientific summarization engine. The key goal was to check if an AI-based solution could shorten a text and capture the most essential information from a scientific paper. Including the most important information while also providing enough context to make the piece comprehensible is a challenging task even for a human. That the text needs to be grammatically correct and as interesting as possible makes the challenge all the more difficult.
TL;DR – it didn’t work. Yet. For all the details, see the Arxiv paper.
Why it matters
While summarization engines aren’t yet entirely feasible, polish applied over the next few years will lead them to eventually shine.
According to the research done at the University of Ottawa, the number of scientific papers published since 1665 surpassed the 50 million mark in 2009. Add to that the approximately 2.5 million new papers published every year, and you quickly see that staying up to date in any field, let alone many, is only getting harder.
Also, we are beset by news on all sides today. To wade through the overload and get results, people need to manage their attention carefully. That’s why a good, automated summarization engine can be a service we increasingly need not only in science but also our daily lives.
Tackling biased language models
“Thrash in, thrash out,” data scientists will say from time to time, alluding to the quality of an AI model or prediction, which can only be as good as the dataset the model was provided with.
A dataset can be full of hidden or obscure misinformation that a human supervisor usually misses or considers irrelevant given the knowledge they have already gathered. A good example of a model gone bad is the one Amazon AI used to analyze applications for engineering roles, which turned out to be biased against female engineers. Analyzing the profiles of the then current engineers, the model identified significantly fewer females among them, and concluded that women can’t code. For a human, this is obvious nonsense, but with AI there is no such thing as common sense.
Considering that, it is not surprising that research is being done to tackle bias in Natural Language Processing. A consortium from MIT, Intel, and the Montreal Institute for Learning Algorithms (MILA) has come up with a way to evaluate NLP models in a more disciplined and structured way. Their StereoSet is a dataset that measures typical biases in English to check if a model is truly neutral toward its subject. The research is available on Arxiv.
Why it matters
Apart from the fact that any discrimination is evil, a biased model comes with multiple disadvantages. Coming back to the example of Amazon, nobody knows how many talented female coders the company’s model caused to be dismissed. Biases in language processing hurt society, business, and companies in multiple ways, and reducing these biases is one of the most significant challenges to overcome in the near future.
AI playing GameBoy games
An iconic gadget of the late twentieth century, Game Boy and Game Boy Color made it into the palms and pockets of a lot of kids worldwide–118 million, to be exact. The platform handled multiple types of portable games, including simple platform games like Super Mario and more sophisticated RPG-like experiences like the Pokémon series.
Long story short, the platform delivers significantly more advanced games while keeping the graphic environment simple. Today, this is a good training sandbox for reinforcement learning models.
Why it matters
Reinforcement learning shines in multiple classes of problems, especially when there is no straightforward way to solve them. But the point is in delivering an environment where the model can encounter various challenges while keeping the simulator relatively lightweight. Atari is currently a standard sandbox for testing various approaches, even as the needs and classes of problems to solve evolve.
Delivering a GameBoy simulator suitable for running AI experiments is an interesting way to broaden the research. The software is available on GitHub.
It’s official – deep learning supremacy in board games
The supremacy of deep learning in board games is now a fact. Leela Chess Zero has won the latest Top Chess Engine Championship, formerly known as the Thoresen Chess Engines Competition. The tournament has been run since 2010 and is considered an unofficial computer chess championship.
The winner of the 2019 edition was Stockfish, an open-source chess engine that is not powered by ML-based techniques and utilizes more traditional ways of playing chess. Here, “more traditional” doesn’t mean ineffective – no human has beaten Stockfish.
After just ten months of development with deep learning, LCZ beat Stockfish twice – in the first half of 2019, and again this year. Stockfish was initially published in 2008, giving it 12 years of constant development, and in 2019 it remained the champ, but only barely, squeaking out a 50.5 to 49.5 advantage. During the recent faceoff, however, Leela Chess Zero defeated Stockfish 52.5-47.5 in 100 matches.
To give some scale, a typical non-professional player has a chess ranking (so called ELO rating system) of 1000 points. A talented non-professional can reach 2000 points. The current world champion, Magnus Carlsen, has about 2900 points.
Leela and Stockfish, meanwhile, are both over 3800 – far beyond human reach.
Why it matters
First of all, this isn’t only about chess. Apart from some hermetic insight on particular moves or ways to overcome interesting chess situations, most of the moves are incomprehensible for lay observers. Appreciating the strategy and tactics behind each move requires a rare level of chess mastery.
So in fact, the match was between two approaches to solving problems in computer science– traditional coding versus deep learning, the latter of which is superior in playing chess.