Deep learning Archives - deepsense.ai

3D meets AI – an unexplored world of new business opportunities

AI has become a powerful force in computer vision and it has unleashed tangible business opportunities for 2D visual data such as images and videos. Applying AI can bring tremendous results in a number of fields. To learn more about this exciting area, read our overview of 2D computer vision algorithms and applications.

Despite its popularity, there is nothing inherent to 2D imagery that makes it uniquely suitable for AI application. In fact, artificial intelligence systems can analyze various forms of information, including volumetric data. In spite of the increasing number of companies already using 3D data gathered by lidar or 3D cameras, AI applications aren’t the mainstream in their industries.

In this post, we describe how to leverage 3D data across multiple industries with the use of AI. Later in the article we’ll have a closer look at the nuts and bolts of the technology and we’ll aslo show what it takes to apply AI to 3D data. At the end of the post, you’ll also find an interactive demo to play with.

In the 3D world, there is no Swiss Army Knife

3D data is what we call volumetric information. The most common types include:

2.5D data, including information on depth or the distance to visible objects, but no volumetric information of what’s hidden behind them. Lidar data is an example.
3D data, with full volumetric information. Examples include MRI scans or objects rendered with computer graphics.
4D data, where volumetric information is captured as a sequence, and the outcome is a recording where one can go back and forth in time to see the changes occurring in the volume. We refer to this as 3D + time, which we can treat as the 4th dimension. Such representation enables us to visualize and model dynamic 3D processes, which is especially useful in medical applications such as respiratory or cardiac monitoring.

There are also multiple data representations. These include a compound of 2D images along the normal axis, sparse Point Cloud representation and voxelized representation. Such data could have additional channels, like reflectance in every point of a lidar’s view.

Depending on the business need, there can be different objectives for using AI: object detection and classification, semantic segmentation, instance segmentation and movement parameterization, to name a few. Moreover, every setup has its own characteristics and limitations that should be addressed with a dedicated approach (or, in the case of artificial neural networks, with a sophisticated and thoroughly designed architecture). These are the main reasons our clients come to us, and to take advantage of our experience in the field. We are responsible for delivering the AI part of specific projects, even though the majority of their competencies are built in-house.

Let us have a closer look at a few examples

1. Autonomous driving

Task: 3D object detection and classification,
Data: 2.5 Point clouds captured with a lidar: sparse data, big distances between points

Autonomous driving data are very sparse because:

the distances between objects in outdoor environments are significant
In the majority of cases lidar rays from the front and rear of the car don’t return to lidar, since there are no objects to reflect them.
The resolution of objects gets worse the further they are from the laser scanner. Due to the angular expansion of the beam it’s impossible to determine the precise shape of objects that are far away.

For autonomous driving, we needed a system that can take advantage of data sparsity to infer 3D bounding boxes around objects. One such network is the part-aware and aggregation neural network i.e. Part-A2 net (https://arxiv.org/abs/1907.03670). This is a two-stage network that uses the high separability of objects, which functions as segmentation information.

In the first stage, the network estimates the position of foreground points of objects inside bounding boxes generated by an anchor-based or anchor-free scheme. Then, in the second stage, the network aggregates local information for box refinement and class estimation. The network output is shown below, with the colors of points in bounding boxes showing their relative location as perceived by the Part-A² net.

Source of image: From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

2. Indoor scene mapping

Task: Object instance segmentation
Data: Point clouds, sparse data, relatively small distances between points

A different setup is called for in mapping indoor environments, such as we do with instance segmentation of objects in office space or shops (see this dataset for better intuition: S3DIS dataset). Here we employ a relatively high-density representation of a point cloud and BoNet architecture.

In this case the space is divided into a 1- x 1- x 1-meter cubic grid. In each cube, a few thousand points are sampled for further processing. In an autonomous driving scenario, such a grid division would make little sense given the sheer number of cubes produced, many of which are empty and only a few of which contain any relevant information.

The network produces semantic segmentation masks as well as bounding boxes. The inference is a two-stage process. The first produces a global feature vector to predict a fixed number of bounding boxes. It also tallies scores to indicate whether some of the predicted classes are inside those boxes. The point-level and global features derived in the first stage are then used to predict a point-level binary mask with the class assignment. The pictures below show a typical scene with the segmentation masks.

3D meets AI - Indoor scene mapping — An example from the S3DIS dataset. From left: input image, semantic segmentation labels, instance segmentation labels

3. Medical diagnosis

Task: 3D Semantic segmentation
Data: Stacked 2D images, dense data, small distance between images

This is a highly controlled setup, where all 2D images are carefully and densely stacked together. Such a representation can be treated as a natural extension of a 2D setup. In such cases, modifying existing 2D approaches will deliver satisfactory results.

An example of a modified 2D approach is the 3D U-Net (https://arxiv.org/abs/1606.06650), where all 2D operations for a classical U-Net are replaced by their 3D counterparts. If you want to know more about AI in medicine, check out how it can be used to help with COVID-19 diagnosis and other challenges.

3D meets AI - Medical diagnosis — Source: Head CT scan

4. A 3D-enhanced 2D approach

There is also another case, where luckily, it can be relatively straightforward to apply expertise and technology developed for 2D cases in 3D applications. One such scenario is where there are 2D labels available, but the data and the inference products are in 3D. Another is when 3D information can play a supportive role.

In such a case, a depth map produced by 3D cameras can be treated as an additional image channel beyond regular RGB colors. Such additional information increases the sensitivity of neural networks to edge detection and thus yield better object boundaries.

3D meets AI - A 3D-enhanced 2D approach — Source: Azure Kinect DK depth camera

Examples of the projects we have delivered in such a setup include:

Defect detection based on 2D and 3D images.

We developed an AI system for a tire manufacturer to detect diverse types of defects. 3D data played a crucial role as it allowed for ultra-precise detection of submillimeter-size bubbles and scratches.

Object detection in a factory

We designed a system to detect and segment industrial assets in a chemical facility that had been thoroughly scanned with high resolution laser scanners. Combining 2D and 3D information allowed us to digitize the topology of the installation and its pipe system.

3D data needs a mix of competencies

At deepsense.ai, we have a team of data scientists and software engineers handling the algorithmic, visualization, and integration capabilities. Our teams are set up to flexibly adapt to specific business cases and provide tailor-made AI solutions. The solutions they produce are an alternative to pre-made, off-the-shelf products, which often prove too rigid and constrained; they fail once user expectations deviate from the assumptions of their designers.

Processing and visualizing data in near real time with appropriate user experience is no piece of cake. Doing so requires a tough balancing act, including

combining specific business needs, technical limitations resulting from huge data loads and the need to support multiple platforms.

It is always easier to discuss based on an example. Next section shows what it takes to develop an object detection system for autonomous vehicles with outputs accessible from a web browser. The goal is to predict bounding boxes of 3 different classes: car, pedestrian and cyclist, 360 degrees around the car. Such a project can be divided into 4 interconnected components: data processing, algorithms, visualizations and deployment.

Data preprocessing

In our example, we use the KITTI and A2D2 datasets, two common datasets for autonomous driving, and ones our R&D hub rely on heavily. In both datasets, we use data from spinning lidars for inference and cameras for visualization purposes.

Lidars and cameras work independently, capturing data at different rates. To obtain a full picture, all data have to be mapped to a common coordinate system and adjusted for time. This is no easy task. As lidars are constantly spinning, each point is captured at a different time, while simultaneously the position and rotation of the car in relation to world coordinates is changing. Meanwhile, the precise location and angle of the car is not known perfectly due to limitations of geolocation systems such as GPS. These difficulties make it extremely difficult to precisely and stably determine the absolute positions of objects around you (SLAM can be used to tackle some of the problems).

Fortunately, absolute positioning of objects around the vehicle is not always required.

Algorithms

There are a vast number of approaches when it comes to 3D data. However, factors such as the length to and between objects and high sparsity will play an essential role in which algorithm we ultimately settle on. As in the first example above, we used Part-A2 net.

Deployment

We have relied on a complete, in-house solution for visualization, data handling, and UI. We have used expertise in the Unity engine to develop a cross-platform, graphically rich and fully flexible solution. In terms of a platform, we opted for maximum availability, which can be satisfied by a popular web browser like Chrome or Mozilla and WebGL as Unity’s compilation platform.

Visualization/UI

WebGL, while very comfortable for the user, disables drive access and advanced GPU features, limits available RAM to 2GB and processing to a single thread. Additionally, while standalone solutions in Unity may rely on existing libraries for point cloud visualization, making it possible to visualize hundreds of millions of points (thanks to advanced GPU features), this is not the case in WebGL.

Therefore, we have developed an in-house visualization solution enabling real-time, in-browser visualization of up to 70 mln points. Give it a try!

Such visualization could be tailored to the company’s specific needs. In a recent project, we took a different approach: we used AR glasses in visualizing a factory in all its complexity. This enabled our client to reach next level user experience and see the factory in a whole new light.

Summary

We hope that this post has shed some light on how AI can be used with 3D data. If you have a particular 3D use case in mind or you are just curious about the potential for AI solutions in your field, please reach out to us. We’ll be happy to share our experience and discuss potential ways we can help you apply the power of artificial intelligence in your business. Please drop us an email at contact@deepsense.ai.

AI in healthcare – tackling COVID-19 and other future challenges

May 8, 2020/in Data science, Deep learning /by Paulina Knut, Maciej Leoniak and Konrad Budek

Throughout history, tackling pandemics has always been about using the latest knowledge and approaches. Today, with AI-powered solutions, healthcare has new tools to tackle present and future challenges, and the COVID-19 pandemic will prove to be a catalyst of change.

It was probably a typical October day in Messina, a Sicilian port, when 12 genoese ships docked. People were horrified to discover the dead bodies of sailors aboard, and with them the entrance of the black death to Europe. Today, in the age of vaccines and advanced medical treatments, the specter of a pandemic may until recently have seemed a phantom menace. But the COVID pandemic has proved otherwise.

There are currently several challenges regarding the COVID, including symptoms that can be easily mistaken with those of the common flu. An X-ray or CT image of lungs is a key element in the diagnosis and treatment of COVID 19 – the disease produces several telltale signs that are easy for trained professionals to spot. Or a trained neural network.

Neural networks- a building block for medical AI analysis

Computer scientists have traditionally developed methods that let them find keypoints on images based on defined heuristics, which allow them to tackle a huge array of problems. For example, locating machine parts on a uniform conveyor belt where simple colour filtration differentiates them from the background. But this is not the case for more sophisticated problems, where extensive domain knowledge is required.

Enter Neural Networks, algorithms inspired by the mathematical model of how the human brain processes signals. In the same way as humans gain knowledge by gathering experience, Neural Networks process data and learn on their own, instead of being manually tuned.

In AI-powered image processing, every pixel is represented as an input node and its value is passed to neurons in the next layer, allowing the interdependencies between pixels to be captured. As seen in the face detection model below, the lower layers develop the ability to filter simple shapes like edges and corners (e.g., eye corners) or color gradients. These are then used by intermediate layers to construct more sophisticated shapes representing the parts of the objects being analysed (in this case eyes, parts of lips or a lung edge etc.). The high layers analyse recognised parts and classify them as specific objects. In the case of X-ray images, such objects may be a rib, a lung or an irrelevant object in the background.

Source: researchgate.net

A neural network can see details the average observer cannot, and even specialists would be hard-pressed to find. But such skill requires a significant amount of training and a good dataset.

What does it take to train neural networks?

Data scientists spend a lot of time ensuring their models have the ability to generalise, and can thus deliver accurate predictions from data they didn’t encounter during training. This requires vast knowledge of data preprocessing and augmentation techniques, state-of-the-art network architectures and error-interpreting skills. The iterative process of designing and executing experiments is also both very time- and computing power-consuming and requires good organisation if it is to be done efficiently. Under these conditions, high prediction accuracy is hard to achieve – deepsense.ai’s teams have been developing this ability for 7 years.

The key difference between a human specialist and a neural network is that the latter is completely domain-agnostic. An algorithm that excelled in Segmenting satellite images or recognising individual North Atlantic right whales from a population of 447 of North Atlantic right whales can just as well be used for medical image recognition after tuning.

AI in medical data

Numerous AI solutions are currently used in medicine: from appointments and digitization of medical records to drug dosing algorithms (applications of artificial intelligence in health care). However, doctors still have to perform painstaking and repetitive tasks e.g. by analyzing images.

Images are used across the field of medicine, but they play a particularly important role in radiology. According to IBM estimates, up to 90% of all medical data is in image form, be it x-rays, MRIs or most other output from a diagnostic device. That is why radiology as a field is so open to using new technologies. Computers initially used in clinical imaging for administrative work, such as image acquisition and storage, are now becoming an indispensable element of the work environment at the beginning of the image archiving and communication system.

Recently, deep learning has been used with great success in medical imaging thanks to its ability to extract features. In particular, neural networks have been used to detect and differentiate bacterial and viral pneumonia in childrens’ chest radiographs).

COVID appears to be a similar case. Studies show that 86% of Covid-19 patients have ground-glass opacities (GGO), 64% have mixed GGO and consolidation and 71% have vascular enlargement in the lesion. This can be observed on CT scans as well as chest X-ray images and can be relatively easily spotted by a trained neural network.

There are several advantages of CT and x-ray scans when it comes to diagnosing COVID-19. The speed and noninvasiveness of these methods make them suitable for assisting doctors in determining the development of the infection and making decisions regarding performance of invasive tests. Also, due to the lack of both vaccines and medications, immediately isolating the infected patient is the only way to prevent the spread of the disease.

How deepsense.ai already supports healthcare

deepsense.ai’s first foray into medical data was when we took part in a competition to classify the severity of diabetic retinopathy using images of retinas. The contestants were given over 35,000 images of retinas, each having a severity rating. There were 5 severity classes, and the distribution of classes was fairly imbalanced. Most of the images showed no signs of disease. Only a few percent had the two most severe ratings. After months of hard work, we took 6th place.

As we gained more contact and experience with medical data, our results improved, and after some time we were able to take on challenges such as producing an algorithm that could automatically detect nuclei. With images acquired under a variety of conditions and having different cell types, magnification, and imaging modality (brightfield vs. fluorescence), the main challenge was to ensure the ability to generalise across these conditions.

Another interesting project we did involved automatic stomatological assessment. We trained a model to read an x-ray image and detect and identify teeth, accessories and lesions including laces, implants, cavities, cavity fillings, and parodontosis, among a long list of others. In yet another project, we estimated minimum (end-systolic) and maximum (end-diastolic) volumes of the left ventricle from a set of MRI-images taken over one heartbeat. Our results were rated “excellent” by cardiologists that reviewed our work.

Move your mouse cursor over the image to see the difference.

The standardized formats used in medical imaging allow for better transfer of knowledge in modeling different problems. In a recent research project we explored the potential of image preprocessing of CT scans in DICOM format.

Image preprocessing is a vital aspect of computer vision projects. Developing the optimal procedure rests upon the team’s experience in similar projects as well as their ability to explore new ideas. In this case the specialized image preprocessing methods we developed made the image more readable for the model and boosted its performance by 20%.

The deepsense take-away

It is common to think that an epidemic starts and ends, with no further threat to fear. But that’s not true. The black death started with the arrival of twelve ships from Genoa, then proceeded to claim the lives of up to 50 million Europeans. The disease still exists today, with 3248 people infected and 584 dead between 2010 and 2015. That’s right, the disease never really disappeared.

700 hundred years ago, Ragusa (modern Dubrovnik), then a Venice-controlled port city, played a prominent role in slowing the spread of the disease.. Learning from the tragic fate of other port cities including Venice, Genoa, Bergen and Weymouth, officials in Ragusa hold sailors on their ships for 30 days (trentino) to check if they were healthy and slow the spread of the disease.

COVID-19 is neither the most deadly nor the last pandemic humans will face. The key is to apply the latest knowledge and the most sophisticated solutions available to tackle the challenges they present. AI can support not only the most dramatic life-death issues in healthcare, but also more mundane cases. According to an Accenture study, AI can deliver savings of up to $150 billion annually by 2025 by supporting both the front line, with diagnosis augmentation, and the back office, by enhancing document processing or delivering more accurate cost estimates. This translates to potential significant savings for each hospital that adopts AI.

If you want to know more about the ways AI-powered solutions can support healthcare and tackle modern and future pandemics, contact us through the form below!

Five AI trends 2020 to keep an eye on

March 9, 2020/in Deep learning, Machine learning, Reinforcement learning /by Konrad Budek

While making predictions may be easy, delivering accurate ones is an altogether different story. That’s why in this column we won’t just be looking at the most important trends of 2020, but we’ll also look at how the ideas we highlighted last year have developed.

In summarizing the trends of 2020, one conclusion we’ve come to is that society is getting increasingly interested in AI technology, in terms of both the threats it poses and common knowledge about other problems that need to be addressed.

AI trends 2019 in review – how accurate were our predictions?

In our AI Trends 2019 blogpost we chronicled last year’s most important trends and directions of development to watch. It was shortly after launching the AI Monthly Digest, a monthly summary of the most significant and exciting machine learning news. Here’s a short summary of what we were right and wrong about in our predictions.

Chatbots and virtual assistants – powered by a focus on the development of Natural Language Processing (NLP), our prediction was accurate–the growth in this market would be robust. The chatbot market was worth $2.6 billion in 2019 and is predicted to reach up to $9.4 billion by 2024.
the time needed for training would fall – the trend gets reflected by larger neural networks being trained in a feasible time, with GPT-2 being the best example.
Autonomous vehicles are on the rise – the best proof is in our own contribution to the matter in a joint-venture with Volkswagen.
Machine learning and artificial intelligence are being democratized and productionized – According to Gartner, 37% of organizations have implemented AI in some form. That’s a 270% increase over the last four years.
AI and ML responsibility and transparency – the trend encompasses the delivering unbiased models and tools. The story of Amazon using an AI-based recruiting tool that turned out to be biased against female applicants made enough waves to highlight the need for further human control and supervision over automated solutions.

Apparently, deepsense.ai’s data science team was up to date and well-informed on these matters.

“It is difficult to make predictions, especially about the future.”
-Niels Bohr

The world is far from slowing down and Artificial Intelligence (AI) appears to be one of the most dominant technologies at work today. The demand for AI talents has doubled in the last two years with technology and the financial sector absorbing 60% of talented employees on the market.

The Artificial Intelligence market itself is predicted to reach $390.9 billion by 2025, mainly by primarily by automating dull and repetitive tasks. It is predicted that AI will resolve around 20% of unmet healthcare demands.

Considering the impact of AI on people’s daily lives, spotting the right trends to follow is even more important. AI is arguably the most important technology trend of 2020, so enjoy our list!

Natural language processing (NLP) – further development

Whether the world was ready for it or not, GPT-2 was released last year, with balance between safety and progress a guiding motif. Initially, OpenAI refused to make the model and dataset public due to the risk of the technology being used for malicious ends.

The organization released versions of the model throughout 2019, with each confirmed to be “hardened against malicious usage”. The model was considered cutting edge, though like most things in tech, another force soon prevailed. At the end of January 2020, Google Brain took the wraps off of Meena, a 2.6-billion parameter end-to-end neural conversational model trained on 341 GB of online text.

The convenience of NLP solutions is enjoyed by users who have embraced virtual assistants like Google Assistant, Alexa or Siri. According to Adroit Market Research, the market of Intelligent Virtual Assistants is predicted to grow at 33% compound annual growth rate between now and 2025. The market was valued at $2.1 billion in 2019. The increasing use of smartphones and other wearable intelligent devices, among other trends, is predicted to be a driver of the growth.

Started with a consumer-centric approach, virtual assistants are predicted to get more involved in business operations, further automating processes as well as tedious and repetitive tasks. According to Computerworld, approximately 40% of business representatives are planning to implement voice technology within 24 months – that is, no later than in 2021. NLP is shaping up to be a major trend not only this year, but well into the future.

Autonomous vehicles

It is 2020 and driverless cars have yet to hit the streets. In hindsight, the Guardian’s prediction that there would be 10 million self-driving cars on the road by 2020 is all too easy to scoff at now.

On the other hand, tremendous progress has been made and with every month the autonomous car gets closer to rolling out.

deepsense.ai has also contributed to the progress, cooperating with Volkswagen on building a reinforcement learning-based model that, when transferred from a simulated to a real environment, managed to safely drive a car.

But deepsense.ai is far from being the only company bringing significant research about autonomous cars and developing the technology in this field. Also, it is a great difference between seeing an autonomous car on busy city streets and in the slightly less demanding highway environment, where we can expect the automation and semi-automation of driving to first get done.

According to the US Department of Transportation, 63.3% of the $1,139 billion of goods shipped in 2017 were moved on roads. Had autonomous vehicles been enlisted to do the hauling, the transport could have been organized more efficiently, and the need for human effort vastly diminished. Machines can drive for hours without losing concentration. Road freight is globally the largest producer of emissions and consumes more than 70% of all energy used for freight. Every optimization made to fuel usage and routes will improve both energy and time management.

AI getting popular – beneath the surface

There is a lot of buzz around how AI-powered solutions impact our daily lives. While the most obvious change may be NLP powering virtual assistants like Google Assistant, Siri or Alexa, the impact on our daily lives runs much deeper, even if it’s not all that visible at first glance. Artificial intelligence-powered solutions have a strong influence on manufacturing, impacting prices and supply chains of goods.

Here are a few applications being used without batting an eye:

Demand forecasting – companies collect tremendous amounts of data on their customer relationships and transactional history. Also, with the e-commerce revolution humming along, retail companies have gained access to gargantuan amounts of data about customer service, products and services. deepsense.ai delivers demand forecasting tools that not only process such data but also combines it with external sources to deliver more accurate predictions than standard heuristics. Helping companies avoid overstocking while continuing to satisfy demand is one essential benefit demand forecasting promises.
Quality control – harnessing the power of image recognition enables companies to deliver more accurate and reliable quality control automation tools. Because machines are domain-agnostic, the tools can be applied in various businesses, from fashion to construction to manufacturing. Any product that can be controlled using human sight can also be placed under the supervision of computer vision-powered tools.
Manufacturing processes optimization – The big data revolution impacts all businesses, but with IoT and the building of intelligent solutions, companies get access to even more data to process. But it is not about gathering and endless processing in search of insights – the data is also the fuel for optimization, sometimes in surprising ways. Thanks solely to optimization, Google reduced its cooling bill by 40% without adding any new components to its system. Beyond cutting costs, companies also use process optimization to boost employee safety and reduce the number of accidents.
Office processes optimization – AI-powered tools can also be used to augment the daily tasks done by various specialists, including lawyers or journalists. Ernst & Young is using an NLP tool to review contracts, enabling their specialists to use their time more efficiently. Reuters, a global media corporation and press agency, uses AI-powered video transcription tools to deliver time-coded speech-to-text tools that are compatible with 11 languages.

Thanks to the versatility and flexibility of such AI-powered solutions, business applications are possible even in the most surprising industries and companies. So even if a person were to completely abandon technology (right…), the services and products delivered to them would still be produced or augmented with AI, be they clothing, food or furniture.

AI getting mainstream in culture and society

The motif of AI is prevalent in the arts, though usually not in a good way. Isaac Asimov was among the first writers to hold that autonomous robots would need to follow a moral code in order not to become dangerous to humans. Of course, literature has offered a number of memorable examples of AI run amok, including Terminator and HAL 9000 from Space Odyssey.

The question of moral principles may once have been elusive and abstract, but autonomous cars have necessitated a legal framework ascribing responsibility for accidents. Amazon learned about the need to control AI models the hard way, albeit in a less mobile environment: a recruiting tool the company was using had to be scrapped due to a bias against women.

The impact of AI applications on people’s daily lives, choices and careers is building pressure to deliver legal regulations on model transparency as well as information not only about outcomes, but also the reasons behind them. Delivering AI in a black-box mode is not the most suitable way to operate, especially as the number of decisions made automatically by AI-powered solutions increases.

Automating the development of AI

Making AI mainstream is not only about making AI systems more common, but widening the availability of AI tools and their accessibility to less-skilled individuals. The number of models delivering solutions to power with the machine and deep learning will only increase.It should therefore come as no surprise that the people responsible for automating others’ jobs are keen to support their own jobs with automation.

Google enters the field with AutoML, a tool that simplifies the process of developing AI and making it available for a wider audience, one that, presumably, is not going to use ML algorithms in some especially non-standard ways. AutoML joins IBM’s autoAI, which supports data preparation.

Also, there are targeted cloud offerings for companies seeking to harness ready-to-use components in their daily jobs with a view to augmenting their standard procedures with machine learning.

Summary

While the 2020 AI Trends themselves are similar to those of 2019, the details have changed immensely, thus refreshing our perspective seemed worth our while. The world is changing, ML is advancing, and AI is ever more ubiquitous in our daily lives.

A business guide to Natural Language Processing (NLP)

September 24, 2019/in Data science, Deep learning /by Konrad Budek and Artur Zygadlo

With chatbots powering up customer service on one hand and fake news farms on the other, Natural Language Processing (NLP) is getting attention as one of the most impactful branches of Artificial Intelligence (AI).

When Alan Turing proposed his famous test in 1950, he couldn’t, despite the prescience that accompanies brilliance such as his, predict how easy breaking the test would become. And how far from intelligence the machine that broke the test would be!

Modern Natural Language Processing is being used in multiple industries, in both large-scale projects delivered by tech giants and minor tweaks local companies employ to improve the user experience.

The solutions vary from supporting internal business processes in document management to improving customer service by automated responses generated for the most common questions. According to IDC data cited by Deloitte, companies leveraging the information buried in plain sight in documents and other unstructured data can achieve up to $430 billion in productivity gains by 2020.

The biggest problem with NLP is the significant difference between machines mimicking the understanding of text and actually understanding it. The difference is easily shown with ELIZA software (a famous chatbot from the 1960s), which was based on a set of scripts that paraphrased input text to produce credible-looking responses. The technology was sufficient to produce some text, but far from demonstrating real understanding or delivering business value. Things changed, however, once machine learning models came into use.

What is natural language processing?

As the name implies, natural language processing is the act of a machine processing human language, analyzing the queries in it and responding in a human manner. After several decades of NLP research strongly based on a combination of computer science and linguistic expertise, the “deep learning tsunami” (a term coined by Stanford CS and Linguistics professor Christopher Manning) has recently taken over this field of AI as well, similarly to what happened in computer vision.

Many NLP tasks today are tackled with deep neural networks, which are frequently used among various techniques that enable machines to understand a text’s meaning and its author’s intent.

Modern NLP solutions work on text by “reading” it and making a network of connections between each word. Thus, the model gets more information on the context, the sentiment and exactly what the author sought to communicate.

Tackling the context

Context and intent are critical in analyzing text. Analyzing a picture without context can be tricky – is a fist a symbol of violence, or a bro fist-bump?

The challenge grows even further with NLP, as there are multiple social and cultural norms at work in communication. “The cafe is way too cool for me” can refer to a too-groovy atmosphere or the temperature. Depending on the age of the speaker, a “savage” punk rock concert can be either positive or negative. Before the machine learning era, the flatness of traditional, dictionary-based solutions provided information with significantly less accuracy.

The best way to deal with this challenge is to deliver a word-mapping system based on multidimensional vectors (so-called word embeddings) that provide complex information on the words they represent. Following the idea of distributional semantics (“You shall know a word by the company it keeps”), the neural network learns word representations by looking at the neighboring words. A breakthrough moment for neural NLP came in 2013, when the renowned word2vec model was introduced. However, one of the main problems that word2vec could not solve was homonymy, as the model could not distinguish between different meanings of the same word. A way to significantly improve handling the context in which a word is used in a sentence was found in 2018, when more sophisticated word embedding models like BERT and ELMo were introduced.

Natural Language Processing examples

Recent breakthroughs, especially GPT-2, have significantly improved NLP and delivered some very promising use cases, including the ones elaborated below.

Automated translation

One of the most widely used applications of natural language processing is automated translation between two languages, e.g. with Google Translate. The translator delivers increasingly accurate texts, good enough to serve even in court trials. Google Translate was used when a British court failed to deliver an interpreter for a Mandarin speaker.

Machine translation was one of the first successful applications of deep learning in the field of NLP. The neural approach quickly surpassed statistical machine translation, the technology that preceded it. In a translation task, the system’s input and output are sequences of words. The typical neural network architecture used for translation is therefore called seq2seq, and consists of two recurrent neural networks (encoder and decoder).

The first seq2seq paper was published in 2014, and subsequent research led Google Translate to switch from statistical to neural translation in 2016. Later that year, Google announced a single multi-lingual system that could translate between pairs of languages the system had never seen explicitly, suggesting the existence of some interlingua-like representation of sentences in vector space.

Another important development related to recurrent neural networks is the attention mechanism, which allows a model to learn to focus on particular parts of sequences, greatly improving translation quality. Further improvements come from using Transformer architecture instead of Recurrent Neural Networks.

Chatbots

Automated interaction with customers causes their satisfaction with the overall user experience to rise significantly. And that’s not a thing to overcome, as up to 88% of customers are willing to pay more for better customer experience.

A great example of chatbots improving the customer experience comes from Amtrak, a US railway company that transports 31 million passengers yearly and administrates over 21,000 miles of rails across America. The company decided to employ Julie, a chatbot that supports passengers in searching for a convenient commute. She delivered 800% ROI and reduced the cost of customer service by $1 million yearly while also increasing bookings by 25%.

Speech recognition

As much as a company can use a chatbot to perform some customer service, one can have a personal assistant in the pocket. According to eMarketer data, up to 111.8 million people in the US–over a third of its population–will use a voice assistant at least once a month. The voice assistant market is growing rapidly, with companies such as Google, Apple, Amazon and Samsung developing their assistants not only for mobile devices, but also for TVs and home appliances.

Despite the privacy concerns voice assistants are raising, speech is becoming the new interface for human-machine interaction. The interface can also be used to control industrial machines, especially when employees have their hands occupied – a case common across industries from HoReCa to agriculture and construction – assuming that the noise is reduced enough for machine to register the voice properly.

Thanks to advances in NLP, speech recognition solutions are getting smarter and delivering a better experience for users. As the assistants come to understand speakers’ intentions better and better, they will provide more accurate answers to increasingly complex questions.

An unexpected example of speech recognition comes from deepsense.ai’s project renovating and digitalizating classic movies, where the machine delivers an automated transcription. When combined with a facial recognition tool, the system transcribed and annotated the actor speaking in the film.

Sentiment analysis

Social media provides numerous ways of reaching customers, gathering information on their habits and delivering excellence. It’s also a melting pot of perspectives and news, delivering unprecedented insight on public opinion. This insight can be understood using sentiment analysis tools, which check if the context where a brand is exposed in social media is positive, negative or neutral.

Sentiment analysis can be done without the assistance of AI by building up a glossary of positive and negative words and checking their frequency. If there is swearing or words like “broken” near the brand, sentiment is negative. Yet those systems cannot spot irony or more sophisticated hate. The sentence “I would be happy to see you ill” suggests an aggression and possibly hatred, yet there are no slurs or swearing. By supporting the analysis of words in the glossary by checking the relations between words in each sentence, a machine learning model can deliver a better understanding of the text and provide more information on the message’s subjectivity.

So good can that understanding be, in fact, that deepsense.ai delivered a solution that could spot terrorist propaganda and illicit content in social media in real-time. In the same way, it is possible to deliver a system that spots hate speech and other forms of online harassment. A study from the Pew Research Center shows that up to 41% of adult Americans have experienced some form of online harassment, a number that is likely to increase, mostly due to the rising prevalence of the Internet in people’s daily lives.

Natural language generation

Apart from understanding text, machines are getting better at delivering new texts. According to research published in Foreign Affairs, texts being produced by modern AI software are, for unskilled readers, comparable to those written by journalists. ML models are indeed already writing texts for world media organizations. And while that may seem a fascinating accomplishment, it was the fear of what such advanced abilities might portend that led OpenAI not to make GPT-2 public.

The most known case of automated journalism comes from the Washington Post, where Heliograf covers sport events. Its journalistic debut came in 2016, when the software was responsible for writing up coverage of the Olympic Games in Rio.

In business, natural languae generation is used to produce more polite and humane responses to FAQs. Thus, ironically, automating the conventional communication will make it more personal and humane than current, trigger-based solutions.

Text analytics

Apart from delivering real-time monitoring and sentiment analysis, NLP tools can analyze long and complicated texts, as is already being done at EY, PwC and Deloitte, all of which employ machine learning models to review contracts. The same can be applied to analyze emails or other company-owned unstructured data. According to Gartner estimates, up to 80% of all business data is unstructured and thus nonactionable for companies.

A good example of natural language processing in text analytics is a solution deepsense.ai designed for market research giant Nielsen. The company delivered reports on the ingredients in all of the FMCG products available on the market.

The process of gathering the data was time-consuming and riddled with pitfalls: an employee had to manually read a label, check the ingredients and fill out the tables. The entire process took up to 30 minutes per product. Also, due to inconsistencies in naming, the task was riddled with inconsistencies, as the companies delivered the product ingredients in local languages, English and, especially on the beauty and skin care markets, Latin.

deepsense.ai delivered a comprehensive system that processed an image of the product label taken with a smartphone. The solution spotted the ingredients, scanned the text and sorted the ingredients into tables, effectively reducing the work time from 30 minutes to less than two minutes, including the time needed to gather and validate the data.

Another use case of text analytics is the automated question response function generated by Google, which aims not only to provide search results for particular queries, but a complete answer to the user’s needs, including a link to the referred website, and a description of the matter.

Summary

Natural language processing provides numerous opportunities for companies from multiple industries and segments. Apart from relatively intuitive ways to leverage NLP, such as processing the documents and chatbots, there are multiple other applications, including real time social media analytics and supporting journalism or research work.

NLP models can be used to further augment existing solutions–from supporting the reinforcement learning models behind autonomous cars by providing better sign recognition to augmenting demand forecasting tools with extensions to analyze headlines and deliver more event-based predictions.

Because natural language is the best way to transfer information between humans and machines, the applications NLP makes possible will only increase and will soon be augmenting business processes around the globe.

Satellite images semantic segmentation with deep learning

July 12, 2019/in Deep learning /by Wojciech Mormul and Paweł Chmielak

Building maps to fit a crisis situation provides a challenge even when considering the impact of satellite imaging on modern cartography. Machine learning significantly reduces the time required to prepare an accurate map.

Crisis maps are often prepared by combining crowdsourced data, satellite and aerial imagery. Such mapping was widely used during the recent humanitarian crises brought about by the earthquake in Haiti and the floods in Pakistan in 2010.

Satellite mapping is way easier than traditional cartographic methods, but still, the main challenge is in recognizing particular objects in the image, like roads, buildings and landmarks. Getting up-to-date information about roadblocks and threats is even more essential. And that’s where machine learning-based solutions come into play.

The artificial cartographer

In this blog post we address the problem of satellite imagery semantic segmentation applied to building detection. Unlike many other approaches, we use only RGB color information and no multispectral wavebands.

Check out the demo:

1. Network architecture overview

A baseline fully-convolutional network uses a simple encoder-decoder framework to solve semantic segmentation tasks. It consists of only convolutional and pooling layers, without any fully connected layers. This allows it to make predictions on arbitrary-sized inputs. By propagating an image through several pooling layers, the resolution of feature maps is downsampled, which, due to information loss during pooling operations, results in low-resolution, coarse segmentation maps.

As an improvement over a baseline fully-convolutional network, we used skip connections from higher resolution feature maps, recreating U-Net network architecture. Thanks to those connections, grain information about small details isn’t lost in the process. Such an architecture makes it possible to learn fine-grained details which, when combined with a ResNet core encoder, significantly speeds up the training. The architecture of a segmentation neural network with skip connections is presented below. Cross entropy loss with weight regularization is used during training.

2. Network implementation

We present easy-to-understand minimal code fragments which seek to create and train deep neural networks for the semantic segmentation task. We will implement and train the network in PyTorch. Keep in mind that it’s not meant for out-of-box use but rather for educational purposes.

We present our semantic segmentation task in three steps:

Create the network
Train and save the deep learning model
Load the model and make predictions

2.1 Create the network

First we will create a module that performs convolution with ReLU nonlinearity. This is a basic building block in most convolutional neural networks for computer vision tasks. Convolution applies a set of filters to an image in order to extract specific features, while ReLU introduces nonlinearity between the linear layers of a neural network. Convolution with kernel size 3, stride 1 and padding 1 does not change a tensor’s spatial dimensions, but only its depth, while ReLU, as a pointwise operation, does not change any of the tensor’s dimensions.

class ConvRelu(torch.nn.Module):
    
    def __init__(self, in_depth, out_depth):
   	 super(ConvRelu, self).__init__()
   	 self.conv = torch.nn.Conv2d(in_depth, out_depth, kernel_size=3, stride=1, padding=1)
   	 self.activation = torch.nn.ReLU(inplace=True)

    def forward(self, x):
   	 x = self.conv(x)
   	 x = self.activation(x)
   	 return x

We next implement a decoder block to increase the spatial size of the tensor. Generating images with neural networks usually involves up-sampling the tensor of low spatial resolution. Transposed convolution with stride greater than one can be imagined as inserting zeros between elements of input tensor and sliding a convolution kernel over it. This increases the tensor’s size. Bear in mind that doing this in a straightforward manner is inefficient, but conceptually it is how transpose convolution works. Real implementations avoid useless multiplications by zero and compute it as sparse matrix multiplication with weight matrix transposed from weight matrix representation of convolution operation with equal stride.

Be aware of other methods to increase spatial size used in generative neural networks. These include:

Linear resizing operation which increases the spatial size of a tensor with following convolution operation
A convolution operation to greater depth than desired with following operation which projects depth elements to spatial dimensions
Fractionally strided convolution where a kernel is slided over a tensor with fractional strides and where linear interpolation is used to share kernel weights over the elements of a tensor

Here we apply additional convolution with an ReLU nonlinearity module before the transposed convolution. While this step may not be strictly required, it can improve network performance.

class DecoderBlock(torch.nn.Module):
    
    def __init__(self, in_depth, middle_depth, out_depth):
   	 super(DecoderBlock, self).__init__()
   	 self.conv_relu = ConvRelu(in_depth, middle_depth)
   	 self.conv_transpose = torch.nn.ConvTranspose2d(middle_depth, out_depth, kernel_size=4, stride=2, padding=1)
   	 self.activation = torch.nn.ReLU(inplace=True)

    def forward(self, x):
   	 x = self.conv_relu(x)
   	 x = self.conv_transpose(x)
   	 x = self.activation(x)
   	 return x

Now let’s focus on the main network, which is intended to solve the semantic segmentation task. We follow the encoder-decoder framework with skip connections to recreate a UNet architecture. We then perform transfer learning using ResNet pre-trained on an ImageNet dataset. Below you can investigate detailed network architecture with additional information about tensor size in every layer to help you understand how network propagates input image to compute the desired output map. It’s important that there isn’t one optimal network architecture. We achieve something that works reasonably well through many attempts.

The network propagates the input tensor through the encoder while decreasing spatial resolution and increasing depth using layers from the ResNet network. The pooling layer, as well as convolution operation with stride greater than one, decreases the spatial size of a tensor. However, pooling itself does not change a tensor’s depth, which is often desired for convolution operations. In the constructor, we import a pre-trained ResNet-101 model with a torchvision module and keep only the layers, which will work as a feature extractor. After processing the image through the encoder, which transforms the input image into meaningful multi-scale representations, the decoder continues the process and transforms it into the desired semantic map. To do this, we use previously created decoder blocks. Notice that we are building a complex neural network from simpler blocks, which we either define ourselves or take from the PyTorch library. Moreover, we add skip connections – horizontal lines in a graph which connect the encoder and decoder layers by depth concatenate operation. For each pixel of the input image, the network predicts N classes (including background) with the last convolution operation with kernel size 1, which linearly projects the depth of each spatial element to another desired depth.

Keep in mind that we do not yet define loss, unlike we would do in TensorFlow, where the entire computational graph needs to be defined up front. In PyTorch, we only define the class which provides forward function. Operations used in forward pass are remembered and backward pass can be run whenever it’s needed.

class UNetResNet(torch.nn.Module):

    def __init__(self, num_classes):

   	 super(UNetResNet, self).__init__()
   	 
   	 self.encoder = torchvision.models.resnet101(pretrained=True)
   	 
   	 self.pool = torch.nn.MaxPool2d(2, 2)
   	 self.conv1 = torch.nn.Sequential(self.encoder.conv1, self.encoder.bn1, self.encoder.relu, self.pool)
   	 self.conv2 = self.encoder.layer1
   	 self.conv3 = self.encoder.layer2
   	 self.conv4 = self.encoder.layer3
   	 self.conv5 = self.encoder.layer4
   	 
   	 self.pool = torch.nn.MaxPool2d(2, 2)   	 
   	 self.center = DecoderBlock(2048, 512, 256)
   	 
   	 self.dec5 = DecoderBlock(2048 + 256, 512, 256)
   	 self.dec4 = DecoderBlock(1024 + 256, 512, 256)
   	 self.dec3 = DecoderBlock(512 + 256, 256, 64)
   	 self.dec2 = DecoderBlock(256 + 64, 128, 128)
   	 self.dec1 = DecoderBlock(128, 128, 32)
   	 self.dec0 = ConvRelu(32, 32)
   	 self.final = torch.nn.Conv2d(32, num_classes, kernel_size=1)

    def forward(self, x):

   	 conv1 = self.conv1(x)
   	 conv2 = self.conv2(conv1)
   	 conv3 = self.conv3(conv2)
   	 conv4 = self.conv4(conv3)
   	 conv5 = self.conv5(conv4)

   	 pool = self.pool(conv5)
   	 center = self.center(pool)

   	 dec5 = self.dec5(torch.cat([center, conv5], 1))
   	 dec4 = self.dec4(torch.cat([dec5, conv4], 1))
   	 dec3 = self.dec3(torch.cat([dec4, conv3], 1))
   	 dec2 = self.dec2(torch.cat([dec3, conv2], 1))
   	 dec1 = self.dec1(dec2)
   	 dec0 = self.dec0(dec1)

   	 return self.final(dec0)

2.2. Train and save the model

As we train the network we will be loading data in batches using PyTorch data generators, which additionally shuffles the training set, normalizes input tensors and applies color (random changes in brightnest, contrast and saturation) and spatial (random flip, flop and rotation) augmentation.

While creating main network, we only need to define output depth of the last convolutional layer. There are two output classes involved in the semantic segmentation of the buildings – the probability of building pixels belonging to a building or not. Notice that necessary weights are initialized here and kept by default in CPU memory. After the output depth has been defined, we transfer all the weights to the GPU, then set the network to train mode, which results in batch normalization computing the mean and variance on each batch and updating the statistics with the moving average. Finally, we define cross entropy loss with softmax, which is included for further use during the training. Notice that the loss function doesn’t have anything in common with the network graph. We won’t freeze any pre-trained ResNet convolutional layers and train all network weights using Adam optimizer.

unet_resnet = UNetResNet(num_classes=2)
unet_resnet = unet_resnet.cuda()
unet_resnet.train()
cross_entropy_loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(unet_resnet.parameters(), lr=0.0001, weight_decay=0.0001)

Now we are ready to start the training. We will train for number of epochs. During each epoch we exhaust the data-loader, which provides shuffled batches of data from the training set. We first transfer the batch of images and masks to GPU memory, then propagate every loaded batch of data through the network to get an output probability mask, calculate the loss and modify network weights during the backward pass. Notice that only here, during the execution, is there a connection between the network architecture and loss function. Unlike TensorFlow, which requires the entire computational graph up front, PyTorch offers dynamic graph creation during execution.

for epoch_idx in range(2):

    loss_batches = []
    for batch_idx, data in enumerate(train_dataloader):
    
   	 imgs, masks = data
   	 imgs = torch.autograd.Variable(imgs).cuda()
   	 masks = torch.autograd.Variable(masks).cuda()

   	 y = unet_resnet(imgs)
   	 loss = cross_entropy_loss(y, masks)

   	 optimizer.zero_grad()
   	 loss.backward()
   	 optimizer.step()

   	 loss_batches.append(loss.data.cpu().numpy())

    print 'epoch: ' + str(epoch_idx) + ' training loss: ' + str(np.sum(loss_batches))

After the training, it’s time to save the model. We move the weights back to CPU memory, save the model weights and move it back again to GPU memory for further predictions.

model_file = './unet-' + str(epoch_idx)
unet_resnet = unet_resnet.cpu()
torch.save(unet_resnet.state_dict(), model_file)
unet_resnet = unet_resnet.cuda()
print 'model saved'

2.3. Load the model and make predictions

We first create a network and load weights from the saved checkpoint. We then set the model to eval mode, so now, instead of using parameters computed over batch, mean and variance from statistics will be used for batch normalization. We propagate the image through the network without keeping a computational graph, because no backward is needed during the predictions. To make a prediction, we load and preprocess the test image, move it to GPU memory, predict the output probability mask using softmax (which during training was hidden inside the cross entropy loss function), move the predicted mask back to CPU memory and save it.

unet_resnet = UNetResNet(num_classes=2)
model_path= './unet-99'
pretrained_model = torch.load(model_path)
for name, tensor in pretrained_model.items():
    unet_resnet.state_dict()[name].copy_(tensor)

unet_resnet.eval()
softmax2d = torch.nn.Softmax2d()

img = cv2.imread('./img.png')
assert img.shape[0] % 64 == 0 and img.shape[1] % 64 == 0
img = np.expand_dims(img, axis=0)
img = (img / 255.0 - MEAN) / STD
img = img.transpose(0, 3, 1, 2)
img = torch.FloatTensor(img)
img = img.cuda()

with torch.no_grad():
    pred = unet_resnet(img)
    pred = softmax2d(pred)
    pred = pred[0, 1, :, :] > 0.7
pred = pred.data.cpu().numpy()

mask = (pred * 255.0).astype(np.uint8)
cv2.imwrite('./mask.png', mask)

3. System pipeline

The system propagates the input image through the network, corrects the output mask and performs building segmentation. The processing consists of the following stages (described from left to right, top to bottom):

Input satellite image.
Raw output from network after softmax layer with probability scores.
Probability score map thresholded with removal of small objects and filling of small holes.
Predicted mask overlaid on top of input image.
Segmentation results.

Use cases

The solution is easily extendable to situations with more labels, such as roads, trees or rivers. In such scenarios, there are more classes in the network’s output. Raw output data can be used to speed up map-making, but after simple processing, it can also provide a user with various types of information about an area, such as average building size, occupied percentage of land, street width, number of trees etc. These features can then be used as an input for other ML models, including ones for projecting land value, emergency response or research.

Keras vs. PyTorch: Alien vs. Predator recognition with transfer learning

October 3, 2018/in Deep learning, Machine learning /by Piotr Migdal, Patryk Miziuła and Rafał Jakubanis

In our previous post, we gave you an overview of the differences between Keras and PyTorch, aiming to help you pick the framework that’s better suited to your needs. Now, it’s time for a trial by combat. We’re going to pit Keras and PyTorch against each other, showing their strengths and weaknesses in action. We present a real problem, a matter of life-and-death: distinguishing Aliens from Predators!

Image taken from our dataset. Both Predator and Alien are deeply interested in AI.

We perform image classification, one of the computer vision tasks deep learning shines at. As training from scratch is unfeasible in most cases (as it is very data hungry), we perform transfer learning using ResNet-50 pre-trained on ImageNet. We get as practical as possible, to show both the conceptual differences and conventions.

At the same time we keep the code fairly minimal, to make it clear and easy to read and reuse. See notebooks on GitHub, Kaggle kernels or Neptune versions with fancy charts.

Wait, what’s transfer learning? And why ResNet-50?

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.– Andrej Karpathy (Transfer Learning – CS231n Convolutional Neural Networks for Visual Recognition)

Transfer learning is a process of making tiny adjustments to a network trained on a given task to perform another, similar task. In our case we work with the ResNet-50 model trained to classify images from the ImageNet dataset. It is enough to learn a lot of textures and patterns that may be useful in other visual tasks, even as alien as this Alien vs. Predator case. That way, we use much less computing power to achieve much better result.

In our case we do it the simplest way:

keep the pre-trained convolutional layers (so-called feature extractor), with their weights frozen,
remove the original dense layers, and replace them with brand-new dense layers we will use for training.

So, which network should be chosen as the feature extractor?

ResNet-50 is a popular model for ImageNet image classification (AlexNet, VGG, GoogLeNet, Inception, Xception are other popular models). It is a 50-layer deep neural network architecture based on residual connections, which are connections that add modifications with each layer, rather than completely changing the signal.

ResNet was the state-of-the-art on ImageNet in 2015. Since then, newer architectures with higher scores on ImageNet have been invented. However, they are not necessarily better at generalizing to other datasets (see the Do Better ImageNet Models Transfer Better? arXiv paper).

Ok, it’s time to dive into the code.

Let the match begin!

We do our Alien vs. Predator task in seven steps:

Prepare the dataset
Import dependencies
Create data generators
Create the network
Train the model
Save and load the model
Make predictions on sample test images

We supplement this blog post with Python code in Jupyter Notebooks (Keras-ResNet50.ipynb, PyTorch-ResNet50.ipynb). This environment is more convenient for prototyping than bare scripts, as we can execute it cell by cell and peak into the output.

All right, let’s go!

0. Prepare the dataset

We created a dataset by performing a Google Search with the words “alien” and “predator”. We saved JPG thumbnails (around 250×250 pixels) and manually filtered the results. Here are some examples:

We split our data into two parts:

Training data (347 samples per class) – used for training the network.
Validation data (100 samples per class) – not used during the training, but needed in order to check the performance of the model on previously unseen data.

Keras requires the datasets to be organized in folders in the following way:

|-- train
    |-- alien
    |-- predator
|-- validation
    |-- alien
    |-- predator

If you want to see the process of organizing data into directories, check out the data_prep.ipynb file. You can download the dataset from Kaggle.

1. Import dependencies

First, the technicalities. We assume that you have Python 3.5+, Keras 2.2.2 (with TensorFlow 1.10.1 backend) and PyTorch 0.4.1. Check out the requirements.txt file in the repo.

So, first, we need to import the required modules. We separate the code in Keras, PyTorch and common (one required in both).

COMMON

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline

KERAS

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import Model, layers
from keras.models import load_model, model_from_json

PYTORCH

import torch
from torchvision import datasets, models, transforms
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim

We can check the frameworks’ versions by typing keras.__version__ and torch.__version__, respectively.

2. Create data generators

Normally, the images can’t all be loaded at once, as doing so would be too much for the memory to handle. At the same time, we want to benefit from the GPU’s performance boost by processing a few images at once. So we load images in batches (e.g. 32 images at once) using data generators. Each pass through the whole dataset is called an epoch.

We also use data generators for preprocessing: we resize and normalize images to make them as ResNet-50 likes them (224 x 224 px, with scaled color channels). And last but not least, we use data generators to randomly perturb images on the fly:

Performing such changes is called data augmentation. We use it to show a neural network which kinds of transformations don’t matter. Or, to put it another way, we train on a potentially infinite dataset by generating new images based on the original dataset.

Almost all visual tasks benefit, to varying degrees, from data augmentation for training. For more info about data augmentation, see as applied to plankton photos or how to use it in Keras. In our case, we randomly shear, zoom and horizontally flip our aliens and predators.

Here we create generators that:

load data from folders,
normalize data (both train and validation),
augment data (train only).

KERAS

train_datagen = ImageDataGenerator(
    shear_range=10,
    zoom_range=0.2,
    horizontal_flip=True,
    preprocessing_function=preprocess_input)

train_generator = train_datagen.flow_from_directory(
    'data/train',
    batch_size=32,
    class_mode='binary',
    target_size=(224,224))

validation_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input)

validation_generator = validation_datagen.flow_from_directory(
    'data/validation',
    shuffle=False,
    class_mode='binary',
    target_size=(224,224))

PYTORCH

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

data_transforms = {
    'train':
        transforms.Compose([
            transforms.Resize((224,224)),
            transforms.RandomAffine(0, shear=10, scale=(0.8,1.2)),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize]),
    'validation':
        transforms.Compose([
            transforms.Resize((224,224)),
            transforms.ToTensor(),
            normalize])}

image_datasets = {
    'train':
        datasets.ImageFolder('data/train', data_transforms['train']),
    'validation':
        datasets.ImageFolder('data/validation', data_transforms['validation'])}

dataloaders = {
    'train':
        torch.utils.data.DataLoader(
            image_datasets['train'],
            batch_size=32,
            shuffle=True,
            num_workers=4),
    'validation':
        torch.utils.data.DataLoader(
            image_datasets['validation'],
            batch_size=32,
            shuffle=False,
            num_workers=4)}

In Keras, you get built-in augmentations and preprocess_input method normalizing images put to ResNet-50, but you have no control over their order. In PyTorch, you have to normalize images manually, but you can arrange augmentations in any way you like.

There are also other nuances: for example, Keras by default fills the rest of the augmented image with the border pixels (as you can see in the picture above) whereas PyTorch leaves it black. Whenever one framework deals with your task much better than the other, take a closer look to see if they perform preprocessing identically; we bet they don’t.

3. Create the network

The next step is to import a pre-trained ResNet-50 model, which is a breeze in both cases. We freeze all the ResNet-50’s convolutional layers, and only train the last two fully connected (dense) layers. As our classification task has only 2 classes (compared to 1000 classes of ImageNet), we need to adjust the last layer.

Here we:

load pre-trained network, cut off its head and freeze its weights,
add custom dense layers (we pick 128 neurons for the hidden layer),
set the optimizer and loss function.

KERAS

conv_base = ResNet50(include_top=False,
                     weights='imagenet')

for layer in conv_base.layers:
    layer.trainable = False

x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(2, activation='softmax')(x)
model = Model(conv_base.input, predictions)

optimizer = keras.optimizers.Adam()
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

PYTORCH

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = models.resnet50(pretrained=True).to(device)

for param in model.parameters():
    param.requires_grad = False

model.fc = nn.Sequential(
    nn.Linear(2048, 128),
    nn.ReLU(inplace=True),
    nn.Linear(128, 2)).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters())

We load the ResNet-50 from both Keras and PyTorch without any effort. They also offer many other well-known pre-trained architectures: see Keras’ model zoo and PyTorch’s model zoo. So, what are the differences?

In Keras we may import only the feature-extracting layers, without loading extraneous data (include_top=False). We then create a model in a functional way, using the base model’s inputs and outputs. Then we use model.compile(…) to bake into it the loss function, optimizer and other metrics.

In PyTorch, the model is a Python object. In the case of models.resnet50, dense layers are stored in model.fc attribute. We overwrite them. The loss function and optimizers are separate objects. For the optimizer, we need to explicitly pass a list of parameters we want it to update.

Predator's wrist computer — Frame from ‘AVP: Alien vs. Predator’: Predators’ wrist computer. We’re pretty sure Predator could use it to compute logsoftmax.

In PyTorch, we should explicitly specify what we want to load to the GPU using .to(device) method. We have to write it each time we intend to put an object on the GPU, if available. Well…

Layer freezing works in a similar way. However, in The Batch Normalization layer of Keras is broken (as of the current version; thx Przemysław Pobrotyn for bringing this issue). That is – some layers get modified anyway, even with trainable = False.

Keras and PyTorch deal with log-loss in a different way.

In Keras, a network predicts probabilities (has a built-in softmax function), and its built-in cost functions assume they work with probabilities.

In PyTorch we have more freedom, but the preferred way is to return logits. This is done for numerical reasons, performing softmax then log-loss means doing unnecessary log(exp(x)) operations. So, instead of using softmax, we use LogSoftmax (and NLLLoss) or combine them into one nn.CrossEntropyLoss loss function.

4. Train the model

OK, ResNet is loaded, so let’s get ready to space rumble!

Predators' mother ship — Frame from ‘AVP: Alien vs. Predator’: the Predators’ Mother Ship. Yes, we’ve heard that there are no rumbles in space, but nothing is impossible for Aliens and Predators.

Now, we proceed to the most important step – model training. We need to pass data, calculate the loss function and modify network weights accordingly. While we already had some differences between Keras and PyTorch in data augmentation, the length of code was similar. For training… the difference is massive. Let’s see how it works!

Here we:

train the model,
measure the loss function (log-loss) and accuracy for both training and validation sets.

KERAS

history = model.fit_generator(
    generator=train_generator,
    epochs=3,
    validation_data=validation_generator)

PYTORCH

def train_model(model, criterion, optimizer, num_epochs=3):
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch+1, num_epochs))
        print('-' * 10)

        for phase in ['train', 'validation']:
            if phase == 'train':
                model.train()
            else:
                model.eval()

            running_loss = 0.0
            running_corrects = 0

            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                outputs = model(inputs)
                loss = criterion(outputs, labels)

                if phase == 'train':
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()

                _, preds = torch.max(outputs, 1)
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(image_datasets[phase])
            epoch_acc = running_corrects.double() / len(image_datasets[phase])

            print('{} loss: {:.4f}, acc: {:.4f}'.format(phase,
                                                        epoch_loss,
                                                        epoch_acc))
    return model

model_trained = train_model(model, criterion, optimizer, num_epochs=3)

In Keras, the model.fit_generator performs the training… and that’s it! Training in Keras is just that convenient. And as you can find in the notebook, Keras also gives us a progress bar and a timing function for free. But if you want to do anything nonstandard, then the pain begins…

Predators' shuriken — Predator’s shuriken returning to its owner automatically. Would you prefer to implement its tracking ability in Keras or PyTorch?

PyTorch is on the other pole. Everything is explicit here. You need more lines to construct the basic training, but you can freely change and customize all you want.

Let’s shift gears and dissect the PyTorch training code. We have nested loops, iterating over:

epochs,
training and validation phases,
batches.

The epoch loop does nothing but repeat the code inside. The training and validation phases are done for three reasons:

Some special layers, like batch normalization (present in ResNet-50) and dropout (absent in ResNet-50), work differently during training and validation. We set their behavior by model.train() and model.eval(), respectively.
We use different images for training and for validation, of course.
The most important and least surprising thing: we train the network during training only. The magic commands optimizer.zero_grad(), loss.backward() and optimizer.step() (in this order) do the job. If you know what backpropagation is, you appreciate their elegance.

We take care of computing the epoch losses and prints ourselves.

5. Save and load the model

Saving

Once our network is trained, often with high computational and time costs, it’s good to keep it for later. Broadly, there are two types of savings:

saving the whole model architecture and trained weights (and the optimizer state) to a file,
saving the trained weights to a file (keeping the model architecture in the code).
It’s up to you which way you choose.

Here we:

save the model.

KERAS

# architecture and weights to HDF5
model.save('models/keras/model.h5')

# architecture to JSON, weights to HDF5
model.save_weights('models/keras/weights.h5')
with open('models/keras/architecture.json', 'w') as f:
    f.write(model.to_json())

PYTORCH

torch.save(model_trained.state_dict(),'models/pytorch/weights.h5')

Frame from ‘Alien: Resurrection’: Alien is evolving, just like PyTorch.

One line of code is enough in both frameworks. In Keras you can either save everything to a HDF5 file or save the weights to HDF5 and the architecture to a readable json file. By the way: you can then load the model and run it in the browser.

Currently, PyTorch creators recommend saving the weights only. They discourage saving the whole model because the API is still evolving.

Loading

Loading models is as simple as saving. You should just remember which saving method you chose and the file paths.

Here we:

load the model.

KERAS

# architecture and weights from HDF5
model = load_model('models/keras/model.h5')

# architecture from JSON, weights from HDF5
with open('models/keras/architecture.json') as f:
    model = model_from_json(f.read())
model.load_weights('models/keras/weights.h5')

PYTORCH

model = models.resnet50(pretrained=False).to(device)
model.fc = nn.Sequential(
    nn.Linear(2048, 128),
    nn.ReLU(inplace=True),
    nn.Linear(128, 2)).to(device)
model.load_state_dict(torch.load('models/pytorch/weights.h5'))

In Keras we can load a model from a JSON file, instead of creating it in Python (at least when we don’t use custom layers). This kind of serialization makes it convenient for transfering models.

PyTorch can use any Python code. So pretty much we have to re-create a model in Python.

Loading model weights is similar in both frameworks.

6. Make predictions on sample test images

All right, it’s finally time to make some predictions! To fairly check the quality of our solution, we ask the model to predict the type of monsters from images not used for training. We can use the validation set, or any other image.

Here we:

load and preprocess test images,
predict image categories,
show images and predictions.

COMMON

validation_img_paths = ["data/validation/alien/11.jpg",
                        "data/validation/alien/22.jpg",
                        "data/validation/predator/33.jpg"]
img_list = [Image.open(img_path) for img_path in validation_img_paths]

KERAS

validation_batch = np.stack([preprocess_input(np.array(img.resize((img_size, img_size))))
                             for img in img_list])

pred_probs = model.predict(validation_batch)

PYTORCH

validation_batch = torch.stack([data_transforms['validation'](img).to(device)
                                for img in img_list])

pred_logits_tensor = model(validation_batch)
pred_probs = F.softmax(pred_logits_tensor, dim=1).cpu().data.numpy()

COMMON

fig, axs = plt.subplots(1, len(img_list), figsize=(20, 5))
for i, img in enumerate(img_list):
    ax = axs[i]
    ax.axis('off')
    ax.set_title("{:.0f}% Alien, {:.0f}% Predator".format(100*pred_probs[i,0],
                                                          100*pred_probs[i,1]))
    ax.imshow(img)

Prediction, like training, works in batches (here we use a batch of 3; though we could surely also use a batch of 1). In both Keras and PyTorch we need to load and preprocess the data. A rookie mistake is to forget about the preprocessing step (including color scaling). It is likely to work, but result in worse predictions (since it effectively sees the same shapes but with different colors and contrasts).

In PyTorch there are two more steps, as we need to:

convert logits to probabilities,
transfer data to the CPU and convert to NumPy (fortunately, the error messages are fairly clear when we forget this step).

And this is what we get:

It works!

And how about other images? If you can’t come up with anything (or anyone) else, try using photos of your co-workers. :)

Conclusion

As you can see, Keras and PyTorch differ significantly in terms of how standard deep learning models are defined, modified, trained, evaluated, and exported. For some parts it’s purely about different API conventions, while for others fundamental differences between levels of abstraction are involved.

Keras operates on a much higher level of abstraction. It is much more plug&play, and typically more succinct, but at the cost of flexibility.

PyTorch provides more explicit and detailed code. In most cases it means debuggable and flexible code, with only small overhead. Yet, training is way-more verbose in PyTorch. It hurts, but at times provides a lot of flexibility.

Transfer learning is a big topic. Try tweaking your parameters (e.g. dense layers, optimizer, learning rate, augmentation) or choose a different network architecture.

Have you tried transfer learning for image recognition? Consider the list below for some inspiration:

Chihuahua vs. muffin, sheepdog vs. mop, shrew vs. kiwi (already serves as an interesting benchmark for computer vision)
Original images vs. photoshopped ones
Artichoke vs. broccoli vs. cauliflower
Zerg vs. Protoss vs. Orc vs. Elf
Meme or not meme
Is it a picture of a bird?
Is it huggable?

Pick Keras or PyTorch, choose a dataset and let us know how it went in the comments section below :)

AI Monthly digest #1 – AI stock trading & Kaggle record

October 5, 2018/in Data science, Deep learning, Machine learning, AI Monthly Digest /by Konrad Budek and Arkadiusz Nowaczynski

AI-based stock trading, a record-breaking competition on Kaggle and more stories cherry-picked from all the interesting ML- and AI-related news from September. Right here in the AI Monthly Digest.

The Digest gathers machine learning and AI news to spot the most important and interesting events and developments of the past month. The five events below were curated from last month’s events and chosen by Arkadiusz Nowaczyński and Konrad Budek from deepsense.ai team.

Deep learning takes a deep dive into the stock market

Deep reinforcement learning can be applied as a complete AI solution for algorithmic trading.
The authors of “Deep Reinforcement Learning in Portfolio Management” set out to determine whether methods derived primarily for playing Atari games and continuous control would work on the stock market. The algorithm they used, called deep deterministic policy gradient (DDPG), returned promising results in an offline backtest.
The second paper, “Deep Reinforcement Learning in High Frequency Trading,” provides convincing arguments about why AI stock trading is suitable for trading in a timescale below 1 second (High Frequency Trading). The authors did a solid evaluation of their approach with a few noteworthy tips:

Online learning at test time makes it possible to maintain high accuracy over time;
A small neural network is enough for this problem, meaning AI for trading can be developed on laptops;
Predicting the next 100 ticks from the last 500 ticks works best for them.

Progress remains to be made and questions to be answered. Does this algorithm work when deployed on the real market? How much money can you actually make with it? The lack of answers is certainly intriguing, as is the fact that algorithmic trading may soon be powered mostly by Deep RL, if it’s not already. We think that the potential financial reward will push people to develop further breakthroughs in AI. After all, setting high scores in Atari games isn’t as satisfying as having supersmart AI earning you gobs of money.

A record-breaking Kaggle competition

Over 8500 data scientists on no fewer than 7000 teams took part in the Kaggle Home Credit Default Risk evaluation record-breaking competition. The goal of the competition was to predict the risk of giving a loan to a particular customer. The teams were provided with rich datasets containing historical and transactional data on the customer’s behavior.

Perfectly designed, the competition attracted teams from far and wide, mostly thanks to the outstanding dataset. It allowed the teams to harvest insights and play with data in often surprising ways. Looking to tune up their models and further polish their skills, participants engaged in discussions and peer-reviews long after the competition had ended.
deepsense.ai took part in the competition, with Paweł Godula leading a team that took 5th place overall and finished first on the public leaderboard.

Volvo trucks introduce Vera, the cabless truck

According to PwC data, by 2030 the transport sector will require 138 million fewer cars in Europe and the US, mostly thanks to the rise of autonomous vehicles and the development of new business models. What’s more, it is predicted that by 2030 autonomous vehicles will be driving 40% of all miles driven.
As a proof of concept, Volvo has brought out Vera, the cabless truck to be used in short-haul transportation at logistics centres or ports. With the fleet of vehicles able to communicate and be supervised by a cloud-based management system, the truck is an interesting glimpse of the driverless future.

DARPA announced $2 billion investment in AI

At it’s 60th anniversary conference, the DARPA (Defense Advanced Research Projects Agency) announced that it is going to invest $2 billion in artificial intelligence. The agency is known for developing cutting-edge technology, be it ARPANET, which later evolved into the Internet, or the Aspen Movie Map, which was among the predecessors of Google Street View.
According to John Everrett (via CNNMoney), the deputy director of DARPA’s Information Innovation Office, the agency’s investment is intended to accelerate the development of AI from 20 years down to five years.
DARPA’s investment is not the first a government has made in AI. The most notable example comes from the United Arab Emirates, which has appointed an AI minister.

NIPS conference sold out in less than 13 minutes

NIPS, hosted in Montreal, Canada, is currently the most important machine learning and AI research conference in the world. Initially held as an interdisciplinary meeting of experts interested in sharing their knowledge on neural networks, it has evolved into the machine learning meeting with thousands of papers sent for review. It is also a place to run competitions with the “Learning to run” in 2017 as an example.

In 2017, the tickets sold out in two weeks, a relative eternity compared to the rock concert-like 12 minutes and 38 seconds they flew out in this year. Tickets for last year’s Comic-Con, one of the world’s most beloved pop culture events, sold out in a bit more than an hour.
So, when it comes to selling tickets, Marvel superheroes would appear to have nothing on machine learning. This year’s NIPS conference will feature Henryk Michalewski, visiting professor at Oxford University and a researcher at deepsense.ai, as a co-author of “Reinforcement Learning of Theorem Proving” paper.

Summary

September has clearly shown that AI is one of the most dominant trends in modern tech. Selling out venues faster than pop culture events goes a long way to proving that a scientific conference, or at least this one, can be as exciting as a concert or show – so long as it’s about Artificial Intelligence.

What is reinforcement learning in Machine Learning

What is reinforcement learning? deepsense.ai’s complete guide

July 5, 2018/in Deep learning, Machine learning, Reinforcement learning, Popular posts /by Błażej Osiński and Konrad Budek

With an estimated market size of 7.35 billion US dollars, artificial intelligence is growing by leaps and bounds. McKinsey predicts that AI techniques (including deep learning and reinforcement learning) have the potential to create between $3.5T and $5.8T in value annually across nine business functions in 19 industries.

Although machine learning is seen as a monolith, this cutting-edge technology is diversified, with various sub-types including machine learning, deep learning, and the state-of-art technology of deep reinforcement learning.

What is reinforcement learning?

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem. To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.
Although the designer sets the reward policy–that is, the rules of the game–he gives the model no hints or suggestions for how to solve the game. It’s up to the model to figure out how to perform the task to maximize the reward, starting from totally random trials and finishing with sophisticated tactics and superhuman skills. By leveraging the power of search and many trials, reinforcement learning is currently the most effective way to hint machine’s creativity. In contrast to human beings, artificial intelligence can gather experience from thousands of parallel gameplays if a reinforcement learning algorithm is run on a sufficiently powerful computer infrastructure.

Examples of reinforcement learning

Applications of reinforcement learning were in the past limited by weak computer infrastructure. However, as Gerard Tesauro’s backgamon AI superplayer developed in 1990’s shows, progress did happen. That early progress is now rapidly changing with powerful new computational technologies opening the way to completely new inspiring applications.
Training the models that control autonomous cars is an excellent example of a potential application of reinforcement learning. In an ideal situation, the computer should get no instructions on driving the car. The programmer would avoid hard-wiring anything connected with the task and allow the machine to learn from its own errors. In a perfect situation, the only hard-wired element would be the reward function.

For example, in usual circumstances we would require an autonomous vehicle to put safety first, minimize ride time, reduce pollution, offer passengers comfort and obey the rules of law. With an autonomous race car, on the other hand, we would emphasize speed much more than the driver’s comfort. The programmer cannot predict everything that could happen on the road. Instead of building lengthy “if-then” instructions, the programmer prepares the reinforcement learning agent to be capable of learning from the system of rewards and penalties. The agent (another name for reinforcement learning algorithms performing the task) gets rewards for reaching specific goals.

Another example: deepsense.ai took part in the “Learning to run” project, which aimed to train a virtual runner from scratch. The runner is an advanced and precise musculoskeletal model designed by the Stanford Neuromuscular Biomechanics Laboratory. Learning the agent how to run is a first step in building a new generation of prosthetic legs, ones that automatically recognize people’s walking patterns and tweak themselves to make moving easier and more effective. While it is possible and has been done in Stanford’s labs, hard-wiring all the commands and predicting all possible patterns of walking requires a lot of work from highly skilled programmers.

For more real-life applications of reinforcement learning check this article.

Challenges with reinforcement learning

Creating realistic simulation environments

The main challenge in reinforcement learning lays in preparing the simulation environment, which is highly dependant on the task to be performed. When the model has to go superhuman in Chess, Go or Atari games, preparing the simulation environment is relatively simple. When it comes to building a model capable of driving an autonomous car, building a realistic simulator is crucial before letting the car ride on the street. The model has to figure out how to brake or avoid a collision in a safe environment, where sacrificing even a thousand cars comes at a minimal cost. Transferring the model out of the training environment and into to the real world is where things get tricky.

Scaling and tweaking neural networks

Scaling and tweaking the neural network controlling the agent is another challenge. There is no way to communicate with the network other than through the system of rewards and penalties.This in particular may lead to catastrophic forgetting, where acquiring new knowledge causes some of the old to be erased from the network (to read up on this issue, see this paper, published during the International Conference on Machine Learning).

Overcoming local optimum and task evasion

Yet another challenge is reaching a local optimum – that is the agent performs the task as it is, but not in the optimal or required way. A “jumper” jumping like a kangaroo instead of doing the thing that was expected of it-walking-is a great example, and is also one that can be found in our recent blog post.
Finally, there are agents that will optimize the prize without performing the task it was designed for. An interesting example can be found in the OpenAI video below, where the agent learned to gain rewards, but not to complete the race.

What distinguishes reinforcement learning from deep learning and machine learning?

In fact, there should be no clear divide between machine learning, deep learning and reinforcement learning. It is like a parallelogram – rectangle – square relation, where machine learning is the broadest category and the deep reinforcement learning the most narrow one.
In the same way, reinforcement learning is a specialized application of machine and deep learning techniques, designed to solve problems in a particular way.

Although the ideas seem to differ, there is no sharp divide between these subtypes. Moreover, they merge within projects, as the models are designed not to stick to a “pure type” but to perform the task in the most effective way possible. So “what precisely distinguishes machine learning, deep learning and reinforcement learning” is actually a tricky question to answer.

What is machine learning?

Machine learning is a form of AI in which computers are given the ability to progressively improve the performance of a specific task with data, without being directly programmed ( this is Arthur Lee Samuel’s definition). He coined the term “machine learning”, of which there are two types, supervised and unsupervised machine learning

Supervised machine learning happens when a programmer can provide a label for every training input into the machine learning system.

Example – by analyzing the historical data taken from coal mines, deepsense.ai prepared an automated system for predicting dangerous seismic events up to 8 hours before they occur. The records of seismic events were taken from 24 coal mines that had collected data for several months. The model was able to recognize the likelihood of an explosion by analyzing the readings from the previous 24 hours.

AAIA16 Data Mining Challenge Seismic Events Height Randomization

Some of the mines can be exactly identified by their main working height values. To obstruct the identification, we added some Gaussian noise

From the AI point of view, a single model was performing a single task on a clarified and normalized dataset. To get more details on the story, read our article about machine learning models predicting dangerous seismic events.
Unsupervised learning takes place when the model is provided only with the input data, but no explicit labels. It has to dig through the data and find the hidden structure or relationships within. The designer might not know what the structure is or what the machine learning model is going to find.

An example we employed was for churn prediction. We analyzed customer data and designed an algorithm to group similar customers. However, we didn’t choose the groups ourselves. Later on, we could identify high-risk groups (those with a high churn rate) and our client knew which customers they should approach first.
Another example of unsupervised learning is anomaly detection, where the algorithm has to spot the element that doesn’t fit in with the group. It may be a flawed product, potentially fraudulent transaction or any other event associated with breaking the norm.

What is deep learning?

Deep learning consists of several layers of neural networks, designed to perform more sophisticated tasks. The construction of deep learning models was inspired by the design of the human brain, but simplified. Deep learning models consist of a few neural network layers which are in principle responsible for gradually learning more abstract features about particular data.
Although deep learning solutions are able to provide marvelous results, in terms of scale they are no match for the human brain. Each layer uses the outcome of a previous one as an input and the whole network is trained as a single whole. The core concept of creating an artificial neural network is not new, but only recently has modern hardware provided enough computational power to effectively train such networks by exposing a sufficient number of examples. Extended adoption has brought about frameworks like TensorFlow, Keras and PyTorch, all of which have made building machine learning models much more convenient.

Example: deepsense.ai designed a deep learning-based model for the National Oceanic and Atmospheric Administration (NOAA). It was designed to recognize Right whales from aerial photos taken by researchers. For further information about this endangered species and deepsense.ai’s work with the NOAA, read our blog post. From a technical point of view, recognizing a particular specimen of whales from aerial photos is pure deep learning. The solution consists of a few machine learning models performing separate tasks. The first one was in charge of finding the head of the whale in the photograph while the second normalized the photo by cutting and turning it, which ultimately provided a unified view (a passport photo) of a single whale.

The third model was responsible for recognizing particular whales from photos that had been prepared and processed earlier. A network composed of 5 million neurons located the blowhead bonnet-tip. Over 941,000 neurons looked for the head and more than 3 million neurons were used to classify the particular whale. That’s over 9 million neurons performing the task, which may seem like a lot, but pales in comparison to the more than 100 billion neurons at work in the human brain. We later used a similar deep learning-based solution to diagnose diabetic retinopathy using images of patients’ retinas.

Reinforcement learning in detail

Reinforcement learning, as stated above employs a system of rewards and penalties to compel the computer to solve a problem by itself. Human involvement is limited to changing the environment and tweaking the system of rewards and penalties. As the computer maximizes the reward, it is prone to seeking unexpected ways of doing it. Human involvement is focused on preventing it from exploiting the system and motivating the machine to perform the task in the way expected. Reinforcement learning is useful when there is no “proper way” to perform a task, yet there are rules the model has to follow to perform its duties correctly. Take the road code, for example.

Example: By tweaking and seeking the optimal policy for deep reinforcement learning, we built an agent that in just 20 minutes reached a superhuman level in playing Atari games. Similar algorithms in principal can be used to build AI for an autonomous car or a prosthetic leg. In fact, one of the best ways to evaluate the reinforcement learning approach is to give the model an Atari video game to play, such as Arkanoid or Space Invaders. According to Google Brain’s Marc G. Bellemare, who introduced Atari video games as a reinforcement learning benchmark, “although challenging, these environments remain simple enough that we can hope to achieve measurable progress as we attempt to solve them”.

	Breakout
Initial performance	After 15 minutes of training	After 30 minutes of training

	Assault
Initial performance	After 15 minutes of training	After 30 minutes of training

In particular, if artificial intelligence is going to drive a car, learning to play some Atari classics can be considered a meaningful intermediate milestone. A potential application of reinforcement learning in autonomous vehicles is the following interesting case. A developer is unable to predict all future road situations, so letting the model train itself with a system of penalties and rewards in a varied environment is possibly the most effective way for the AI to broaden the experience it both has and collects.

Reinforcement learning vs deep learning ve machine learning: conclusion

The key distinguishing factor of reinforcement learning is how the agent is trained. Instead of inspecting the data provided, the model interacts with the environment, seeking ways to maximize the reward. In the case of deep reinforcement learning, a neural network is in charge of storing the experiences and thus improves the way the task is performed.

Is reinforcement learning the future of machine learning?

Although reinforcement learning, deep learning, and machine learning are interconnected no one of them in particular is going to replace the others. Yann LeCun, the renowned French scientist and head of research at Facebook, jokes that reinforcement learning is the cherry on a great AI cake with machine learning the cake itself and deep learning the icing. Without the previous iterations, the cherry would top nothing.
In many use cases, using classical machine learning methods will suffice. Purely algorithmic methods not involving machine learning tend to be useful in business data processing or managing databases.
Sometimes machine learning is only supporting a process being performed in another way, for example by seeking a way to optimize speed or efficiency.
When a machine has to deal with unstructured and unsorted data, or with various types of data, neural networks can be very useful.

Summary

Reinforcement learning is no doubt a cutting-edge technology that has the potential to transform our world. However, it need not be used in every case. Nevertheless, reinforcement learning seems to be the most likely way to make a machine creative – as seeking new, innovative ways to perform its tasks is in fact creativity. This is already happening: DeepMind’s now famous AlphaGo played moves that were first considered glitches by human experts, but in fact secured victory against one of the strongest human players, Lee Sedol.
Thus, reinforcement learning has the potential to be a groundbreaking technology and the next step in AI development.

Keras or PyTorch as your first deep learning framework

June 26, 2018/in Data science, Deep learning, Machine learning /by Piotr Migdal and Rafał Jakubanis

So, you want to learn deep learning? Whether you want to start applying it to your business, base your next side project on it, or simply gain marketable skills – picking the right deep learning framework to learn is the essential first step towards reaching your goal.

What are Keras and PyTorch?

Keras and PyTorch are open-source frameworks for deep learning gaining much popularity among data scientists.

Keras is a high-level API capable of running on top of TensorFlow, CNTK, Theano, or MXNet (or as tf.contrib within TensorFlow). Since its initial release in March 2015, it has gained favor for its ease of use and syntactic simplicity, facilitating fast development. It’s supported by Google.
PyTorch, released in October 2016, is a lower-level API focused on direct work with array expressions. It has gained immense interest in the last year, becoming a preferred solution for academic research, and applications of deep learning requiring optimizing custom expressions. It’s supported by Facebook.

Before we discuss the nitty-gritty details of both frameworks, we want to preemptively disappoint you – there’s no straight answer to the ‘which one is better?’. The choice ultimately comes down to your technical background, needs, and expectations. This article aims to give you a better idea of where each of the two frameworks you should be pick as the first.

TL;DR:

Keras may be easier to get into and experiment with standard layers, in a plug & play spirit.
PyTorch offers a lower-level approach and more flexibility for the more mathematically-inclined users.

Ok, but why not any other framework?

TensorFlow is a popular deep learning framework. Raw TensorFlow, however, abstracts computational graph-building in a way that may seem both verbose and not-explicit. Once you know the basics of deep learning, that is not a problem. But for anyone new to it, sticking with Keras as its officially-supported interface should be easier and more productive.
[Edit: Recently, TensorFlow introduced Eager Execution, enabling the execution of any Python code and making the model training more intuitive for beginners (especially when used with tf.keras API).]
While you may find some Theano tutorials, it is no longer in active development. Caffe lacks flexibility, while Torch uses Lua (though its rewrite is awesome :)). MXNet, Chainer, and CNTK are currently not widely popular.

Keras vs. PyTorch: Ease of use and flexibility

Keras and PyTorch differ in terms of the level of abstraction they operate on.
Keras is a higher-level framework wrapping commonly used deep learning layers and operations into neat, lego-sized building blocks, abstracting the deep learning complexities away from the precious eyes of a data scientist.
PyTorch offers a comparatively lower-level environment for experimentation, giving the user more freedom to write custom layers and look under the hood of numerical optimization tasks. Development of more complex architectures is more straightforward when you can use the full power of Python and access the guts of all functions used. This, naturally, comes at the price of verbosity.
Consider this head-to-head comparison of how a simple convolutional network is defined in Keras and PyTorch:

Keras

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPool2D())
model.add(Conv2D(16, (3, 3), activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

PyTorch

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.conv2 = nn.Conv2d(32, 16, 3)
        self.fc1 = nn.Linear(16 * 6 * 6, 10)
        self.pool = nn.MaxPool2d(2, 2)
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 6 * 6)
        x = F.log_softmax(self.fc1(x), dim=-1)
        return x
model = Net()

The code snippets above give a little taste of the differences between the two frameworks. As for the model training itself – it requires around 20 lines of code in PyTorch, compared to a single line in Keras. Enabling GPU acceleration is handled implicitly in Keras, while PyTorch requires us to specify when to transfer data between the CPU and GPU.
If you’re a beginner, the high-levelness of Keras may seem like a clear advantage. Keras is indeed more readable and concise, allowing you to build your first end-to-end deep learning models faster, while skipping the implementational details. Glossing over these details, however, limits the opportunities for exploration of the inner workings of each computational block in your deep learning pipeline. Working with PyTorch may offer you more food for thought regarding the core deep learning concepts, like backpropagation, and the rest of the training process.
That said, Keras, being much simpler than PyTorch, is by no means a toy – it’s a serious deep learning tool used by beginners, and seasoned data scientists alike.
For instance, in the Dstl Satellite Imagery Feature Detection Kaggle competition, the 3 best teams used Keras in their solutions, while our deepsense.ai team (4th place) used a combination of PyTorch and (to a lesser extend) Keras.
Whether your applications of deep learning will require flexibility beyond what pure Keras has to offer is worth considering. Depending on your needs, Keras might just be that sweet spot following the rule of least power.

Summary

Keras – more concise, simpler API
PyTorch – more flexible, encouraging deeper understanding of deep learning concepts

Keras vs. PyTorch: Popularity and access to learning resources

A framework’s popularity is not only a proxy of its usability. It is also important for community support – tutorials, repositories with working code, and discussions groups. As of June 2018, Keras and PyTorch are both enjoying growing popularity, both on GitHub and arXiv papers (note that most papers mentioning Keras mention also its TensorFlow backend). According to a KDnuggets survey, Keras and PyTorch are the fastest growing data science tools.

Unique mentions of deep learning frameworks in arxiv papers (full text) over time, based on 43K ML papers over last 6 years. So far TF mentioned in 14.3% of all papers, PyTorch 4.7%, Keras 4.0%, Caffe 3.8%, Theano 2.3%, Torch 1.5%, mxnet/chainer/cntk <1%. (cc @fchollet) pic.twitter.com/YOYAvc33iN

— Andrej Karpathy (@karpathy) 10 marca 2018

While both frameworks have satisfactory documentation, PyTorch enjoys stronger community support – their discussion board is a great place to visit to if you get stuck (you will get stuck) and the documentation or StackOverflow don’t provide you with the answers you need.
Anecdotally, we found well-annotated beginner level deep learning courses on a given network architecture easier to come across for Keras than for PyTorch, making the former somewhat more accessible for beginners. The readability of code and the unparalleled ease of experimentation Keras offers may make it the more widely covered by deep learning enthusiasts, tutors and hardcore Kaggle winners.
For examples of great Keras resources and deep learning courses, see “Starting deep learning hands-on: image classification on CIFAR-10“ by Piotr Migdał and “Deep Learning with Python” – a book written by François Chollet, the creator of Keras himself. For PyTorch resources, we recommend the official tutorials, which offer a slightly more challenging, comprehensive approach to learning the inner-workings of neural networks. For a concise overview of PyTorch API, see this article.

Summary

Keras – Great access to tutorials and reusable code
PyTorch – Excellent community support and active development

Keras vs. PyTorch: Debugging and introspection

Keras, which wraps a lot of computational chunks in abstractions, makes it harder to pin down the exact line that causes you trouble.
PyTorch, being the more verbose framework, allows us to follow the execution of our script, line by line. It’s like debugging NumPy – we have easy access to all objects in our code and are able to use print statements (or any standard Pythonic debugging) to see where our recipe failed.
A Keras user creating a standard network has an order of magnitude fewer opportunities to go wrong than does a PyTorch user. But once something goes wrong, it hurts a lot and often it’s difficult to locate the actual line of code that breaks. PyTorch offers a more direct, unconvoluted debugging experience regardless of model complexity. Moreover, when in doubt, you can readily lookup PyTorch repo to see its readable code.

Summary

PyTorch – way better debugging capabilities
Keras – (potentially) less frequent need to debug simple networks

Keras vs. PyTorch: Exporting models and cross-platform portability

What are the options for exporting and deploying your trained models in production?
PyTorch saves models in Pickles, which are Python-based and not portable, whereas Keras takes advantages of a safer approach with JSON + H5 files (though saving with custom layers in Keras is generally more difficult). There is also Keras in R, in case you need to collaborate with a data analyst team using R.
Running on Tensorflow, Keras enjoys a wider selection of solid options for deployment to mobile platforms through TensorFlow for Mobile and TensorFlow Lite. Your cool web apps can be deployed with TensorFlow.js or keras.js. As an example, see this deep learning-powered browser plugin detecting trypophobia triggers, developed by Piotr and his students.
Exporting PyTorch models is more taxing due to its Python code, and currently the widely recommended approach is to start by translating your PyTorch model to Caffe2 using ONNX.

Summary

Keras – more deployment options (directly and through the TensorFlow backend), easier model export.

Keras vs. PyTorch: Performance

Donald Knuth famously said:

Premature optimization is the root of all evil (or at least most of it) in programming.

In most instances, differences in speed benchmarks should not be the main criterion for choosing a framework, especially when it is being learned. GPU time is much cheaper than a data scientist’s time. Moreover, while learning, performance bottlenecks will be caused by failed experiments, unoptimized networks, and data loading; not by the raw framework speed. Yet, for completeness, we feel compelled to touch on this subject. We recommend these two comparisons:

TensorFlow, Keras and PyTorch comparison by Wojtek Rosiński
Comparing Deep Learning Frameworks: A Rosetta Stone Approach by Microsoft (make sure to check notebooks to get the taste of different frameworks). For a detailed explanation of the multi-GPU framework comparisons, see this article.

PyTorch is as fast as TensorFlow, and potentially faster for Recurrent Neural Networks. Keras is consistently slower. As the author of the first comparison points out, gains in computational efficiency of higher-performing frameworks (ie. PyTorch & TensorFlow) will in most cases be outweighed by the fast development environment, and the ease of experimentation Keras offers.

Summary

As far as training speed is concerned, PyTorch outperforms Keras

Keras vs. PyTorch: Conclusion

Keras and PyTorch are both excellent choices for your first deep learning framework to learn.

If you’re a mathematician, researcher, or otherwise inclined to understand what your model is really doing, consider choosing PyTorch. It really shines, where more advanced customization (and debugging thereof) is required (e.g. object detection with YOLOv3 or LSTMs with attention) or when we need to optimize array expressions other than neural networks (e.g. matrix decompositions or word2vec algorithms).

Keras is without a doubt the easier option if you want a plug & play framework: to quickly build, train, and evaluate a model, without spending much time on mathematical implementation details.
EDIT: For side-by-side code comparison on a real-life example, see our new article: Keras vs. PyTorch: Alien vs. Predator recognition with transfer learning.

Knowledge of the core concepts of deep learning is transferable. Once you master the basics in one environment, you can apply them elsewhere and hit the ground running as you transition to new deep learning libraries.

We encourage you to try out simple deep learning recipes in both Keras and PyTorch. What are your favourite and least favourite aspects of each? Which framework experience appeals to you more? Let us know in the comment section below!

Would you and your team like to learn more about deep learning in Keras, TensorFlow and PyTorch? Choose our custom-made AI workshops.

Learning to run – an example of reinforcement learning

June 22, 2018/in Deep learning, Machine learning /by Konrad Budek

Turns out a walk in the park is not so simple after all. In fact, it is a complex process done by controlling multiple muscles and coordinating who knows how many motions. If carbon-based lifeforms have been developing these aspects of walking for millions of years, can AI recreate it?

This blog will describe:

How reinforcement learning works in practical usage
The process used to learn the model
Challenges in reinforcement learning
How knowledge is transferred between neural networks and why it is important for the development of artificial intelligence

Moving by controlling the muscles attached to bones, as humans do it, is way more complicated and harder to recreate than building a robot that can move with engines and hydraulic cylinders.
Building a model that can run by controlling human muscles recreated in a simulated environment was the goal of a competition organized at the NIPS 2017 conference. Designing the model with reinforcement learning was a part of a scientific project that could potentially be used to build software for sophisticated prostheses, which allow people to live normally after serious injuries.
Software that understands muscle-controlled limb movement would be able to translate the neural signals into instructions for an automated arm or leg. On the other hand, it may also be possible to artificially stimulate the muscles to move in a particular way, allowing paralyzed people to move again.

Why reinforcement learning

Our RL Agent had to move the humanoid by controlling 18 muscles attached to bones. The simulation was done in an OpenSim environment. Such environments are used mainly in medicine to determine how changes in physiology are going to affect a human’s ability to move. For example, if a patient with a shorter tendon or bone will still be able to walk or grab something with his hand. The surprising challenge was the environment itself – OpenSims require a lot of computational power.

Building hard-coded software to control a realistic biomechanical model of a human body would be quite a challenge, even if researchers from Stanford University have done just that. But training a neural network to perform this task proved to be much more efficient and less time-consuming, and didn’t require biomechanical domain specific knowledge.

Run Stephen! Run!

Our reinforcement learning algorithm leverages a system of rewards and punishments to acquire useful behaviour. During the first experiments, our agent (whom we called Stephen)randomly performed his actions, with no hints from the designer. His goal was to maximize the rewards involved by learning which actions, done randomly, yielded the best effect. Basically, the model had to figure out how to walk over the course of a few days, a much shorter time than the few billion years it took carbon-based lifeforms.

In this case, Stephen got a reward for every meter he travelled. During the first trials, he frequently fell over, sometimes forward, sometimes backward. With enough trials, it managed to fall only forward, then to jump or take its first step.

The curriculum, or step-by-step learning

After enough trials, Stephen learned that jumping forward is a good way to maximize the future reward. As a jumper, he was not that bad – he got from point A to point B by effectively controlling his muscles. He didn’t fall and was able to move quickly.

But our goal for Stephen was not “learning to hop”- it was “learning to run”. Jumping was a sub-optimal form of locomotion.
This prompted the need for a curriculum, or, in other words, a tutoring program. Instead of training Stephen to avoid obstacles and run at the same time, we would teach him progressively harder skills – first to walk on a straight road, then to run and, finally, to avoid obstacles. Learn to walk before you run, right?

To reduce his tendency to jump and instead find a way to walk, we had to get Stephen to explore different options such as moving his legs separately.
We opted to use a relatively small neural network that would be able to learn to walk on a path without any obstacles. He succeeded at this, but during the process, he had a Jon Snowesque problem with his knee.

Anyone who has ever aspired to sports stardom will remember a coach admonishing them to bend their knees. Apparently, the failure to do so is common among all walkers, including simulated ones controlled by an artificial neural network. Reshaping the reward function was the only way to communicate with the agent. As the human creators, we of course know just what walking should look like, but the neural network had no clue. So adding an award for Stephen for bending his knees was a good way to improve his performance and find a better policy.

If any human had his walk from that moment, it would be wise to apply for a government grant to develop it.

When Stephen finally worked out how to walk and run effectively, we added another, bigger neural network to figure out how to avoid obstacles. At that point, one neural network was controlling the running process while the second one figured out how to tweak Stephen’s movement to avoid obstacles and not fall.
This is a novel technique which we called policy blending. The usual way to make a neural network bigger and teach it new skills is behavioral cloning, which is a machine learning interpretation of the master-apprentice relation. The new, bigger deep neural network watches how the smaller one performs its tasks.
For this task, our method of policy blending has been outperforming behavioural cloning. For further information, please read a scientific paper we contributed to. It presents interesting ideas employed during the challenge. After Stephen learned how to move and avoid rocks in his way, we blended another neural network encouraging him to run even faster.

With policy blending and enough computational power, Stephen managed to run in a human way without falling. With 10 random obstacles to navigate, Stephen fell in less than 8% of trials. When he was moving more carefully (about 20% slower), the falls ratio fell (pardon the pun) to below 0.5%.

After the run – the effects of reinforcement learning

The experiment brought a few significant outcomes.
First, it is possible for a computer to perform the tremendously complicated task of walking with separate and coordinated control of the muscles. The agent was able to figure out how to do that using reinforcement learning alone – it did not need to observe human movement.
Moreover, the policy blending method proved effective and outperformed the standard behaviour cloning approach. Although it is not certain that it will be more efficient in every possible case, it is another, sometimes better way to transfer knowledge from one trained network to another.
Finally, we handled the resource-demanding environment by effectively splitting the computations between nodes of a large cluster. So even within the complex and heavy simulator, reinforcement learning may be not only possible, but effective.