Data science Archives - deepsense.ai

Credit risk modelling with Machine Learning

Using machine learning in credit risk modelling

May 5, 2021/in Data science, Machine learning /by deepsense.ai

Cost of risk is one of the biggest components in banks’ cost structure. Thus, even a slight improvement in credit risk modelling can translate in huge savings. That’s why machine learning is often implemented in this area.

We would like to share with you some insights from one of our projects, where we applied machine learning to increase credit scoring performance.To illustrate our insights we selected a random pool of 10 000 applications.

How the regular process works

Loan applications are usually assessed through a credit score model, which is most often based on a logistic regression (LR). It is trained on historical data, such as credit history. The model assesses the importance of every attribute provided and translates them into a prediction.

The main limitation of such a model is that it can take into account only linear dependencies between input variables and the predicted variable. On the other hand, it is this very property that makes logistic regression so interpretable. LR is in widespread used in credit risk modelling.

Credit scoring from a logistic regression model

What machine learning brings to the table

Machine learning enables the utilization of more advanced modeling techniques, such as decision trees and neural networks. This introduces non-linearities to the model and allows to detect more complex dependencies between the attributes. We decided to use an XGBoost model fed with features selected with the use of a method called permutation importance.

Credit scoring from tree-based model

However, ML models are usually so sophisticated that they are hard to interpret. Since a lack of interpretability would be a serious issue in such a highly regulated field as credit risk assessment, we opted to combine XGBoost and logistic regression.

Combining the models

We used both scoring engines – logistic regression and the ML based one – to assess all of the loan applications

With a clear correlation between the two assessment approaches, a high score in one model would likely mean a high score in the other.

Loan applications assessed by 2 models

In the original approach, logistic regression was used to assess applications. The acceptance level was set around 60% and the risk resulted at 1%

Initial credit application split (acceptance to portfolio risk)

If we decrease the threshold by a couple of points, the acceptance level hits 70% while the risk jumps to 1,5%

Credit applications’ split after lowering the threshold

We next applied a threshold for an ML model, allowing us to get an acceptance percentage to the original level (60%) while bringing the risk down to 0,75% that is by 25% lower than the risk level resulting from only traditional approach.

Credit applications’ split after applying Machine Learning

Summary

Machine learning is often seen as difficult to apply in banking due to the sheer amount of regulation the industry faces. The facts don’t necessarily back this up. ML is successfully used in numerous, heavily regulated industries. The example above is just one more example of how. Thanks to this innovative approach it is possible to increase the sustainability of the loans sector and make loans even more affordable to bank customers. There’s nothing artificial about that kind of intelligence.

3D meets AI – an unexplored world of new business opportunities

May 22, 2020/in Data science, Deep learning, Machine learning /by Krzysztof Palczewski, Jarosław Kochanowicz and Michał Tadeusiak

AI has become a powerful force in computer vision and it has unleashed tangible business opportunities for 2D visual data such as images and videos. Applying AI can bring tremendous results in a number of fields. To learn more about this exciting area, read our overview of 2D computer vision algorithms and applications.

Despite its popularity, there is nothing inherent to 2D imagery that makes it uniquely suitable for AI application. In fact, artificial intelligence systems can analyze various forms of information, including volumetric data. In spite of the increasing number of companies already using 3D data gathered by lidar or 3D cameras, AI applications aren’t the mainstream in their industries.

In this post, we describe how to leverage 3D data across multiple industries with the use of AI. Later in the article we’ll have a closer look at the nuts and bolts of the technology and we’ll aslo show what it takes to apply AI to 3D data. At the end of the post, you’ll also find an interactive demo to play with.

In the 3D world, there is no Swiss Army Knife

3D data is what we call volumetric information. The most common types include:

2.5D data, including information on depth or the distance to visible objects, but no volumetric information of what’s hidden behind them. Lidar data is an example.
3D data, with full volumetric information. Examples include MRI scans or objects rendered with computer graphics.
4D data, where volumetric information is captured as a sequence, and the outcome is a recording where one can go back and forth in time to see the changes occurring in the volume. We refer to this as 3D + time, which we can treat as the 4th dimension. Such representation enables us to visualize and model dynamic 3D processes, which is especially useful in medical applications such as respiratory or cardiac monitoring.

There are also multiple data representations. These include a compound of 2D images along the normal axis, sparse Point Cloud representation and voxelized representation. Such data could have additional channels, like reflectance in every point of a lidar’s view.

Depending on the business need, there can be different objectives for using AI: object detection and classification, semantic segmentation, instance segmentation and movement parameterization, to name a few. Moreover, every setup has its own characteristics and limitations that should be addressed with a dedicated approach (or, in the case of artificial neural networks, with a sophisticated and thoroughly designed architecture). These are the main reasons our clients come to us, and to take advantage of our experience in the field. We are responsible for delivering the AI part of specific projects, even though the majority of their competencies are built in-house.

Let us have a closer look at a few examples

1. Autonomous driving

Task: 3D object detection and classification,
Data: 2.5 Point clouds captured with a lidar: sparse data, big distances between points

Autonomous driving data are very sparse because:

the distances between objects in outdoor environments are significant
In the majority of cases lidar rays from the front and rear of the car don’t return to lidar, since there are no objects to reflect them.
The resolution of objects gets worse the further they are from the laser scanner. Due to the angular expansion of the beam it’s impossible to determine the precise shape of objects that are far away.

For autonomous driving, we needed a system that can take advantage of data sparsity to infer 3D bounding boxes around objects. One such network is the part-aware and aggregation neural network i.e. Part-A2 net (https://arxiv.org/abs/1907.03670). This is a two-stage network that uses the high separability of objects, which functions as segmentation information.

In the first stage, the network estimates the position of foreground points of objects inside bounding boxes generated by an anchor-based or anchor-free scheme. Then, in the second stage, the network aggregates local information for box refinement and class estimation. The network output is shown below, with the colors of points in bounding boxes showing their relative location as perceived by the Part-A² net.

Source of image: From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

2. Indoor scene mapping

Task: Object instance segmentation
Data: Point clouds, sparse data, relatively small distances between points

A different setup is called for in mapping indoor environments, such as we do with instance segmentation of objects in office space or shops (see this dataset for better intuition: S3DIS dataset). Here we employ a relatively high-density representation of a point cloud and BoNet architecture.

In this case the space is divided into a 1- x 1- x 1-meter cubic grid. In each cube, a few thousand points are sampled for further processing. In an autonomous driving scenario, such a grid division would make little sense given the sheer number of cubes produced, many of which are empty and only a few of which contain any relevant information.

The network produces semantic segmentation masks as well as bounding boxes. The inference is a two-stage process. The first produces a global feature vector to predict a fixed number of bounding boxes. It also tallies scores to indicate whether some of the predicted classes are inside those boxes. The point-level and global features derived in the first stage are then used to predict a point-level binary mask with the class assignment. The pictures below show a typical scene with the segmentation masks.

3D meets AI - Indoor scene mapping — An example from the S3DIS dataset. From left: input image, semantic segmentation labels, instance segmentation labels

3. Medical diagnosis

Task: 3D Semantic segmentation
Data: Stacked 2D images, dense data, small distance between images

This is a highly controlled setup, where all 2D images are carefully and densely stacked together. Such a representation can be treated as a natural extension of a 2D setup. In such cases, modifying existing 2D approaches will deliver satisfactory results.

An example of a modified 2D approach is the 3D U-Net (https://arxiv.org/abs/1606.06650), where all 2D operations for a classical U-Net are replaced by their 3D counterparts. If you want to know more about AI in medicine, check out how it can be used to help with COVID-19 diagnosis and other challenges.

3D meets AI - Medical diagnosis — Source: Head CT scan

4. A 3D-enhanced 2D approach

There is also another case, where luckily, it can be relatively straightforward to apply expertise and technology developed for 2D cases in 3D applications. One such scenario is where there are 2D labels available, but the data and the inference products are in 3D. Another is when 3D information can play a supportive role.

In such a case, a depth map produced by 3D cameras can be treated as an additional image channel beyond regular RGB colors. Such additional information increases the sensitivity of neural networks to edge detection and thus yield better object boundaries.

3D meets AI - A 3D-enhanced 2D approach — Source: Azure Kinect DK depth camera

Examples of the projects we have delivered in such a setup include:

Defect detection based on 2D and 3D images.

We developed an AI system for a tire manufacturer to detect diverse types of defects. 3D data played a crucial role as it allowed for ultra-precise detection of submillimeter-size bubbles and scratches.

Object detection in a factory

We designed a system to detect and segment industrial assets in a chemical facility that had been thoroughly scanned with high resolution laser scanners. Combining 2D and 3D information allowed us to digitize the topology of the installation and its pipe system.

3D data needs a mix of competencies

At deepsense.ai, we have a team of data scientists and software engineers handling the algorithmic, visualization, and integration capabilities. Our teams are set up to flexibly adapt to specific business cases and provide tailor-made AI solutions. The solutions they produce are an alternative to pre-made, off-the-shelf products, which often prove too rigid and constrained; they fail once user expectations deviate from the assumptions of their designers.

Processing and visualizing data in near real time with appropriate user experience is no piece of cake. Doing so requires a tough balancing act, including

combining specific business needs, technical limitations resulting from huge data loads and the need to support multiple platforms.

It is always easier to discuss based on an example. Next section shows what it takes to develop an object detection system for autonomous vehicles with outputs accessible from a web browser. The goal is to predict bounding boxes of 3 different classes: car, pedestrian and cyclist, 360 degrees around the car. Such a project can be divided into 4 interconnected components: data processing, algorithms, visualizations and deployment.

Data preprocessing

In our example, we use the KITTI and A2D2 datasets, two common datasets for autonomous driving, and ones our R&D hub rely on heavily. In both datasets, we use data from spinning lidars for inference and cameras for visualization purposes.

Lidars and cameras work independently, capturing data at different rates. To obtain a full picture, all data have to be mapped to a common coordinate system and adjusted for time. This is no easy task. As lidars are constantly spinning, each point is captured at a different time, while simultaneously the position and rotation of the car in relation to world coordinates is changing. Meanwhile, the precise location and angle of the car is not known perfectly due to limitations of geolocation systems such as GPS. These difficulties make it extremely difficult to precisely and stably determine the absolute positions of objects around you (SLAM can be used to tackle some of the problems).

Fortunately, absolute positioning of objects around the vehicle is not always required.

Algorithms

There are a vast number of approaches when it comes to 3D data. However, factors such as the length to and between objects and high sparsity will play an essential role in which algorithm we ultimately settle on. As in the first example above, we used Part-A2 net.

Deployment

We have relied on a complete, in-house solution for visualization, data handling, and UI. We have used expertise in the Unity engine to develop a cross-platform, graphically rich and fully flexible solution. In terms of a platform, we opted for maximum availability, which can be satisfied by a popular web browser like Chrome or Mozilla and WebGL as Unity’s compilation platform.

Visualization/UI

WebGL, while very comfortable for the user, disables drive access and advanced GPU features, limits available RAM to 2GB and processing to a single thread. Additionally, while standalone solutions in Unity may rely on existing libraries for point cloud visualization, making it possible to visualize hundreds of millions of points (thanks to advanced GPU features), this is not the case in WebGL.

Therefore, we have developed an in-house visualization solution enabling real-time, in-browser visualization of up to 70 mln points. Give it a try!

Such visualization could be tailored to the company’s specific needs. In a recent project, we took a different approach: we used AR glasses in visualizing a factory in all its complexity. This enabled our client to reach next level user experience and see the factory in a whole new light.

Summary

We hope that this post has shed some light on how AI can be used with 3D data. If you have a particular 3D use case in mind or you are just curious about the potential for AI solutions in your field, please reach out to us. We’ll be happy to share our experience and discuss potential ways we can help you apply the power of artificial intelligence in your business. Please drop us an email at contact@deepsense.ai.

deepsense.ai and books-box.com – using machine learning to deliver knowledge pills

May 12, 2020/in Data science /by Oleh Plakhtiy

Modern science is facing a completely new challenge: overload. According to research from the University of Ottawa, the total number of research papers published since 1665 passed the 50 million mark in 2009 and approximately 2.5 million new papers are published every year. In fact, it is nearly impossible to be up-to-date with all this information, at least for a human being. Machine learning tools make it easier and faster to find information in today’s ever vaster trove of publications.

Books-box.com runs a platform with access to a wide variety of science-oriented literature. Its library contains around 5,000 books across multiple categories. But rather than distributing whole books in a digital form, it provides page-level access to required pieces of knowledge. This is often the case in academia and research work, where a particular piece of information is needed to enrich a paper and deliver more credible information.

We had the pleasure of working with books-box.com and providing them with NLP services. The goal of the project was to create a recommendation engine that suggests relevant literature to users based on the content they’re viewing.

Text embedding

To make a book’s text readable to a computer, the words are transformed into vectors. A vector is just a set of real numbers that functions as input for a Machine Learning algorithm. If you would like to learn more about the technologies and techniques we use, click on over to our business guide to Natural Language Processing.

One way to transform a sentence into a vector of numbers is one-hot encoding. This technique transforms a word into an n-vector where “n” equals the number of all unique words the model was taught during training. Unfortunately, this solution isn’t very useful for text because the vector becomes enormous and the word order and context are completely lost.

Enter embeddings, the state-of-the-art in NLP algorithms. To create a sentence embedding means to assign a vector to a sentence in a vector space that conserves semantics. When two embedding vectors in this space are close to each other, the sentences they represent are similar in meaning. Commonly used vector size is between 100 and 1024, which is much smaller than the number of all unique words.

To create sentence embeddings we use the top-shelf neural network-based NLP algorithms ELMo and BERT. ELMo uses deep, bi-directional LSTM recurrent neural networks, while BERT uses the Transformer attention mechanism. The engine uses both algorithms to make final recommendations.

We developed a proprietary aggregation mechanism that allows us to generate aggregate embedding vectors for each book page. They allow us to easily check page similarity, by calculating the cosine similarity of two vectors, a standard metric in multidimensional vector space.

Recommendations

When viewing a page users will get five other page recommendations that may interest them. But having around 200,000 pages per book category to get recommendations means calculating 200,000 page embedding comparisons for each request! That’s a lot of computing time. But instead of calculating the similarity online, we calculate it beforehand, and store top recommendations.

Having a pre-calculated cosine similarity between all pages, book-box’s recommendations can now be given almost instantaneously – the higher the score, the better the recommendation will be.

Ball tree is an alternative solution to storing raw embeddings in a multi-dimensional, space partitioning data structure like k-d tree. The beauty of this approach is that it doesn’t require all possible embedding comparisons to be calculated. Instead, the data structure enables the optimal search for the nearest points (embeddings) in multidimensional space. In our case, however, there is one problem with this approach – from the business side we have required for one page to have recommendations from a variety of books. But top k similar pages for one (k is a parameter which needs to be chosen during the tree-build phase) would most likely be from the same book. And that was not the solution we were looking for.

Threshold

Each pair of pages comes with a similarity score. In order to achieve the quality of recommendations desired, a threshold has to be selected. The higher the similarity score, the higher the quality of the recommendations will be, even if a smaller number of recommendations remain available.

It is worth noting that assessing recommendation quality is not a straightforward (binary) task as it takes the subjective opinion of the assessor into account.

Cloud deployment

books-box.com regularly adds new books to its library. This entails preprocessing new books (parsing from html etc.), transforming their pages to embedding vectors and then updating the recommendation structure. Such operations require a lot of computing power, especially when we’re talking about thousands of books. To run neural networks for embedding, we need GPU devices for fast parallel computing.

We decided to deploy our recommendation engine on Amazon Web Services (AWS) cloud, which allowed us to control costs and work on the solution’s elasticity, durability and scaling capabilities. AWS also provides a convenient system of spot instances that are available at a discount of up to 90% compared to on-demand pricing.

Our deployment consists of three elements: API server, Simple Storage Service (S3) bucket, Graph updater.

The API server

Exposes a simple API to retrieve recommendations
Provides API for category management
Fetches new books and storing them in S3 basket

The S3 bucket

Stores books
Stores embeddings
Stores recommendation graphs

The Graph updater

Processes new books

As new books are ingested into the S3 bucket, a new message is sent to the appropriate queue in the Simple Queue Service (SQS). It contains information about where the book is stored in the bucket. Each message represents one book and each queue represents one category.

The CloudWatch component observes the size of this queue, and it will update the instances count in the Auto Scaling Group (ASG) accordingly – if there are a lot of messages, it will increase the count; otherwise it will decrease it.

The Auto Scaling Group (ASG) keeps track of the number of instances running. If ASG instance count drops to 0, it will terminate all of the running instances. Once Elastic Compute Cloud (EC2) instances come online, they will connect to the queue and start processing jobs. When there are no more jobs, ASG is set back to zero and the instances will terminate.

To make our solution cost effective, we went with EC2 spot instances, cutting costs by up to 70% compared to on-demand instances. When using EC2’s in conjunction with SQS we can continue processing even if instances are terminated because of the price limit has been reached.They will be back on as soon as the price drops again and they will pick up any work that’s still left on the queues.

Each of the EC2’s runs dockerized applications that process the books and keep a graph that’s stored and updated on S3. Thankfully, AWS offers the data transfer between EC2 and S3 free of charge.

Summary

Scanning through immense amounts of text in the latest scientific publication was a painful and time-consuming process. The ML-powered tools delivered by books-box cuts all the noise and delivers the desired pages straight to the researcher in little to no time.

AI in healthcare – tackling COVID-19 and other future challenges

May 8, 2020/in Data science, Deep learning /by Paulina Knut, Maciej Leoniak and Konrad Budek

Throughout history, tackling pandemics has always been about using the latest knowledge and approaches. Today, with AI-powered solutions, healthcare has new tools to tackle present and future challenges, and the COVID-19 pandemic will prove to be a catalyst of change.

It was probably a typical October day in Messina, a Sicilian port, when 12 genoese ships docked. People were horrified to discover the dead bodies of sailors aboard, and with them the entrance of the black death to Europe. Today, in the age of vaccines and advanced medical treatments, the specter of a pandemic may until recently have seemed a phantom menace. But the COVID pandemic has proved otherwise.

There are currently several challenges regarding the COVID, including symptoms that can be easily mistaken with those of the common flu. An X-ray or CT image of lungs is a key element in the diagnosis and treatment of COVID 19 – the disease produces several telltale signs that are easy for trained professionals to spot. Or a trained neural network.

Neural networks- a building block for medical AI analysis

Computer scientists have traditionally developed methods that let them find keypoints on images based on defined heuristics, which allow them to tackle a huge array of problems. For example, locating machine parts on a uniform conveyor belt where simple colour filtration differentiates them from the background. But this is not the case for more sophisticated problems, where extensive domain knowledge is required.

Enter Neural Networks, algorithms inspired by the mathematical model of how the human brain processes signals. In the same way as humans gain knowledge by gathering experience, Neural Networks process data and learn on their own, instead of being manually tuned.

In AI-powered image processing, every pixel is represented as an input node and its value is passed to neurons in the next layer, allowing the interdependencies between pixels to be captured. As seen in the face detection model below, the lower layers develop the ability to filter simple shapes like edges and corners (e.g., eye corners) or color gradients. These are then used by intermediate layers to construct more sophisticated shapes representing the parts of the objects being analysed (in this case eyes, parts of lips or a lung edge etc.). The high layers analyse recognised parts and classify them as specific objects. In the case of X-ray images, such objects may be a rib, a lung or an irrelevant object in the background.

Source: researchgate.net

A neural network can see details the average observer cannot, and even specialists would be hard-pressed to find. But such skill requires a significant amount of training and a good dataset.

What does it take to train neural networks?

Data scientists spend a lot of time ensuring their models have the ability to generalise, and can thus deliver accurate predictions from data they didn’t encounter during training. This requires vast knowledge of data preprocessing and augmentation techniques, state-of-the-art network architectures and error-interpreting skills. The iterative process of designing and executing experiments is also both very time- and computing power-consuming and requires good organisation if it is to be done efficiently. Under these conditions, high prediction accuracy is hard to achieve – deepsense.ai’s teams have been developing this ability for 7 years.

The key difference between a human specialist and a neural network is that the latter is completely domain-agnostic. An algorithm that excelled in Segmenting satellite images or recognising individual North Atlantic right whales from a population of 447 of North Atlantic right whales can just as well be used for medical image recognition after tuning.

AI in medical data

Numerous AI solutions are currently used in medicine: from appointments and digitization of medical records to drug dosing algorithms (applications of artificial intelligence in health care). However, doctors still have to perform painstaking and repetitive tasks e.g. by analyzing images.

Images are used across the field of medicine, but they play a particularly important role in radiology. According to IBM estimates, up to 90% of all medical data is in image form, be it x-rays, MRIs or most other output from a diagnostic device. That is why radiology as a field is so open to using new technologies. Computers initially used in clinical imaging for administrative work, such as image acquisition and storage, are now becoming an indispensable element of the work environment at the beginning of the image archiving and communication system.

Recently, deep learning has been used with great success in medical imaging thanks to its ability to extract features. In particular, neural networks have been used to detect and differentiate bacterial and viral pneumonia in childrens’ chest radiographs).

COVID appears to be a similar case. Studies show that 86% of Covid-19 patients have ground-glass opacities (GGO), 64% have mixed GGO and consolidation and 71% have vascular enlargement in the lesion. This can be observed on CT scans as well as chest X-ray images and can be relatively easily spotted by a trained neural network.

There are several advantages of CT and x-ray scans when it comes to diagnosing COVID-19. The speed and noninvasiveness of these methods make them suitable for assisting doctors in determining the development of the infection and making decisions regarding performance of invasive tests. Also, due to the lack of both vaccines and medications, immediately isolating the infected patient is the only way to prevent the spread of the disease.

How deepsense.ai already supports healthcare

deepsense.ai’s first foray into medical data was when we took part in a competition to classify the severity of diabetic retinopathy using images of retinas. The contestants were given over 35,000 images of retinas, each having a severity rating. There were 5 severity classes, and the distribution of classes was fairly imbalanced. Most of the images showed no signs of disease. Only a few percent had the two most severe ratings. After months of hard work, we took 6th place.

As we gained more contact and experience with medical data, our results improved, and after some time we were able to take on challenges such as producing an algorithm that could automatically detect nuclei. With images acquired under a variety of conditions and having different cell types, magnification, and imaging modality (brightfield vs. fluorescence), the main challenge was to ensure the ability to generalise across these conditions.

Another interesting project we did involved automatic stomatological assessment. We trained a model to read an x-ray image and detect and identify teeth, accessories and lesions including laces, implants, cavities, cavity fillings, and parodontosis, among a long list of others. In yet another project, we estimated minimum (end-systolic) and maximum (end-diastolic) volumes of the left ventricle from a set of MRI-images taken over one heartbeat. Our results were rated “excellent” by cardiologists that reviewed our work.

Move your mouse cursor over the image to see the difference.

The standardized formats used in medical imaging allow for better transfer of knowledge in modeling different problems. In a recent research project we explored the potential of image preprocessing of CT scans in DICOM format.

Image preprocessing is a vital aspect of computer vision projects. Developing the optimal procedure rests upon the team’s experience in similar projects as well as their ability to explore new ideas. In this case the specialized image preprocessing methods we developed made the image more readable for the model and boosted its performance by 20%.

The deepsense take-away

It is common to think that an epidemic starts and ends, with no further threat to fear. But that’s not true. The black death started with the arrival of twelve ships from Genoa, then proceeded to claim the lives of up to 50 million Europeans. The disease still exists today, with 3248 people infected and 584 dead between 2010 and 2015. That’s right, the disease never really disappeared.

700 hundred years ago, Ragusa (modern Dubrovnik), then a Venice-controlled port city, played a prominent role in slowing the spread of the disease.. Learning from the tragic fate of other port cities including Venice, Genoa, Bergen and Weymouth, officials in Ragusa hold sailors on their ships for 30 days (trentino) to check if they were healthy and slow the spread of the disease.

COVID-19 is neither the most deadly nor the last pandemic humans will face. The key is to apply the latest knowledge and the most sophisticated solutions available to tackle the challenges they present. AI can support not only the most dramatic life-death issues in healthcare, but also more mundane cases. According to an Accenture study, AI can deliver savings of up to $150 billion annually by 2025 by supporting both the front line, with diagnosis augmentation, and the back office, by enhancing document processing or delivering more accurate cost estimates. This translates to potential significant savings for each hospital that adopts AI.

If you want to know more about the ways AI-powered solutions can support healthcare and tackle modern and future pandemics, contact us through the form below!

A business guide to Natural Language Processing (NLP)

September 24, 2019/in Data science, Deep learning /by Konrad Budek and Artur Zygadlo

With chatbots powering up customer service on one hand and fake news farms on the other, Natural Language Processing (NLP) is getting attention as one of the most impactful branches of Artificial Intelligence (AI).

When Alan Turing proposed his famous test in 1950, he couldn’t, despite the prescience that accompanies brilliance such as his, predict how easy breaking the test would become. And how far from intelligence the machine that broke the test would be!

Modern Natural Language Processing is being used in multiple industries, in both large-scale projects delivered by tech giants and minor tweaks local companies employ to improve the user experience.

The solutions vary from supporting internal business processes in document management to improving customer service by automated responses generated for the most common questions. According to IDC data cited by Deloitte, companies leveraging the information buried in plain sight in documents and other unstructured data can achieve up to $430 billion in productivity gains by 2020.

The biggest problem with NLP is the significant difference between machines mimicking the understanding of text and actually understanding it. The difference is easily shown with ELIZA software (a famous chatbot from the 1960s), which was based on a set of scripts that paraphrased input text to produce credible-looking responses. The technology was sufficient to produce some text, but far from demonstrating real understanding or delivering business value. Things changed, however, once machine learning models came into use.

What is natural language processing?

As the name implies, natural language processing is the act of a machine processing human language, analyzing the queries in it and responding in a human manner. After several decades of NLP research strongly based on a combination of computer science and linguistic expertise, the “deep learning tsunami” (a term coined by Stanford CS and Linguistics professor Christopher Manning) has recently taken over this field of AI as well, similarly to what happened in computer vision.

Many NLP tasks today are tackled with deep neural networks, which are frequently used among various techniques that enable machines to understand a text’s meaning and its author’s intent.

Modern NLP solutions work on text by “reading” it and making a network of connections between each word. Thus, the model gets more information on the context, the sentiment and exactly what the author sought to communicate.

Tackling the context

Context and intent are critical in analyzing text. Analyzing a picture without context can be tricky – is a fist a symbol of violence, or a bro fist-bump?

The challenge grows even further with NLP, as there are multiple social and cultural norms at work in communication. “The cafe is way too cool for me” can refer to a too-groovy atmosphere or the temperature. Depending on the age of the speaker, a “savage” punk rock concert can be either positive or negative. Before the machine learning era, the flatness of traditional, dictionary-based solutions provided information with significantly less accuracy.

The best way to deal with this challenge is to deliver a word-mapping system based on multidimensional vectors (so-called word embeddings) that provide complex information on the words they represent. Following the idea of distributional semantics (“You shall know a word by the company it keeps”), the neural network learns word representations by looking at the neighboring words. A breakthrough moment for neural NLP came in 2013, when the renowned word2vec model was introduced. However, one of the main problems that word2vec could not solve was homonymy, as the model could not distinguish between different meanings of the same word. A way to significantly improve handling the context in which a word is used in a sentence was found in 2018, when more sophisticated word embedding models like BERT and ELMo were introduced.

Natural Language Processing examples

Recent breakthroughs, especially GPT-2, have significantly improved NLP and delivered some very promising use cases, including the ones elaborated below.

Automated translation

One of the most widely used applications of natural language processing is automated translation between two languages, e.g. with Google Translate. The translator delivers increasingly accurate texts, good enough to serve even in court trials. Google Translate was used when a British court failed to deliver an interpreter for a Mandarin speaker.

Machine translation was one of the first successful applications of deep learning in the field of NLP. The neural approach quickly surpassed statistical machine translation, the technology that preceded it. In a translation task, the system’s input and output are sequences of words. The typical neural network architecture used for translation is therefore called seq2seq, and consists of two recurrent neural networks (encoder and decoder).

The first seq2seq paper was published in 2014, and subsequent research led Google Translate to switch from statistical to neural translation in 2016. Later that year, Google announced a single multi-lingual system that could translate between pairs of languages the system had never seen explicitly, suggesting the existence of some interlingua-like representation of sentences in vector space.

Another important development related to recurrent neural networks is the attention mechanism, which allows a model to learn to focus on particular parts of sequences, greatly improving translation quality. Further improvements come from using Transformer architecture instead of Recurrent Neural Networks.

Chatbots

Automated interaction with customers causes their satisfaction with the overall user experience to rise significantly. And that’s not a thing to overcome, as up to 88% of customers are willing to pay more for better customer experience.

A great example of chatbots improving the customer experience comes from Amtrak, a US railway company that transports 31 million passengers yearly and administrates over 21,000 miles of rails across America. The company decided to employ Julie, a chatbot that supports passengers in searching for a convenient commute. She delivered 800% ROI and reduced the cost of customer service by $1 million yearly while also increasing bookings by 25%.

Speech recognition

As much as a company can use a chatbot to perform some customer service, one can have a personal assistant in the pocket. According to eMarketer data, up to 111.8 million people in the US–over a third of its population–will use a voice assistant at least once a month. The voice assistant market is growing rapidly, with companies such as Google, Apple, Amazon and Samsung developing their assistants not only for mobile devices, but also for TVs and home appliances.

Despite the privacy concerns voice assistants are raising, speech is becoming the new interface for human-machine interaction. The interface can also be used to control industrial machines, especially when employees have their hands occupied – a case common across industries from HoReCa to agriculture and construction – assuming that the noise is reduced enough for machine to register the voice properly.

Thanks to advances in NLP, speech recognition solutions are getting smarter and delivering a better experience for users. As the assistants come to understand speakers’ intentions better and better, they will provide more accurate answers to increasingly complex questions.

An unexpected example of speech recognition comes from deepsense.ai’s project renovating and digitalizating classic movies, where the machine delivers an automated transcription. When combined with a facial recognition tool, the system transcribed and annotated the actor speaking in the film.

Sentiment analysis

Social media provides numerous ways of reaching customers, gathering information on their habits and delivering excellence. It’s also a melting pot of perspectives and news, delivering unprecedented insight on public opinion. This insight can be understood using sentiment analysis tools, which check if the context where a brand is exposed in social media is positive, negative or neutral.

Sentiment analysis can be done without the assistance of AI by building up a glossary of positive and negative words and checking their frequency. If there is swearing or words like “broken” near the brand, sentiment is negative. Yet those systems cannot spot irony or more sophisticated hate. The sentence “I would be happy to see you ill” suggests an aggression and possibly hatred, yet there are no slurs or swearing. By supporting the analysis of words in the glossary by checking the relations between words in each sentence, a machine learning model can deliver a better understanding of the text and provide more information on the message’s subjectivity.

So good can that understanding be, in fact, that deepsense.ai delivered a solution that could spot terrorist propaganda and illicit content in social media in real-time. In the same way, it is possible to deliver a system that spots hate speech and other forms of online harassment. A study from the Pew Research Center shows that up to 41% of adult Americans have experienced some form of online harassment, a number that is likely to increase, mostly due to the rising prevalence of the Internet in people’s daily lives.

Natural language generation

Apart from understanding text, machines are getting better at delivering new texts. According to research published in Foreign Affairs, texts being produced by modern AI software are, for unskilled readers, comparable to those written by journalists. ML models are indeed already writing texts for world media organizations. And while that may seem a fascinating accomplishment, it was the fear of what such advanced abilities might portend that led OpenAI not to make GPT-2 public.

The most known case of automated journalism comes from the Washington Post, where Heliograf covers sport events. Its journalistic debut came in 2016, when the software was responsible for writing up coverage of the Olympic Games in Rio.

In business, natural languae generation is used to produce more polite and humane responses to FAQs. Thus, ironically, automating the conventional communication will make it more personal and humane than current, trigger-based solutions.

Text analytics

Apart from delivering real-time monitoring and sentiment analysis, NLP tools can analyze long and complicated texts, as is already being done at EY, PwC and Deloitte, all of which employ machine learning models to review contracts. The same can be applied to analyze emails or other company-owned unstructured data. According to Gartner estimates, up to 80% of all business data is unstructured and thus nonactionable for companies.

A good example of natural language processing in text analytics is a solution deepsense.ai designed for market research giant Nielsen. The company delivered reports on the ingredients in all of the FMCG products available on the market.

The process of gathering the data was time-consuming and riddled with pitfalls: an employee had to manually read a label, check the ingredients and fill out the tables. The entire process took up to 30 minutes per product. Also, due to inconsistencies in naming, the task was riddled with inconsistencies, as the companies delivered the product ingredients in local languages, English and, especially on the beauty and skin care markets, Latin.

deepsense.ai delivered a comprehensive system that processed an image of the product label taken with a smartphone. The solution spotted the ingredients, scanned the text and sorted the ingredients into tables, effectively reducing the work time from 30 minutes to less than two minutes, including the time needed to gather and validate the data.

Another use case of text analytics is the automated question response function generated by Google, which aims not only to provide search results for particular queries, but a complete answer to the user’s needs, including a link to the referred website, and a description of the matter.

Summary

Natural language processing provides numerous opportunities for companies from multiple industries and segments. Apart from relatively intuitive ways to leverage NLP, such as processing the documents and chatbots, there are multiple other applications, including real time social media analytics and supporting journalism or research work.

NLP models can be used to further augment existing solutions–from supporting the reinforcement learning models behind autonomous cars by providing better sign recognition to augmenting demand forecasting tools with extensions to analyze headlines and deliver more event-based predictions.

Because natural language is the best way to transfer information between humans and machines, the applications NLP makes possible will only increase and will soon be augmenting business processes around the globe.

AI Monthly Digest #12 – the shadow of malicious use

September 6, 2019/in Data science, AI Monthly Digest /by Konrad Budek and Arkadiusz Nowaczynski

With this edition of AI Monthly Digest, we have now for a full year been bringing readers carefully selected and curated news from the world of AI and Machine Learning (ML) that deepsense.ai’s team considers important, inspiring and entertaining.

Our aim is to deliver information that people not necessarily involved in AI and ML may find interesting. Also, the digest is curated by data scientists who ensure that the information included isn’t just hot air or marketing mumbo-jumbo, but significant news that will impact the global machine learning and reinforcement learning world.

This edition focuses on natural language processing, as the GPT-2 model is still an important element of AI-related discourse. This edition also contrasts the enthusiasm of ML-developers with concerns expressed by a renowned professor of Psychology.

1200 questions to ask

With natural language processing, a computer needs to generate natural texts in response to a human. This is at least troublesome, especially if a longer text or speech is required.

While these problems are being tackled in various ways, the gold standard is currently to run the newest solution on a benchmark. Yet delivering one is another challenge, to put it mildly.

To tackle it, researchers from the University of Maryland created a set of over 1200 questions that are easy to answer for a human and nearly impossible for a machine. To jump from “easy” to “impossible” is sometimes a matter of very subtle changes. As the researchers have said:

if the author writes “What composer’s Variations on a Theme by Haydn was inspired by Karl Ferdinand Pohl?” and the system correctly answers “Johannes Brahms,” the interface highlights the words “Ferdinand Pohl” to show that this phrase led it to the answer. Using that information, the author can edit the question to make it more difficult for the computer without altering the meaning of the question. In this example, the author replaced the name of the man who inspired Brahms, “Karl Ferdinand Pohl,” with a description of his job, “the archivist of the Vienna Musikverein,” and the computer was unable to answer correctly. However, expert human quiz game players could still easily answer the edited question correctly.

Capitalizing on this knowledge, researchers will be able to deliver better benchmarking for models and thus determine which part of the question confuses the computer.

Why does it matter

With each and every breakthrough, researchers get closer to delivering human-level natural language processing. At the same time, however, it is increasingly hard to determine if the neural network is understanding the processed text, or is just getting better fitted to the benchmark. Were the latter the case, the model would outperform existing solutions but register no significant improvement in real-life performance.

An example with detailed explanations are available in the video below.

A benchmark updated with those 1200 questions delivers significantly more precise information on the model’s ability to process the language and spot the drawbacks.

Large GPT-2 released

GPT-2 is probably the hottest topic among AI Trends 2019, especially considering the groundbreaking effect and controversial decision to NOT make the model public. Instead, OpenAI, the company behind the model, decided to cooperate with chosen institutions to find a way to harden the model against potential misuse.

And the threat is serious. According to research published in Foreign Affairs, readers consider GPT-2-written texts nearly as credible and convincing as those written by journalists and published in The New York Times (72% compared to 83%). Thus the articles are good enough to be especially dangerous as a weapon of mass disinformation or fake news factory – AI can produce a limitless amount of credible-looking texts with no effort.

To find the balance between supporting the development of the global science of AI and protecting models from being used for maleficent ends, OpenAI is releasing the model in iterations, starting a small one and ultimately aiming to make the model public but with the threat of misuse minimized.

Why does it matter

As research published in Foreign Affairs states, the model produces texts that an unskilled reader will find comparable to journalist-written ones. Image recognition models are already outperforming human experts in their tasks. But with all these cultural contexts, humor and irony, natural language once seemed protected by the unassailable fortress of the human mind.

The GPT-2 model has apparently cracked the gates and with business appliances it may be on the road to delivering a model that can provide human-like performance. The technology just needs to be controlled so as not to fall into the wrong hands.

What is this GPT-2 all about?

A GPT-2 model is, as stated above, one of the hottest topics of AI in 2019. But even the specialist can find it hard to understand the nitty-gritty of how the model works. To make the matter more clear, Jay Alammar has prepared a comprehensive guide to the technology.

Why does it matter

The guide is good enough to allow a person who has limited to no knowledge on the matter to understand the nuances of the model. For a moderately skilled data scientist given sufficient computing power and a dataset, the guide is sufficient to reproduce the model for example to support demand forecasting with NLP. It enables a data scientist to broaden his or her knowledge with one comprehensive article – a convenient way indeed.

Doing research is one thing, but sharing the knowledge it affords is a whole different story.

Malicious use, you say?

Jordan Peterson is a renowned professor and psychologist who studies the structure of myth and its role in shaping social behavior. If not a household name, he is certainly a public person and well-known speaker.

Using deep neural networks, AI researcher Chris Vigorito launched a notjordanpeterson.com website that allowed any user to generate any text that was later read with the neural network-generated voice of Jordan Peterson. As was the case with Joe Rogan, the output was highly convincing, mirroring the manner of speaking, breathing and natural pauses.

The networks was trained on 20 hours of transcripted Jordan Peterson speeches, an easy number to obtain where a public speaker is concerned. The amount of work was considerable, but not overwhelming.

Why does it matter

The creation of the neural network is not as interesting as Jordan Peterson’s response. He has written a blogpost entitled “I didn’t say that”, where he calls the situation “very strange and disturbing”. In the post, he notes that while it was fun to hear himself singing popular songs, the prospect of being an unwitting part of a scam is more than real. Due to the rising computing power available at affordable prices and algorithms getting better and less data-hungry, the threat of this technology being used for malicious ends is rising. If you’d like to know just how malicious he means, I’ll leave you with this to consider.

I can tell you from personal experience, for what that’s worth, that it is far from comforting to discover an entire website devoted to allowing whoever is inspired to do so produce audio clips imitating my voice delivering whatever content the user chooses—for serious, comic or malevolent purposes. I can’t imagine what the world will be like when we will truly be unable to distinguish the real from the unreal, or exercise any control whatsoever on what videos reveal about behaviors we never engaged in, or audio avatars broadcasting any opinion at all about anything at all. I see no defense, and a tremendously expanded opportunity for unscrupulous troublemakers to warp our personal and collective reality in any manner they see fit.

AI Monthly Digest #11 – From OpenAI partnering with Microsoft to battle.net Blade Runners

August 8, 2019/in Data science, AI Monthly Digest /by Konrad Budek and Arkadiusz Nowaczynski

AI models are skilled in Chess, Go, StarCraft and, since July, six-player Texas Hold’em Poker. But the hunt for inhuman players has begun.

And when AI models get bored with beating humans in games, it’s apparently time for a bike ride. Read on to find out just why.

OpenAI partners with Microsoft

In March, OpenAI shifted to a for-profit paradigm, forging OpenAI LP. The company has now entered into a strategic partnership with Microsoft to build a new computational platform in Azure Cloud. OpenAI will port its codebase into Azure and develop new solutions with tools available there.

The main benefit for Microsoft will be the access it gains to the fruits of OpenAI’s work as a preferred partner for commercializing new AI technologies. OpenAI was formed with the goal of improving people’s lives with technology and has delivered multiple AI breakthrough models including MuseNet, which produces music in various styles, and GPT-2, a natural language processing model and gold standard in text generation.

For more on the joint venture, click on over to the official press release.

Why does it matter?

As the company recently announced, taking a non-profit approach to develop AI proved too daunting even for the likes of Elon Musk and Sam Altman, the company’s CEO. Unlike with most software, developing artificial intelligence requires not only skilled and talented people but also an astonishing amount of computing power. Training a single GPT-2 model is estimated to cost up to $50,000 – and that’s only for one of many experiments run in a given year. So, securing access to computing power is a must. By providing that, Microsoft will be getting access to the base of talented AI developers, significantly increasing its AI development potential.

AI beats skilled human players in six-player poker

Pluribus bot is the first AI-controlled agent capable of beating human pro players in six-player no-limit Hold’em poker, the most popular format of this game in the world. Unlike chess and Go, poker is a game with hidden information – the player cannot see the hands of other players. The game itself involves bluff and a great deal of psychological factors. AI bots were good at beating one opponent, but going up against more than one was a major milestone.

More details are available in the ai.facebook blogpost.

Why does it matter

Dealing with a one-on-one situation, though common in recreational games, is rare when solving real-life problems. Moreover, it is somewhere between difficult and impossible to get all the information one needs, as in chess or Go. Delivering models that work in a limited-information, multiple-agent environment is pushing models that support demand forecasting or managing cybersecurity threats.

When an autonomous car is not enough

Cars, be they traditional or autonomous, come with various disadvantages, especially in cities. They get stuck in traffic jams, and require parking, which can be hard to come by.

To provide more sustainable autonomous transport, Chinese scientists have brought out an autonomous bicycle. The machine responds to voice commands, avoids obstacles and maintains balance. It uses a new Tianjic chip, which supports processing neuroscience-inspired algorithms. Check out the video below to see how it works.

Why does it matter

The research itself sounds like a lot of fun, but it also constitutes an excellent foundation for further work on autonomous transportation. A bike can be used for delivering fast food, or modified to work as a motorized Rickshaw in our ever more crowded metropolises. Couriers can used them to deliver the mail and other documents or to transport individuals incapable of riding a bike.

Autonomous bikes need not be fully autonomous, by the way: AI can support the driving process or provide alerts to riders.

Alan Turing will be featured on 50 pound banknote

Often hailed as the godfather of modern computing, Alan Turing is widely known for his work on cracking the famed German Enigma code and as the leader of the team that enhanced the cracking methods delivered by Polish mathematicians.

Turing is also considered a pioneer of artificial intelligence. He came up with the Turing Test as a first way to determine if a machine mimicking a human in conversation is truly intelligent.

So great was Turing’s contribution to humanity that he will now be featured on Britain’s 50-pound banknote. Chosen from 227,299 nominations covering 989 eligible characters, Turing was ultimately picked by Mark Carney, Bank of England governor.

Why does it matter

The announcement is a sign that computer science is no longer considered a novelty and prominent AI researchers earn the same respect chemists, physicists or life sciences experts do, as representation on a banknote well attests.

BattleNet Blade Runners

Deepmind, in conjunction with Blizzard, has deployed AlphaStar model on Battle.net, to allow players to test their skills and mettle against artificial intelligence. Battle.net is an official platform connecting players from all around the world, enabling them to quickly find opponents for a multiplayer match.

There is just one twist: the famed, reinforcement learning-trained AlphaStar will play anonymously, thus allowing players to compete with the model as they would do in any match with a normal opponent.

AlphaStar has been developed significantly beyond the abilities it commanded in defeating human professional players MaNa and TLO. Deepmind capped the actions-per-minute and actions-per-second rate to make it more accurately appropriate human abilities limited by muscles and the need to operate a mouse and keyboard. The model’s perception has also been narrowed to a single frame to come in line with what human players see on the screen.

Finally, the model is able to control and compete in any race given, be it Terran, Protoss or Zerg, representing all the factions available in the game. This represents serious progress: during matches in January, the model could only control Protoss units fighting against other Protoss.

Why does it matter

At the moment, it doesn’t. But let the experiment run its course and tune in later for an update. We anticipate more impressive progress.

Interestingly, this most recent news thrilled the players’ community, which all too clearly remembers the wounds AlphaStar inflicted in dominating renowned pros. Given that any match with the model is counted as a normal encounter and a ranking match, on-guard players try to spot and avoid AlphaStar lurking in the muddy waters of the battle.net rankings.

Players have reported “odd” behavior of some opponents and have been uploading videos on YouTube, where they discuss if the other player is actually AlphaStar incognito. They also advise each other to check if the partner responds to messages. Being called a noob by an opponent even once can be strong evidence that the opponent on the other side of the battlefield isn’t human.

So players are on the hunt for a replicant.

AI Monthly Digest #10 – AI tackles climate change and deciphers long-forgotten languages

July 8, 2019/in Data science, AI Monthly Digest /by Konrad Budek and Arkadiusz Nowaczynski

June brought record-breaking temperatures, perfectly highlighting the global challenge of climate change. Is that AI-related news? Check and see in the latest AI Monthly Digest.

A common misconception about machine learning projects is that they are by definition big. However, any number of AI-powered micro-tweaks and improvements are applied in everyday work. A good example of both micro and macro tweaks that can fix a major problem can be found in the paper described below.

AI tackling climate change

The world witnessed an extraordinarily hot June, with average temperatures 2 degrees celsius above normal in Europe. According to the World Meteorological Organization, the heatwave is consistent with predictions based on greenhouse gas concentrations and human-induced climate change.

Tackling this challenge will not be easy: according to World Bank Data, fossil fuel energy consumption still stacks to 79% of total. Furthermore, greenhouse gasses, particularly methane, are emitted by cattle, with livestock being responsible of 14.5% of total human-induced greenhouse emissions.

The most prominent figures in AI today, including DeepMind CEO Demis Hassabis, Turing award winner Yoshua Bengio, and Google Brain co-founder Andrew Ng, have authored a comprehensive paper on ways that AI can tackle the changing climate.

Their call for collaboration is meant to inspire practitioners, engineers and investors to deliver short- and long-term solutions for measures within our reach. Those include producing low-carbon electricity through better forecasting, scheduling, and control for variable sources of energy, mitigating the damage produced by high-carbon economies through, for example, better predictive maintenance as well as help minimize energy use in transportation, smart buildings and cities. The applications can vary from designing grid-wide control systems or optimizing scheduling with more accurate demand forecasting.

Why does it matter

Climate change is one of the greatest challenges mankind faces today, with truly cataclysmic scenarios approaching. Further temperature increases may lead to a variety of disasters, from flooding coastal regions due to melting ice caps, agricultural crises and conflicts over access to water.

Green energy promises solutions, yet these are not without their challenges, many of which could be solved with machine learning, deep learning or reinforcement learning. Responsibility is among deepsense.ai’s most important AI trends, and being responsible for the planet would be an excellent example of just why we chose to focus on that trend.

We will provide more in-depth content on climate change and AI-powered ways of tackling it. So stay tuned!

Giants racing to produce the best image recognition

If machine learning is today’s equivalent of the steam engine revolution, data and hardware are the coal and engine that power the machines. Facebook and Google are like the coal mines of yesteryear, having access to large amounts of fuel and power to build new models and experiment.

It should come as no surprise that breakthroughs are usually powered by the tech giants. Google’s state of the art in image recognition, EfficientNet, has been a recent giant step forward. The model was delivered by automated searching procedure uniformly scaling each dimension of the network in order to find the best combination.

EfficientNet stands for something.

The result is state-of-the-art in Image recognition. At least when it comes to combining efficiency and accuracy. But not when it comes to accuracy alone.

Not even a month later Facebook delivered a model that outperformed Google’s. The key lay in scaling the enormous dataset it was trained on. The social media mogul has access to Instagram’s database, which holds no less than billions of user-tagged images, a dataset ready to be chewed over by a hungry deep learning model.

The neural network was released to the public using a recently launched Pytorch Hub platform for sharing cutting edge models.

Why does it matter

Both advances show how important machine learning is for the tech giants and how much effort they invest in pushing their research forward. Every advancement in image recognition brings new breakthroughs closer. For example, models are becoming more accurate in detecting diabetic retinopathy using images of the eye. Every further development delivers new ways to solve problems that would be unsolvable without ML (Machine learning) – manufacturing for visual quality control is among the best examples.

XLNet outperforms BERT

As we noted in a past AI Monthly Digest, Google has released Bidirectional Encoder Representation from Transformations (BERT). BERT was, until recently, the state-of-the-art when it comes to Natural Language Processing benchmarks. The newly announced XLNet is an autoregressive pretraining method (as opposed to an autoencoder-like BERT) which learns a language model by predicting the next word in a sequence using the permutation of all the surrounding words. An intuitive explanation can be found (here).

The XLNet model proved more effective than BERT in beating all 20 benchmark tasks.

Why does it matter

Understanding a natural language was considered a benchmark for intelligence, with Alan Turing’s test being among the best examples. Every push forward delivers new possibilities in building new products and solving problems, be they business ones or something more uncommon, like the example below.

AI-powered archeology? Bring it on!

Deep learning-based models are getting even better at understanding natural language. But what about language that is natural, but has never been deciphered due to lack of knowledge or a frustratingly small amount of extant text?

Recent research from MIT and Google shows that a machine learning approach can deliver major improvements in deciphering ancient texts. In the basics of modern natural language processing techniques, all of the words in a given text are assumed to be related to each other. The machine itself doesn’t “understand” text it in a human way, but rather forms its own assumptions based on the relations and connotations of each word in a sentence.

Disc of Phaistos, one of the most famous mysteries of archaeology

In this approach, the translation process is not built on understanding the world, but rather finding similarly connotated words that transfer the same message. This is entirely different than humans’ approach to language.

By making the algorithm less data-hungry, the researchers deliver a model that translates texts from rare and long-lost languages. The approach is described in this paper.

Why does it matter

While there are countless examples of machine learning in business, there are also new horizons to discover in the humanities. Deciphering the secrets of the past is every bit as exciting as building defenses against the challenges of the future.

The more sophisticated approach to and possible brute-force breaking of unknown languages provides a way to uncover more language-related secrets.

A Disc of Phaistos? Or a Voynich manuscript maybe?

Outsmarting failure. Predictive maintenance powered by machine learning

November 13, 2018/in Data science, Machine learning /by Konrad Budek

Since the days of the coal-powered industrial revolution, manufacturing has become machine-dependent. As the fourth industrial revolution approaches, factories can harness the power of machine learning to reduce maintenance costs.

The internet of things (IoT) is nothing new for industry. Worldwide, the number of cellular-enabled factory automation devices reached 270 000 in 2012 worldwide. In 2018 it will rise to a staggering 820 000. Machines are present in every stage of the production process, from assembly to shipment. Although automation makes industry more efficient, with rising complexity it also becomes more vulnerable to breakdowns, as service is both time-consuming and expensive.

Four levels of predictive maintenance

According to PricewaterhouseCoopers, there are four levels of predictive maintenance.

1.	Visual inspection, where the output is entirely based on the inspector’s knowledge and intuition
2.	Instrument inspection, where conclusions are a combination of the specialist’s experience and the instrument’s read-outs
3.	Real-time condition monitoring that is based on constant monitoring with IoT and alerts triggered by predefined conditions
4.	AI-based predictive analytics, where the analysis is performed by self-learning algorithms that continuously tweak themselves to the changing conditions

As the study indicates, a good number of the companies surveyed by PwC (36%) are now on level 2 while more than a quarter (27%) are on level 1. Only 22% had reached level 3 and 11% level 4, which is basically level 3 on machine learning steroids. The PwC report states that only 3% use no predictive maintenance at all.

Staying on track

According to the PwC data, the rail sector is the most advanced sector of those surveyed with 42% of companies at level 4, compared to 11% overall.

One of the most prominent examples is Infrabel, the state-owned Belgian company, which owns, builds, upgrades and operates a railway network which it makes available to privately-owned transportation companies. The company spends more than a billion euro annually to maintain and develop its infrastructure, which contains over 3 600 kilometers of railway and some 12 000 civil infrastructure works like crossings, bridges, and tunnels. The network is used by 4 200 trains every day, transporting both cargo and passengers.

According to the PwC data, the rail sector is the most advanced sector of those surveyed with 42% of companies at level 4, compared to 11% overall.

The company faces both technical and structural challenges. Among them is its aging technical staff, which is shrinking.

At the same time, the density of railroad traffic is increasing – the number of daily passengers has increased by 50% since 2000, reaching 800 000. What’s more, the growing popularity of high-speed trains is exerting ever greater tension on the rails and other infrastructure.

To face these challenges, the company has implemented monitoring tools, such as sensors for monitoring overheating tracks, cameras which inspect the pantographs and meters to detect drifts in power consumption, which usually occur before mechanical failures in switches. All of the data is collected and analyzed by a single tool designed to apply predictive maintenance. Machine learning models are a component of that tool.

As sounding brass

Mueller Industries (Memphis, Tennessee) is a global manufacturer and distributor of copper, brass, aluminum and plastic products. The predictive maintenance solution the company uses is based on sound analysis. Every machine can be characterized by the sound it makes and any change in the tone or the sounds it makes may be a sign of impending malfunction. The analysis of the sound and the vibrations of the machine is done in real-time with the cloud-based machine learning solution that seeks patterns in the data gathered.

Both the amount and the nature of the data collected render it impossible for a human to analyze, but a machine-learning powered AI solution handles it with ease. The devices are able to gather data in ultrasonic and vibration sensors and analyze them in real time. Contrary to experience-based analytics, using the devices requires little-to-no training and can be done on the go.

Endless possibilities

With the power of machine learning enlisted, handling the tremendous amounts of data generated by the sensors in modern factories becomes a much easier task. It allows the company to detect failures before they paralyze the company, thus saving time and money. What’s more, the data that is gathered can be used to further optimize the company’s performance, including by searching for bottlenecks and managing workflows.

That’s why 98% of industrial companies expect to increase efficiency with digital technologies.

AI Monthly Digest #9 – the double-edged sword of modern technology

June 7, 2019/in Data science, AI Monthly Digest /by Konrad Budek and Arkadiusz Nowaczynski

This edition is all about AI morality-related themes, with a slight tinge of Talking Heads and Modern Talking.

Earlier this year, deepsense.ai highlighted AI morality and transparency as one of 2019’s dominant AI trends. May bore out our thesis, especially as it relates to potential misuse and malicious intent. At the same time, though, AI provides unique chances to support entertainment and education, as well as deliver new business cases.

A bigger version of GPT-2 released to the public

Open-AI has recently shown the GPT-2 model has set a new gold standard for natural language processing. Following the acclaimed success of the model, OpenAI opted not to make it public due to the risk of malicious usage, particularly to produce spam and fake news at no cost.

This sparked an uproar. The industry good practice is to release AI research work as open-source software, so other researchers can push the boundaries further without having to repeat all the work done earlier from scratch. In other words – OpenAI threw up a major hurdle to NLP-model development by keeping GPT-2 under wraps.

To support the scientific side of the equation while reducing the malicious threat, OpenAI releases some smaller-scale models to the public. The model it recently released operates on 345M parameters, while the best original model consists of 1.5B parameters. Every parameter can be seen as a virtual neuron inside a neural network, so OpenAI is basically reducing the brain it designed.

The original network was released to OpenAI partners currently working on malice-proofing the system. The first independent applications of the downscaled network are already available at talktotransformer.com and onionbot headline generator.

Why does it matter?

OpenAI is currently facing a difficult choice between supporting the global development of AI and the fear of losing control over dangerous technology. In a world facing a potential avalanche of fake news and social media being used to perpetuate propaganda, building a system that writes coherent and convincing texts is undoubtedly dangerous.

This case allows one to see all the AI-related issues in a nutshell, including the technology’s amazing potential, the real threat of misuse or malicious intent. So the case may serve as a precedent for future cases.

Talking heads unleashed

A group of scientists working for Samsung’s AI Center in Moscow and Skolkovo Institute of Science and Technology designed a model that can produce a convincing video of a talking head from a single image, such as a passport photo or even a painting.

The model renders with consistency both the background and the head’s behavior. Most impressively, the model builds a convincing video of a talking head from even a single image of the frame.

The solution is searching for a similar face that was analyzed and extracts facial features including a nose, chin, mouth and eyes. The movement of those features is then applied on the image, as shown in the video.

The results are undoubtedly impressive.

Why does it matter?

Yet another AI ethics-related issue, the talking-head technology poses the threat of deepfakes, images that show a person making statements that he or she would never make. This raises obvious questions about the malicious ways such technology could be used.

On the other hand, when deepfakes are used for special effects in popular movies, no one seems to complain and critics even weigh in with their acclaim. Some of the better-known examples come from the Star Wars franchise, particularly Rogue One, which features Leia Organa wearing the face of a young Carrie Fisher.

AI has also proved itself useful in promoting art. By leveraging this technology it is possible to deliver the talking head of Girl with a Pearl Earring or the Mona Lisa telling visitors from screens about a painting’s historical context – a great way to put more fun in art lessons for kids. Or just to have some fun seeing what a Stallone-faced Terminator would look like.

Again, AI can be used for both good and evil ends. The ethics are up to the wielder of this double-edged sword.

Modern Talking – recreating the voice of Joe Rogan

Another example of deepfake-related technology is using AI to convincingly recreate Joe Rogan’s voice. The text-to-speech technology is not a new kid on the block, yet it is easy to spot due to the robotic and inhumanely calm style of speaking. Listening to automated text-to-speech was usually boring at best while delivering the unintentional comic effects of robotic speech, all in the absence of emotion or inflection.

Dessa engineers have delivered a model that is not only transforming text to speech, but also recreating Joe Rogan’s style of speaking. Joe is a former MMA commentator who went on to become arguably the most popular podcaster in the world. Speaking with great emotion, heavily accenting and delivering power with every word, Rogan is hard to mistake.

Or is he? The team released a quiz that challenges the listener to distinguish if a given sample comes from a real podcast or was AI-generated. The details can be found on Dessa’s blog.

Why does it matter?

Hearing a convincing imitation of a public personality’s voice is nearly as unsettling as watching a talking head talk. But the technology can be used for entertainment and educational purposes. For example, delivering a new Frank Sinatra single or presenting Winston Churchill’s comprehensive and detailed speech on reasons behind World War II.

Again, the ethics are in the user’s hands, not in the tool. Despite that, and as we saw with OpenAI’s GPT-2 Natural Language Processing model, researchers have decided NOT to let the model go public.

Machine learning-powered translations increase trade by 10,9%

Researchers at Olin Business School at Washington University in St.Louis have found a direct connection between machine learning-powered translations and business efficiency. The study was conducted on e-Bay and shows that moderate improvement in the quality of language translation increased trade between countries on eBay by 10.9%.

The study examined the trade between English speakers from the United States and their trade relations with countries speaking other languages in Europe, America and Asia. More on the research can be found on the Washington University of St.Louis website.

Why does it matter?

While there is no doubt that AI provides vital support for business, the evidence, while voluminous, remains largely anecdotal (sometimes called anec-data) with little quantitative research to back up the claim. Until the Olin study, which does provide hard and reliable data. Is justified true belief knowledge? That’s an entirely different question…

A practical approach to AI in Finland

AI Monthly Digest #5 presented a bit about a Finnish way of spreading the word about AI. Long story short: contrary to many approaches of building AI strategy in a top-down model, Finns have apparently decided to build AI-awareness as a grassroots movement.

To support the strategy, the University of Helsinki has released a digital AI course on the foundations and basic principles of AI. It is available for free to everyone interested.

Why does it matter?

AI is gaining attention and the reactions are usually polarised – from fear of losing jobs and machine rebellion to arcadian visions of an automated future with no hunger or pain. The truth is no doubt far from either of those poles. Machine learning, deep learning and reinforcement learning are all built on certain technological foundations that are relatively easy to understand, including their strengths and limitations. The course provides good basic knowledge on these issues, which can do nothing but help our modern world.

How the regular process works

What machine learning brings to the table

Combining the models

Summary

In the 3D world, there is no Swiss Army Knife

Let us have a closer look at a few examples

1. Autonomous driving

2. Indoor scene mapping

3. Medical diagnosis

4. A 3D-enhanced 2D approach

3D data needs a mix of competencies

Summary

Text embedding

Recommendations

Threshold

Cloud deployment

The API server

The S3 bucket

The Graph updater

Summary

Neural networks- a building block for medical AI analysis

What does it take to train neural networks?

AI in medical data

How deepsense.ai already supports healthcare

The deepsense take-away

What is natural language processing?

Tackling the context

Natural Language Processing examples

Automated translation

Chatbots

Speech recognition

Sentiment analysis

Natural language generation

Text analytics

Summary

1200 questions to ask

Large GPT-2 released

What is this GPT-2 all about?

Malicious use, you say?

OpenAI partners with Microsoft

AI beats skilled human players in six-player poker

When an autonomous car is not enough

Alan Turing will be featured on 50 pound banknote

BattleNet Blade Runners

AI tackling climate change

Giants racing to produce the best image recognition

XLNet outperforms BERT

AI-powered archeology? Bring it on!

Four levels of predictive maintenance

Staying on track

As sounding brass

Endless possibilities

A bigger version of GPT-2 released to the public

Talking heads unleashed

Modern Talking – recreating the voice of Joe Rogan

Machine learning-powered translations increase trade by 10,9%

A practical approach to AI in Finland

Contact us

Locations

Let us know how we can help

Services

Resources

About us

Support

Join our community