Computer vision enables machines to perform once-unimaginable tasks like diagnosing diabetic retinopathy as accurately as a trained physician or supporting engineers by automating their daily work.
Recent advances in computer vision are providing data scientists with tools to automate an ever-wider range of tasks. Yet companies sometimes don’t know how best to employ machine learning in their particular niche. The most common problem is understanding how a machine learning model will perform its task differently than a human would.
What is computer vision?
Computer vision is an interdisciplinary field that enables computers to understand, process and analyze images. The algorithms it uses can process both videos and static images. Practitioners strive to deliver a computer version of human sight while reaping the benefits of automation and digitization. Sub-disciplines of computer vision include object recognition, anomaly detection, and image restoration. While modern computer vision systems rely first and foremost on machine learning, there are also trigger-based solutions for performing simple tasks.
The following case studies show computer vision in action.
5 popular computer vision applications
1. Diagnosing diabetic retinopathy
Diagnosing diabetic retinopathy usually takes a skilled ophthalmologist. With obesity on the rise globally, so too is the threat of diabetes. As the World Bank indicates, obesity is a threat to world development – among Latin America’s countries only Haiti has an average adult Body Mass Index reading below 25 (the upper limit of the healthy weight range). With rising obesity comes a higher risk of diabetes – it is believed that obesity comes with 80-85% risk of developing type 2 diabetes. This results in a skyrocketing need for proper diagnostics.
What is the difference between these two images?
The one on the left has no signs of diabetic retinopathy, while the other one has severe signs of it.
By applying algorithms to analyze digital images of the retina, deepsense.ai delivered a system that diagnosed diabetic retinopathy with the accuracy of a trained human expert. The key was in training the model on a large dataset of healthy and non-healthy retinas.
2. AI movie restoration
The algorithms trained to find the difference between healthy and diseased retinas are equally capable of spotting blemishes on old movies and making the classics shine again.
Recorded on a celluloid film, old movies are endangered by two factors – the fading technology of reading tapes that enable users to watch them and the nature of the tape, which degenerates with age. Moreover, the process of digitizing the movie is no guarantee of flawlessness, as the recorded film comes with multiple new damages.
However, when trained on two versions of a movie – one with digital noise and one that is perfect – the model learns to spot the disturbances and remove them during the AI movie restoration process.
Another example of the push towards digitization comes via industrial installation documentation. Like films, this documentation is riddled with inconsistencies in the symbols used, which can get lost in the myriad of lines and other writing that ends up in the documentation–and must be made sense of by humans. Digitizing industrial documentation that takes a skilled engineer up to ten hours of painstaking work can be reduced to a mere 30 minutes thanks to machine learning.
4. Building digital maps from satellite images
Despite their seeming similarities, satellite images and fully-functional maps that deliver actionable information are two different things. The differences are never as clear as during a natural disaster such as a flood or hurricane, which can quickly if temporarily, render maps irrelevant.
deepsense.ai has also used image recognition technology to develop a solution that instantly turns satellite images into maps, replete with roads, buildings, trees and the countless obstacles that emerge during a crisis situation. The model architecture we used to create the maps is similar to those used to diagnose diabetic retinopathy or restore movies.
Check out the demo:
5. Aerial image recognition
Computer vision techniques can work as well on aerial images as they do on satellite images. deepsense.ai delivered a computer vision system that supports the US NOAA in recognizing individual North Atlantic Right whales from aerial images.
With only about 411 whales alive, the species is highly endangered, so it is crucial that each individual be recognizable so its well-being can be reliably tracked. Before deepsense.ai delivered its AI-based system, identification was handled manually using a catalog of the whales. Tracking whales from aircraft above the ocean is monumentally difficult as the whales dive and rise to the surface, the telltale patterns on their heads obscured by rough seas and other forces of nature.
Bounding box produced by the head localizer
These obstacles made the process both time-consuming and prone to error. deepsense.ai delivered an aerial image recognition solution that improves identification accuracy and takes a mere 2% of the time the NOAA once spent on manual tracking.
The deepsense.ai takeaway
As the above examples show, computer vision is today an essential component of numerous AI-based software development solutions. When combined with natural language processing, it can be used to read the ingredients from product labels and automatically sort them into categories. Alongside reinforcement learning, computer vision powers today’s groundbreaking autonomous vehicles. It can also support demand forecasting and function as a part of an end-to-end machine learning manufacturing support system.
The key difference between human vision and computer vision is the domain of knowledge behind data processing. Machines find no difference in the type of image data they process, be it images of retinas, satellite images or documentation – the key is in providing enough training data to allow the model to spot if a given case fits the pattern. The domain is usually irrelevant.
https://deepsense.ai/wp-content/uploads/2019/07/5-examples-of-the-versatility-of-computer-vision-algorithms-and-applications.jpg3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2019-07-25 13:40:162023-11-11 16:37:455 examples of the versatility of computer vision algorithms and applications
With convolutional neural networks and state-of-the-art image recognition techniques it is possible to make old movie classics shine again. Neural networks polish the image, reduce the noise and apply colors to the aged images.
The first movies were created in the late nineteenth century with celluloid photographic film used in conjunction with motion picture cameras.
Skip ahead to 2018, when the global movie industry was worth $41.7 billion globally. Serving entertainment, cultural and social purposes, films are a hugely important heritage to protect. And that’s not always easy. Especially considering the fact that modern movies are produced and screened digitally, with the technology of celluloid tape fading into obsolescence.
Challenges in film preservation
The challenge and importance of preserving the cultural heritage of old movies has been underscored by numerous organizations including the European Commision, which noted that a lack of proper devices to play aging technology on could make it impossible to watch old films.
In deepsense.ai’s experience with restoring film, the first challenge is to remove distortions. Classics are usually recorded in low resolution while the original tapes are obviously aged and filled with noise and cracks. Also, the transition process from celluloid tape to digital format usually damages the material and results in the loss of quality.
By using AI-driven solutions, specifically supervised learning techniques, deepsense.ai’s team removed the cracks and black spots from the digitized version of a film. The model we produced uses deep neural networks trained on a movie with cracks and flaws added manually for training purposes. Having some films in original and broken form, the system learned to remove the flaws. An example of generated noise put on the classic Polish movie “Rejs” and the neural network’s output is displayed below.
The example clearly shows that our neural network can process and restore even a thoroughly damaged source material and make it shine again. The networks start to produce low-quality predictions when the images are so darkened and blurred that the human eye can barely recognize people in the film.
How to convert really old movies into HD
A similar training technique was applied to deliver a neural network used to improve the quality of an old movie. The goal was to deliver missing details and “pump up” the resolution from antiquated to HD quality.
The key challenge lay in reproducing the details, which was nearly impossible. Due to technological development, it is difficult for people to watch lower quality video than what they are used to.
The model was trained by downscaling an HD movie and then conducting a supervised training to deliver the missing details.
Move your mouse cursor over the image to see the difference.
The model performs well thanks to the wide availability of training data. The team could downscale the resolution of any movie, provide the model with the original version and let the neural network learn how to forge and inject the missing detail into the film.
A key misconception about delivering HD versions of old movies is that the neural network will discover the missing details from the original. In fact, there is no way to reclaim lost details because there were none on the originally registered material. The neural network produces them on the go on with the same techniques Thispersondoesnotexist and similar Generative Adversarial Networks use.
So, the source material is enriched with details that only resemble reality, but are in fact not real ones. This can be a challenge (or a problem) if the material is to be used for forensic purposes or detailed research. But when it comes to delivering the movies for entertainment or cultural ends, the technique is more than enough.
Coloring old movies
Another challenge comes with producing color versions of movie classics, technically reviving them for newer audiences. The process was long handled by artists applying color to every frame. The first film colored this way was the British silent movie “The Miracle” (1912).
Because there are countless color movies to draw on, providing a rich training set, a deep neural network can vastly reduce the time required to revive black and white classics. Yet the process is not fully automatic. In fact, putting color on the black and white movie is a titanic undertaking. Consider Disney’s “Tron,” which was shot in black and white and then colored by 200 inkers and painters from Taiwan-based Cuckoo’s Nest Studio.
When choosing colors, a neural network tends to play it safe. An example of how this can be problematic would be when the network misinterprets water as a field of grass. It would do that because it is likely more common for fields than for lakes to appear as a backdrop in a film.
By manually applying colored pixels to single frames, an artist can suggest what colors the AI model should choose.
There is no way to determine the real color of a scarf or a shirt an actor or actress was wearing when a film rendered in black and white was shot. After all these years, does it even matter? In any case, neural networks employ the LAB color standard, leveraging lightness (L) to predict the two remaining channels (A and B respectively).
Transcription and face recognition
Last but not least, transcribing dialogue makes analysis and research much easier – be it for linguistic or cultural studies purposes. With facial recognition software, the solution can attribute all of the lines delivered to the proper characters.
The speech-to-text function processes the sound and transcribes the dialogue while the other network checks which of the people in the video moves his or her lips. When combined with image recognition, the model can both synchronize the subtitles and provide the name of a character or actor speaking.
While the content being produced needs to be supervised, it still vastly reduces the time required for transcription. In the traditional way, the transcription only takes at least the time of a recording and then needs to be validated. The machine transcribes an hour-long movie in a few seconds.
Summary
Using machine learning-based techniques to restore movies takes less time and effort than other methods. It also makes efforts to preserve the cultural heritage more successful and ensures films remain relevant. Machine learning in business gets huge recognition but ML-based techniques remain a novel way to serve the needs of culture and art.. deepsense.ai’s work has proven that AI in the art can serve multiple purposes, including promotion and education. Maybe using it in art and culture will be one of 2020’s AI trends.
Reviving and digitalizing classics improves the access to and availability of cultural goods and ensures that those works remain available, so future generations will, thanks to AI, enjoy Academy-awarded movies of the past as much as, if not more than, we do now.
https://deepsense.ai/wp-content/uploads/2019/07/AI-movie-restoration.jpg3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2019-07-18 14:35:042021-10-19 13:17:39AI movie restoration – Scarlett O’Hara HD
Building maps to fit a crisis situation provides a challenge even when considering the impact of satellite imaging on modern cartography. Machine learning significantly reduces the time required to prepare an accurate map.
Crisis maps are often prepared by combining crowdsourced data, satellite and aerial imagery. Such mapping was widely used during the recent humanitarian crises brought about by the earthquake in Haiti and the floods in Pakistan in 2010.
Satellite mapping is way easier than traditional cartographic methods, but still, the main challenge is in recognizing particular objects in the image, like roads, buildings and landmarks. Getting up-to-date information about roadblocks and threats is even more essential. And that’s where machine learning-based solutions come into play.
The artificial cartographer
In this blog post we address the problem of satellite imagery semantic segmentation applied to building detection. Unlike many other approaches, we use only RGB color information and no multispectral wavebands.
Check out the demo:
1. Network architecture overview
A baseline fully-convolutional network uses a simple encoder-decoder framework to solve semantic segmentation tasks. It consists of only convolutional and pooling layers, without any fully connected layers. This allows it to make predictions on arbitrary-sized inputs. By propagating an image through several pooling layers, the resolution of feature maps is downsampled, which, due to information loss during pooling operations, results in low-resolution, coarse segmentation maps.
As an improvement over a baseline fully-convolutional network, we used skip connections from higher resolution feature maps, recreating U-Net network architecture. Thanks to those connections, grain information about small details isn’t lost in the process. Such an architecture makes it possible to learn fine-grained details which, when combined with a ResNet core encoder, significantly speeds up the training. The architecture of a segmentation neural network with skip connections is presented below. Cross entropy loss with weight regularization is used during training.
2. Network implementation
We present easy-to-understand minimal code fragments which seek to create and train deep neural networks for the semantic segmentation task. We will implement and train the network in PyTorch. Keep in mind that it’s not meant for out-of-box use but rather for educational purposes.
We present our semantic segmentation task in three steps:
Create the network
Train and save the deep learning model
Load the model and make predictions
2.1 Create the network
First we will create a module that performs convolution with ReLU nonlinearity. This is a basic building block in most convolutional neural networks for computer vision tasks. Convolution applies a set of filters to an image in order to extract specific features, while ReLU introduces nonlinearity between the linear layers of a neural network. Convolution with kernel size 3, stride 1 and padding 1 does not change a tensor’s spatial dimensions, but only its depth, while ReLU, as a pointwise operation, does not change any of the tensor’s dimensions.
class ConvRelu(torch.nn.Module):
def __init__(self, in_depth, out_depth):
super(ConvRelu, self).__init__()
self.conv = torch.nn.Conv2d(in_depth, out_depth, kernel_size=3, stride=1, padding=1)
self.activation = torch.nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.activation(x)
return x
We next implement a decoder block to increase the spatial size of the tensor. Generating images with neural networks usually involves up-sampling the tensor of low spatial resolution. Transposed convolution with stride greater than one can be imagined as inserting zeros between elements of input tensor and sliding a convolution kernel over it. This increases the tensor’s size. Bear in mind that doing this in a straightforward manner is inefficient, but conceptually it is how transpose convolution works. Real implementations avoid useless multiplications by zero and compute it as sparse matrix multiplication with weight matrix transposed from weight matrix representation of convolution operation with equal stride.
Be aware of other methods to increase spatial size used in generative neural networks. These include:
Linear resizing operation which increases the spatial size of a tensor with following convolution operation
A convolution operation to greater depth than desired with following operation which projects depth elements to spatial dimensions
Fractionally strided convolution where a kernel is slided over a tensor with fractional strides and where linear interpolation is used to share kernel weights over the elements of a tensor
Here we apply additional convolution with an ReLU nonlinearity module before the transposed convolution. While this step may not be strictly required, it can improve network performance.
class DecoderBlock(torch.nn.Module):
def __init__(self, in_depth, middle_depth, out_depth):
super(DecoderBlock, self).__init__()
self.conv_relu = ConvRelu(in_depth, middle_depth)
self.conv_transpose = torch.nn.ConvTranspose2d(middle_depth, out_depth, kernel_size=4, stride=2, padding=1)
self.activation = torch.nn.ReLU(inplace=True)
def forward(self, x):
x = self.conv_relu(x)
x = self.conv_transpose(x)
x = self.activation(x)
return x
Now let’s focus on the main network, which is intended to solve the semantic segmentation task. We follow the encoder-decoder framework with skip connections to recreate a UNet architecture. We then perform transfer learning using ResNet pre-trained on an ImageNet dataset. Below you can investigate detailed network architecture with additional information about tensor size in every layer to help you understand how network propagates input image to compute the desired output map. It’s important that there isn’t one optimal network architecture. We achieve something that works reasonably well through many attempts.
The network propagates the input tensor through the encoder while decreasing spatial resolution and increasing depth using layers from the ResNet network. The pooling layer, as well as convolution operation with stride greater than one, decreases the spatial size of a tensor. However, pooling itself does not change a tensor’s depth, which is often desired for convolution operations. In the constructor, we import a pre-trained ResNet-101 model with a torchvision module and keep only the layers, which will work as a feature extractor. After processing the image through the encoder, which transforms the input image into meaningful multi-scale representations, the decoder continues the process and transforms it into the desired semantic map. To do this, we use previously created decoder blocks. Notice that we are building a complex neural network from simpler blocks, which we either define ourselves or take from the PyTorch library. Moreover, we add skip connections – horizontal lines in a graph which connect the encoder and decoder layers by depth concatenate operation. For each pixel of the input image, the network predicts N classes (including background) with the last convolution operation with kernel size 1, which linearly projects the depth of each spatial element to another desired depth.
Keep in mind that we do not yet define loss, unlike we would do in TensorFlow, where the entire computational graph needs to be defined up front. In PyTorch, we only define the class which provides forward function. Operations used in forward pass are remembered and backward pass can be run whenever it’s needed.
As we train the network we will be loading data in batches using PyTorch data generators, which additionally shuffles the training set, normalizes input tensors and applies color (random changes in brightnest, contrast and saturation) and spatial (random flip, flop and rotation) augmentation.
While creating main network, we only need to define output depth of the last convolutional layer. There are two output classes involved in the semantic segmentation of the buildings – the probability of building pixels belonging to a building or not. Notice that necessary weights are initialized here and kept by default in CPU memory. After the output depth has been defined, we transfer all the weights to the GPU, then set the network to train mode, which results in batch normalization computing the mean and variance on each batch and updating the statistics with the moving average. Finally, we define cross entropy loss with softmax, which is included for further use during the training. Notice that the loss function doesn’t have anything in common with the network graph. We won’t freeze any pre-trained ResNet convolutional layers and train all network weights using Adam optimizer.
Now we are ready to start the training. We will train for number of epochs. During each epoch we exhaust the data-loader, which provides shuffled batches of data from the training set. We first transfer the batch of images and masks to GPU memory, then propagate every loaded batch of data through the network to get an output probability mask, calculate the loss and modify network weights during the backward pass. Notice that only here, during the execution, is there a connection between the network architecture and loss function. Unlike TensorFlow, which requires the entire computational graph up front, PyTorch offers dynamic graph creation during execution.
for epoch_idx in range(2):
loss_batches = []
for batch_idx, data in enumerate(train_dataloader):
imgs, masks = data
imgs = torch.autograd.Variable(imgs).cuda()
masks = torch.autograd.Variable(masks).cuda()
y = unet_resnet(imgs)
loss = cross_entropy_loss(y, masks)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_batches.append(loss.data.cpu().numpy())
print 'epoch: ' + str(epoch_idx) + ' training loss: ' + str(np.sum(loss_batches))
After the training, it’s time to save the model. We move the weights back to CPU memory, save the model weights and move it back again to GPU memory for further predictions.
We first create a network and load weights from the saved checkpoint. We then set the model to eval mode, so now, instead of using parameters computed over batch, mean and variance from statistics will be used for batch normalization. We propagate the image through the network without keeping a computational graph, because no backward is needed during the predictions. To make a prediction, we load and preprocess the test image, move it to GPU memory, predict the output probability mask using softmax (which during training was hidden inside the cross entropy loss function), move the predicted mask back to CPU memory and save it.
The system propagates the input image through the network, corrects the output mask and performs building segmentation. The processing consists of the following stages (described from left to right, top to bottom):
Input satellite image.
Raw output from network after softmax layer with probability scores.
Probability score map thresholded with removal of small objects and filling of small holes.
Predicted mask overlaid on top of input image.
Segmentation results.
Use cases
The solution is easily extendable to situations with more labels, such as roads, trees or rivers. In such scenarios, there are more classes in the network’s output. Raw output data can be used to speed up map-making, but after simple processing, it can also provide a user with various types of information about an area, such as average building size, occupied percentage of land, street width, number of trees etc. These features can then be used as an input for other ML models, including ones for projecting land value, emergency response or research.
https://deepsense.ai/wp-content/uploads/2019/07/Satellite-images-semantic-segmentation-with-deep-learning.jpg3371140Wojciech Mormulhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngWojciech Mormul2019-07-12 10:53:052021-01-19 16:37:40Satellite images semantic segmentation with deep learning
June brought record-breaking temperatures, perfectly highlighting the global challenge of climate change. Is that AI-related news? Check and see in the latest AI Monthly Digest.
A common misconception about machine learning projects is that they are by definition big. However, any number of AI-powered micro-tweaks and improvements are applied in everyday work. A good example of both micro and macro tweaks that can fix a major problem can be found in the paper described below.
AI tackling climate change
The world witnessed an extraordinarily hot June, with average temperatures 2 degrees celsius above normal in Europe. According to the World Meteorological Organization, the heatwave is consistent with predictions based on greenhouse gas concentrations and human-induced climate change.
Tackling this challenge will not be easy: according to World Bank Data, fossil fuel energy consumption still stacks to 79% of total. Furthermore, greenhouse gasses, particularly methane, are emitted by cattle, with livestock being responsible of 14.5% of total human-induced greenhouse emissions.
The most prominent figures in AI today, including DeepMind CEO Demis Hassabis, Turing award winner Yoshua Bengio, and Google Brain co-founder Andrew Ng, have authored a comprehensive paper on ways that AI can tackle the changing climate.
Their call for collaboration is meant to inspire practitioners, engineers and investors to deliver short- and long-term solutions for measures within our reach. Those include producing low-carbon electricity through better forecasting, scheduling, and control for variable sources of energy, mitigating the damage produced by high-carbon economies through, for example, better predictive maintenance as well as help minimize energy use in transportation, smart buildings and cities. The applications can vary from designing grid-wide control systems or optimizing scheduling with more accurate demand forecasting.
Why does it matter
Climate change is one of the greatest challenges mankind faces today, with truly cataclysmic scenarios approaching. Further temperature increases may lead to a variety of disasters, from flooding coastal regions due to melting ice caps, agricultural crises and conflicts over access to water.
Green energy promises solutions, yet these are not without their challenges, many of which could be solved with machine learning, deep learning or reinforcement learning. Responsibility is among deepsense.ai’s most important AI trends, and being responsible for the planet would be an excellent example of just why we chose to focus on that trend.
We will provide more in-depth content on climate change and AI-powered ways of tackling it. So stay tuned!
Giants racing to produce the best image recognition
If machine learning is today’s equivalent of the steam engine revolution, data and hardware are the coal and engine that power the machines. Facebook and Google are like the coal mines of yesteryear, having access to large amounts of fuel and power to build new models and experiment.
It should come as no surprise that breakthroughs are usually powered by the tech giants. Google’s state of the art in image recognition, EfficientNet, has been a recent giant step forward. The model was delivered by automated searching procedure uniformly scaling each dimension of the network in order to find the best combination.
EfficientNet stands for something.
The result is state-of-the-art in Image recognition. At least when it comes to combining efficiency and accuracy. But not when it comes to accuracy alone.
Not even a month later Facebook delivered a model that outperformed Google’s. The key lay in scaling the enormous dataset it was trained on. The social media mogul has access to Instagram’s database, which holds no less than billions of user-tagged images, a dataset ready to be chewed over by a hungry deep learning model.
The neural network was released to the public using a recently launched Pytorch Hub platform for sharing cutting edge models.
Why does it matter
Both advances show how important machine learning is for the tech giants and how much effort they invest in pushing their research forward. Every advancement in image recognition brings new breakthroughs closer. For example, models are becoming more accurate in detecting diabetic retinopathy using images of the eye. Every further development delivers new ways to solve problems that would be unsolvable without ML (Machine learning) – manufacturing for visual quality control is among the best examples.
XLNet outperforms BERT
As we noted in a past AI Monthly Digest, Google has released Bidirectional Encoder Representation from Transformations (BERT). BERT was, until recently, the state-of-the-art when it comes to Natural Language Processing benchmarks. The newly announced XLNet is an autoregressive pretraining method (as opposed to an autoencoder-like BERT) which learns a language model by predicting the next word in a sequence using the permutation of all the surrounding words. An intuitive explanation can be found (here).
The XLNet model proved more effective than BERT in beating all 20 benchmark tasks.
Why does it matter
Understanding a natural language was considered a benchmark for intelligence, with Alan Turing’s test being among the best examples. Every push forward delivers new possibilities in building new products and solving problems, be they business ones or something more uncommon, like the example below.
AI-powered archeology? Bring it on!
Deep learning-based models are getting even better at understanding natural language. But what about language that is natural, but has never been deciphered due to lack of knowledge or a frustratingly small amount of extant text?
Recent research from MIT and Google shows that a machine learning approach can deliver major improvements in deciphering ancient texts. In the basics of modern natural language processing techniques, all of the words in a given text are assumed to be related to each other. The machine itself doesn’t “understand” text it in a human way, but rather forms its own assumptions based on the relations and connotations of each word in a sentence.
Disc of Phaistos, one of the most famous mysteries of archaeology
In this approach, the translation process is not built on understanding the world, but rather finding similarly connotated words that transfer the same message. This is entirely different than humans’ approach to language.
By making the algorithm less data-hungry, the researchers deliver a model that translates texts from rare and long-lost languages. The approach is described in this paper.
Why does it matter
While there are countless examples of machine learning in business, there are also new horizons to discover in the humanities. Deciphering the secrets of the past is every bit as exciting as building defenses against the challenges of the future.
The more sophisticated approach to and possible brute-force breaking of unknown languages provides a way to uncover more language-related secrets.
Since the days of the coal-powered industrial revolution, manufacturing has become machine-dependent. As the fourth industrial revolution approaches, factories can harness the power of machine learning to reduce maintenance costs.
Visual inspection, where the output is entirely based on the inspector’s knowledge and intuition
2.
Instrument inspection, where conclusions are a combination of the specialist’s experience and the instrument’s read-outs
3.
Real-time condition monitoring that is based on constant monitoring with IoT and alerts triggered by predefined conditions
4.
AI-based predictive analytics, where the analysis is performed by self-learning algorithms that continuously tweak themselves to the changing conditions
As the study indicates, a good number of the companies surveyed by PwC (36%) are now on level 2 while more than a quarter (27%) are on level 1. Only 22% had reached level 3 and 11% level 4, which is basically level 3 on machine learning steroids. The PwC report states that only 3% use no predictive maintenance at all.
Staying on track
According to the PwC data, the rail sector is the most advanced sector of those surveyed with 42% of companies at level 4, compared to 11% overall.
One of the most prominent examples is Infrabel, the state-owned Belgian company, which owns, builds, upgrades and operates a railway network which it makes available to privately-owned transportation companies. The company spends more than a billion euro annually to maintain and develop its infrastructure, which contains over 3 600 kilometers of railway and some 12 000 civil infrastructure works like crossings, bridges, and tunnels. The network is used by 4 200 trains every day, transporting both cargo and passengers.
According to the PwC data, the rail sector is the most advanced sector of those surveyed with 42% of companies at level 4, compared to 11% overall.
The company faces both technical and structural challenges. Among them is its aging technical staff, which is shrinking.
At the same time, the density of railroad traffic is increasing – the number of daily passengers has increased by 50% since 2000, reaching 800 000. What’s more, the growing popularity of high-speed trains is exerting ever greater tension on the rails and other infrastructure.
Both the amount and the nature of the data collected render it impossible for a human to analyze, but a machine-learning powered AI solution handles it with ease. The devices are able to gather data in ultrasonic and vibration sensors and analyze them in real time. Contrary to experience-based analytics, using the devices requires little-to-no training and can be done on the go.
Endless possibilities
With the power of machine learning enlisted, handling the tremendous amounts of data generated by the sensors in modern factories becomes a much easier task. It allows the company to detect failures before they paralyze the company, thus saving time and money. What’s more, the data that is gathered can be used to further optimize the company’s performance, including by searching for bottlenecks and managing workflows.
This edition is all about AI morality-related themes, with a slight tinge of Talking Heads and Modern Talking.
Earlier this year, deepsense.ai highlighted AI morality and transparency as one of 2019’s dominant AI trends. May bore out our thesis, especially as it relates to potential misuse and malicious intent. At the same time, though, AI provides unique chances to support entertainment and education, as well as deliver new business cases.
A bigger version of GPT-2 released to the public
Open-AI has recently shown the GPT-2 model has set a new gold standard for natural language processing. Following the acclaimed success of the model, OpenAI opted not to make it public due to the risk of malicious usage, particularly to produce spam and fake news at no cost.
This sparked an uproar. The industry good practice is to release AI research work as open-source software, so other researchers can push the boundaries further without having to repeat all the work done earlier from scratch. In other words – OpenAI threw up a major hurdle to NLP-model development by keeping GPT-2 under wraps.
To support the scientific side of the equation while reducing the malicious threat, OpenAI releases some smaller-scale models to the public. The model it recently released operates on 345M parameters, while the best original model consists of 1.5B parameters. Every parameter can be seen as a virtual neuron inside a neural network, so OpenAI is basically reducing the brain it designed.
The original network was released to OpenAI partners currently working on malice-proofing the system. The first independent applications of the downscaled network are already available at talktotransformer.com and onionbot headline generator.
Why does it matter?
OpenAI is currently facing a difficult choice between supporting the global development of AI and the fear of losing control over dangerous technology. In a world facing a potential avalanche of fake news and social media being used to perpetuate propaganda, building a system that writes coherent and convincing texts is undoubtedly dangerous.
This case allows one to see all the AI-related issues in a nutshell, including the technology’s amazing potential, the real threat of misuse or malicious intent. So the case may serve as a precedent for future cases.
Talking heads unleashed
A group of scientists working for Samsung’s AI Center in Moscow and Skolkovo Institute of Science and Technology designed a model that can produce a convincing video of a talking head from a single image, such as a passport photo or even a painting.
The model renders with consistency both the background and the head’s behavior. Most impressively, the model builds a convincing video of a talking head from even a single image of the frame.
The solution is searching for a similar face that was analyzed and extracts facial features including a nose, chin, mouth and eyes. The movement of those features is then applied on the image, as shown in the video.
The results are undoubtedly impressive.
Why does it matter?
Yet another AI ethics-related issue, the talking-head technology poses the threat of deepfakes, images that show a person making statements that he or she would never make. This raises obvious questions about the malicious ways such technology could be used.
On the other hand, when deepfakes are used for special effects in popular movies, no one seems to complain and critics even weigh in with their acclaim. Some of the better-known examples come from the Star Wars franchise, particularly Rogue One, which features Leia Organa wearing the face of a young Carrie Fisher.
AI has also proved itself useful in promoting art. By leveraging this technology it is possible to deliver the talking head of Girl with a Pearl Earring or the Mona Lisa telling visitors from screens about a painting’s historical context – a great way to put more fun in art lessons for kids. Or just to have some fun seeing what a Stallone-faced Terminator would look like.
Again, AI can be used for both good and evil ends. The ethics are up to the wielder of this double-edged sword.
Modern Talking – recreating the voice of Joe Rogan
Another example of deepfake-related technology is using AI to convincingly recreate Joe Rogan’s voice. The text-to-speech technology is not a new kid on the block, yet it is easy to spot due to the robotic and inhumanely calm style of speaking. Listening to automated text-to-speech was usually boring at best while delivering the unintentional comic effects of robotic speech, all in the absence of emotion or inflection.
Dessa engineers have delivered a model that is not only transforming text to speech, but also recreating Joe Rogan’s style of speaking. Joe is a former MMA commentator who went on to become arguably the most popular podcaster in the world. Speaking with great emotion, heavily accenting and delivering power with every word, Rogan is hard to mistake.
Or is he? The team released a quiz that challenges the listener to distinguish if a given sample comes from a real podcast or was AI-generated. The details can be found on Dessa’s blog.
Why does it matter?
Hearing a convincing imitation of a public personality’s voice is nearly as unsettling as watching a talking head talk. But the technology can be used for entertainment and educational purposes. For example, delivering a new Frank Sinatra single or presenting Winston Churchill’s comprehensive and detailed speech on reasons behind World War II.
Again, the ethics are in the user’s hands, not in the tool. Despite that, and as we saw with OpenAI’s GPT-2 Natural Language Processing model, researchers have decided NOT to let the model go public.
Machine learning-powered translations increase trade by 10,9%
Researchers at Olin Business School at Washington University in St.Louis have found a direct connection between machine learning-powered translations and business efficiency. The study was conducted on e-Bay and shows that moderate improvement in the quality of language translation increased trade between countries on eBay by 10.9%.
The study examined the trade between English speakers from the United States and their trade relations with countries speaking other languages in Europe, America and Asia. More on the research can be found on the Washington University of St.Louis website.
Why does it matter?
While there is no doubt that AI provides vital support for business, the evidence, while voluminous, remains largely anecdotal (sometimes called anec-data) with little quantitative research to back up the claim. Until the Olin study, which does provide hard and reliable data. Is justified true belief knowledge? That’s an entirely different question…
A practical approach to AI in Finland
AI Monthly Digest #5 presented a bit about a Finnish way of spreading the word about AI. Long story short: contrary to many approaches of building AI strategy in a top-down model, Finns have apparently decided to build AI-awareness as a grassroots movement.
To support the strategy, the University of Helsinki has released a digital AI course on the foundations and basic principles of AI. It is available for free to everyone interested.
Why does it matter?
AI is gaining attention and the reactions are usually polarised – from fear of losing jobs and machine rebellion to arcadian visions of an automated future with no hunger or pain. The truth is no doubt far from either of those poles. Machine learning, deep learning and reinforcement learning are all built on certain technological foundations that are relatively easy to understand, including their strengths and limitations. The course provides good basic knowledge on these issues, which can do nothing but help our modern world.
https://deepsense.ai/wp-content/uploads/2019/06/AI-Monthly-Digest-9-–-the-double-edged-sword-of-modern-technology.png3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2019-06-07 13:02:112024-02-06 19:08:43AI Monthly Digest #9 – the double-edged sword of modern technology
Everything you need to know about demand forecasting – from the purpose and techniques to the goals and pitfalls to avoid.
Essential since the dawn of commerce and business, demand forecasting enters a new era of big-data rocket fuel.
What is demand forecasting?
The term couldn’t be clearer: demand forecasting forecasts demand. The process of predicting the future involves processing historical data to estimate the demand for a product. An accurate forecast can bring significant improvements to supply chain management, profit margins, cash flow and risk assessment.
What is the purpose of demand forecasting?
Demand forecasting is done to optimize processes, reduce costs and avoid losses caused by freezing up cash in stock or being unable to process orders due to being out of stock. In an ideal world, the company would be able to satisfy demand without overstocking.
Demand forecasting techniques
Demand forecasting is an essential component of every form of commerce, be it retail, wholesale, online, offline or multichannel. It has been present since the very dawn of civilization when intuition and experience were used to forecast demand.
Sybilla – deepsense.ai’s demand forecasting tool
More recent techniques combine intuition with historical data. Modern merchants can dig into their data in a search for trends and patterns. At the pinnacle of these techniques, are demand forecasting machine learning models, including gradient boosting and neural networks, which are currently the most popular ones and outperform classic statistics-based methods.
The basis of more recent demand forecasting techniques is historical data from transactions. These are data that sellers collect and store for fiscal and legal reasons. Because they are also searchable, these data are the easiest to use.
Sybilla – deepsense.ai’s demand forecasting tool
How to choose the right demand forecasting method – indicators
As always, selecting the right technique depends on various factors, including:
The scale of operations – the larger the scale, the more challenging processing the data becomes.
The organization’s readiness – even the large companies can operate (efficiency aside) on fragmented and messy databases, so the technological and organizational readiness to apply more sophisticated demand forecasting techniques is another challenge.
The product – it is easier to forecast demand for an existing product than for a newly introduced one. When considering the latter, it is crucial to forming a set of assumptions to work on. Owning as much information about the product as possible is the first step, as it allows the company to spot the similarities between particular goods and search for correlations in the buying patterns. Spotting an accessory that is frequently bought along with the main product is one example.
How AI-based demand forecasting can help a business
Demand forecasting and following sales forecasting is crucial to shaping a company’s logistics policy and preparing it for the immediate future. Among the main advantages of demand forecasting are:
Loss reduction – any demand that was not fulfilled should be considered a loss. Moreover, the company freezes its cash in stock, thus reducing liquidity.
Supply chain optimization – behind every shop there is an elaborate logistics chain that generates costs and needs to be managed. The bigger the organization, the more sophisticated and complicated its inventory management must be. When demand is forecast precisely, managing and estimating costs is easier.
Increased customer satisfaction – there is no bigger disappointment for consumers than going to the store to buy something only to return empty-handed. For a business, the worst-case scenario is for said consumers to swing over to the competition to make their purchase there. Companies reduce the risk of running out of stock–and losing customers–by making more accurate predictions.
Smarter workforce management – hiring temporary staff to support a demand peak is a smart way for a business to ensure it is delivering a proper level of service.
Better marketing and sales management – depending on the upcoming demand for particular goods, sales and marketing teams can shift their efforts to support cross- and upselling of complementary products,
Supporting expert knowledge – models can be designed to build predictions for every single product, regardless of how many there are. In small businesses, humans handle all predictions, but when the scale of the business and the number of goods rises, this becomes impossible. Machine learning models extend are proficient at big data processing.
How to start demand forecasting – a short guide
Building a demand forecasting tool or solution requires, first and foremost, data to be gathered.
While the data will eventually need to be organized, simply procuring it is a good first step. It is easier to structure and organize data and make them actionable than to collect enough data fast. The situation is much easier when the company employs an ERP or CRM system, or some other form of automation, in their daily work. Such systems can significantly ease the data gathering process and automate the structuring.
Sybilla – deepsense.ai’s demand forecasting tool
The next step is building testing scenarios that allow the company to test various approaches and their impact on business efficiency. The first solution is usually a simple one, and is a good benchmark for solutions to come. Every next iteration should be tested to see if it is performing better than the previous one.
Historical data is usually everything one needs to launch a demand forecasting project, and obviously, there are significantly less data on the future. But sometimes it is available, for example:
Short-term weather forecasts – the information about upcoming shifts in weather can be crucial in many businesses, including HoReCa and retail. It is quite intuitive to cross-sell sunglasses or ice cream on sunny days.
The calendar – Black Friday is a day like no other. The same goes for the upcoming holiday season or other events that are tied to a given date.
Sources of data that originate from outside the company make predictions even more accurate and provide better support for making business decisions.
Common pitfalls to avoid when building a demand forecasting solution
There are numerous pitfalls to avoid when building a demand forecasting solution. The most common of them include:
The data should be connected with the marketing and ads history – a successful promotion results in a significant change in data, so having information about why it was a success makes predictions more accurate. If machine learning was used to make the predictions, the model could have misattributed the changes and made false predictions based on wrong assumptions.
New products with no history – when new products are introduced, demand must still be estimated, but without the help of historical data. The good news here is that great strides have been made in this area, and techniques such as product DNA can help a company uncover similar products its past/current portfolio. Having data on similar products can boost the accuracy of prediction for new products.
The inability to predict the weather – weather drives demand in numerous contexts and product areas and can sometimes be even more important than the price of a product itself! (yes, classical economists would be very upset). The good news is that even if you are unable to predict the weather, you can still use it in your model to explain historical variations in demand.
Lacking information about changes – In an effort to support both short- and long-term goals, companies constantly change their offering and websites. When the information about changes is not annotated in the data, the model encounters sudden dwindles and shifts in demand with apparently no reason. In the reality, it is usually a minor issue like changing the inventory or removing a section from website.
Inconsistent portfolio information – predictions can be done only if the data set is consistent. If any of the goods in a portfolio have undergone a name or ID change, it must be noted in order not to confuse the system or miss out on a valuable insight.
Overfitting the model – a vicious problem in data science. A model is so good at working on the training dataset that it becomes inflexible and produces worse predictions when new data is delivered. Avoiding overfitting is down to the data scientists.
Inflexible logistics chain – the more flexible the logistics process is, the better and more accurate the predictions will be. Even the best demand forecasting model is useless when the company’s logistics is a fixed process that allows no space for changes.
Sybilla – deepsense.ai’s demand forecasting tool
AI in demand forecasting: final thoughts
Demand and sales forecasting is a crucial part of any business. Traditionally it has been done by experts, based on know-how honed through experience. With the power of machine learning it is now possible to combine the astonishing scale of big data with the precision and cunning of a machine-learning model. While the business community must remain aware of the multiple pitfalls it will face when employing machine learning to predict demand, there is no doubt that it will endow demand forecasting with awesome power and flexibility.
Ready to harness the full potential of AI for your business? Opt for our AI consulting services, and let our experts guide you.
https://deepsense.ai/wp-content/uploads/2019/05/A-comprehensive-guide-to-demand-forecasting.jpg3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2019-05-28 11:24:422023-10-12 15:18:53A comprehensive guide to demand forecasting
The April edition of AI Monthly Digest looks at how AI is used in entertainment, for both research and commercial purposes.
After its recent shift from non-profit to for-profit, OpenAI continues to build a significant presence in the world of AI research. It is involved in two of five stories chosen as April’s most significant.
AI Music – spot the discord…
While machine learning algorithms are getting increasingly better at delivering convincing text or gaining superior accuracy in image recognition, machines struggle to understand the complicated patterns behind the music. In its most basic form, the music is built upon repetitive motifs that return based on sections of various length – it may be a recurrent part of one song or a leading theme of an entire movie, opera or computer game.
Machine learning-driven composing is comparable to natural language processing – the short parts are done well but the computer gets lost when it comes to keeping the integrity of the longer ones. April brought us two interesting stories regarding different approaches to ML-driven composition.
OpenAI developed MuseNet, a neural network that produces music in a few different styles. Machine learning algorithms were used to analyze the style of various classical composers, including Chopin, Bach, Beethoven and Rachmaninoff. The model was further fed rock songs by Queen, Green Day and Nine Inch Nails and pop music by Madonna, Adele and Ricky Martin, to name a few. The model learned to mimic the style of a particular artist and infuse it with twists. If the user wants to spice up the Moonlight Sonata with a drum, the road is open.
OpenAI has rolled out an early version of the model and it performs better when the user is trying to produce a consistent piece of music, rather than pair up a disparate coupling of Chopin and Nine Inch Nails-style synthesizers.
OpenAI claims that music is a great tool with which to evaluate a model’s ability to maintain long-term consistency, mainly thanks to how easy it is to spot discord.
…or embrace it
While OpenAI embraces harmony in music, Dadabots has taken the opposite tack. Developed by Cj Carr and Zack Zukowski, Databots model imitates rock, particularly metal bands. The team has put their model on YouTube to deliver technical death metal as an endless live stream – the Relentless Doppelganger.
While it is increasingly common to find AI-generated music on Bandcamp, putting a 24/7 death metal stream on YouTube is undoubtedly something new.
Fans of the AI-composed death metal have given the music rave reviews. As The Verge notes, the creation is “Perfectly imperfect” thanks to its blending of various death metal styles, transforming vocals into a choir and delivering sudden style-switching.
It appears that bare-metal has ushered in a new era in technical death metal.
Why does it matter?
Researchers behind the Relentless Doppelganger remark that developing music-making AI has mainly been based on classical music, which is heavily reliant on harmony, while death metal, among others, embraces the power of chaos. It stands to reason, then, that the music generated is not perfect when it comes to delivering harmony. The effect is actually more consistent with the genre’s overall sound. What’s more, Databots’ model delivers not only instrumentals, but also vocals, which would be unthinkable with classical music. Of course, the special style of metal singing called growl makes most of the lyrics incomprehensible, so little to no sense is actually required here.
From a scientific point of view, OpenAI delivers much more significant work. But AI is working its way into all human activity, including politics, social problems and policy and art. From an artistic point of view, AI-produced technical death metal is interesting.
It appears that when it comes to music, AI likes it brutal.
AI in gaming goes mainstream
Game development has a long and uneasy tradition of delivering computer players to allow users to play in single-player mode. There are many forms of non-ML-based AI present in video games. They are usually based on a set of triggers that initiate a particular action the computer player takes. What’s more, modern, story-driven games rely heavily on scripted events like ambushes or sudden plot twists.
This type of AI delivers an enjoyable level of challenge but lacks the versatility and viciousness of human players coming up with surprising strategies to deal with. Also, the goal of AI in single-player mode is not to dominate the human player in every way possible.
The real challenge in all of this comes from developing bots, or the computer-controlled players, to deliver a multiplayer experience in single-player mode. Usually, the computer players significantly differ from their human counterparts and any transfer from single to multiplayer ends with shock and an instant knock-out from experienced players.
A.N.N.A. is a neural network-based AI that is not scripted directly but created through reinforcement learning. This means developers describe an agent’s desired behaviour and then train a neural network to achieve it. Agents created in this way show more skilled and realistic behaviors, which are high on the wish list of Moto GP gamers.
Why does it matter?
Applying ML-based artificial intelligence in a mainstream game is the first step in delivering a more realistic and immersive game experience. Making computer players more human in their playing style makes them less exploitable and more flexible.
The game itself is an interesting example. It is common in RL-related research to apply this paradigm in strategic games, be it chess, GO or Starcraft II for research purposes. In this case, the neural network controls a digital motorcycle. Racing provides a closed game environment with a limited amount of variables to control. Thus, racing in a virtual world is a perfect environment to deploy ML-based solutions.
In the end, it isn’t the technology but rather gamers’ experience that is key. Will reinforcement learning bring a new paradigm of embedding AI in games? We’ll see once gamers react.
Bittersweet lessons from OpenAI Five
Defense of The Ancients 2 (DOTA 2) is a highly popular multiplayer online battle arena game with two teams, each consisting of five players fighting for control over a map. The game blends tactical, strategic and action elements and is one of the most popular online sports games.
OpenAI Five is the neural network that plays DOTA 2, developed by OpenAI.
The AI agent beat world champions from Team OG during the OpenAI Five Finals on April 13th. It was the first time an AI-controlled player has beaten a pro-player team during a live-stream.
Why does it matter?
Although the project seems similar to Deepmind’s AlphaStar, there are several significant differences:
The model was trained continuously for almost a year instead of starting from zero knowledge for each new experiment – the common way of developing machine learning models is to design the entire training procedure upfront, launch it and observe the result. Every time a novel idea is proposed, the learning algorithm is modified accordingly and a new experiment is launched starting from scratch to get a fair comparison between various concepts. In this case, researchers decided not to run training from scratch, but to integrate ideas and changes into the already trained model, sometimes doing elaborate surgery on their artificial neural network. Moreover, the game received a number of updates during the training process. Thus, the model was forced at some points not to learn a new fact, but to update its knowledge. And it managed to do so. The approach enabled the team to massively reduce the computing power over the amount it had invested in training previous iterations of the model.
The model effectively cooperated with human players – The model was available publicly as a player, so users could play with it, both as ally and foe. Despite being trained without human interaction, the model was effective both as an ally and foe, clearly showing that AI is a potent tool to support humans in performing their tasks — even when that task is slaying an enemy champion.
The research done was somewhat of a failure – The model performs well, even if building it was not the actual goal. The project was launched to break a previously unbroken game by testing and looking for new approaches. The best results were achieved by providing more computing power and upscaling the neural network. Despite delivering impressive results for OpenAI, the project did not lead to the expected breakthroughs and the company has hinted that it could be discontinued in its present format. A bitter lesson indeed.
Blurred computer vision
Computer vision techniques deliver astonishing results. They have sped up the diagnosing of diabetic retinopathy, built maps from satellite images and recognized particular whales from aerial photography. Well-trained models often outperform human experts. Given that they don’t get tired and never lose their focus, why shouldn’t they?
But there remains room for improvement for machine vision, as researchers from KU Leuven University in Belgium report. They delivered an image that fooled an algorithm, rendering the person holding a card with an image virtually invisible to a machine learning-based solution.
Why does it matter?
As readers of William Gibson’s novel Zero Hour will attest, images devised to fool AI are nothing new. Delivering a printable image to confound algorithm highlights a serious interest among malicious players interfering with AI.
Examples may include images produced to fool AI-powered medical diagnostic devices for fraudulent reasons or sabotaging road infrastructure to render it useless for autonomous vehicles.
AI should not be considered a black box and algorithms are not unbreakable. As always, reminders of that are welcome, especially as responsibility and transparency are among the most significant AI trends for 2019.
https://deepsense.ai/wp-content/uploads/2019/05/AI-Monthly-Digest-8-–-new-AI-applications-for-music-and-gaming.jpg3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2019-05-09 11:34:462024-02-06 19:08:42AI Monthly Digest #8 – new AI applications for music and gaming
In our previous post, we gave you an overview of the differences between Keras and PyTorch, aiming to help you pick the framework that’s better suited to your needs. Now, it’s time for a trial by combat. We’re going to pit Keras and PyTorch against each other, showing their strengths and weaknesses in action. We present a real problem, a matter of life-and-death: distinguishing Aliens from Predators!
We perform image classification, one of the computer vision tasks deep learning shines at. As training from scratch is unfeasible in most cases (as it is very data hungry), we perform transfer learning using ResNet-50 pre-trained on ImageNet. We get as practical as possible, to show both the conceptual differences and conventions.
Wait, what’s transfer learning? And why ResNet-50?
In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.– Andrej Karpathy (Transfer Learning – CS231n Convolutional Neural Networks for Visual Recognition)
Transfer learning is a process of making tiny adjustments to a network trained on a given task to perform another, similar task. In our case we work with the ResNet-50 model trained to classify images from the ImageNet dataset. It is enough to learn a lot of textures and patterns that may be useful in other visual tasks, even as alien as this Alien vs. Predator case. That way, we use much less computing power to achieve much better result.
In our case we do it the simplest way:
keep the pre-trained convolutional layers (so-called feature extractor), with their weights frozen,
remove the original dense layers, and replace them with brand-new dense layers we will use for training.
So, which network should be chosen as the feature extractor?
ResNet-50 is a popular model for ImageNet image classification (AlexNet, VGG, GoogLeNet, Inception, Xception are other popular models). It is a 50-layer deep neural network architecture based on residual connections, which are connections that add modifications with each layer, rather than completely changing the signal.
We supplement this blog post with Python code in Jupyter Notebooks (Keras-ResNet50.ipynb, PyTorch-ResNet50.ipynb). This environment is more convenient for prototyping than bare scripts, as we can execute it cell by cell and peak into the output.
All right, let’s go!
0. Prepare the dataset
We created a dataset by performing a Google Search with the words “alien” and “predator”. We saved JPG thumbnails (around 250×250 pixels) and manually filtered the results. Here are some examples:
We split our data into two parts:
Training data (347 samples per class) – used for training the network.
Validation data (100 samples per class) – not used during the training, but needed in order to check the performance of the model on previously unseen data.
Keras requires the datasets to be organized in folders in the following way:
If you want to see the process of organizing data into directories, check out the data_prep.ipynb file. You can download the dataset from Kaggle.
1. Import dependencies
First, the technicalities. We assume that you have Python 3.5+, Keras 2.2.2 (with TensorFlow 1.10.1 backend) and PyTorch 0.4.1. Check out the requirements.txt file in the repo.
So, first, we need to import the required modules. We separate the code in Keras, PyTorch and common (one required in both).
COMMON
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline
KERAS
import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import Model, layers
from keras.models import load_model, model_from_json
PYTORCH
import torch
from torchvision import datasets, models, transforms
import torch.nn as nn
from torch.nn import functional as F
import torch.optim as optim
We can check the frameworks’ versions by typing keras.__version__ and torch.__version__, respectively.
2. Create data generators
Normally, the images can’t all be loaded at once, as doing so would be too much for the memory to handle. At the same time, we want to benefit from the GPU’s performance boost by processing a few images at once. So we load images in batches (e.g. 32 images at once) using data generators. Each pass through the whole dataset is called an epoch.
We also use data generators for preprocessing: we resize and normalize images to make them as ResNet-50 likes them (224 x 224 px, with scaled color channels). And last but not least, we use data generators to randomly perturb images on the fly:
Performing such changes is called data augmentation. We use it to show a neural network which kinds of transformations don’t matter. Or, to put it another way, we train on a potentially infinite dataset by generating new images based on the original dataset.
Almost all visual tasks benefit, to varying degrees, from data augmentation for training. For more info about data augmentation, see as applied to plankton photos or how to use it in Keras. In our case, we randomly shear, zoom and horizontally flip our aliens and predators.
In Keras, you get built-in augmentations and preprocess_input method normalizing images put to ResNet-50, but you have no control over their order. In PyTorch, you have to normalize images manually, but you can arrange augmentations in any way you like.
There are also other nuances: for example, Keras by default fills the rest of the augmented image with the border pixels (as you can see in the picture above) whereas PyTorch leaves it black. Whenever one framework deals with your task much better than the other, take a closer look to see if they perform preprocessing identically; we bet they don’t.
3. Create the network
The next step is to import a pre-trained ResNet-50 model, which is a breeze in both cases. We freeze all the ResNet-50’s convolutional layers, and only train the last two fully connected (dense) layers. As our classification task has only 2 classes (compared to 1000 classes of ImageNet), we need to adjust the last layer.
Here we:
load pre-trained network, cut off its head and freeze its weights,
add custom dense layers (we pick 128 neurons for the hidden layer),
set the optimizer and loss function.
KERAS
conv_base = ResNet50(include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(2, activation='softmax')(x)
model = Model(conv_base.input, predictions)
optimizer = keras.optimizers.Adam()
model.compile(loss='sparse_categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
PYTORCH
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.resnet50(pretrained=True).to(device)
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Sequential(
nn.Linear(2048, 128),
nn.ReLU(inplace=True),
nn.Linear(128, 2)).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters())
We load the ResNet-50 from both Keras and PyTorch without any effort. They also offer many other well-known pre-trained architectures: see Keras’ model zoo and PyTorch’s model zoo. So, what are the differences?
In Keras we may import only the feature-extracting layers, without loading extraneous data (include_top=False). We then create a model in a functional way, using the base model’s inputs and outputs. Then we use model.compile(…) to bake into it the loss function, optimizer and other metrics.
In PyTorch, the model is a Python object. In the case of models.resnet50, dense layers are stored in model.fc attribute. We overwrite them. The loss function and optimizers are separate objects. For the optimizer, we need to explicitly pass a list of parameters we want it to update.
In PyTorch, we should explicitly specify what we want to load to the GPU using .to(device) method. We have to write it each time we intend to put an object on the GPU, if available. Well…
Layer freezing works in a similar way. However, in The Batch Normalization layer of Keras is broken (as of the current version; thx Przemysław Pobrotyn for bringing this issue). That is – some layers get modified anyway, even with trainable = False.
Keras and PyTorch deal with log-loss in a different way.
In Keras, a network predicts probabilities (has a built-in softmax function), and its built-in cost functions assume they work with probabilities.
In PyTorch we have more freedom, but the preferred way is to return logits. This is done for numerical reasons, performing softmax then log-loss means doing unnecessary log(exp(x)) operations. So, instead of using softmax, we use LogSoftmax (and NLLLoss) or combine them into one nn.CrossEntropyLoss loss function.
4. Train the model
OK, ResNet is loaded, so let’s get ready to space rumble!
Now, we proceed to the most important step – model training. We need to pass data, calculate the loss function and modify network weights accordingly. While we already had some differences between Keras and PyTorch in data augmentation, the length of code was similar. For training… the difference is massive. Let’s see how it works!
Here we:
train the model,
measure the loss function (log-loss) and accuracy for both training and validation sets.
KERAS
history = model.fit_generator(
generator=train_generator,
epochs=3,
validation_data=validation_generator)
PYTORCH
def train_model(model, criterion, optimizer, num_epochs=3):
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch+1, num_epochs))
print('-' * 10)
for phase in ['train', 'validation']:
if phase == 'train':
model.train()
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
loss = criterion(outputs, labels)
if phase == 'train':
optimizer.zero_grad()
loss.backward()
optimizer.step()
_, preds = torch.max(outputs, 1)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(image_datasets[phase])
epoch_acc = running_corrects.double() / len(image_datasets[phase])
print('{} loss: {:.4f}, acc: {:.4f}'.format(phase,
epoch_loss,
epoch_acc))
return model
model_trained = train_model(model, criterion, optimizer, num_epochs=3)
In Keras, the model.fit_generator performs the training… and that’s it! Training in Keras is just that convenient. And as you can find in the notebook, Keras also gives us a progress bar and a timing function for free. But if you want to do anything nonstandard, then the pain begins…
PyTorch is on the other pole. Everything is explicit here. You need more lines to construct the basic training, but you can freely change and customize all you want.
Let’s shift gears and dissect the PyTorch training code. We have nested loops, iterating over:
epochs,
training and validation phases,
batches.
The epoch loop does nothing but repeat the code inside. The training and validation phases are done for three reasons:
Some special layers, like batch normalization (present in ResNet-50) and dropout (absent in ResNet-50), work differently during training and validation. We set their behavior by model.train() and model.eval(), respectively.
We use different images for training and for validation, of course.
The most important and least surprising thing: we train the network during training only. The magic commands optimizer.zero_grad(), loss.backward() and optimizer.step() (in this order) do the job. If you know what backpropagation is, you appreciate their elegance.
We take care of computing the epoch losses and prints ourselves.
5. Save and load the model
Saving
Once our network is trained, often with high computational and time costs, it’s good to keep it for later. Broadly, there are two types of savings:
saving the whole model architecture and trained weights (and the optimizer state) to a file,
saving the trained weights to a file (keeping the model architecture in the code).
It’s up to you which way you choose.
Here we:
save the model.
KERAS
# architecture and weights to HDF5
model.save('models/keras/model.h5')
# architecture to JSON, weights to HDF5
model.save_weights('models/keras/weights.h5')
with open('models/keras/architecture.json', 'w') as f:
f.write(model.to_json())
One line of code is enough in both frameworks. In Keras you can either save everything to a HDF5 file or save the weights to HDF5 and the architecture to a readable json file. By the way: you can then load the model and run it in the browser.
Currently, PyTorch creators recommend saving the weights only. They discourage saving the whole model because the API is still evolving.
Loading
Loading models is as simple as saving. You should just remember which saving method you chose and the file paths.
Here we:
load the model.
KERAS
# architecture and weights from HDF5
model = load_model('models/keras/model.h5')
# architecture from JSON, weights from HDF5
with open('models/keras/architecture.json') as f:
model = model_from_json(f.read())
model.load_weights('models/keras/weights.h5')
In Keras we can load a model from a JSON file, instead of creating it in Python (at least when we don’t use custom layers). This kind of serialization makes it convenient for transfering models.
PyTorch can use any Python code. So pretty much we have to re-create a model in Python.
Loading model weights is similar in both frameworks.
6. Make predictions on sample test images
All right, it’s finally time to make some predictions! To fairly check the quality of our solution, we ask the model to predict the type of monsters from images not used for training. We can use the validation set, or any other image.
Here we:
load and preprocess test images,
predict image categories,
show images and predictions.
COMMON
validation_img_paths = ["data/validation/alien/11.jpg",
"data/validation/alien/22.jpg",
"data/validation/predator/33.jpg"]
img_list = [Image.open(img_path) for img_path in validation_img_paths]
KERAS
validation_batch = np.stack([preprocess_input(np.array(img.resize((img_size, img_size))))
for img in img_list])
pred_probs = model.predict(validation_batch)
PYTORCH
validation_batch = torch.stack([data_transforms['validation'](img).to(device)
for img in img_list])
pred_logits_tensor = model(validation_batch)
pred_probs = F.softmax(pred_logits_tensor, dim=1).cpu().data.numpy()
COMMON
fig, axs = plt.subplots(1, len(img_list), figsize=(20, 5))
for i, img in enumerate(img_list):
ax = axs[i]
ax.axis('off')
ax.set_title("{:.0f}% Alien, {:.0f}% Predator".format(100*pred_probs[i,0],
100*pred_probs[i,1]))
ax.imshow(img)
Prediction, like training, works in batches (here we use a batch of 3; though we could surely also use a batch of 1). In both Keras and PyTorch we need to load and preprocess the data. A rookie mistake is to forget about the preprocessing step (including color scaling). It is likely to work, but result in worse predictions (since it effectively sees the same shapes but with different colors and contrasts).
In PyTorch there are two more steps, as we need to:
convert logits to probabilities,
transfer data to the CPU and convert to NumPy (fortunately, the error messages are fairly clear when we forget this step).
And this is what we get:
It works!
And how about other images? If you can’t come up with anything (or anyone) else, try using photos of your co-workers. :)
Conclusion
As you can see, Keras and PyTorch differ significantly in terms of how standard deep learning models are defined, modified, trained, evaluated, and exported. For some parts it’s purely about different API conventions, while for others fundamental differences between levels of abstraction are involved.
Keras operates on a much higher level of abstraction. It is much more plug&play, and typically more succinct, but at the cost of flexibility.
PyTorch provides more explicit and detailed code. In most cases it means debuggable and flexible code, with only small overhead. Yet, training is way-more verbose in PyTorch. It hurts, but at times provides a lot of flexibility.
Transfer learning is a big topic. Try tweaking your parameters (e.g. dense layers, optimizer, learning rate, augmentation) or choose a different network architecture.
Have you tried transfer learning for image recognition? Consider the list below for some inspiration:
Pick Keras or PyTorch, choose a dataset and let us know how it went in the comments section below :)
https://deepsense.ai/wp-content/uploads/2019/04/keras-vs-pytorch-avp-transfer-learning.jpg3371140Piotr Migdalhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngPiotr Migdal2018-10-03 17:21:202024-03-17 18:48:21Keras vs. PyTorch: Alien vs. Predator recognition with transfer learning
Artificial intelligence is advancing various industries, including healthcare and the pharmaceutical industry. According to Accenture data, key clinical health AI applications can potentially create $150 billion in annual savings for the United States healthcare sector by 2026.
The numbers show that the healthcare industry will heavily leverage the possibilities provided by machine learning. That’s why AI companies are getting involved in various activities in the treatment process, from diagnosis to therapy and drug development.
Another healthcare segment that is heavily dependent on data is drug discovery.
The potential of AI in drug discovery
Computational solutions in drug discovery help significantly reduce the cost of introducing drugs to the market. Grand View Research and its new 2018 report implies that global drug discovery informatics market size was estimated at $713.4 million in 2016 and it is anticipated to progress at a CAGR (Compound Annual Growth Rate) of 12.6% by 2025. With artificial intelligence being used in drug discovery, the market’s value is growing rapidly. In its Global Artificial Intelligence in Drug Discovery Market Size Analysis, 2018-2028, Bekryl indicates that AI has the potential to create $70 billion in savings in the drug discovery process by 2028.
The technological and paradigm shift to machine learning seen in the pharmaceutical industry enables researchers to use novel computational algorithms to support the process. As biomedical data are highly complex, using algorithms in designing new drugs has become more possible than it has ever been. Machine learning can enhance many stages of the drug discovery process:
investigating the effect of a drug – both in basic preclinical research and clinical trials, in which a lot of biomedical data is produced. Finding new patterns in those data can be facilitated by machine learning.
There are different kinds of data, including genetic and imaging ones. Each of them can be analyzed with machine learning and further used to build novel solutions for drug discovery.
Challenges in machine learning for drug discovery
Ensuring drug safety is one of the main challenges in the drug discovery process. Interpreting information of the known effects of drugs and predicting their side effects are complex tasks. Scientists and engineers from research institutions and pharmaceutical companies like Roche and Pfizer have been trying to use machine learning to get meaningful information from clinical data obtained in clinical trials. Interpretation of this data in the context of drug safety is an active area of research.
Clinical trials are the most expensive stage of drug development. To reduce their costs, it is crucial to use the experience gained during previous clinical trials in the early stages of drug development. This can be achieved in two steps:
biomedical data from research experiments could be analyzed and interpreted using machine learning to predict a drug’s effects and side effects;
data from clinical trials analyzed with machine learning should support the interpretation of biological data.
With those two approaches developed simultaneously, it is possible to design better preclinical experiments to come up with the most effective therapies with the fewest side effects.
Integrating biomedical data with computational approaches
Machine learning could help optimize therapy by integrating biomedical and clinical data with computational models, and can be used to build software to test drugs and combinatorial therapies. Some computational models and approaches which support the integration of clinical data are still under development but there are also a few very good examples of successful data integration in biology and medicine.
For example, there are a number of machine learning methods for integrating genetic regulatory networks and pathway information. This can be used to predict their biological functions and efficient Python-based implementation of bioinformatic tools and approaches that are easy to interface with broadly used machine learning packages.
Genetic data analysis and personalized medicine
Many pharmaceutical companies and startups are focused on genetic data interpretation and personalized medicine. Understanding the patient’s genetic profile helps to offer appropriate drugs and therapy. Building computational approaches to analyze genetic data and propose novel therapies could be advanced with machine learning. There are only a few examples that impact current clinical practice based on machine learning solutions which bring huge potential to personalized medicine and drug discovery. They include discovering novel biomarkers of drug response and machine learning-based computational tools used in clinical practice. Such tools are used to estimate the resistance to individual drugs and to combinatorial therapies based on genotype analysis.
One of the possible approaches is based on interpreting the genetic code as a one dimensional image and then applying a standard machine learning algorithm. The data is then scoured for patterns and anomalies, just as has been done in various other deepsense.ai image recognition projects. Analyzing the genomics may be in fact done in the same way as it is applied to classical paintings, when it comes to finding a hand or any other element. For the algorithm, the nature or shape of an image to analyze is irrelevant, so the machine is equally effective at analyzing a one-dimensional DNA chain or any other type of image data
Because genomic data is usually presented as a string of letters, it is also possible to apply Natural Language Processing techniques. One advantage of doing so is that it broadens the area the algorithm is able to process. That may be important when particular changes or patterns are being sought, or the pattern to find consists of a longer sequence of genes.
Innovative startups, like Cambridge Cancer Genomics, use machine learning to analyze data gained from liquid biopsy, a diagnostic technology in which circulating tumor cells or cell-free DNA is collected from blood samples. Although it is not a fully standardized approach for cancer therapy monitoring, it is highly anticipated in personalized medicine due to its ability to acquire genetic data in time series during treatment. Applying machine learning to better understand those data and to answer the question of why cancer evolves could help scientists design less toxic therapies.
Building and getting insight from databases and datasets
Scientists use public repositories of clinical data to tackle big problems in clinics to help medical doctors in their everyday work, as medical knowledge can be extracted from public repositories. These repositories could also be used for drug discovery purposes to include clinical information in the early stage of drug development.
Attempts have been made to represent medical knowledge using deep neural networks. Data mapped with machine learning might also be easier to integrate with biomedical data analyzed with machine learning, thanks to better compatibility in the data structures generated with similar approaches.
New achievements in building databases for machine learning purposes are also promising. For example, the authors of the paper “integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)” developed a database for highly flexible queries and analysis of cancer-related data across multiple data types and multiple studies. However, there are many problems in medicine and drug discovery which are very difficult to answer only on the basis of public data, of which there is a paucity if better machine learning models and approaches are to be developed.
If proper datasets are to be built to answer specific scientific questions, it is not only the way in which data is preprocessed that must be understood, but also the principles of using different bioinformatic tools and interdisciplinary knowledge in biomedicine and where computer science and medicine converge. Teams which have this knowledge and skills could help to make better use even of limited amounts of data from public repositories. Machine learning engineers usually get data from scientists, medical doctors, pharmaceutical companies and hospitals, thus the amount is limited. But models must be strong for results to be achieved.
One of the best examples of designing a model of superior strength, one that can deal with the lack of proper data, was deepsense.ai’s Right Whale Recognition engine. The model was designed to recognize an individual Right Whale in a photograph, even if there were only a few photos provided in the dataset.
To get deep insight from data, close cooperation and mutual understanding of different languages and disciplines is needed. That is difficult if there is only the occasional consultation.
Standard machine learning approaches for genetics and genomics
Standard supervised, semi-supervised and unsupervised machine learning algorithms are applied to analyze genetic data like microarray or RNA-seq expression data. To understand how, read “machine learning in genetics and genomics”. These algorithms can reveal disease and healthy phenotypes and could be further used to uncover the mechanisms of action of drugs. In any application of machine learning methods, the researcher must decide which data to provide as input to the algorithm to answer complex biomedical questions.
There are a number of comprehensive reviews summarizing the use of large-scale analysis of genomic data and machine learning strategies to solve genomic sequencing problems, like finding specific regions in sequences and recognizing locations of transcriptomic sites. It is one of the biggest challenges in genomics with practical applications.
Machine learning has potential for this application, though the results produced with machine learning algorithms should be validated with data from laboratory experiments or clinical trials. Deep learning algorithms could be useful in genome interpretation and analysis of genetic variants, a complex task that requires a combination of robust biological data and clinical knowledge.
Recently scientists and engineers have taken a step toward better understanding the human genome thanks to machine learning. Supervised heterogeneous ensemble methods can significantly improve our ability to address difficult biomedical prediction problems. Still, the application of machine learning algorithms to genomic problems is in a nascent stage. After all, genomic and genetic data are multidimensional and there remains a need to develop probabilistic machine learning algorithms for their analysis.
Machine learning approaches for network analysis of biomedical data
Analysis of genetic data could be helpful in elucidating genetic networks, which can reveal a drug’s mechanism of action and help understand how diseases work. This falls within the scope of an emerging new discipline called network medicine. The Barabasi group, a pioneer in network medicine, states that an unsupervised network-based approach enables the prediction of novel drug-disease associations, which offer significant opportunities for finding new applications for drugs and predicting potential side effects.
The group also found that the therapeutic effect of drugs might be localized in a small network neighborhood. This means that several genes in close network proximity of genes related to the mechanism of a disease could be targeted to effectively treat the disease.
Analyzing genetic network data with machine learning could help in finding novel targets for drugs and predict the optimal combination of drugs. There are research papers that explain how to benchmark machine learning for biological network analysis. One is “machine learning-assisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks.” This study shows usage of support vector machine (SVM) models combined with machine learning-assisted network inference (MALANI) to identify cancer-associated gene pairs. These can be used to reconstruct cancer networks to identify key cancer genes in high-dimensional data space that would otherwise go undetected by conventional approaches. These algorithms should be equally applicable to other machine learning and feature selection approaches. There is also a tutorial by Stanford lecturers which shows the basics of how to use deep learning approaches to analyze biological networks. However, for analysis of complex biological networks, non-standard machine learning algorithms are still being developed and network and machine learning approaches need better integration.
Machine learning algorithms in image analysis for drug discovery
The article Machine learning and image-based profiling in drug discovery presents how image-based screening of high-throughput experiments, in which cells are treated with drugs, could help elucidate a drug’s mechanism of action. It is mentioned that unsupervised and simple statistical inference methods seem to be in favor for analyzing image data from large-scale profiling experiments, but complex biological phenotypes and single-cell experiments could be successfully classified with supervised algorithms.
The recently explored application of supervised learning in image-based profiling, particularly deep neural networks, might be a novelty detection framework to identify unexpected phenotypes revealed in the drug discovery process. With deep learning it is possible to predict the properties of a molecule only from its structure. The technique requires using a convolutional neural network that is able to extract the shape of a molecule and then confront it with the information gathered about the properties.
Novel machine learning algorithms under way
Research on quantum machine learning shows that this approach should be useful for finding complex patterns in data. As biological and medical data are complex, probabilistic quantum machine learning algorithms represents a real opportunity to understand them better. Innovative pharmaceutical companies like Amgen or startups like ProteinQure have moved to apply quantum computing and quantum machine learning to drug discovery, while focusing these efforts mainly on predicting the structure of new drugs. Finally, genomics and systems biology are two important areas in which novel machine learning algorithms can be applied with a view to producing less toxic drugs based on the profound analysis of biomedical data.
The text was written in collaboration with Anna Kornakiewicz, an independent data scientist and researcher as a consultant.
https://deepsense.ai/wp-content/uploads/2019/02/machine-learning-drug-discovery-1140x337.png3371140Konrad Budekhttps://deepsense.ai/wp-content/uploads/2023/10/Logo_black_blue_CLEAN_rgb.pngKonrad Budek2019-02-28 16:49:582024-02-26 00:14:42Machine learning in drug discovery