deepsense.ai
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI advisory
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
How to perform self-supervised learning on high-dimensional data

How to perform self-supervised learning on high-dimensional data

October 28, 2022/in Computer vision, Artificial Intelligence, Machine learning /by Łukasz Kuśmierz

In this post, the self-supervised learning paradigm is discussed. This method of training machine learning models is emerging nowadays, especially for high-dimensional data. In order to focus the attention of this article, we will only work on examples from the computer vision area. However, the methods presented are general and may be successfully used for problems from other domains as well.

Introduction

As data sets grow and tasks become more complicated, supervised learning and reinforcement learning approaches turn out to be harder to apply efficiently. The reason is that the feedback signal needed during training is becoming increasingly hard to obtain.

In this post, I present another learning paradigm which is free of this problem – self-supervised learning. This method of training machine learning models is emerging nowadays, especially for high-dimensional data.

In order to focus the attention of this article, we will only work on examples from the computer vision area. However, the methods presented are general and may be successfully used for problems from other domains as well.

The challenge of labeling in supervised learning and reinforcement learning

Current deep learning techniques work great in tasks where there is plenty of data that comes with a feedback signal. In supervised tasks, the feedback comes in the form of error signals computed based on the labels attached to each data point. Thus, we have to specify the correct output for a given input.

In reinforcement learning, a scalar feedback signal is provided from the environment. In this case we do not have to state how the agent should behave, but we should be able to assess how the agent has performed. This is usually done in the context of tasks that have a temporal aspect — the AI agent acting in its environment over multiple time steps must adjust its behavior based on possibly delayed and sparse reward signals.

The problem with the supervised approach is that it does not scale very well. It may be relatively easy to label 100 images, but usually we need thousands or millions of labeled examples before the model can learn the nuances of the task. All practitioners can probably confidently say that there are hardly ever enough labeled data points.

How to perform self-supervised learning on high-dimensional data - sheeps

Source: SOLO paper. If we want to train a good model to segment sheeps using supervised learning, we have to color every single sheep manually in hundreds of pictures.

In the context of reinforcement learning, it may be relatively easy to obtain many samples (episodes) from virtual environments, like with chess, Go, or Atari games. However, this is becoming much more difficult in the “wild” with an actual physical agent interacting with the real world. Not only is the environment richer and noisier, but it may not be feasible to obtain many episodes with bad actions (think about self-driving cars or AI controlling nuclear power plants).

This is one of the reasons why we almost always use transfer learning. Here, we take a model that was pre-trained on another dataset (e.g., in computer vision the standard practice is to use ImageNet) and use it as a starting point for training on our dataset. Note that ImageNet is in itself a fairly large  labeled dataset. But there is so much more digital data available!

Could we somehow benefit from that data without laborious and time-consuming labeling? I will try to answer this question later in this article.

Pretext tasks

In the absence of labels, we do not have any clear task that can be written in terms of a cost function. How can we then learn from data itself?

The general strategy is to define an auxiliary, pretext task that gives rise to a self-supervised learning (SSL) signal. To do so, in general we must ask the model to predict some aspects of input data, possibly given some corrupted version of that data. Perhaps the most straightforward idea is to work in the input space and to ask the model to generate part of the input tensor given an input that had this part of the input masked or replaced. Such SSL methods are known as generative methods and have been a big hit in the context of natural language processing, where the masked words (or tokens) are predicted given the context defined by the surrounding text.

Similar ideas have been developed in the context of computer vision and other modalities.

For example, deep belief networks are a generative model that jointly learns to map inputs to latent representations and to generate the same inputs given the latent representation, whereas masked autoencoders are tasked with reconstructing patches (pixels) that have been randomly masked from the input image, which is achieved given the context (non-masked pixels). Although such methods can be effective, generating images, sounds, videos, and other high-dimensional objects that feature a lot of variability is a rather difficult task. It would be great if we could come up with a simpler pretext task to achieve the same goal…

Useful representations

Wait, but what do we want to achieve anyway? As mentioned before, we want to pretrain our model on unlabeled data and then use it in other (“downstream”) tasks where available labeled data is limited. Therefore, the model should use the unlabeled data to learn something useful about that data: something that is transferable to other similar datasets.

One way of looking at this is to say that we want the model to take an input \(x\) (e.g., an image or a video clip) and output a vector \(z\) that represents the input in some useful way. Following the literature, we will refer to this vector as representation or embedding. Of course, the crucial thing here is to specify what “useful” means; in general, this will depend on the downstream task.

Fortunately, many tasks share some common features that can be utilized to assess whether a given representation is useful. Indeed, classification, detection, and segmentation tasks in computer vision or speech recognition are characterized by important invariances (e.g., I can recognize an object regardless of its position in the image) and equivariances (e.g., a detection box should encompass the entire object regardless of its position in the image). Notice that I did not specify whether the object is a dog or a pineapple — in this sense these are very general features that do not depend on many other details of the task.

Augmentations

Ok, but how can we translate invariances into good representations? The basic idea is to ensure that the representation features the desired invariances. To do so, we should first state the invariances – the easiest way of doing it is to list, in the form of a procedural definition, transformations that our desired representation should be invariant to.

These transformations can be implemented in the form of a function that takes an input \(x\) and outputs a new (possibly random) view \(x’\). Note that the same procedure is used as part of a standard supervised learning pipeline where it is referred to as data augmentation. In practice, when working with images one could for example use albumentations or the torchvision.transforms module of PyTorch.

In this article I focus on computer vision where input is a static image, but the same methods can be adapted to other modalities including cross-modal self-supervision with, say, videos that contain both sound and a series of images. The crucial difference in how to deal with these other inputs lies in defining a good set of augmentations.

Invariance term: squeeze representations together

The next step is to formalize our intuition described above in the form of a cost function. As a reminder, our goal is to ensure that the representation (output of the model) is invariant under our chosen set of augmentations. To this end, let us take two views of the same input image \(x^A\) and \(x^B\) and pass them through the model obtaining a pair of joint embeddings \(z^A\) and \(z^B\). Next, we calculate the cosine similarity between these two representations $$\mathrm{sim}(z^A, z^B) \equiv \cos{(\phi)} =\frac{z^A \cdot z^B}{\Vert z^A\Vert \Vert z^B \Vert}.$$ Ideally, we should have \(\phi=0\) (hence, \(\cos{\phi}=1\)), so we want to minimize the cost function of the form \(l_{sim} = -\mathrm{sim}(z^A, z^B)\) which we will call invariance or similarity cost. As usual, this cost should be averaged over all images in the batch, leading to $$\mathcal{L}_{sim} = -\frac{1}{N}\sum_{i=1}^N \mathrm{sim}(z_i^A, z_i^B)$$ As the similarity cost decreases, representations of different views of the same image are pressed together, ultimately leading to a model that produces representations that are invariant under the set of transformations used to augment our dataset. However, this alone will not work as a good representation extractor.

Collapse

It is easy to understand why the similarity cost by itself is not enough. Take the following rather boring model

$$
f(x) = z_0,
$$

which ignores the input and always outputs the same representation (say, \(z_0 = [1,1,…,1]\)). Since this is the simplest solution to the optimization problem defined by \(\mathcal{L}_{sim}\) and “everything which is not forbidden is allowed”, we can expect such undesirable solutions to appear frequently as we optimize \(\mathcal{L}_{sim}\).

This is indeed what happens and this phenomenon is called a (representation) collapse. It is useful to think about the current self-supervised learning techniques in terms of how they avoid collapse. From this perspective, there are two main categories of SSL methods: contrastive and regularization-based. Below we describe in detail two popular examples, each relatively simple but still representative of its category.

Projection head

There is an additional detail: it is not actually \(z\) that is being used in downstream tasks. Instead, as it turns out, it is more beneficial to use an intermediate representation \(h\), see Fig. 1. In other words, the projection head \(g\) that is used to calculate \(z = g(h)\) is thrown away after the training. The intuition behind this trick is that the full invariance is actually detrimental in some tasks. For example, it is great if our model can report “dog” even if only the dog’s tail is visible in the image, but the same model should also be able to output “tail” or “dog tail” if it is asked to do so.

Fig. 1: A schematic diagram of a self-supervised training pipeline based on augmentations and joint embeddings. Although this image is taken from the SimCLR paper, the same overall strategy is employed in both contrastive and non-contrastive SSL methods.

Fig. 1: A schematic diagram of a self-supervised training pipeline based on augmentations and joint embeddings. Although this image is taken from the SimCLR paper, the same overall strategy is employed in both contrastive and non-contrastive SSL methods.

Contrastive learning (SimCLR)

Contrastive learning methods can be thought of as generating supervision signals from a pretext discriminative task. In the past few years there has been an explosion of interest in contrastive learning and many similar methods have been developed. Here, let us focus on a famous example, SimCLR, which stands for “a simple framework for contrastive learning of visual representations”.

Indeed, the algorithm is pretty straightforward.

  • First, take a batch of images \((x_i)_{i\in\{1,..,N\}}\) where batch size \(N\) should be large.
  • Second, for a given input image \(x_k\) generate (sample) two views, \(\tilde{x}_i\) and \(\tilde{x}_j\). Note that this gives us a new, extended batch of augmented images of size \(2 N\).
  • Third, apply the same base encoder \(f\) and projection head \(g\) to each sample in the extended batch obtaining “useful” representations \(h_i = f(\tilde{x}_i)\) and “invariant” representations \(z_i = g(h_i)\).
  • Fourth, optimize \(f\) and \(g\) jointly by minimizing the contrastive loss \(\mathcal{L}_{InfoNCE}\).
  • Last, throw away \(g\) and use \(f\) in the downstream task(s).

But what is \(\mathcal{L}_{InfoNCE}\)? In the original paper this loss function was termed NT-Xent for the “normalized temperature-scaled cross entropy loss”, but it is basically a version of InfoNCE loss introduced in the Contrastive Predictive Coding paper, which in itself is a special case of noise-contrastive estimation. The main idea here is to split the batch into positive and negative pairs. Positive pairs are two different views of the same image and, as discussed above, their representations should be close to each other. The crucial idea is that all the other (“negative”) pairs are treated as non-matching pairs whose representations should be pulled apart. Note that this approximation makes sense only if the dataset is rich enough and contains many categories. In this case the likelihood that two randomly chosen images represent the same object (or two very similar objects) is small.

How to pull negative pairs apart? In SimCLR this is achieved by the following loss

$$
\mathcal{L}_{InfoNCE} = -\frac{1}{N}\sum_{i, j=P(i)} \log\frac{\exp\left(\mathrm{sim}(z_i, z_j)\right/\tau)}
{\sum_{k\neq i}\exp\left(\mathrm{sim}(z_i, z_k)\right/\tau)},
$$

where \(P(i)\) returns the index of the other view of the same image (positive “partner”) and \(\tau\) is a “temperature” hyperparameter that is introduced to adjust how strongly hard negative examples are weighed. One can think of this loss as a cross-entropy loss for multi-class classification with a softmax layer or, in other words, as a multinomial logistic regression.

The pretext task can be then summarized as follows: given a view of an image \(\tilde{x}_i\), find the other view of the same image among the set containing all the other \(2N – 1\) views of images in the extended batch. It is also easy to see that this loss function can be decomposed as

$$
\mathcal{L}_{InfoNCE} = \mathcal{L}_{sim} + \mathcal{L}_{con},
$$

where \(\mathcal{L}_{sim}\) is the familiar similarity term discussed above

and \(\mathcal{L}_{con}\) is a contrastive term that pulls all representations in the batch apart.

Additional details:

  • The base encoder can be any differentiable model. The authors of the original paper have opted for variants of ResNet-50 as this neural network has emerged as the standard architecture used to compare different methods.
  • In SimCLR the projection head is a simple multilayer perceptron with a single hidden unit. The dimensionality of \(z\) does not have to be very large but it is important for the projection head to be nonlinear.
  • In the original paper the authors have presented the results of systematic experiments aiming to find the best set of augmentations: see Fig. 2 that shows the augmentations studied in the paper. They found that no single transformation is enough to learn good representations. The best results among pairs of transformations were obtained by combining random cropping with random color distortion. Interestingly, the authors have also included random Gaussian blur in their standard pipeline.
  • This method strongly benefits from relatively large batch sizes and long training sessions.
  • Some interesting limitations of this and other contrastive methods are discussed in this paper. If many objects are present in images, the dominant object may suppress the learning of statistics of smaller objects. Similarly, easy-to-learn shared features may suppress the learning of other features.
Fig. 2: Augmentations studied in the SimCLR paper.

Fig. 2: Augmentations studied in the SimCLR paper.

Noncontrastive methods (Barlow Twins)

Noncontrastive methods avoid collapse without relying on negative pairs. This class is quite diverse and includes methods such as BYOL and SimSiam, which break the symmetry between two branches that generate two views (and their representations) of input images, as well as methods based on clustering like ClusterFit or SwAV.

Another idea is to minimize the redundancy between components of \(z\). The reduction of redundancy is the cornerstone of the efficient coding hypothesis, a theory of sensory coding in the brain proposed by Horace Barlow, hence the name Barlow twins. Here, the cost function is based upon the cross-correlation matrix \(\mathcal{C}\) of size \(M\times M\), where \(M\) is the number of representation neurons (dimensionality of \(z\)). As before, two views of each image in the batch are passed through the network leading to two representations per image, \(z^A\) and \(z^B\).

Previously we have used the notation \(z_i\) to denote the representation of an image \(i\). To better understand Barlow Twins, we have to extend our notation. Let \(z_{i,\alpha}\) denote the \(\alpha\)-th component (neuron) of vector \(z_{i}\) and \(\overline{z_{\alpha}}=(1/N)\sum_{i}z_{i,\alpha}\) the batch average of that component. Each component can be z-scored (normalized) over the batch

$$
u_{i,\alpha} = \frac{z_{i,\alpha} – \overline{z_{\alpha}}}{\sqrt{\overline{z_\alpha^2}-\overline{z_{\alpha}}^2}}.
$$

The cross-correlation matrix is defined as

$$
\mathcal{C}_{\alpha\beta}
=
\overline{u^A_{\alpha} u^B_{\beta}}.
$$

Note that only positive pairs are averaged over the batch here.

The loss is then defined as

$$
\mathcal{L}_{BT}
=
\sum_{\alpha} \left( (1 – \mathcal{C}_{\alpha\alpha})^2
+
\lambda \sum_{\beta\neq\alpha} \left(\mathcal{C}_{\alpha\beta}\right)^2
\right).
$$

The contrastive term is absent from this loss and collapse is avoided due to a different mechanism, which can be understood by analyzing two terms in the Barlow Twins loss. The first, invariance term is trying to push all the diagonal terms of the cross-correlation matrix towards \(1\) (perfect correlation).

Components of \(z^A\) and \(z^B\) are perfectly correlated when \(z^A\) and \(z^B\) are identical, as desired from the invariance principle. The second, redundancy reduction term is trying to decorrelate different neurons (components of \(z\)). This has the effect of the output neurons to contain non-redundant information about the inputs, leading to non-trivial representations.

Additional details:

  • Barlow Twins do not need nearly as large batch sizes as SimCLR.
  • Unlike SimCLR, Barlow Twins benefit very strongly from a large dimensional output (invariant) representation \(z\).

Summary and additional reading

Self-supervised learning is here to stay to complement supervised learning and reinforcement learning whenever getting enough labels or feedback signals from the environment becomes troublesome. As we saw, the key to beneficial training in a self-supervised manner is to smartly define the pretext task and to set the loss function carefully.

For those who would like to deepen their knowledge of this topic, I recommend the blog article on self-supervised learning written by Yann LeCun and Ishan Misra.

https://deepsense.ai/wp-content/uploads/2022/10/How-to-perform-self-supervised-learning-on-high-dimensional-data.jpeg 337 1140 Łukasz Kuśmierz https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Łukasz Kuśmierz2022-10-28 10:00:502022-10-28 12:45:26How to perform self-supervised learning on high-dimensional data
Five solid reasons to outsource your AI software development

Five solid reasons to outsource your AI software development

August 21, 2022/in Artificial Intelligence /by deepsense.ai

Customized AI software development is one of the most powerful approaches to leveraging AI in business. It brings a true competitive advantage by implementing solutions tailored to the specific challenges and business needs of the enterprise. Undoubtedly, the main difficulty of this approach is the ability to successfully develop and implement AI projects. Cooperation with an experienced AI vendor, who is responsible for end-to-end delivery, is one of the most effective solutions, bringing with it a number of benefits.

Cost reduction

One of the main drivers influencing the popularization of AI software outsourcing is the significant cost reduction. The experience and know-how of the AI vendor ensure a transparent approach to project planning, implementation and related costs. Moreover, savings also come from reducing expenditure on hardware, workspace, training and employment. The client receives full support starting from needs identification to commercial deployment, as well as access to top tech talents.
At deepsense.ai we are flexible when it comes to the cooperation model. The first two are focused on a fixed price or time & material approach designed for customers who have clearly defined goals and requirements related to customized AI software development. In this case, the deepsense.ai team is responsible for the implementation of a specific solution or a component thereof. Where less specific projects are concerned – where in the first place the business use case needs to be identified and described in the data science language – the team augmentation approach is a better solution. It allows business owners to control costs at every stage and flexibly decide on the further development of the project. deepsense.ai’s people support the client’s internal team on a daily basis, as regular peers, and provide knowledge transfer. Such cooperation significantly increases the in-house software development capacity.

Flexibility

The diversity of AI and ML applications requires a plethora of experience and skills. Outsourcing provides a great deal of flexibility and confidence in testing new approaches. Even companies with in-house data science teams which are willing to execute multiple ML projects would need to spend a lot of time preparing and training for each implementation. Instead, they can work with an experienced vendor providing know-how about the latest technologies and possible solutions. At deepsense.ai, with over 100 world-class full stack developers, data scientists and software engineers on board, we provide effective support in the area of AI software development. Our people are equipped with the full tech stack required to help clients reach their milestones quicker, and ready to work hand-in-hand with internal business and technical teams.
Flexibility can be also provided by the agile approach to developing AI software. At deepsense.ai we work on the basis of CRISP methodology – a process model including six stages of running AI projects that leads to the right solution when repeated over and over again. This concept involves an adaptable attitude, where priorities and pathways change as project development progresses.

Access to top tech talents

The majority of enterprises are faced with a huge shortage of skilled AI talent. In particular, companies whose core operation is not related to AI may have a problem attracting world-class experts. Cooperation with an experienced AI vendor gives enterprises access to top tech talents without incurring the costs of recruitment and employment. Moreover, people with highly specific tech competencies can jump on board only when needed, for example just to streamline the process of data analysis or model training. At deepsense.ai we put great emphasis on acquiring and retaining the most talented AI/ML and data science experts. Our team members have backgrounds from top European technical faculties and have won international awards. Moreover, they develop tech competences not only through the implementation of commercial projects, but also through their involvement in the scientific development of AI/ML, academic activity and participation in prestigious data science competitions.

Focus on business value delivery

At deepsense.ai, the overriding goal of customized AI software development is not to deliver the software itself, but above all to provide the client with real business value. The continued success of a project depends on how deeply the given business use case is understood. That’s why at deepsense.ai we focus on close relations with business owners. From the very beginning of cooperation with clients, the entire team dedicated to the project participates in the discussion on business needs, possible solutions, and available data. This ensures that all team members have a broader awareness of the purpose, know the limitations, and are able to contribute both technically and conceptually. This approach makes it possible to efficiently determine the scope of cooperation, focus on low-hanging fruit first, and set up success metrics. The external team is independent and can objectively assess the chances of delivering business value in a given area. Such confidence combined with experience and agility in training ML models offers the best chance of success. As a result, clients remain focused on the core elements of their business and in the meantime get a customized solution tailored to their needs.

Faster results

While cooperating with an AI vendor, there is no need to take care of the necessary infrastructure or training, freeing up time for analyzing data, testing various approaches and building models. This significantly shortens the project implementation time. The vendor’s proven track record of building and implementing AI guarantees the delivery of first-class solutions. deepsense.ai’s portfolio includes more than 150 commercial projects for clients from the USA and Europe. Our commitment and know-how are valued by global companies including Nielsen, L’Oréal, Intel, Nvidia, United Nations, BNP Paribas, Santander, Hitachi and Brainly.

Summary

Close cooperation with an AI vendor allows enterprises to maximize the potential of state-of-the-art technologies and focus on the industry-related aspects of building a competitive advantage. We might go so far as to say that truly remarkable results are not possible without combining in-house and outsourcing models – at least not without investing a great deal of time and significant financial outlays. Cooperation with a reliable AI vendor ensures a number of benefits, and also brings a fresh perspective on the data which is being analyzed.

https://deepsense.ai/wp-content/uploads/2022/08/Five-solid-reasons-to-outsource-your-AI-software-development.jpeg 337 1140 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2022-08-21 10:43:502022-08-21 10:46:51Five solid reasons to outsource your AI software development
6 steps to successfully implement AI project

6 steps to successfully implement AI project

May 18, 2022/in Artificial Intelligence /by deepsense.ai

While enterprises recognize the measurable business benefits of AI adoption, they don’t necessarily see the path to get there. As the Everest Group survey indicates, 3 out of 5 of enterprises fail to adopt AI and don’t achieve meaningful business outcomes. Let’s look for the missing key to harness the full potential of AI implementation.

In many cases, AI projects in enterprises are approached in the same way as other IT implementations, where focusing on selecting an experienced vendor and creating a precisely defined proof of concept are the key milestones. Projects carried out in this way usually end in the PoC phase and never reach deployment. The search for efficient solutions using data science or machine learning requires a different approach. The CRISP-DM (CRoss Industry Standard Process for Data Mining) methodology becomes a helpful tool. CRISP methodology assumes six stages of running AI projects that when repeated over and over again, lead to the right solution. This concept involves an agile approach, where priorities and pathways change as the project development progresses.

CRISP-DM model

CRISP-DM model

Phase 1 – business understanding

Understanding business needs is a key element of data science and the greatest challenge for AI projects. The further success of a project will depend on how deeply the given business use case is understood. The basis is a well-defined business problem that can be described in the language of data science. That is why at deepsense.ai we focus on close contact with business owners. From the very beginning of cooperation with clients, the entire team dedicated to the project participates in the discussion on business needs, possible solutions, and available data. This ensures that all team members have a broader awareness of the purpose, know the limitations, and are able to contribute both technically and conceptually.

Phase 2 – data understanding

On the basis of in-depth knowledge of the business problem that needs to be solved with the use of AI, we can proceed to the data review phase. This phase requires not only special commitment, but also self-confidence because the data received from the client will not always be able to provide valuable answers to a specific business use case. At deepsense.ai, we focus on full transparency and the readiness to modify the project scope if we see that the data will not provide valuable solutions in a given area.

It is difficult to assume in advance how long the data exploration phase will last, but both our experience and intuition are helpful here.

Phase 3 – data preparation

Once we make sure that the data provided by the client meets the business value, we are able to start preparing the data set. A very common challenge is not so much the quality of the data set, but problems related to its labeling. That is why, at deepsense.ai we pay special attention to data labels and always try to support the customer in this area.

Especially in computer vision, data labeling is an extremely subjective task. It sometimes happens that the subject matter experts do not agree with each other on how to label, or they do it in an unsystematic way. An interesting example was a quality assurance project for visual defect detection that we did for one of our clients. Unfortunately, even the experts involved in the project had difficulty in classifying the precise area and type of defects uniformly. At deepsense.si we deal with such situations and solve them based on the specificity of the problem or labeling costs, e.g. by preparing a detailed specification, multiple labeling to obtain consensus, or creating a dedicated application that interactively supports the labeling process.

Phase 4 – modeling

A mixture of thorough experience, efficient exploration, and agile approach is what matters most in this phase. At deepsense.ai we are always focused on optimizing the modeling phase by either proposing a proven approach or – in case of unusual challenges – by exploring many solutions in parallel and their quick selection. The best results are achieved by using a cascade approach, which allows us to spend more time refining the most promising models. We also keep in mind that “the best model is no model” and always try to make things simple whenever possible.

In this phase, we specify milestones and work within a strict regime, so as to provide a solution skeleton in the shortest possible time, the functionalities of which we will then further deepen.

In addition to that, we use neptune.ai – a tool that allows us to monitor experiments and make accurate decisions by tracing all the improvements (such as data preprocessing, model selection and training, validation strategy) and tracking their impact on the end results.

Phase 5 – evaluation

At this stage, we return to the business owners, proposing specific solutions and approaches for selected use cases. It is also the best time to review the project scope from a business perspective. It very often occurs that thanks to the experience and knowledge we have gained, together with the business owners, we discover new possibilities for AI implementation with significant business value.

A good example is a project carried out by deepsense.ai for one of the leading european fashion retailers. As part of our cooperation on online sales prediction, we were able to propose a number of UI solutions for the online shop to better track the customers experience and preferences.

Phase 6 – deployment

From the client’s point of view, the deployment phase is the most crucial; at this stage the proposed solution delivers tangible business results. Here, we try to actively support the client not only during implementation, but also in monitoring and maintenance operations.

We always have a flexible approach to project maintenance. Very often the volume of data to be analyzed increases over time or the project is extended to other departments in the organization and requires, for example, transferring it from on-prem to the cloud.

Summary

In the most advanced AI projects – which therefore have a chance to become the most innovative – there are no ready-made solutions and approaches. The data science team has to try novel methods, take risks and trust their experience. These are the elements we love the most about our work at deepsense.ai.

https://deepsense.ai/wp-content/uploads/2022/05/6-steps-to-successfully-implement-AI-project.jpeg 337 1140 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2022-05-18 18:54:332022-05-18 19:11:296 steps to successfully implement AI project
Overview of explainable AI methods in NLP

Overview of explainable AI methods in NLP

March 29, 2022/in Artificial Intelligence /by Kamil Pluciński

Introduction

In recent years, we have seen rapid development in the field of artificial intelligence, which has led to increased interest in areas that have not often been previously addressed. As AI becomes more and more advanced, beyond model effectiveness, experts are being challenged to understand and retrace how the algorithms came up with their results, and how the models are reasoning and why [Samek and Muller, 2019].

Such knowledge is necessary for many reasons: one is ensuring that the AI-driven solutions comply with regulations, particularly in finance. Another reason is a better understanding of how a model works, which translates into reducing errors and anticipating the strengths and weaknesses of the model, as well as avoiding unexpected behavior already in production. And last, but not least, it allows for the creation of models that are inclusive and eliminate the impact of social biases which can appear in the training data. The use of explainable AI (xAI) translates into increased trust and confidence when deploying an AI-powered solution.

The need for an explanation of AI models is especially noticeable in natural language processing tasks. A common part of many solutions in this domain is the use of vector representations of words. As it turns out, these representations also model human biases that are found in the data. Examples of such biases include: gender bias, racial bias, or social bias towards people with disabilities [Hutchinson et al., 2020].

Categorization of Explanations

The approaches used in explainable artificial intelligence can be divided in several ways. Some of the most functional in both practice and science are the divisions by what is being explained and by at which stage of model usage the explanation is happening.

The first one tells us what we are explaining:

  • local explanation – presents an explanation of one particular decision, e.g. by showing which words in the input example were important
  • global explanation – shows the entire model’s behavior. In this article, we focus on local methods.

The second possible split is based on how the explanation is created:

  • explanation by design – the models that are intrinsically explainable, like decision trees
  • post-hoc explanation – the model is a black-box; however, with post-processing methods it is possible to determine how the decision was made

This taxonomy allows us to organize and characterize the available methods.

Methods walk-through

LIME and SHAP

Let me start by describing the LIME [Ribeiro et al., 2016] and SHAP [Lundberg and Lee, 2017] AI explanation methods, which are examples of post-hoc local explanation algorithms. The idea behind LIME is to create a simpler, interpretable model that approximates the behavior of a complex model in the neighborhood of the analyzed example, and is visualized in the picture below:

Figure 1. Linear approximation of the complex decision boundary.
Figure 1. Linear approximation of the complex decision boundary.
Source: https://www.kdnuggets.com/2019/12/interpretability-part-3-lime-shap.html

The simpler model is trained on the data coming from the perturbations of the input data, where the ground truth is the result returned by the complex model. In the case of textual data, perturbations are usually made by removing words from the analyzed example. SHAP, on the other hand, leverages game theory and using Shapley Values determines the importance of perturbed examples.

They are very handy to use as they can work with any black-box model and with most data types. Another reason for the growing popularity of this approach is the great SHAP library (https://github.com/slundberg/shap) which is easy to use and provides well designed visualizations. Below, we can see an example of an explanation from SHAP documentation presenting why the model predicted the sentiment of a movie review to be positive.

Firstly, we initialize the pretrained model from Huggingface for predicting sentiment:
Overview of explainable AI methods in NLP - Code 1

Next, we import the SHAP library and initialize the model explainer as well as give it an example to predict and analyze:
Overview of explainable AI methods in NLP - Code 2

Finally, we use the visualization module to transform the resulting values into a graph showing the marginal contribution of each word (in simplified terms, we can think of this as feature importance):
Overview of explainable AI methods in NLP - Code 3

And the result is presented below:

Figure 2. Sentiment analysis explanation from SHAP documentation
Figure 2. Sentiment analysis explanation from SHAP documentation
Source: https://github.com/slundberg/shap#natural-language-example-transformers

We can see here that the phrase ‘great movie’ had the highest contribution to predicting the phrase as positive and the model was unable to capture the sarcasm.

Unfortunately, no approach is without its flaws. Researchers point out that an explanation actually comes from another model, which may have fidelity problems in reproducing the actual model.

Gradient-based explanations

Another group of approaches used in explaining NLP model predictions are methods that analyze the gradient appearing in a neural network. More precisely, they analyze the change of the selected decision class with respect to the input example. Since one backward pass is enough to create a saliency map, and they do not use a surrogate model, they are free of the problem that plagued the perturbation approaches because they analyze the model explicitly.

A great tool that allows simultaneous decision analysis using multiple gradient approaches (and also the LIME algorithm described earlier) is LIT – Language Interpretability Tool (https://pair-code.github.io/lit). It also provides a comprehensive analysis of the model and dataset. The concept of this tool is a little different from SHAP. It works as a standalone web service, which can also be run in Jupyter Notebook.

Below, you can see the script that loads the dataset together with a pretrained model and runs the service in Jupyter Notebook:
Overview of explainable AI methods in NLP - Code 4

We run the same example as in the SHAP section. The outcome is presented in figure 3:

Figure 3. The result of explaining positive sentiment prediction in text “What a great movie! ...if you have no taste. returned by LIT
Figure 3. The result of explaining positive sentiment prediction in text “What a great movie! …if you have no taste.” returned by LIT.

The result clearly shows that the key phrases in the prediction were ’great’ and ’no taste’. Unfortunately, even in this example we can observe a divergence of the contribution of the word ’taste’, which seems to be relevant for correct prediction.

Recent research shows that explanations generated by gradient methods should be used with caution because the largest gradient tends to be concentrated in high frequency areas. In other words, where there is a lot going on in the input example (this is easier to understand in an image example where we have a background and an object in the foreground – the higher frequency will occur in areas of the object contours) [Adebayo et al., 2018]. As an alternative to such approaches, the intrinsically explainable models could be used, in which the decision mechanism is understandable by humans [Rudin, 2019].

Using attention to generate explanation

Recently, approaches that use an attention mechanism have become very popular for xAI in NLP. The attention indicates to the network which words it should focus on. It can be incorporated into many networks, e.g., by adding an attention module between the encoder and decoder in an LSTM, or one can build an entire network based on this mechanism, as in Transformer which is based on self-attention blocks. As it requires a specific architecture it will not work with every model, so it represents a group of methods in which the models explain themselves.

The Language Interpretability Tool also allows you to visualize all the heads and attention layers in the Transformer-based model (BERT) used in the earlier example. An example can be seen below:

Figure 4. The result of explaining attention layers in BERT trained for sentiment classification in the input example “What a great movie! ...if you have no taste. returned by LIT.
Figure 4. The result of explaining attention layers in BERT trained for sentiment classification in the input example “What a great movie! …if you have no taste.” returned by LIT.

Already in this single example, we observe that this mechanism pays a lot of attention to punctuation marks. Researchers also point this out and indicate that the attention mechanism, despite its great performance, does not always focus on data parts that are meaningful from a human’s point of view, and thus may be less plausible [Jain & Wallace, 2019; Wiegreffe & Pinter, 2019]. If you would like to explore a method that uses the attention mechanism but creates a saliency map like previous methods and simultaneously addresses the problem of too much focus on punctuation marks, we encourage you to check out the publication Towards Transparent and Explainable Attention Models by Akash Kumar Mohankumar et al.

Example-based explanation

The example-based explanations will be the last group of methods mentioned in this article. There are many approaches here, but what they have in common is showing a data point that is in some way relevant to explaining a particular example. The simplest example might be the Nearest Neighbor algorithm, which identifies the most similar example from the training set and presents it as an explanation. An extension of this approach are prototypes, where the most representative examples in the training set for each class are found and a prediction is made as in Nearest Neighbor by analyzing which prototype is most similar to the input example. On the other hand, there are counterfactual explanations, which attempt to change the input example as little as possible so that the classifier’s decision changes, i.e., they implicitly explain the decision boundary.

At this time, to the best of our knowledge, there is no production-level quality library for example-based explanation. The simplest approach would be using the Nearest Neighbor algorithm while encoding texts with the Universal Sentence Encoder [Cer et al., 2018] or other pretrained models. For more advanced examples, we should use more research-oriented methods.

An example could be ProSeNet [Ming et al., 2019]. The authors train the neural network to learn to find prototypes that best represent the entire dataset while being very diverse. The model makes predictions like the Nearest Neighbor algorithm, making it intrinsically explainable. For an example of explanation, see the image below:

Figure 5. Explanation presented by ProSeNet
Figure 5. Explanation presented by ProSeNet [Ming et al., 2019]. It contains an input example, three most similar prototypes together with their similarity to the input example and assignment to a positive class.

Here we can see an input example together with most similar prototypes that were used to make a prediction.

The advantage of example-based approaches is that they usually work with numerous models and explain decisions very well. On the other hand, these methods are usually very simple and do not allow fits in complex pipelines, or in the case of models adapted to be explainable in such approaches (e.g., neural prototype finding like ProSeNet or concept learning), they offer poorer performance than their corresponding black-boxes.

Summary

Explaining artificial intelligence is still a developing area, and the available tools are not always ready for use in production. In each project we are driven by individual customer needs, and we tailor solutions to the requirements. In this article, I have shown several methods that are often used to explain NLP models. Each of them has its advantages, but also its limitations. Knowing them allows us to use the appropriate methods depending on our needs.

For those who would like to learn more about xAI, I recommend checking out the book Interpretable Machine Learning by Christoph Molnar.

References

Wojciech Samek and Klaus-Robert M ̈uller. Towards explainable artificial intelligence. CoRR, abs/1909.12072, 2019. URL http://arxiv.org/abs/1909.12072.

Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5491–5501, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.487. URL https://aclanthology.org/ 2020.acl-main.487.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450342322. doi: 10.1145/2939672.2939778. URL https://doi.org/10.1145/2939672.2939778.

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 4768–4777, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, vol- ume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/294a8ed24b1ad22ec2e7efea049b8737-Paper.pdf.

Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. ISSN 25225839. doi: 10.1038/s42256-019-0048-x. URL https://doi.org/10.1038/s42256-019-0048-x.

Sarthak Jain and Byron C. Wallace. Attention is not Explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3543–3556, Stroudsburg, PA, USA, 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1357. URL http://aclweb.org/anthology/N19-1357.

Sarah Wiegreffe and Yuval Pinter. Attention is not not explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 11–20, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1002. URL https://aclanthology.org/D19-1002.

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. Universal sentence encoder. CoRR, abs/1803.11175, 2018. URL http://arxiv.org/abs/1803.11175.

Yao Ming, Panpan Xu, Huamin Qu, and Liu Ren. Interpretable and steerable sequence learning via prototypes. In Ankur Teredesai, Vipin Kumar, Ying Li, R ́omer Rosales, Evimaria Terzi, and George Karypis, editors, KDD, pages 903–913. ACM, 2019. ISBN 978-1-4503-6201-6.

https://deepsense.ai/wp-content/uploads/2022/02/Overview-of-explainable-AI-methods-in-NLP.jpeg 337 1140 Kamil Pluciński https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Kamil Pluciński2022-03-29 19:57:392022-03-30 16:41:39Overview of explainable AI methods in NLP
AI challenges in retail and manufacturing

AI challenges in retail and manufacturing

February 2, 2022/in Artificial Intelligence /by deepsense.ai

Technological progress and the use of Big Data in business make AI-based solutions increasingly important, but implementing AI in retail and manufacturing is still perceived as walking the cutting edge of innovation. We discuss the main challenges with Ireneusz Prus, Director of Data Monetization at Maspex Group.

What convinced Maspex to implement AI solutions?

Maspex is one of the largest food producers in Central and Eastern Europe. We have been aware of the business potential hidden in our data and we also have known that only with advanced AI algorithms would we be able to fully harness its potential. Supporting our experts with the knowledge provided by artificial intelligence is another step forward in building our competitive advantage.

How did you start the process of implementing AI?

AI can enhance nearly every aspect of our business, so it was pretty challenging to choose one use case that could quickly deliver results and present real business value. As we didn’t have an internal Data Science team, finding an experienced technology partner was the first challenge. We decided to work with deepsense.ai, a leader in AI. Together, we conducted a data audit, which helped us to prepare for the ML implementation.

This led us to develop our first use case around our marketing activities and build a tool for simulating marketing campaigns and measuring their effectiveness. Maspex performs thousands of promotional activities spread across the whole supply chain. So far, promotional campaigns have been carried out solely on the basis of our experts’ knowledge. We suspected that there was a lot of room for optimization, but there was no tool that would identify these campaigns.

What did the cooperation between Maspex and deepsense.ai look like?

Implementing enterprise AI is very much a cross-departamental process. So, in the first stage we discussed – together with different business stakeholders – which external and internal data should be used to achieve our goals. The biggest challenge was to combine available data sources and to establish a uniform set of variables that would best allow us to predict high impact promotional activities.

Various data sources were integrated to establish the dataset: transaction data, data on promotions and their coexistence, data on competition activities and market trends, and other external factors such as seasonality of sales, weather conditions and variable restrictions related to the pandemic.

deepsense.ai then used that dataset to start training the predictive models. In parallel, models were built based on five different algorithms and ultimately the algorithm that gave the best results was selected. In addition to the predictions related to promotional campaigns, adjunctive algorithms were created to analyze the periods immediately before and after promotions and to evaluate the interactions between different types of promotions.

With deepsense.ai’s support we have developed a system that allows us to define promotion parameters using lists and sliders. We can define promotion, product and target audiences. The parameters are analyzed by the system taking into account the list of evaluated promotions, 12-week-ahead uplift and the impact on market share. As an output, users receive precise predictions of campaign effectiveness and are able to choose the most successful scenario.

What are the next steps?

We plan to further develop the AI implementation process and address the needs of various internal clients. As a market leader, we have multiple sources of data covering the whole retail value chain. It therefore made sense for us to develop our own Data Science team of experts specialized in advanced analytics, which we are currently doing with guidance from deepsense.ai. They will not only have a real impact on our operations but will also have an opportunity to create an innovative AI mindset around our company. The road ahead is full of exciting challenges.

https://deepsense.ai/wp-content/uploads/2022/01/AI-challenges-in-retail-and-manufacturing.jpeg 337 1140 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2022-02-02 18:58:112022-03-01 14:31:18AI challenges in retail and manufacturing
Five questions to answer before hiring an AI vendor

Five questions to answer before hiring an AI vendor

January 17, 2022/in Artificial Intelligence /by deepsense.ai

Many enterprises that consider AI implementation don’t know how to go about it successfully. Companies complain that they don’t have the necessary in-house knowledge or skills, and building a team from scratch is too time-consuming and risky. Because the effective implementation of AI projects requires building interdisciplinary teams – both from business and technology –  executing the process successfully is quite a challenge.

The answer to such a challenge may be to combine in-house and outsourcing models. In one respect, an in-house team provides crucial know-how and knowledge about their company’s operations. An AI vendor, meanwhile, provides cutting-edge knowledge of AI technology and a flexible, innovative approach to realizing the use cases. Ultimately, the vendor’s team becomes an integrated part of the client’s team.  Sounds promising… but before starting cooperation some key questions should be answered.

Where is the business value?

The first element that needs to be decided is what business case the cooperation with the vendor should address. While enterprises are more and more certain about AI’s potential there is still a lot of hype around, making it difficult to decide which use case to implement first. It is not only about not losing money on an AI project, but above all about identifying such business processes where automation will bring visible business values.

The unique approach and work methodology developed by deepsense.ai allows us to quickly identify the most relevant use cases. Our cooperation with a client always begins with an opportunity discovery phase that helps map and understand the business needs. Joint ideation sessions define the main processes and data sources that can become part of the implementation. This approach makes it possible to efficiently determine the scope of cooperation and set up success metrics.

Who from my team will be involved in the project?

Effective implementation of AI projects requires building interdisciplinary teams. Ones that command extensive expertise in available technologies and approaches to data analysis, while at the same time possessing a thorough understanding of the business processes taking place in the enterprise. They must be able to recognize the factors that influence the core operations. As business challenges go, this one certainly qualifies as daunting. However, it is important to consider who – within the organization – will be able to become a partner for cooperation with an AI vendor. This person, or persons, should not only take care of the smooth development & implementation of the project, but also have a full picture of the business situation within the analyzed use case.

What kind of  data can we share with the vendor?

The garbage in/garbage out principle is well known in data science. But, it should also be considered from a business stakeholders perspective. Certainly, most enterprises possess data that can provide valuable insights. However, it is important to review whether the dataset covers all data sources available in the organization as well as assess data quality and consistency. Usually the process of collecting and preparing data for analysis – to the customers’ surprise – takes more time than it was expected. To speed up the process, the deepsense.ai team always begins to review and analyze the data in close cooperation with the customer’s domain experts. At this stage, it is crucial to discuss basic questions, and check features and labels.

An additional element worth paying attention to is the data security. While working with an external vendor the knowledge of core business processes somehow goes beyond the enterprise. That’s why we always implement adequate security measures, such as encryption, network isolation, and strict data access policies.

How much time do I give to see results?

Valuable data analysis requires patience. Often, before reaching the final solution, several models are trained simultaneously using different perspectives and approaches. The duration of the cooperation depends on the specifics of the project. Usually it takes several months from the data audit and development plan to the operational rollout and commercialization of the entire project. The first working proofs-of-concept are usually presented within 4 weeks from the start of the project.

In what model do I want to work with a vendor?

At deepsense.ai we have the flexibility to work along two different models. The first one is focused on team augmentation. We work in cooperation with our customer’s teams to expand their technical skill-set within a broad set of roles. Our data scientists, data, and software engineers are ready to tailor to any specific needs. Our people are equipped with the full tech stack required to help clients reach their milestones quicker, and ready to work hand-in-hand with internal business and technical teams.

The second model provides end-to-end project delivery: from identifying the needs to commercial deployment. We also specialize in handling highly customized challenges for which ready-made solutions don’t exist.

Summary

The dynamic development of AI makes it difficult for companies whose core operations are not related with hi-end technology to keep up with the latest solutions. Close cooperation with an AI vendor allows them to maximize the potential of state-of-the-art technologies and focus on the industry-related aspects of building competitive advantage. We might go so far as to say that truly remarkable results are not possible without combining in-house and outsourcing models. At least not without a great deal of time and much larger financial outlays. Cooperation with a reliable AI vendor ensures flexibility, cost control, and also brings a fresh perspective on the data being analyzed.

https://deepsense.ai/wp-content/uploads/2022/01/Five-questions-to-answer-before-hiring-an-AI-vendor.jpeg 568 1920 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2022-01-17 09:06:082022-01-17 09:56:11Five questions to answer before hiring an AI vendor

deepsense.ai for MIT Sloan Management Review about AI in service of balanced growth

November 26, 2021/in Artificial Intelligence /by deepsense.ai

The pursuit of sustainable enterprise growth has accompanied the rising interest in optimizing the use of global resources. Companies know that the proper management of global resources today isn’t strictly a question of ethics, but can also have a very real impact on building competitiveness. Sustainable growth is becoming a leading trend in the business world, one that can be supported by AI and advanced data analysis.

In business, basing decisions on reliable, real data can be a key means to achieving sustainable growth. AI has made it possible to monitor and analyze huge amounts of data, from various sources, more or less in real time. This provides a full view of the current situation and enables flexible action.

In the pursuit of sustainable resource use, advanced algorithms can promote better decision-making and process optimization in numerous key areas including not only supply chains, transport, and production processes, but also building value for society and the environment.

Balanced supply chains and transport

Lack of control over the extensive global supply chains takes a toll on the environment and sustainable trade, and also costs companies money. Fortunately, AI has it possible to monitor and optimize, on an ongoing basis, all aspects of the acquisition and transport of resources. For its part, deepsense.ai helps its clients optimize their supply chains in a number of ways, including providing systems to help minimize empty mileage in road transport. Machine learning models analyze the location of transport loads by selecting the most optimal route, maximizing the load capacity of trucks and minimizing the number of kilometers they travel. AI not only increases transport efficiency, but also helps companies reduce their carbon footprint.

Optimizing production resources

The optimal use of production resources is becoming another key aspect of sustainability. It offers measurable benefits in the form of higher operational efficiency. Factories that have adopted a zero-waste policy – completely eliminating waste from production lines or their use in subsequent production phases – meet the demand for resource optimization in its most extensive form. But a sustainable approach to production doesn’t have to be so revolutionary in scope. The deepsense.ai team works with clients on various aspects of production and resource optimization through intelligent quality monitoring. One area of focus is real-time monitoring of production lines using a combination of computer vision technology and advanced data analysis. The systems analyze products for defects and impurities, appropriate shape and distribution and a range of other anomalies. This helps companies identify and minimize losses in the early stages of production while maximizing their resource use.

Boosting employee safety is yet another important aspect of sustainable production. In response to this challenge, deepsense.ai created a system for monitoring the safety of factory employees based on the analysis of camera images. Trained neural networks analyze the situation on an ongoing basis, identifying potential threats such as violations of security rules or the sudden emergence of new threats in high-danger zones. When a threat is detected, the system automatically sends alerts and suggests specific preventive actions.

Building value for society and the environment

Another way companies engage in sustainable development is by also means building a company’s value by contributing to society. Here advanced data analysis can be used in a number of ways. An interesting example is a challenge deepsense.ai took up in 2016 as part of an analytical competition announced by the US National Institute of Justice. The task involved predicting the location of possible crimes in the city of Portland, Oregon, by identifying crime hotspots – small areas with the highest predicted crime rate. Our team’s results were impressive enough to take first place.

AI-based systems can also be employed to help protect the environment. The image recognition-based early warning system deepsense.ai created for detecting potential forest fires is a good example of such a system. The machine learning model automatically recognizes smoke in photos from cameras placed in the forest and provides predictions about the locations of possible fires. The solution not only increases the accuracy of the early warning system for fire hazards, but also enables a much faster reaction from fire departments, potentially eliminating the threat at an early stage.

Summary

The use of advanced data analysis and artificial intelligence can help companies and other organizations maximize their achievement of sustainable development goals, build competitive advantage and create a better, safer world. The COVID-19 pandemic has highlighted how the lack of a sustainable approach can disrupt business continuity. The ideas that define sustainable development have gained in importance and will become a permanent feature of companies’ strategies. However, in the pursuit of optimization through the use of the latest technologies, striking a balance between the outcomes we anticipate and the environmental and economic costs of their implementation is crucial. Ever aware of the importance of this balance, deepsense.ai produces solutions that draw on the latest technologies and the optimal and agile model training process.

The article was published in the Polish edition of the MIT Sloan Management Review, in November 2021.

https://deepsense.ai/wp-content/uploads/2021/06/MIT_-1.png 700 1920 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2021-11-26 17:32:062021-11-30 10:07:37deepsense.ai for MIT Sloan Management Review about AI in service of balanced growth
deepsense.ai for AI Trends about successful enterprise AI implementation

deepsense.ai for AI Trends about successful enterprise AI implementation

July 12, 2021/in Artificial Intelligence /by deepsense.ai

There is no question that the enterprises that successfully leverage AI’s potential will be the ones to get ahead. While businesses are more and more certain about the goals they want to use AI for, how to execute and deploy it successfully remains a difficult question.

Enterprise AI – evolution not revolution

In today’s world, the dynamics and efficiency of operations are crucial factors, and data is the key to flourishing in the rapidly changing business reality. Data provides insights from various areas of business activity. Solutions based on artificial intelligence or machine learning not only support the analysis of huge amounts of data, but also provide a new approach to optimizing and automating core processes. Enterprises across the full spectrum of industries can build competitive advantage based on AI. Such a strategy need not come down to one-off AI implementations. Rather the most successful will take a holistic approach to treating data and high-end technologies as core organizational assets. From this perspective, enterprise AI can be considered as the next, natural stage in the digital transformation of enterprises.

New approach to AI team augmentation 

Many companies want to implement the enterprise AI approach, but don’t know how to go about it successfully. Companies complain that they don’t have the necessary in-house knowledge or skills, and building a team from scratch is too time-consuming and risky. In fact, the effective implementation of AI projects requires building interdisciplinary teams – ones that command extensive expertise in available technologies and approaches to data analysis, while at the same time possessing a thorough understanding of the business processes taking place in the enterprise. They must be able to recognize the factors that influence the core processes. As business challenges go, this one certainly qualifies as daunting.

The answer to such a challenge may be to combine in-house and outsourcing models. In one respect, an in-house team provides crucial know-how and knowledge about their company’s operations. The outsourcing partner, meanwhile, provides cutting-edge knowledge of AI technology and a flexible, innovative approach to realizing the use cases. Ultimately, the vendor’s team becomes an equal member of the client’s team. Such an approach seems to have a number of disadvantages, as the knowledge of core business processes somehow goes beyond the enterprise. However, the benefits of this approach far outweigh such disadvantages. It might go so far as to say that truly remarkable results are not possible without combining the in-house and outsourcing models. At least not without a great deal of time and much larger financial outlays. Cooperation with a reliable outsourcing partner ensures flexibility, cost control and it also brings a fresh perspective to the data being analyzed.

deepsense.ai’s holistic approach to enterprise AI transformation 

The unique approach and work methodology developed by deepsense.ai maximizes synergy. Our cooperation with the in-house client team always begins with workshops that help map and understand the business needs. Joint ideation sessions define the main processes, data sources and use cases that can become part of the implementation. The deepsense.ai team then begins to review and analyze the data in close cooperation with the customer’s team on a daily basis. This model was implemented in cooperation with our client, the retail research leader Nielsen. Working alongside Nielsen’s experts across the globe maximized the synergy between our two companies. Together we developed an advanced AI solution to automate extraction of content images of varying lighting conditions, viewing angle and quality. After implementing the solution we conducted technical workshops for Nielsen’s team to transfer the know-how and code. We also provided assistance in cloud readiness and scaling.

Summary

The dynamic development of technology makes it difficult for companies whose core operations are not related with hi-end technology to keep up with the latest solutions. Close cooperation with a technology partner allows them to maximize the potential of state-of-the-art technologies and focus on the industry-related aspects of building competitive advantage.

https://deepsense.ai/wp-content/uploads/2021/07/deepsense.ai-for-AI-Trends-about-successful-enterprise-AI-implementation.jpeg 337 1140 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2021-07-12 17:47:322021-07-13 10:57:08deepsense.ai for AI Trends about successful enterprise AI implementation

deepsense.ai for MIT Sloan Management Review about AI in platform businesses

June 22, 2021/in Artificial Intelligence /by deepsense.ai

Businesses based on the platform model are gaining in popularity. Their potential is hidden not only in the territorially unlimited, multi-level network of value exchange. The platforms also provide enormous amounts of data on customer purchasing preferences, vendors’ sales opportunities and new market trends. In-depth analysis of this data allows for the proper implementation of a development strategy and facilitates scaling.

The key to fully exploiting the potential of platform businesses is to build a unique customer experience that translates into a growing base of loyal users.

Advanced analytics based on artificial intelligence and machine learning enables this unique customer experience through hyper-personalization in the most crucial areas including individualized offers and content, dynamic recommendations and service automation.

Hyper-personalization of sales and dynamic recommendations

A key consideration in building a competitive advantage in the platform model is sales personalization. However, customers expect more than traditional personalization based on historical data analysis.

To maintain customers’ interest, they must be provided with the highest quality interactions with the platform. This can be done by adjusting the offer to their individual purchasing preferences “here and now”. The availability of tools based on cognitive technologies and artificial intelligence provides a transition from basic personalization to hyper-personalization, enabling the real-time analysis of customer needs. An example of an innovative approach to personalized recommendations is the intelligent, virtual fitting room deepsense.ai developed. Customers can upload their photo to the application, which simulates their appearance in the garments of their choice. The system can also propose a specific size and adjust complementary styling elements. Multi-stage data analysis based on generative adversarial neural networks takes the customer to a new level of interaction with the offer.

Advanced machine learning models also make it possible to optimize  recommendation engines and transform recommendations into a personalized offer tailored to the needs of a specific customer.

Data analysis facilitates not only tracking customers during their journey to the next stages of the sales funnel, but also predicting a specific moment of conversion. This gives a business the opportunity to immediately respond to the client’s needs and propose a highly personalized offer. An example of this approach is present in an AI solution developed by deepsense.ai for one of the leading european banks, where, a machine learning model creates personalized offers for customers with an efficiency exceeding forty times the baseline approach. The algorithm automatically searches the database in order to identify customers matching a specific profile, individually selects the elements of the offer and analyzes customers’ readiness to buy.

Content hyper-personalization and intelligent search

Highly personalized content makes it possible to build a customer relationship with the platform and maintain customer loyalty, providing each user with a unique experience.

Leading brands like Facebook, Amazon, Spotify and Starbucks have long been taking content personalization to a new, higher level using predictive personalization. AI has made it possible to extract additional information about customer preferences in real time. Taking into account the relevant data, the models are able to predict what type of content a given client will most likely engage with.

Intelligent search also supports content hyper-personalization. Deep learning elevates search from a simple understanding of keywords to an understanding of intent and context.

Understanding the semantics and real meaning of user queries is essential to intelligent search. Neural networks learn to correctly interpret queries and adjust the information users expect. Such solutions facilitate the effective search for content, navigate information in platforms with complex structures and ensure a quick transition of customers to the issues they are interested in.  An example application of the above technology is a system created by deepsense.ai for a platform offering access to scientific literature. To equip users with efficient content search, natural language processing models that can recommend items with similar topics are applied. Thanks to this, the platform’s clients quickly find the materials they are interested in.

Process automation

Intelligent tools that automate the sales process keep the user engaged and support conversion. But it is not only about simple solutions for improving customer service.

Algorithms make it possible to efficiently guide the customer through the purchase process by providing dedicated information obtained in the learning process, based on the service of previous users. Additionally, by eliminating manual operations, they enable real-time customer service. An instance of such automation can be found in the deep neural networks deepsense.ai created to power an intelligent system for reporting and assessing motor insurance claims. Thanks to advanced image recognition algorithms, the system identifies car parts and classifies those that have been damaged. It then automatically assesses the damage and estimates the cost of repair. This solution helps estimate the value of the claim within a few seconds based on photo documentation sent by the customer.

Summary

Platform businesses are a response to the pandemic reality that has forced customers to push deeper into the world of online shopping. The platforms have developed explosively. Going forward, the dynamics of this development will be dictated by the speed with which platforms respond to individual customer needs and advances in data analysis.

The article was published in the Polish edition of the MIT Sloan Management Review, in June 2021.

https://deepsense.ai/wp-content/uploads/2021/06/MIT_-1.png 700 1920 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2021-06-22 17:43:362021-06-28 14:01:07deepsense.ai for MIT Sloan Management Review about AI in platform businesses
AI-based applications in the insurance industry

AI-based applications in the insurance industry

May 17, 2021/in Artificial Intelligence /by deepsense.ai

Innovative leaders from the insurance industry are starting to recognize the benefits of machine learning and deep learning applications. As business approaches to using AI-based solutions evolve, concerns about the transparency and explainability of AI models are also diminishing. While the trust in this type of solutions is growing, AI offers a new look at the product and service portfolio, as well as improve the sales process itself to best suit customer needs. As predicted by McKinsey experts, the technological revolution in the insurance industry has already started, and its progressive development will lead to a massive technological shift over the next decade.1

AI-based customer service automation

One area where insurance leaders look to implement AI solutions is customer service automation. Chatbots and biometric identification are becoming permanent features of the customer service landscape.
Chatbots automate customer service, increasing customer experience and satisfaction. Behind the scenes, automation significantly reduces customer service workloads. As insurance industry leaders have indicated, over 90% of customer service engagements follow basic patterns, leaving excellent potential for automation while retaining high customer satisfaction.
Biometric identification supports explicit or unnoticed identity verification within remote channels. This can include voice identity verification in call centers or typing manner verification in online channels. Similar solutions are poised to gain traction, particularly in the post-covid world.

Back-office optimization

Another area where AI-based applications are highly effective is back-office optimization.  The documents and forms processed in the insurance industry are voluminous, so improvements in this area quickly bring noticeable results. Given that, deepsense.ai developed an insurance claims assessment system for one of the largest financial institutions in Eastern Europe.  Employing advanced computer vision algorithms, the system recognizes car parts and classifies which are broken, assesses the severity of the damage and estimates the cost of repairs. It allowed nearly real-time assessment of claim value based on image documentation sent by the client. With 70% of claims handled automatically, the manual work required in the process is significantly reduced, while the time needed to assess damage falls from days to seconds.

AI-based applications in the insurance industry - deepsense.ai claims assessment system

deepsense.ai claims assessment system

Automating text and image documentation analysis can be widely applied in insurance, including for cash operations, trade finance, insurance application processing as well as classifying incoming emails to go to the appropriate department or customer segmentation.

Customer insights

Applying deep learning to customer analytics makes it easier to combine insights from various data sources (e.g. transactions, online banking logs, call center interactions) as well as a 360o customer view and personalized sales activities. This helps to better understand customers and to build personalized recommendations, making the business more responsive and efficient. For one client in the banking sector, deepsense.ai designed a machine learning model that created personalized recommendations, showing what should be proposed to a given client based on the type of customer they are. The model had a 93% recall rate, which is 60% more effective than traditional product recommendation techniques.
Thanks to accurate AI algorithms, we can also improve customer retention by predicting churn probability. This is important as customers often churn without obvious warning signs. The roots of deepsense.ai’s experience with anti-churn use cases go back to the banking industry. We developed an intelligent anti-churn system based on historical internal data and external databases for a leading financial institution in Central Europe. To identify risk groups that were more likely to churn, we leveraged multiple data sources and gradient boosting trees and also evaluated customer lifetime value. The results were used to prioritize communication with highlighted clients, 50% of which confirmed they were considering leaving the bank.

Insurance risk management

Machine learning enables nearly fully automated fraud detection, adapted to individual patterns and changing behaviors. It can be applied in areas where a large volume of events needs to be analyzed in real time. Our client, a leading CEE insurance company, suspected that some end customers were abusing access to private healthcare. deepsense.ai was tasked with analyzing data in a search for anomalies and spotting the data-marks of fraudulent transactions. With the knowledge it gathered, the team developed algorithms identifying common schemes and techniques of private healthcare abuse. The schemes included excessive medical diagnostics and exploiting flaws in billing systems. The team also identified potential fraud being committed by service providers abusing their agreement with the health insurance company. The model we delivered spots suspicious activities and has enabled the company to reduce losses by over 3M EUR annually.

Summary

A wide range of ML applications is increasingly being used to solve real business problems in insurance. As AI becomes more popular, these applications will become the market standard. Nevertheless successfully leveraging the new opportunities offered by cutting-edge technologies will require insurers to undertake a shift in corporate governance. However, if applied successfully, AI-based solutions will benefit both insurers and clients alike.

1 https://www.mckinsey.com/industries/financial-services/our-insights/insurance-2030-the-impact-of-ai-on-the-future-of-insurance

https://deepsense.ai/wp-content/uploads/2021/05/AI-based-applications-in-the-insurance-industry.jpeg 337 1140 deepsense.ai https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg deepsense.ai2021-05-17 08:00:522021-05-17 09:54:59AI-based applications in the insurance industry
Page 1 of 212

Start your search here

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    THE NEWEST AI MONTHLY DIGEST

    • AI Monthly Digest 20 - TL;DRAI Monthly Digest 20 – TL;DRMay 12, 2020

    CATEGORIES

    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • AI trends for 2021AI trends for 2021January 7, 2021
    • A comprehensive guide to demand forecastingA comprehensive guide to demand forecastingMay 28, 2019
    • What is reinforcement learning? The complete guideWhat is reinforcement learning? deepsense.ai’s complete guideJuly 5, 2018

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Knowledge base
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only
    Cookies To make this site work properly, we sometimes place small data files called cookies on your device. Most big websites do this too.
    Accept
    Change Settings
    Cookie Box Settings
    Cookie Box Settings

    Privacy settings

    Decide which cookies you want to allow. You can change these settings at any time. However, this can result in some functions no longer being available. For information on deleting the cookies, please consult your browser’s help function. Learn more about the cookies we use.

    With the slider, you can enable or disable different types of cookies:

    • Block all
    • Essentials
    • Functionality
    • Analytics
    • Advertising

    This website will:

    This website won't:

    • Essential: Remember your cookie permission setting
    • Essential: Allow session cookies
    • Essential: Gather information you input into a contact forms, newsletter and other forms across all pages
    • Essential: Keep track of what you input in a shopping cart
    • Essential: Authenticate that you are logged into your user account
    • Essential: Remember language version you selected
    • Functionality: Remember social media settings
    • Functionality: Remember selected region and country
    • Analytics: Keep track of your visited pages and interaction taken
    • Analytics: Keep track about your location and region based on your IP number
    • Analytics: Keep track of the time spent on each page
    • Analytics: Increase the data quality of the statistics functions
    • Advertising: Tailor information and advertising to your interests based on e.g. the content you have visited before. (Currently we do not use targeting or targeting cookies.
    • Advertising: Gather personally identifiable information such as name and location
    • Remember your login details
    • Essential: Remember your cookie permission setting
    • Essential: Allow session cookies
    • Essential: Gather information you input into a contact forms, newsletter and other forms across all pages
    • Essential: Keep track of what you input in a shopping cart
    • Essential: Authenticate that you are logged into your user account
    • Essential: Remember language version you selected
    • Functionality: Remember social media settings
    • Functionality: Remember selected region and country
    • Analytics: Keep track of your visited pages and interaction taken
    • Analytics: Keep track about your location and region based on your IP number
    • Analytics: Keep track of the time spent on each page
    • Analytics: Increase the data quality of the statistics functions
    • Advertising: Tailor information and advertising to your interests based on e.g. the content you have visited before. (Currently we do not use targeting or targeting cookies.
    • Advertising: Gather personally identifiable information such as name and location
    Save & Close