Machine learning Archives

Standard Template for Machine Learning projects – deepsense.ai’s approach

September 24, 2023/in Machine learning /by Piotr Gródek

Developing efficient ML projects may be a great challenge, as there are a lot of resources and decision points that need to be taken into consideration. Therefore, it is crucial to establish a framework that ensures both consistency of project delivery and high-quality code. In this post, we will share some insights about our standard template and present best practices to set up ML projects. Let’s get started!

Especially at companies like deepsense.ai the well-defined approach to project delivery is crucial, as we develop multiple isolated projects on a regular basis and work for many different clients simultaneously.

From the very beginning, we have put great attention to following best practices in each project. As the number of projects being developed increased, we decided to create the standard template to ensure uniformity. The ML template includes automating resource creation and bootstrapping each new project. Such an approach provides multiple benefits:

higher quality of developed projects,
quick setup for new projects,
easier onboarding of new project members,
flexibility for special projects needs,
clear rules for team members without a programming background.

The cookiecutter template from deepsense.ai is open-sourced and available on our GitHub. You can find the documentation here.

Why the ML project template and code generators?

As the complexity of our projects has grown due to many different existing or future customers, we have decided to establish a cookiecutter template to initialize new repositories.

The major advantage of using the standard template is that it provides a tremendous general starting point for accumulating our experiences, which can be easily customized according to the specifics of a given project. However, it should be kept in mind that the template does not:

provide perfect configuration for every project,
response to all subjective personal opinions,
recommend the newest and shiny tools or approaches as they appear.

Moreover, our intention was to propagate useful technologies that are more tied to certain types of projects. The proposed solution aims to create custom code generators that can be used to add e.g. a basic boilerplate for streamlit demo application or training loop based on pytorch lightning and hydra to a new or ongoing project.

A structured approach to code generation solves many maintenance problems and allows customization of solutions for every project, without polluting them a lot with unused files. The lessons learned can be integrated back without breaking changes for older projects by accident with new updates as can happen with framework or library updates. In short – it’s safer and faster to iterate. Of course, it also means that fixes or new features would need to be backported to each project, but it is a trade-off which suits best our daily work.

Despite the fact that this approach is nothing new and web communities use quite a lot of code generators (for React, Vue, Elixir Phoenix to name a few), it isn’t as widely spread as good practice in other contexts.

General rules for creating Machine Learning project template

It is worth underlining that there is no single, best template that will be suitable for implementation in all projects. Nevertheless, understanding the basic principles allows you to create a uniform, high-quality standard. Let’s start with understanding the design principles that guided us.

1. Harnessing experiences from different projects (both DS and SE)

In any company where you are not working on just one project (e.g., a specific product) there will be data silos. Especially in our case, we take security very seriously and limit project access very strictly. That is why it is always good to conduct internal interviews with teams implementing various projects and collect lessons learned by presenting different perspectives. While working on the template, we talked to almost every member of the deepsense.ai team to accumulate as many valuable insights as possible.

2. Adapting to software house project style

The wide spectrum of projects that we implement at our AI development company – from R&D to production – meant that we had to think carefully about the main assumptions of the template to make it as universal as possible.

3. Adding tools and configurations that are useful

Based on the collected insights, we have prepared a set of basic, recommended tools. Usually, it takes too much time to set up or read about each tool and how to configure it during an ongoing project – especially if you have not done it before. It is easier to edit existing files and adjust them for specific project needs.

4. Making onboarding as easy as possible

A similar structure, enforced code style, and tools allow to onboard new team members faster or switch people smoothly between projects.

5. Handing over ownership to ensure independence

The template is just a guideline and the project team takes ownership and can adapt it quickly to the specific needs of a project. Standardization and ensuring high consistency cannot kill our creativity! Especially in R&D projects – that we love so much at deepsense.ai – we can afford to take risks, relax the rules and quickly develop with the ability to adapt to higher requirements related to the production phase.

6. Being PEP and other best practices complaint

When implementing the project, we try to maintain certain established rules, but only if it makes sense and justifies the effectiveness of the work in the project. Examples of such sources of information include official setuptools docs or PEP documents like this one.

7. Being cautious

Our main intention was to use battle-tested, stable methods with known drawbacks, issues, and workarounds approaches, so they could be implemented in broader situations and in different projects.

deepsense.ai standard template

The created template can be used manually by all team members – they can point cookiecutter to our internal repository or use it in zipped file form. To use it in ongoing projects cookiecutter is called with the “-f” option on a branch first followed by a merge as any other feature.

As a good start, we highly recommend learning about cookiecutter from its original documentation.

It is super simple to use cookiecutter. Firstly, of course, we need to install it:

$ pip install cookiecutter

And then run it with our template:

$ cookiecutter <template>

cookiecutter.json file

This file is crucial, as it defines questions and key-value substitutions which are applied to template files.

Here is an example cookiecutter file:

{
"project_name": "default",
"__project_name_slug": "{{cookiecutter.project_name | slugify}}",
"ci": ["GitLab", "None" ],
"python_package_name": "{{ cookiecutter.__project_name_slug.replace('-', '_') }}",
}

Internally the file uses the Jinja2 template system and allows the use of Python or some extensions (like slugify). For example, our default Python package name is dynamically constructed with slugify and Python string function.

Hooks

There are two cookiecutter hooks that can be implemented in Python: pre and post project generation:

pre hook – to validate variables’ content before cookiecutter generates files.
- e.g. check if the python package’s name provided by a user is correct. We use regex to validate user input.
post hook – to clean up unnecessary files that are not needed or to run additional code.
- e.g. remove unnecessary files, fix file permissions – depending on user needs we might not need a Gitlab CI config file and we discovered that not all scripts have correct execution permissions and needs to be fixed.

Project template directory

All files that can be created by the template are put in the directory:

{{ cookiecutter.project_name }}

Cookiecutter asks the user for information, copies this directory, applies Jinja substitution and calls hooks.

Tests

We also have a few basic tests to ensure that cookiecutter projects can be generated just fine. For example we test if all jinja templates are resolved, if expected files are/are not present etc.

Unfortunately, it is not possible to test everything, e.g. the gitlab CI config – for that we created a dedicated test repository which is manually synced and used to test changes before they can be merged back. The repository is also sometimes used to prototype or test some changes first and then the feature is backported to the template.

Documentation & CI

It is important to mention that we have CI which runs tests for the template – it already helped to catch some errors early.

Our CI also integrates nicely with Sphinx documentation – it automatically builds and uploads it to GitLab Pages. Initial documentation has recommended sphinx extensions installed and configured and provides a backbone a team can build upon. For example, we find out that Markdown is nicer to use than the default RST format and we make it possible to use it out of the box.

Overview of our cookiecutter project

The standard cookiecutter project at deepsense.ai consists of:

Basic python package structure:
- setup.py – legacy compatibility for pip install -e .
- setup.cfg – package metadata and dependencies.
- pyproject.toml – all tools configuration (if support is present)
- a very minimal python code + example test
pre-commit hooks
- black, flake8 – enforce code style
- pycln – cleanups unused imports
- mypy – checks type errors
- isort – sorts imports
- pylint – provides static code analysis and enforces coding standard
- pyupgrade – modernizes code for given python version
- bandit – checks for security issues
Sphinx documentation
- basic preconfigured documentation template
- page with list of autogenerated third-party python packages list with licenses
Basic script to setup developer environment
Minimal README.md file with standard project setup description
Preconfigured semantic versioning with bump2version
Gitlab integration (default, optional):
- linter stage (pre-commit run –all)
- tests (pytest) + code coverage
- license checks of installed packages (e.g. fail build on GPL dependency)
- building and hosting documentation on GitLab Pages
- building package and uploading to private GitLab Package registry
- security: trivy scanning
Other less important files (more configurations, .gitignore etc)

We use precommits for every commit – this is important for us as it enforces good practices as habits and it is easier to fix linter issues on a small scale.

It takes time to set up many of the tools, especially tiresome might be to set up CI pipelines with YAML files – here everything just works and no tears are to be shed.

It is much easier to modify given example files than to search the web – adaptation for project needs like custom client coding guidelines takes less time too.

Of course, the template will evolve – for example, in the future we might migrate some linter checks into ruff after we battle-test it.

Cookiecutter template usage summary

The Machine Learning project template has gained great recognition among the deepsense.ai team, but we still want to develop it by analyzing new lessons learned and insights from new projects. However, we hope that the presented approach will be an inspiration to expand the idea of standardizing ML projects.

Paramount factors in successful machine learning projects. Part 2/2

January 9, 2023/in Machine learning /by Robert Bogucki and Jan Kanty Milczek

In the first part of our guide we focused on properly executing the entire process of building and implementing machine learning models with a focus on the main goal – solving the overarching business challenge. In the second part of our material we dig deeper into the topic of modeling.

MODEL. It’s never too late to learn.

Machine learning should be approached differently depending on the goal. If you’re taking part in a competition, you’ll want to focus mainly on feature engineering and experimentation – much of the other work will be done for you. In academia, researching previous approaches is vital. For commercial projects, you can likely skip a big portion of experimentation in favor of making doubly sure that you have a good idea of what is ultimately needed of you and how it fits into the big picture.

Understanding the goal and prioritizing the process accordingly is the key to success. The checklist presented below can help organize your efforts. While reading this checklist, you should decide which points are helpful means to your end and which would only slow you down.

Don’t shuffle your data (before reading this)
A lot of people set up their validation code to split the dataset into train and test and call it a day. This is very dangerous, as it is likely to induce data leaks. Are we sure that time is not a factor? We probably shouldn’t use the future to predict the past. Was it grouped by source? We should plan for any and all of those. A simple, yet very imperfect sanity check is to validate twice – once based on shuffled data, and once on a natural split (like the last lines of a file). If there’s a big discrepancy, you need to think twice.

Make the validation as real as possible
When testing your solution, try to imitate the exact scenario and environment the model will be used in as closely as possible. If it’s store sales prediction, you want a different model for new (or hypothetical) stores and existing locations. You likely want different validations too. In addition to a time split, you need to add some lag to account for model deprecation, and you likely want to prevent the same store from appearing in both the training and validation set. A rule of thumb is that “the worse results the validation gives, the better it likely is”.

Try to make the main metric “natural”
When making a model, look for metrics that make sense in the context of the problem. AUC is good internally, but I challenge anyone to explain it to a non-technical person. Precision and Recall convey information in a simpler way, but there’s multiple ways to set the prediction threshold, and in the end they are 2 numbers. F1 needs to die already. It takes 2 flawed numbers and combines them in a flawed way to arrive at an even more flawed number. Instead, you could ask “How much money is a true positive/negative worth? How much does a false positive/negative cost?” If there is one metric non-technical people understand, it’s $$$.

Set up your pipeline in a way that prevents mistakes
Modeling is often a long process. Structure your project and code in a way that ensures that avoidable mistakes are avoided. Obviously, the validation code should not care about the specifics of the model. The training code should not even have access to anything to do with validation.

Double check your framing
Establishing your validation pipeline is a nice opportunity to revisit how you framed the problem at hand and how this is being reflected in the pipeline. Plan a session or two with other stakeholders and domain experts as well to align on methodology, metrics, KPIs, initial results, and how it will all eventually fit.

MULTIPLE APPROACHES. No pain no gain.

Consistently developing good models is only possible when methodically testing hypotheses. You may believe you already know the best architecture/method for the job. What if your knowledge is outdated though? Or, worse still, you were wrong to begin with? The best way to verify is to carefully curate a gauntlet of different models, evaluate them and decide how they figure in the final solution.

Check random prediction & constant prediction
It might seem pointless to do so, but they provide some context to the problem at hand. For example, it’ll help you assess whether a model is failing to learn something at all, or just cannot break through some kind of glass ceiling.

Develop a simple rule-based model
It may be predicting the previous value for time series, a “group by XYZ, then use median” approach, a regex detecting common patterns… We expect that any model we build would at least be an improvement on that.

Try a classical ML model
Before using the newest and most capable tools at your disposal, try linear regression, gradient boosting or something similarly easy to define, train & use. See how much each model can squeeze out of the data and try to understand why. Do not perform non-obvious feature extraction/selection, that’s not the point here.

Test the standard approach for your problem class
Are you doing NLP? There’s probably a version of BERT for you. Financial modeling? Factorization Machines are likely still king. This not only helps you understand the problem better, but it may also become the foundation of the overall solution.

Use any kind of benchmark available
If there is a solution or an approach that has previously been used, test it. It may be your client’s current ruleset, or something you found in a research paper. It is paramount that you don’t spend too much time attempting to be worse than what’s already out there. Also, if possible, verify your validation pipeline – it should rank solutions in similar order as the public benchmarks. If it doesn’t, it’s important to investigate why.

Try ensembling, if you need it
Ensembling of different models (or different instances of the same model!) can help you squeeze out some more quality information from your data. It’s a powerful tool that regularizes your output. Please remember that the more distinct your models are, the better the ensemble performs. Just make sure not to overdo it. The time needed to build a big ensemble or use it for prediction quickly racks up.

After the above, you can get a little crazy
Now you know what the pre-existing approaches are and can make educated guesses about the quality of your own. It’s high time to test them. Leverage your validation/test/evaluation pipeline to make decisions for you. If you think an idea is silly, perhaps you can run it through the code anyway – there’s not much to lose.

FEATURE ENGINEERING. Learn to walk before you run.

The topic of feature engineering has fallen out of fashion in recent years due to the prevalence of deep learning and the focus on models doing “their own feature engineering”. This does not mean that it is obsolete. Especially, the smaller the dataset, the more important it is to help the model understand the data.

Be creative
Think about properties that can be (even remotely) useful for your algorithms. This is an area where some domain knowledge may help, but you should go beyond this. If experts use 5 specific variables to assess something, ask them about 20 more that may be related to the problem.

Copy from others (or from your past self)
It may be worthwhile to spend some time inspecting similar problems and features that worked well there. Treat this as an inspiration – even if you can’t translate them directly, you may be able to come up with a proxy or an analogy.

Be thorough
Some features can be extracted from the way data is collected. When predicting the click through rate for a commercial, looking at the user’s history is much more important than just the specific datapoint. A user who sees a lot of commercials in a short time frame is most likely a bot. We cannot infer this if we decide to treat each record as a separate entity.

Help your model
A lot of models are limited in terms of scope. Linear regression treats each feature separately to an extent – but we can make “new features” out of feature interactions (and feature interactions are not limited to cross-products). On the other hand, when using tree-based models, some (order-definable) categorical variables will work better as a discrete value than one-hot encoded.

Remove or change features that are too specific
A full address is probably not needed – as opposed to info about being located in a city or not, or the distance to the closest highway…

CONTINUOUS IMPROVEMENT. To err is a model, to understand is divine.
Your model will be wrong in the future and this may make someone unhappy. What if you can predict what mistakes can happen? What if you can somehow spot them? Fix them?

Understand errors, predictions, variables and factors
By understanding your model’s shortcomings very well, you might not only get some ideas on how to mend them but also design the whole thing to make them less severe.

Know your predictions
The same goes for predictions. What do they usually look like? Is there a bias? What are the typical results in typical scenarios?

Know how to change your (model’s) mind
What variables make it work? Do you know how your model reacts to changes in hyperparameters? You may check whether increasing/decreasing them gives the desired result.

Reassess your results and validation pipeline
Simple models might not be powerful and expressive enough to expose the shortcomings of the validation pipeline. Once your errors are fewer or smaller, make sure you revisit the validation method and make sure it stays solid throughout the whole process.

How to perform self-supervised learning on high-dimensional data

October 28, 2022/in Computer vision, Artificial Intelligence, Machine learning /by Łukasz Kuśmierz

In this post, the self-supervised learning paradigm is discussed. This method of training machine learning models is emerging nowadays, especially for high-dimensional data. In order to focus the attention of this article, we will only work on examples from the computer vision area. However, the methods presented are general and may be successfully used for problems from other domains as well.

Introduction

As data sets grow and tasks become more complicated, supervised learning and reinforcement learning approaches turn out to be harder to apply efficiently. The reason is that the feedback signal needed during training is becoming increasingly hard to obtain.

In this post, I present another learning paradigm which is free of this problem – self-supervised learning. This method of training machine learning models is emerging nowadays, especially for high-dimensional data.

In order to focus the attention of this article, we will only work on examples from the computer vision area. However, the methods presented are general and may be successfully used for problems from other domains as well.

The challenge of labeling in supervised learning and reinforcement learning

Current deep learning techniques work great in tasks where there is plenty of data that comes with a feedback signal. In supervised tasks, the feedback comes in the form of error signals computed based on the labels attached to each data point. Thus, we have to specify the correct output for a given input.

In reinforcement learning, a scalar feedback signal is provided from the environment. In this case we do not have to state how the agent should behave, but we should be able to assess how the agent has performed. This is usually done in the context of tasks that have a temporal aspect — the AI agent acting in its environment over multiple time steps must adjust its behavior based on possibly delayed and sparse reward signals.

The problem with the supervised approach is that it does not scale very well. It may be relatively easy to label 100 images, but usually we need thousands or millions of labeled examples before the model can learn the nuances of the task. All practitioners can probably confidently say that there are hardly ever enough labeled data points.

How to perform self-supervised learning on high-dimensional data - sheeps

Source: SOLO paper. If we want to train a good model to segment sheeps using supervised learning, we have to color every single sheep manually in hundreds of pictures.

In the context of reinforcement learning, it may be relatively easy to obtain many samples (episodes) from virtual environments, like with chess, Go, or Atari games. However, this is becoming much more difficult in the “wild” with an actual physical agent interacting with the real world. Not only is the environment richer and noisier, but it may not be feasible to obtain many episodes with bad actions (think about self-driving cars or AI controlling nuclear power plants).

This is one of the reasons why we almost always use transfer learning. Here, we take a model that was pre-trained on another dataset (e.g., in computer vision the standard practice is to use ImageNet) and use it as a starting point for training on our dataset. Note that ImageNet is in itself a fairly large labeled dataset. But there is so much more digital data available!

Could we somehow benefit from that data without laborious and time-consuming labeling? I will try to answer this question later in this article.

Pretext tasks

In the absence of labels, we do not have any clear task that can be written in terms of a cost function. How can we then learn from data itself?

The general strategy is to define an auxiliary, pretext task that gives rise to a self-supervised learning (SSL) signal. To do so, in general we must ask the model to predict some aspects of input data, possibly given some corrupted version of that data. Perhaps the most straightforward idea is to work in the input space and to ask the model to generate part of the input tensor given an input that had this part of the input masked or replaced. Such SSL methods are known as generative methods and have been a big hit in the context of natural language processing, where the masked words (or tokens) are predicted given the context defined by the surrounding text.

Similar ideas have been developed in the context of computer vision and other modalities.

For example, deep belief networks are a generative model that jointly learns to map inputs to latent representations and to generate the same inputs given the latent representation, whereas masked autoencoders are tasked with reconstructing patches (pixels) that have been randomly masked from the input image, which is achieved given the context (non-masked pixels). Although such methods can be effective, generating images, sounds, videos, and other high-dimensional objects that feature a lot of variability is a rather difficult task. It would be great if we could come up with a simpler pretext task to achieve the same goal…

Useful representations

Wait, but what do we want to achieve anyway? As mentioned before, we want to pretrain our model on unlabeled data and then use it in other (“downstream”) tasks where available labeled data is limited. Therefore, the model should use the unlabeled data to learn something useful about that data: something that is transferable to other similar datasets.

One way of looking at this is to say that we want the model to take an input $x$ (e.g., an image or a video clip) and output a vector $z$ that represents the input in some useful way. Following the literature, we will refer to this vector as representation or embedding. Of course, the crucial thing here is to specify what “useful” means; in general, this will depend on the downstream task.

Fortunately, many tasks share some common features that can be utilized to assess whether a given representation is useful. Indeed, classification, detection, and segmentation tasks in computer vision or speech recognition are characterized by important invariances (e.g., I can recognize an object regardless of its position in the image) and equivariances (e.g., a detection box should encompass the entire object regardless of its position in the image). Notice that I did not specify whether the object is a dog or a pineapple — in this sense these are very general features that do not depend on many other details of the task.

Augmentations

Ok, but how can we translate invariances into good representations? The basic idea is to ensure that the representation features the desired invariances. To do so, we should first state the invariances – the easiest way of doing it is to list, in the form of a procedural definition, transformations that our desired representation should be invariant to.

These transformations can be implemented in the form of a function that takes an input $x$ and outputs a new (possibly random) view $x’$. Note that the same procedure is used as part of a standard supervised learning pipeline where it is referred to as data augmentation. In practice, when working with images one could for example use albumentations or the torchvision.transforms module of PyTorch.

In this article I focus on computer vision where input is a static image, but the same methods can be adapted to other modalities including cross-modal self-supervision with, say, videos that contain both sound and a series of images. The crucial difference in how to deal with these other inputs lies in defining a good set of augmentations.

Invariance term: squeeze representations together

The next step is to formalize our intuition described above in the form of a cost function. As a reminder, our goal is to ensure that the representation (output of the model) is invariant under our chosen set of augmentations. To this end, let us take two views of the same input image $x^A$ and $x^B$ and pass them through the model obtaining a pair of joint embeddings $z^A$ and $z^B$. Next, we calculate the cosine similarity between these two representations $$\mathrm{sim}(z^A, z^B) \equiv \cos{(\phi)} =\frac{z^A \cdot z^B}{\Vert z^A\Vert \Vert z^B \Vert}.$$ Ideally, we should have $\phi=0$ (hence, $\cos{\phi}=1$), so we want to minimize the cost function of the form $l_{sim} = -\mathrm{sim}(z^A, z^B)$ which we will call invariance or similarity cost. As usual, this cost should be averaged over all images in the batch, leading to $$\mathcal{L}_{sim} = -\frac{1}{N}\sum_{i=1}^N \mathrm{sim}(z_i^A, z_i^B)$$ As the similarity cost decreases, representations of different views of the same image are pressed together, ultimately leading to a model that produces representations that are invariant under the set of transformations used to augment our dataset. However, this alone will not work as a good representation extractor.

Collapse

It is easy to understand why the similarity cost by itself is not enough. Take the following rather boring model

$$
f(x) = z_0,
$$

which ignores the input and always outputs the same representation (say, $z_0 = [1,1,…,1]$). Since this is the simplest solution to the optimization problem defined by $\mathcal{L}_{sim}$ and “everything which is not forbidden is allowed”, we can expect such undesirable solutions to appear frequently as we optimize $\mathcal{L}_{sim}$.

This is indeed what happens and this phenomenon is called a (representation) collapse. It is useful to think about the current self-supervised learning techniques in terms of how they avoid collapse. From this perspective, there are two main categories of SSL methods: contrastive and regularization-based. Below we describe in detail two popular examples, each relatively simple but still representative of its category.

Projection head

There is an additional detail: it is not actually $z$ that is being used in downstream tasks. Instead, as it turns out, it is more beneficial to use an intermediate representation $h$, see Fig. 1. In other words, the projection head $g$ that is used to calculate $z = g(h)$ is thrown away after the training. The intuition behind this trick is that the full invariance is actually detrimental in some tasks. For example, it is great if our model can report “dog” even if only the dog’s tail is visible in the image, but the same model should also be able to output “tail” or “dog tail” if it is asked to do so.

Fig. 1: A schematic diagram of a self-supervised training pipeline based on augmentations and joint embeddings. Although this image is taken from the SimCLR paper, the same overall strategy is employed in both contrastive and non-contrastive SSL methods.

Contrastive learning (SimCLR)

Contrastive learning methods can be thought of as generating supervision signals from a pretext discriminative task. In the past few years there has been an explosion of interest in contrastive learning and many similar methods have been developed. Here, let us focus on a famous example, SimCLR, which stands for “a simple framework for contrastive learning of visual representations”.

Indeed, the algorithm is pretty straightforward.

First, take a batch of images $(x_i)_{i\in\{1,..,N\}}$ where batch size $N$ should be large.
Second, for a given input image $x_k$ generate (sample) two views, $\tilde{x}_i$ and $\tilde{x}_j$. Note that this gives us a new, extended batch of augmented images of size $2 N$.
Third, apply the same base encoder $f$ and projection head $g$ to each sample in the extended batch obtaining “useful” representations $h_i = f(\tilde{x}_i)$ and “invariant” representations $z_i = g(h_i)$.
Fourth, optimize $f$ and $g$ jointly by minimizing the contrastive loss $\mathcal{L}_{InfoNCE}$.
Last, throw away $g$ and use $f$ in the downstream task(s).

But what is $\mathcal{L}_{InfoNCE}$? In the original paper this loss function was termed NT-Xent for the “normalized temperature-scaled cross entropy loss”, but it is basically a version of InfoNCE loss introduced in the Contrastive Predictive Coding paper, which in itself is a special case of noise-contrastive estimation. The main idea here is to split the batch into positive and negative pairs. Positive pairs are two different views of the same image and, as discussed above, their representations should be close to each other. The crucial idea is that all the other (“negative”) pairs are treated as non-matching pairs whose representations should be pulled apart. Note that this approximation makes sense only if the dataset is rich enough and contains many categories. In this case the likelihood that two randomly chosen images represent the same object (or two very similar objects) is small.

How to pull negative pairs apart? In SimCLR this is achieved by the following loss

$$
\mathcal{L}_{InfoNCE} = -\frac{1}{N}\sum_{i, j=P(i)} \log\frac{\exp\left(\mathrm{sim}(z_i, z_j)\right/\tau)}
{\sum_{k\neq i}\exp\left(\mathrm{sim}(z_i, z_k)\right/\tau)},
$$

where $P(i)$ returns the index of the other view of the same image (positive “partner”) and $\tau$ is a “temperature” hyperparameter that is introduced to adjust how strongly hard negative examples are weighed. One can think of this loss as a cross-entropy loss for multi-class classification with a softmax layer or, in other words, as a multinomial logistic regression.

The pretext task can be then summarized as follows: given a view of an image $\tilde{x}_i$, find the other view of the same image among the set containing all the other $2N – 1$ views of images in the extended batch. It is also easy to see that this loss function can be decomposed as

$$
\mathcal{L}_{InfoNCE} = \mathcal{L}_{sim} + \mathcal{L}_{con},
$$

where $\mathcal{L}_{sim}$ is the familiar similarity term discussed above

and $\mathcal{L}_{con}$ is a contrastive term that pulls all representations in the batch apart.

Additional details:

The base encoder can be any differentiable model. The authors of the original paper have opted for variants of ResNet-50 as this neural network has emerged as the standard architecture used to compare different methods.
In SimCLR the projection head is a simple multilayer perceptron with a single hidden unit. The dimensionality of $z$ does not have to be very large but it is important for the projection head to be nonlinear.
In the original paper the authors have presented the results of systematic experiments aiming to find the best set of augmentations: see Fig. 2 that shows the augmentations studied in the paper. They found that no single transformation is enough to learn good representations. The best results among pairs of transformations were obtained by combining random cropping with random color distortion. Interestingly, the authors have also included random Gaussian blur in their standard pipeline.
This method strongly benefits from relatively large batch sizes and long training sessions.
Some interesting limitations of this and other contrastive methods are discussed in this paper. If many objects are present in images, the dominant object may suppress the learning of statistics of smaller objects. Similarly, easy-to-learn shared features may suppress the learning of other features.

Fig. 2: Augmentations studied in the SimCLR paper.

Noncontrastive methods (Barlow Twins)

Noncontrastive methods avoid collapse without relying on negative pairs. This class is quite diverse and includes methods such as BYOL and SimSiam, which break the symmetry between two branches that generate two views (and their representations) of input images, as well as methods based on clustering like ClusterFit or SwAV.

Another idea is to minimize the redundancy between components of $z$. The reduction of redundancy is the cornerstone of the efficient coding hypothesis, a theory of sensory coding in the brain proposed by Horace Barlow, hence the name Barlow twins. Here, the cost function is based upon the cross-correlation matrix $\mathcal{C}$ of size $M\times M$, where $M$ is the number of representation neurons (dimensionality of $z$). As before, two views of each image in the batch are passed through the network leading to two representations per image, $z^A$ and $z^B$.

Previously we have used the notation $z_i$ to denote the representation of an image $i$. To better understand Barlow Twins, we have to extend our notation. Let $z_{i,\alpha}$ denote the $\alpha$-th component (neuron) of vector $z_{i}$ and $\overline{z_{\alpha}}=(1/N)\sum_{i}z_{i,\alpha}$ the batch average of that component. Each component can be z-scored (normalized) over the batch

$$
u_{i,\alpha} = \frac{z_{i,\alpha} – \overline{z_{\alpha}}}{\sqrt{\overline{z_\alpha^2}-\overline{z_{\alpha}}^2}}.
$$

The cross-correlation matrix is defined as

$$
\mathcal{C}_{\alpha\beta}
=
\overline{u^A_{\alpha} u^B_{\beta}}.
$$

Note that only positive pairs are averaged over the batch here.

The loss is then defined as

$$
\mathcal{L}_{BT}
=
\sum_{\alpha} \left( (1 – \mathcal{C}_{\alpha\alpha})^2
+
\lambda \sum_{\beta\neq\alpha} \left(\mathcal{C}_{\alpha\beta}\right)^2
\right).
$$

The contrastive term is absent from this loss and collapse is avoided due to a different mechanism, which can be understood by analyzing two terms in the Barlow Twins loss. The first, invariance term is trying to push all the diagonal terms of the cross-correlation matrix towards $1$ (perfect correlation).

Components of $z^A$ and $z^B$ are perfectly correlated when $z^A$ and $z^B$ are identical, as desired from the invariance principle. The second, redundancy reduction term is trying to decorrelate different neurons (components of $z$). This has the effect of the output neurons to contain non-redundant information about the inputs, leading to non-trivial representations.

Additional details:

Barlow Twins do not need nearly as large batch sizes as SimCLR.
Unlike SimCLR, Barlow Twins benefit very strongly from a large dimensional output (invariant) representation $z$.

Summary and additional reading

Self-supervised learning is here to stay to complement supervised learning and reinforcement learning whenever getting enough labels or feedback signals from the environment becomes troublesome. As we saw, the key to beneficial training in a self-supervised manner is to smartly define the pretext task and to set the loss function carefully.

For those who would like to deepen their knowledge of this topic, I recommend the blog article on self-supervised learning written by Yann LeCun and Ishan Misra.

Data Science with Graphs – using knowledge graphs on the data before it reaches the ML phase

June 11, 2022/in Machine learning /by Grzegorz Rybak

Graph usage in AI recently became quite evident with an increased number of research papers and some impressive examples among the industry . This article aims to answer the question: Are there ways to improve a project’s delivery by using graphs even before reaching GraphML?

Introduction

“We can do a lot more with data to make systems that appear intelligent before we reach for the ML pipeline” – Dr. Jim Webber, Chief Scientist at Neo4j [“Graphs for AI and ML” conference talk]

Graph usage in AI recently became quite evident with an increased number of research papers [figure 1] and some impressive examples among the industry of using Graph Neural Network-based architectures – like the AlphaFold 2 – turning the community’s attention to GraphML.

But perhaps the focus on this area solely overshadows the wider look at the applications of graphs in data science and some of the other advantages they can bring in general?

This article aims to answer the question: Are there ways to improve a project’s delivery by using graphs even before reaching GraphML?

Spoiler: yes – it can happen.

Figure 1 - increased popularity of graph neural networks as indicated by the keyword usage in ICLR'21 paper submissions — Figure 1: increased popularity of graph neural networks as indicated by the keyword usage in ICLR’21 paper submissions. [Source: State of AI report 2021]

Knowledge graphs & graph databases

Starting with the basics, graphs are an abstract data type existing in various forms and shapes of implementation. Here is a couple of examples of graph entities for context:

Graph algorithms (notably, the famous Dijkstra for shortest path-finding or PageRank, the original algorithm for Google’s search engine)
Data Ontologies
Tim Berner-Lee’s Semantic Web concept.
GraphQL (graph-resembling API design)
3D mesh structures

For this article’s needs, I will focus on one particular type of a graph application: graph databases.

Graph databases are an implementation of graphs enhanced with efficient persistence of the data, a robust graph query language to ask for data and its relationships, and (depending on the DB vendor) an in-DB interactive graph data visualisation for easier data exploration and analysis.

Graph DBs introduce the terminology of “nodes” for data subjects/objects and “edges” for relationships between the data.

They can also extend the graph data structure itself with new features – in particular, Labelled Property Graph databases allow for giving labels to different data nodes (which can work as a types or even “tags” system), and add properties on both nodes and relationships.

Whilst one can model their database so that the relationships can have any meaning (or even no meaning at all and their purpose is to only connect the data, for example in a specific geometric pattern), a particularly interesting aspect of graph databases for data science purposes is that they support forming knowledge graphs – a graph data representation in which connections between the objects have a defined, semantic meaning allowing for enhanced reasoning while traversing through the graph.

Figure 2 - example of a knowledge graph — Figure 2: example of a knowledge graph. [Source: Zhou, Zhixuan & Huankang, Guan & Bhat, Meghana & Hsu, Justin. (2019). Fake News Detection via NLP is Vulnerable to Adversarial Attacks.]

Knowledge graphs require a “knowledge base” – a resource (or many resources combined) defining the semantics of the relationships between the objects in the graph. In a strict sense (commonly accepted as the “proper” way by the knowledge graphs community), such a knowledge base should be a formal description of each possible relationship in the domain – for example, an automotive industry-focused knowledge graph could be built on top of the schema.org’s taxonomy for a “vehicle” concept.

However, I will relax this strict knowledge base definition and consider any kind of a data model describing entities and relations between them as knowledge base too, as this will simplify introducing the forthcoming concepts while still being factual.In other words, for as long as your graph data reflect the meanings you defined through any kind of ontology, taxonomy, or a data model created specifically for your project, it will be considered a knowledge graph in this article.

Figure 3 - A simplified view on the differences between a Graph Database & a Knowledge Graph — Figure 3: A simplified view on the differences between a Graph Database & a Knowledge Graph [Sources: data model, taxonomy, ontology]. *Data models are considered as a knowledge base for the purpose of this article.

There are two main graph DB technologies on the market and both, in principle, support creating knowledge graphs:

RDFs (Resource Description Framework)
LPGs (Labelled Property Graphs)

Whilst directly comparing the two technologies is out of the scope of this article, here is a useful article showing the main differences between them.

Having defined the common understanding of the knowledge graph concept in the context of the graph databases, let’s now proceed with exploring how they can be leveraged.

Graphs on the data science project – what’s more there apart from graphML?

To answer this question, I will make use of a great presentation by Dr. Victor Lee (VP of ML at Tigergraph – one of the LPG DB vendor companies) at the latest edition of the Connected Data World conference:

“Graph Algorithms & Graph Machine Learning: Making Sense of Today’s Choices”

During the talk, Dr. Lee breaks down a typical AI project into five main stages:

Data Acquisition
Data Cleansing
Feature Extraction/Selection
Model Training
>Model Deployment

The first three stages form a great basis for the list of benefits that I would like to expand on based on personal findings from past graph projects – they are as follows:

Intuitive data modeling.
Improved data exploration & data discovery.
Faster data model iterations & enhanced flexibility when changing the data model.
Enhanced feature engineering/selection abilities, specifically:
1. Easier querying of inter-connected (or indirectly connected) data than in the tabular data – improved features selection.
2. Data-enriching graph transformations (above all: data paths creation, computed properties, in-graph data restructuring).
Graph data science algorithms (depending on the DB vendor), including:
1. Centrality & Community detection, Link prediction, graph similarity, and other useful algorithms adding extra information dimensions to your data.

Let’s now analyse these benefits one by one in context of the highlighted AI project stages from Dr. Lee’s presentation.

Graph benefits at different project stages

Figure 4 - AI project stages breakdown – in this article, I will focus on the first 3 stages prior to the actual ML phase — Figure 4: AI project stages breakdown – in this article, I will focus on the first 3 stages prior to the actual ML phase. [Source: “Dr. Victor Lee – Graph Algorithms & Graph Machine Learning: Making Sense of Today’s Choices” – Connected Data World 2021 conference]

Stage 1: Data Acquisition

Figure 5 - high-level visualisation of constructing data in a graph database format – in this example, mapping Twitter data — Figure 5: high-level visualisation of constructing data in a graph database format – in this example, mapping Twitter data.

Benefit 1: Intuitive data modeling

Graphs are simple in data modeling design (in other words, they are “whiteboard friendly”). They are basically mind maps, which make it easy to intuitively plan out the data stream and design the graph’s data model as there is no need to worry about any primary/foreign keys or structure rules that your data needs to follow.

Additionally, mind maps help explain the data modeling ideas to the business stakeholders of the project because they are usually more comprehensible than e.g. SQL relations mappings.

Extra benefit: Leverage open data standards via data ontologies (RDF DBs-specific feature)

In what can be described as perhaps the closest thing to transfer learning in the database world, RDF databases support building knowledge graphs on top of open-sourced data ontologies, i.e. publicly available knowledge graph data models of specific or general domains. This, in turn, can make the process of modeing your data model much faster. As described by one of the RDF DB vendors, Ontotext, in one of their articles:

“>Now you may be thinking that’s all well and good but creating a realistic map of all of these relationships sounds like a herculean task to begin with. You wouldn’t be wrong, except that you don’t have to build out a knowledge graph from scratch.”

“Graphs on the Ground Part I: The Power of Knowledge Graphs within the Financial Industry” article, section: “Taking Advantage of External Knowledge” [Source]

For example, if you need to map the financial data in your project, you could use the entirety or fragments of the Financial Industry Business Ontology as part of your own data model.

Figure 6 - excerpt of the Financial Industry Business Ontology — Figure 6: excerpt of the Financial Industry Business Ontology (FIBO) [Source].

Note, some LPG DB vendors also provide ways to leverage external ontologies in LPGs via database extensions, for example Neo4j with their neosemantics plugin (example: FIBO in a neo4j LPG DB.

Stage 2: Data cleansing

Figure 7 - An overview of the data refactoring workflow — Figure 7: An overview of the data refactoring workflow.

Benefit 2: Improved data exploration & data discovery

Visualise, explore, and interact with your data more easily

One of the key benefits of bringing data into a graph format is that many graph databases provide interactive visualisation of your data. This means the developer/data scientist can often get more insights and “see” the data better than when it’s in a tabular format. This also means it’s easier to find discrepancies and bad data patterns in your data and fix them at this stage, rather than performing a post-mortem analysis of a badly performing ML model.

Figure 8 - In-DB graph visualisation for a given graph data query in Stardog, an RDF-based graph database — Figure 8: In-DB graph visualisation for a given graph data query in Stardog, an RDF-based graph database (in other words: features selection). [Source]

More efficient human-driven entity resolution

Oftentimes, data sources used on the project have a varying degree of quality with regards to duplicates. In situations where there are multiple data sources in your project – especially when the data is mutually inclusive, the problem can increase significantly – bringing data together results in conflicts or hidden duplicate entities lurking all over your data.

Figure 9 - inter-connected data — Figure 9: inter-connected data

The below dataset from “what’s cooking” series of articles, creating a food metrics knowledge graph from a collection of recipes https://medium.com/neo4j/whats-cooking-part-5-dealing-with-duplicates-a6cdf525842a shows a glimpse of this on real-life data (and that’s from one data source only):

Figure 10 - “What’s cooking ?” neo4j blog posts series. The article also introduces automated approaches to entity-resolution via graph algorithms — Figure 10: “What’s cooking ?” neo4j blog posts series. The article also introduces automated approaches to entity-resolution via graph algorithms [Source]

Therefore, being able to visualise the data allows for easier spot-checking of such phenomena compared to standard, tabular data. Additionally, in applicable scenarios, it allows more efficient cooperation with your data analysts or domain experts to identify and remove such cases without them knowing the technical details of your database (or query language).

Benefit 3: Easier data model iterations

Flexibility and ease of iterations

Graph databases are “schema-less” and they don’t require defining any structure/constraints/rules in advance when you start building/expanding your graph’s structure (a.k.a. “data model”).

Additionally, practically any problem can be shaped into a graph format, because graphs are conceptually easy -> they’re just a bunch of vertices and edges between them.

These two characteristics make graph DBs highly flexible and agile. And because they’re unconstrained by any design rules, they are highly iteration-friendly – changing a graph’s data model is much less hassle than e.g. de-normalising an SQL schema.

All this unlocks an extra degree of freedom on the project – you can focus on discovering your data “as it unfolds” rather than set up your data’s schema early (perhaps when you don’t really know the data yet) and then fight through any changes, or worse, stick to a schema that won’t let you answer project-critical questions.

[https://youtu.be/GekQqFZm7mA?t=640 the 10:40-13:14 fragment of this talk tells more about this.]

Stage 3: Feature Extraction/Selection

Figure 11 - Example of a non-trivial query — Figure 11: Example of a non-trivial query: Find all users mentioned by related tweets to which user X commented on

Benefit 4: Enhanced feature engineering/selection abilities via the graph query language

Ease of querying indirectly linked data

One of the main advantages of graphs is how “cheap” it is to traverse through them, compared to costly joins in relational databases.

The figure above presents how to perform a rather non-trivial ask for all people mentioned by any tweet related to the tweet our “main user” commented on. Such data requests require multiple “hops” through our data, yet, it was roughly 4 lines-long.

Here’s an example of how a similarly inter-connected data query can, in extreme cases, compare to data in a relational database:

Figure 12 - Comparing SQL vs Graph queries when asking deeply-connected data — Figure 12: Comparing SQL vs Graph queries when asking deeply-connected data. Credits.

If your project often relies on questions like: “what are the next Nth connections to element X and how do they change if I change X?” Or if you have to query paths between your data on a daily basis – storing data in a graph can be a big advantage due to how much time you will save when querying that data – both in terms of developing and maintaining the queries, as well as from the DB performance perspective.

Note: this section was used with Cypher – a graph query language implemented across a couple of LPG DBs including Neo4j, RedisGraph & Memgraph – as an example to compare against SQL. An RDF graph DB query (using the SPARQL query language) would look a bit different; here’s a comparison.

Data-enriching graph transformations

Convenient graph query languages capable of data transformations bring another advantage to the table – you can use them to perform calculations on your graph data and persist them.

Notable examples of enhancing your data:

Identify and create new paths /data relationships between given data points.
Generate computed properties based on close/distant relationships
Restructure/re-shape your data directly in-graph.

This very benefit of graph databases is leveraged by trase.finance – a graph DB-based platform for tracking down, monitoring, and calculating the biggest contributors to deforestation in the Brazilian and Indonesian tropical forests regions.

As shown in the figure below, trase.finance uses data-enriching graph transformations (specifically, computed properties generation) to dynamically propagate the deforestation risk and deforestation volume parameters across the company’s legal hierarchy and shareholding chains.

Figure 13 - Trase.finance platform using graphs to dynamically propagate calculations on interconnected data — Figure 13: Trase.finance platform using graphs to dynamically propagate calculations on interconnected data (figure taken from the methodology page)

Benefit 5: Graph data science

Finally, as the last piece of the superpowers coming with data stored in graph databases, let’s briefly mention the various useful graph algorithms, often directly built-into the graph DBs. As this is the area where what is “just” data science and what is graphML is the most blurry, I will give only a high-level overview of it here so that it can be explored more along with other ML solutions in a future article dedicated to GraphML.

This aspect tends to differ the most between RDFs & LPGs and, within them, their various DB vendors and implementations, but the main concept relies on the same – leverage how the data is structured to perform an algorithm on it and come up with meaningful analytics.

A common scenario for graph DBs, especially seen across Labelled Property Graphs’ DB vendors, is to contain a separate “add-on” library that implements a set of algorithms “out of the box”.

One example of that is the TigerGraph database with its Graph Data Science Library:

Sample group of graph algorithms implemented in Tigergraph’s Graph Data Science Library

Another example is Neo4j DB with its GDS library, here’s a helpful infographic of their sample graph algorithms suite:

Graph Algorithms & Functions in Neo4j — Source. Note: this is a historical infographic, there are many more algorithms in the current version, you can inspect it here.

Turning to the RDF world, an extra strength of theirs is that apart from potential built-in graph algorithms RDF DB vendors, thanks to the integration with the ontologies at the core level, implement a so-called reasoning engine which allows the DB engine to draw inferences from the connections.

For example, imagine having a knowledge graph for an automotive industry business use case. Due to the reasoning, you could query the data for anything that’s a vehicle and the results would give us exactly that, even if the graph didn’t have any kind of “is type of vehicle” connection for our nodes. This is because, provided the data is correctly built on ontologies and thus the connections have true semantics, the DB engine can infer that e.g. any motorcycles are also a vehicle because the motorcycle entity is defined via an ontology (such as this one: https://schema.org/Motorcycle). Therefore, you could consider this feature of RDF DBs as a form of machine learning since, as long as you keep the coherent semantics of connections between the data in the data model, the DB engine has an “understanding” of what are the objects in the database.

Figure 14 - Simple example of a basic reasoning — Figure 14: Simple example of a basic reasoning – DB engine will infer to give you both “The Beatles” and “John Lennon” results when asked about artists since either “Band” or “SoloArtist” have a semantic meaning [Source]

In terms of graph algorithms library examples similar to the highlighted LPG vendors, it appears less prominently than in the LPGs – for example, Ontotext, one of the leading RDF DB vendors provide a handful of graph analytics features via their plugins. The most similar example of a DB vendor offering a comparable library of out-of-the-box graph algorithms is Stardog with their Graph Algorithms module.

While the showcased graph analytics can be considered largely representing unsupervised ML, there are examples of Supervised ML in both graph DB types as well.

Stardog DB expands the idea of reasoning and allows creating classification, regression, and similarity models via using the semantically-defined objects in the graph as labels. More details can be found in their documentation.

For LPGs, Neo4j’s GDS library recently began supporting supervised ML algorithms too – as of writing this article, these are: logistic regression and random forest methods, along with two built-in ML pipelines: node classification and link prediction, which rely on the methods mentioned.

Finally, some DB vendors also provide interfaces for users to make their own custom graph algorithms and models if needed. For example, Memgraph DB exposes a Python-expandable query module object along with docs providing explanations on how to extend it [source]

It is likely, for certain use cases – e.g. community-detection or for certain constrained path-finding business requirements – that the out-of-the-box graph algorithms could be sufficient enough for any AI needs on a project and become the sole reason to try using graph databases.

Conclusions and what’s next

The above mentioned five benefits of using graphs in a data science project answer the original question posed in the introduction section – it is possible to improve the project’s delivery by using graphs without even reaching the graphML stage.

Putting the data in your project into a knowledge graph structure through one of the graph databases can bring several advantages to the data workflow and in certain cases, due to the appearing graph data science features, reduce the need for implementing the commonly perceived graphML approach altogether.

The next article in the series will look into the ways of integrating the actual ML pipelines with graphs (with or without prior usage of graph databases) as well as review potential data challenges coming with using them in the project. Stay tuned for that!

Further learning resources

Introduction to Graph Theory: A Computer Science Perspective – Useful introduction (or a refresher) to the topic of graphs as an abstract data type.
6 – Graph Data Science 1 6 What’s New – great presentation of both the general AI in graphs overview from a graph DB vendor’s perspective as well as an overview of the features in Neo4j’s graph data science library (note there has been a 2.0 version of the library recently released, more information here.
Dr. Victor Lee – Graph Algorithms & Graph Machine Learning: Making Sense of Today’s Choices | CDW21 – an overview of the AI landscape in graphs.
https://levelup.gitconnected.com/knowledge-graph-app-in-15min-c76b94bb53b3 – an interesting, graph DB-agnostic example of how to quickly prototype a knowledge graph app to test graph’s usefulness on the project without committing to either of the two main graph DB technologies.

Logo detection in sports sponsorship

August 3, 2021/in Machine learning /by Michał Tadeusiak and Krzysztof Dziedzic

Consumers love brands that bring them closer to sporting events. This has compelled the largest brands to jump headlong into sports sponsorship. While the benefits of sponsorship are undeniable, measuring ROI precisely remains a challenge.

With machine learning-powered tools, brands can evaluate a campaign’s ROI by analyzing video coverage of sponsored events. Thanks to computer vision solutions, it is possible to precisely determine the time and place a brand was positioned during a sporting event. Image analysis also allows companies to see the branding activities of competitors and compare their brands’ visibility. Having precise information on brand positioning also enables advertising equivalent value to be calculated, and the most impactful events to be determined and the activities of competitors to be monitored. Such analyses would be extremely time consuming and far less accurate if performed manually. Automated analysis based on advanced machine learning algorithms allows brands to quickly provide new valuable insights and boost the effectiveness of marketing campaigns. To address these needs, deepsense.ai developed an automated tool for logo detection and visibility analysis that provides both raw detection and a rich set of statistics.

Solution overview

deepsense.ai’s solution is based on a combination of two approaches – supervised object detection and one-shot learning.

Supervised object detection approach

In this approach, to train models on a labelled data set, one of the many well-tested architectures can be used. They include fully convolutional YOLO / SSD or the R-CNN family (regional convolutional neural networks). Since video streaming is essential to logo detection during sports broadcasts, a fully convolutional model is the best choice. In this case, the model does not have to process each proposal region separately and has a constant inference time, independent of the number of objects detected. This enables the model to run in real time.

The advantages of this approach include:

Simplicity of operation, well-developed use cases (many open, tested and ready-to-use implementations).

But there are also disadvantages:

It’s impossible to quickly add a new version of a logo without obtaining a large amount of training data;
The system is very sensitive to changes in the appearance of the logo, and updating a model trained as described above would require a large amount of new data.

Therefore, in order to increase the efficiency of the system, the approaches were combined (supervised object detection + one-shot learning).

The One-shot learning approach

This approach effectively solves the problem of dynamic logos and allows us to add new logos to the database without the need to collect large amounts of data. All we need is to have a reference set of template vectors for each supported logo and model detecting logo region proposals without performing classification on them. The model is trained by a process known as triplet loss.

During the training process, each mini-batch consists of a three element tuple:

Example of company A logo (anchor),
Photo with regions with the company’s logo (positive),
Image with regions with a different brand’s logo (negative).

For architecture, we use the fully convolutional YOLOv3 model, which will both embed the template set of logos into a certain relatively low-dimensional vector space and detect objects in photos (but without assigning them specific classes).

During training, the “template logo” (anchor) is encoded by the same model that is used for object detection. The one difference is that on the map of features extracted for anchor, we calculate average pooling to obtain a single feature vector-anchor vector.

An approximate diagram of this process is presented in the figure below.

The optimized target function in this case is triplet margin loss – a differentiable function that yields small values if the vector representing the positive region is close to the anchor pattern vector (the logos are similar) and the negative region vector is far away from it (they are not similar).

After training, the model will process the available template logos to create a template vector database for each of the classes supported. In the inference after the logo region is detected, we extract the vector representing this region and compare it with the patterns. The label of the most similar is selected as the class of the given detection.

Updating a model trained in this way requires only that new elements be added to the reference vector database, or the “old” logo be replaced with “new” one, without having to train the model for optimization issues.

Parallelization of the stream

To speed up the system’s performance, we parallelized the stream without processing it “frame by frame”. As the streaming data flowed in gradually over time, we opted not to use “batch” inference with one instance of the model. In this context, it is also important to synchronize the processes in order to return the processed stream elements in chronological order.

We initialize n workers (parameter, natural number). Each of them is simply a pair of YOLOv3 detection networks (one trained with a supervised method and the other with one-shot).
We create a FIFO queue to which we throw the incoming data from the stream and from which workers collect frames for processing.
To ensure the chronology, the processed frames are thrown by workers into the heap,
A separate, looped process checks that the heap is not empty. If it isn’t, then the element with the smallest id is taken from it; if the value is 1 greater than the id of the last processed frame, then we update the value of this variable and return the processed frame, alternatively the frame and id are thrown back onto the heap.

The diagram below presents an approximate scheme of the system.

Enabling live processing, this approach significantly improves the performance.

Logo detection analytics

Automated logo detection analytics helps advertisers evaluate the results of sponsorships by providing a series of statistics, charts, rankings and visualizations that can be assembled into a concise report. The statistics can be calculated globally and per brand. Some features include brand exposure size in time, heatmaps of a logo’s position on the screen and bar charts to allow you to easily compare various statistics across the brands. Last but not least, we have a module for creating highlights – visualizations of the bounding boxes detected by the model. This module serves a double purpose: in addition to making the analysis easy to track, such visualizations are also a source of valuable information for data scientists tweaking the model.

Paramount factors in successful machine learning projects. Part 1/2.

May 21, 2022/in Machine learning /by Robert Bogucki

Much has been said about the effective running of machine learning projects. However, the topic keeps coming up. Data Scientists spend a lot of time discussing modeling methods, while – in my opinion – the overarching goal of running machine learning projects in companies fades into the background. It is vital to remember that the purpose of ML projects is not modeling itself, but achieving defined business goals.

While for data scientists modeling is often the most exciting part of the job, the other steps of an ML project should not be neglected, as doing so may imperil the valuable business results we set out to achieve in the first place. Properly executing the entire process of building and implementing machine learning models is essential.

PROCESS. As you make your bed, so you must lie upon it.

The entire ML project process can be described as a five-point checklist.

FRAMING – the main goal here is to determine the essence of a business problem and phrase it in Data Science lingo.
DATA – fuels the whole solution, so we need to painstakingly examine and understand it.
MODELING – building the model is the core activity and often is viewed as the most exciting part.
PRESENTATION & CONTINUATION – for our efforts to be truly fully appreciated, both the results and the solution must be described in a way that it is understandable and useful for business stakeholders.
PRODUCTION & MAINTENANCE – because data science projects require so much experimentation, how they are to be brought to the production stage may end up being neglected. Is the code up to snuff? Do my timelines need to be adjusted? Will the solution we produce ultimately solve the problem stakeholders need solved? Make sure that you think about this at some point.

FRAMING. All that glitters is not gold.

Often, before we start a project, it seems that everyone has the same understanding of the basic concepts and purpose. It is worth double-checking to ensure they do. A project deepsense.ai did for a client in the banking sector may serve as a good example of why. Our job was to predict churn, which seemed straightforward enough at first glance. The problem was that everyone’s underlying interpretation of “churn” was slightly different. Only a series of detailed questions allowed us to agree on what exactly we define as “churn,” taking into account the time horizon, specific customer activities and eventual net profit from this customer. The more thoroughly we analyze and define a business problem, the more precisely we will be able to transfer it into metrics.

These are the questions that are worth answering during this phase:

How will the solution be used?

Determine at the outset in what context the results of our modeling will be used. Will it be input for business decisions, support for process automation or just some improvements within the system.

How should performance be measured?

Discuss the KPIs and establish how project success will be measured. The importance of this point shouldn’t be underestimated.

How will the solution be tested and validated?

Consider how your validation pipeline should look like during the development and what additional testing should be done during and after it. Remember that business stakeholders must be able to confirm that the solution works well.

Will additional requirements or limitations arise?

Last but not least, analyze technical capabilities and possible limitations related to technical constraints, data extraction or tolerable latency.

DATA. A bad workman blames his tools.

The garbage in / garbage out principle is well known. I would encourage you to look at it not only from the perspective of data, but also of the model development.

Key aspects at this stage include:

Understand what data can be available, and request it early

Due to the complexity of business processes, we almost always encounter difficulties in obtaining properly prepared data. Take this into account and plan more time to request and validate data – possibly even before we officially kick off the project.

Understand how the data was extracted and preprocessed

Surprises are rare here, but when they occur, they tend to be big ones. The person responsible for data extraction may make certain decisions or errors that can significantly distort the results – either by accident, miscommunication or simply out of a lack of knowledge of modelling practices. It is always a good idea to understand the data extraction process and later thoroughly review the data with the business and data owners to confirm that you are all on the same page.

Explore the data

First of all, double-check that you have all the data you requested. Then perform the critical step of pre-modelling Exploratory Data Analysis. It should help you understand the data, the problem itself, generate insights, discover patterns, spot anomalies and test hypotheses. You can approach the data exploration as if it were up to you, rather than the model, to generate predictions. Don’t forget the simple stuff at this stage: ask basic questions, compute statistics, check features and labels distribution.

Confront your findings with the data owners

With all the findings from the EDA, conclusions, insights, and new hypotheses, talk to the data owners and business stakeholders. Confronting all of these early and with people that know the data and problem (hopefully better and with more intuition than we have) will provide a better foundation for modelling.

MODEL. It is never late to learn.

I hope I won’t disappoint you here, but given the importance of modelling and people’s eagerness to do it, I’m going to cover it in detail in the second part. As for now, just for the sake of order, let’s just cover a few key aspects.

Try a number of different approaches

Modeling is a process of constant experimentation and it is always worth trying a number of different approaches to models, features and hyperparameters. The key thing here is to do this in the proper order – starting from basic benchmarks and standard or off-the-shelf approaches before rolling up your sleeves and unleashing your creativity.

Understand errors, important variables, predictions, …

Understanding those will give us ideas for improvement as well as allow us to catch potential issues or anomalies in time.

Confront your findings and predictions with business stakeholders

Whatever your findings, make sure they are either expected or you work on explaining them. Another good educational exercise is asking business stakeholders what they expect the data to reveal and cross-checking this with what you discover.

Do at least a couple of iterations

Don’t forget the experimental nature of modeling. Developing a good solution usually takes time and requires multiple iterations. Don’t be afraid to revisit earlier steps if necessary – especially after discovering something new about the problem/data or stalling.

PRESENTATION & CONTINUATION. All’s well that ends well.

How and why does your solution achieve your business objective?

A well-thought-out presentation of the project’s results may determine its success or failure. That is why we need to pay particular attention to emphasizing the business value of the project and the impact of the modeling on specific business processes. Otherwise business stakeholders may not understand the solution we deliver or its actual value. Doing the work is one thing but solving the problem and convincing others that we have done so is quite another.

What steps are necessary towards full-deployment/productionization?

More often than not the PoC doesn’t fully reflect the project’s business value. Therefore, at this stage, ask yourself “what’s next?” in order to determine what further developments are needed to achieve the desired results.

Optional: Prepare for the handover

If you are aiming for a full handover, be sure that both sides are on the same page – no one likes surprises here.

PRODUCTION & MAINTENANCE. Don’t count your chickens before they hatch.

Make sure your code is production-ready

Because of the experimental nature of data science, code quality and general software engineering principles may take a back seat to modelling. Going “live” is the last call to account for this.

Monitor, measure, and retrain only if necessary

Make sure that you monitor both inputs (feature space) and outputs (your predictions and actual labels if possible). Detecting any data shifts late will surely make many people unhappy. As for retraining, understand how often it has to be done and figure out the right degree of automation.

THE DEEPSENSE.AI TAKEAWAY

Many businesses have already learned the value of putting machine learning to use. The role of Data Science or ML teams is growing and advanced data analysis is becoming a key factor to support strategic imperatives. It is therefore crucial that data scientists keep in mind the main goal of the project – solving or improving a specific business problem (as opposed to just playing with ML). The modelling itself may well be the key or core ML activity, yet if unaccompanied by all the other steps it won’t achieve the ultimate goal. Hopefully, the above checklist will improve your collaboration with business stakeholders and bring greater success to your projects.

Machine learning for applications in retail and manufacturing

February 25, 2021/in Machine learning /by deepsense.ai

Machine learning is still perceived as an innovative approach in business. The technological progress and the use of Big Data in business make ML-based solutions increasingly important. As Forbes magazine indicates, 76% of enterprises today prioritize artificial intelligence and machine learning over other IT initiatives [1].

The concept of machine learning is derived from advanced data analysis of pattern and dependency recognition. ML assumes that when models analyze new data, they can adapt independently and use this new knowledge to develop by learning from previous experiences. Machine learning models can enhance nearly every aspect of a business, from marketing and sales to maintenance.

Predictive maintenance

Machine learning enables predictive monitoring, with algorithms anticipating equipment breakdowns before they occur and scheduling timely maintenance. With the work it did on predictive maintenance in medical devices, deepsense.ai reduced one client’s downtime by 15%.

But it isn’t just in straightforward failure prediction where machine learning supports maintenance. In another recent application, our team delivered a system that automates industrial documentation digitization, effectively reducing workflow time by up to 90%. We developed a model that recognizes and adds descriptions for all symbols used in the installation documentation. The schematics, including the technical descriptions of all components, are fully digitalized. The model reduces the work to a 30-minute review by a specialist. It also handles the most tedious tasks, thus reducing the effort required of human specialists and the number of errors they make in performing them.

An international manufacturer of medical devices was looking for a solution that would reduce device downtime. Our experts built a predictive maintenance model that pores over historical data, searching for anomalies and signs of a breakdown before one occurs. The model reduced breakdown-related downtime by more than 15%. Such a solution can be applied in machine-reliant industries, where breakdowns bring operations to a halt and hamper overall company performance.

Quality control

Machine learning is also being adopted for product inspection and quality control. ML-based computer vision algorithms can learn from a set of samples to distinguish the “good” from the flawed. In particular, semi-supervised anomaly detection algorithms require only “good” samples in their training set, making a library of possible defects unnecessary. Alternatively, a solution can be developed that compares samples to typical cases of defects.

One of our clients asked us to tackle two of their visualization problems on a food production line – to detect sauce smears on the product’s inner packaging, and to identify correct positioning of the product’s topping. The system deepsense.ai delivered was able to identify 99% of faulty products with topping defects and while raising the alarm with a 99% accuracy rate for sauce smears. The model significantly reduced the need for manual quality control, hence lowering costs.

ML-based computer vision solutions can also be an essential component in the monitoring of hazardous areas in factories, tracking whether every worker is following safety requirements (including wearing helmets, glasses, vests, earmuffs etc.) and, if not, sending an instant alert to the supervisor with a detailed description of the event that has occurred.

Demand forecasting

In the field of predictive analytics demand forecasting can be used to predict consumer demand. Such forecasting is done by analyzing statistical data and looking for patterns and correlations. With machine learning taking the practice to a higher level, modern demand forecasting techniques go far beyond simple historical data analysis.

More recent techniques combine intuition with historical data. Modern merchants can dig into their data in a search for trends and patterns. At the pinnacle of these techniques sit demand forecasting machine learning models, including gradient boosting and neural networks, which are currently the most popular types and outperform classic statistics-based methods. Historical data from transactions form the basis of more recent demand forecasting techniques. These are data that sellers collect and store for fiscal and legal reasons. Because they are also searchable, these data are the easiest to use.

This modern approach is extremely effective. One of our clients from the retail industry was losing millions of euros a year due to out-of-stocks. There was a daily cap on how many new items its warehouse could receive. Our team built a demand forecasting model for products that were new to market. It enables the company to use the cap more efficiently by ordering more hot products and fewer of those that are less in demand. We used Gradient Boosting, Random Forest and Neural Networks to build the model, and the trifecta reduced out-of-stocks by 30%.

Marketing optimization

Companies can maximize ROI on their marketing activities by implementing machine learning into their customer analysis. Sophisticated data analysis helps identify customers with the highest ROI on ads to make the most of marketing campaigns. It also optimizes channel mix with advanced attribution models.

deepsense.ai designed a model for a leading mobile advertising platform that predicts the click-through rate of internet advertisements. The model analyzes historical data on site user behavior to spot patterns and uncover anomalies. It enabled clients to identify an abnormal pattern among users, which turned out to be bots engaging in fraudulent clicking. The solution effectively identified internet bots that click ads, significantly boosting CTR predictions – up to 90% of bots were spotted and CTR predictions were improved by up to 35% over existing heuristics.

Summary

Analyzing large amounts of data has become a crucial part of the retail and manufacturing business landscape. It has traditionally been done by experts, based on know-how honed through experience. With the power of machine learning, however, it is now possible to combine the astonishing scale of big data with the precision and intelligence of a machine-learning model. While the business community must remain aware of the multiple pitfalls it will face when employing machine learning, that it endows business processes with awesome power and flexibility is now beyond question.

[1] https://www.forbes.com/sites/louiscolumbus/2021/01/17/76-of-enterprises-prioritize-ai–machine-learning-in-2021-it-budgets/?sh=6d24c5e3618a

AI solutions that are designated to bring value in banking

AI solutions are boosting value in banking

August 10, 2020/in Machine learning /by Oleh Plakhtiy and Dawid Nguyen

According to research done by Business Insider Intelligence, banks will make some $450 billion by 2023 by applying Artificial Intelligence. No wonder, then, that AI is playing an increasingly important role in financial institutions’ roadmap for the coming years – 75% of banks with over $100 billion in assets already have AI strategies in place.

Banks don’t just create AI strategies, but are increasingly using AI and Machine Learning in their day-to-day business. We often work with them on ideation workshops, PoC and solution implementation. As an example, we recently had the opportunity to work with Santander Consumer Bank, running workshops and researching how to use ML to boost the sustainability of loan portfolios. We were able to significantly reduce risk while maintaining the same acceptance rate for extending loans.

Apart from credit risk modelling, there is already an impressive range of use cases for AI in banking, covering everything from customer service to back-office operations. Here is a list of the most common AI solutions in the banking sector:

Customer service automation

Chatbots– applying chatbots to automate customer service increases customer satisfaction. In fact, most simple issues can be solved entirely without human interference. Behind the scenes, automation significantly reduces customer service workloads.
Biometric identification enables explicit or unnoticed identity verification within remote channels. This can include voice identity verification in call centers or typing manner verification in online banking.

Customer insights:

Customer 360 view – applying deep learning to customer analytics makes it easier to combine insights from various data sources (e.g. transactions, online banking logs, call center interactions). This helps us better understand a bank’s customers and build personalized recommendations and Intelligent customer assistants, making the business more responsive and efficient.
Churn prediction – Thanks to accurate AI algorithms, churn probability predictions improve customer retention. This is important as customers often churn without obvious warning signs. Thus, it is difficult to run precisely targeted anti-churn campaigns. On the other hand, retention activities can be expensive, sometimes much more so than the value a potential customer may bring.
Customer life-time value is often used to understand how valuable a particular relation is and to optimize other activities – for example, by integrating Customer Lifetime Value with a probability-of-churn function to focus retention activities on the most valuable clients.

Boosting Sales

New client acquisition – Deep learning is particularly suited to improving remarketing. As with the customer 360 view, it promotes the use of all possible information about a prospective customer. This includes cookies and how the individual has interacted with a website – from time spent to what they hovered over and how far into the site they went. Understanding customer behaviour enables a bank to focus marketing activities on potential customers and show them personalized ads, translating into even 2,5x uplift from advertising activities.
X-sell – ML techniques can be used to improve the selection of customers targeted for outbound CRM campaigns. They combine the benefits from both the Customer 360 view and advanced probability of purchase predictions. This allows a bank to choose the right customer and the right product to cross-sell. As an example, ML has been shown to improve credit card x-sell by 12,5%.

Credit risk management

Loan application assessment – Machine Learning can analyze unstructured data (e.g. transaction descriptions) more thoroughly than other techniques and find non-obvious dependencies. ML techniques can also be combined with traditional scoring models to get even better results.
Fraud detection – ML enables nearly fully automated fraud detection, adapting to individual patterns and changing behaviors. It can be applied in areas where a high volume of events needs to be analyzed in real time, e.g. in card payments. AI can find complex correlations, so even the wildest purchase will make sense to AI. Those it can’t wrap its algorithms around will lead to the detection of fraud.
Debt collection strategies – AI algorithms can generate a customized communication strategy for each customer. It will adjust the contact channel, recommending script for CC, or propose a schedule.
Continuous portfolio evaluation – detecting SME clients, whose risk of default has risen. This enables banks to react rapidly and start the recovery process before other creditors do.

Back-office optimization

Workflow documentation – classifying incoming emails to go to the appropriate department (sales, complaints, support) and customer segmentation (individual, SME, Corporate) reduces the manual work involved with organizing customer service departments.
Process automation – including for cash operations, trade finance, credit application processing, accounting processes.

Summary

A wide range of ML and AI applications is increasingly being used to solve real business problems in banking. As AI becomes more popular, those applications will become the market standard.

This article was prepared in cooperation with Santander Consumer Bank

Credit risk modelling with Machine Learning

Using machine learning in credit risk modelling

May 5, 2021/in Data science, Machine learning /by deepsense.ai

Cost of risk is one of the biggest components in banks’ cost structure. Thus, even a slight improvement in credit risk modelling can translate in huge savings. That’s why machine learning is often implemented in this area.

We would like to share with you some insights from one of our projects, where we applied machine learning to increase credit scoring performance.To illustrate our insights we selected a random pool of 10 000 applications.

How the regular process works

Loan applications are usually assessed through a credit score model, which is most often based on a logistic regression (LR). It is trained on historical data, such as credit history. The model assesses the importance of every attribute provided and translates them into a prediction.

The main limitation of such a model is that it can take into account only linear dependencies between input variables and the predicted variable. On the other hand, it is this very property that makes logistic regression so interpretable. LR is in widespread used in credit risk modelling.

Credit scoring from a logistic regression model

What machine learning brings to the table

Machine learning enables the utilization of more advanced modeling techniques, such as decision trees and neural networks. This introduces non-linearities to the model and allows to detect more complex dependencies between the attributes. We decided to use an XGBoost model fed with features selected with the use of a method called permutation importance.

Credit scoring from tree-based model

However, ML models are usually so sophisticated that they are hard to interpret. Since a lack of interpretability would be a serious issue in such a highly regulated field as credit risk assessment, we opted to combine XGBoost and logistic regression.

Combining the models

We used both scoring engines – logistic regression and the ML based one – to assess all of the loan applications

With a clear correlation between the two assessment approaches, a high score in one model would likely mean a high score in the other.

Loan applications assessed by 2 models

In the original approach, logistic regression was used to assess applications. The acceptance level was set around 60% and the risk resulted at 1%

Initial credit application split (acceptance to portfolio risk)

If we decrease the threshold by a couple of points, the acceptance level hits 70% while the risk jumps to 1,5%

Credit applications’ split after lowering the threshold

We next applied a threshold for an ML model, allowing us to get an acceptance percentage to the original level (60%) while bringing the risk down to 0,75% that is by 25% lower than the risk level resulting from only traditional approach.

Credit applications’ split after applying Machine Learning

Summary

Machine learning is often seen as difficult to apply in banking due to the sheer amount of regulation the industry faces. The facts don’t necessarily back this up. ML is successfully used in numerous, heavily regulated industries. The example above is just one more example of how. Thanks to this innovative approach it is possible to increase the sustainability of the loans sector and make loans even more affordable to bank customers. There’s nothing artificial about that kind of intelligence.

3D meets AI – an unexplored world of new business opportunities

May 22, 2020/in Data science, Deep learning, Machine learning /by Krzysztof Palczewski, Jarosław Kochanowicz and Michał Tadeusiak

AI has become a powerful force in computer vision and it has unleashed tangible business opportunities for 2D visual data such as images and videos. Applying AI can bring tremendous results in a number of fields. To learn more about this exciting area, read our overview of 2D computer vision algorithms and applications.

Despite its popularity, there is nothing inherent to 2D imagery that makes it uniquely suitable for AI application. In fact, artificial intelligence systems can analyze various forms of information, including volumetric data. In spite of the increasing number of companies already using 3D data gathered by lidar or 3D cameras, AI applications aren’t the mainstream in their industries.

In this post, we describe how to leverage 3D data across multiple industries with the use of AI. Later in the article we’ll have a closer look at the nuts and bolts of the technology and we’ll aslo show what it takes to apply AI to 3D data. At the end of the post, you’ll also find an interactive demo to play with.

In the 3D world, there is no Swiss Army Knife

3D data is what we call volumetric information. The most common types include:

2.5D data, including information on depth or the distance to visible objects, but no volumetric information of what’s hidden behind them. Lidar data is an example.
3D data, with full volumetric information. Examples include MRI scans or objects rendered with computer graphics.
4D data, where volumetric information is captured as a sequence, and the outcome is a recording where one can go back and forth in time to see the changes occurring in the volume. We refer to this as 3D + time, which we can treat as the 4th dimension. Such representation enables us to visualize and model dynamic 3D processes, which is especially useful in medical applications such as respiratory or cardiac monitoring.

There are also multiple data representations. These include a compound of 2D images along the normal axis, sparse Point Cloud representation and voxelized representation. Such data could have additional channels, like reflectance in every point of a lidar’s view.

Depending on the business need, there can be different objectives for using AI: object detection and classification, semantic segmentation, instance segmentation and movement parameterization, to name a few. Moreover, every setup has its own characteristics and limitations that should be addressed with a dedicated approach (or, in the case of artificial neural networks, with a sophisticated and thoroughly designed architecture). These are the main reasons our clients come to us, and to take advantage of our experience in the field. We are responsible for delivering the AI part of specific projects, even though the majority of their competencies are built in-house.

Let us have a closer look at a few examples

1. Autonomous driving

Task: 3D object detection and classification,
Data: 2.5 Point clouds captured with a lidar: sparse data, big distances between points

Autonomous driving data are very sparse because:

the distances between objects in outdoor environments are significant
In the majority of cases lidar rays from the front and rear of the car don’t return to lidar, since there are no objects to reflect them.
The resolution of objects gets worse the further they are from the laser scanner. Due to the angular expansion of the beam it’s impossible to determine the precise shape of objects that are far away.

For autonomous driving, we needed a system that can take advantage of data sparsity to infer 3D bounding boxes around objects. One such network is the part-aware and aggregation neural network i.e. Part-A2 net (https://arxiv.org/abs/1907.03670). This is a two-stage network that uses the high separability of objects, which functions as segmentation information.

In the first stage, the network estimates the position of foreground points of objects inside bounding boxes generated by an anchor-based or anchor-free scheme. Then, in the second stage, the network aggregates local information for box refinement and class estimation. The network output is shown below, with the colors of points in bounding boxes showing their relative location as perceived by the Part-A² net.

Source of image: From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

2. Indoor scene mapping

Task: Object instance segmentation
Data: Point clouds, sparse data, relatively small distances between points

A different setup is called for in mapping indoor environments, such as we do with instance segmentation of objects in office space or shops (see this dataset for better intuition: S3DIS dataset). Here we employ a relatively high-density representation of a point cloud and BoNet architecture.

In this case the space is divided into a 1- x 1- x 1-meter cubic grid. In each cube, a few thousand points are sampled for further processing. In an autonomous driving scenario, such a grid division would make little sense given the sheer number of cubes produced, many of which are empty and only a few of which contain any relevant information.

The network produces semantic segmentation masks as well as bounding boxes. The inference is a two-stage process. The first produces a global feature vector to predict a fixed number of bounding boxes. It also tallies scores to indicate whether some of the predicted classes are inside those boxes. The point-level and global features derived in the first stage are then used to predict a point-level binary mask with the class assignment. The pictures below show a typical scene with the segmentation masks.

3D meets AI - Indoor scene mapping — An example from the S3DIS dataset. From left: input image, semantic segmentation labels, instance segmentation labels

3. Medical diagnosis

Task: 3D Semantic segmentation
Data: Stacked 2D images, dense data, small distance between images

This is a highly controlled setup, where all 2D images are carefully and densely stacked together. Such a representation can be treated as a natural extension of a 2D setup. In such cases, modifying existing 2D approaches will deliver satisfactory results.

An example of a modified 2D approach is the 3D U-Net (https://arxiv.org/abs/1606.06650), where all 2D operations for a classical U-Net are replaced by their 3D counterparts. If you want to know more about AI in medicine, check out how it can be used to help with COVID-19 diagnosis and other challenges.

3D meets AI - Medical diagnosis — Source: Head CT scan

4. A 3D-enhanced 2D approach

There is also another case, where luckily, it can be relatively straightforward to apply expertise and technology developed for 2D cases in 3D applications. One such scenario is where there are 2D labels available, but the data and the inference products are in 3D. Another is when 3D information can play a supportive role.

In such a case, a depth map produced by 3D cameras can be treated as an additional image channel beyond regular RGB colors. Such additional information increases the sensitivity of neural networks to edge detection and thus yield better object boundaries.

3D meets AI - A 3D-enhanced 2D approach — Source: Azure Kinect DK depth camera

Examples of the projects we have delivered in such a setup include:

Defect detection based on 2D and 3D images.

We developed an AI system for a tire manufacturer to detect diverse types of defects. 3D data played a crucial role as it allowed for ultra-precise detection of submillimeter-size bubbles and scratches.

Object detection in a factory

We designed a system to detect and segment industrial assets in a chemical facility that had been thoroughly scanned with high resolution laser scanners. Combining 2D and 3D information allowed us to digitize the topology of the installation and its pipe system.

3D data needs a mix of competencies

At deepsense.ai, we have a team of data scientists and software engineers handling the algorithmic, visualization, and integration capabilities. Our teams are set up to flexibly adapt to specific business cases and provide tailor-made AI solutions. The solutions they produce are an alternative to pre-made, off-the-shelf products, which often prove too rigid and constrained; they fail once user expectations deviate from the assumptions of their designers.

Processing and visualizing data in near real time with appropriate user experience is no piece of cake. Doing so requires a tough balancing act, including

combining specific business needs, technical limitations resulting from huge data loads and the need to support multiple platforms.

It is always easier to discuss based on an example. Next section shows what it takes to develop an object detection system for autonomous vehicles with outputs accessible from a web browser. The goal is to predict bounding boxes of 3 different classes: car, pedestrian and cyclist, 360 degrees around the car. Such a project can be divided into 4 interconnected components: data processing, algorithms, visualizations and deployment.

Data preprocessing

In our example, we use the KITTI and A2D2 datasets, two common datasets for autonomous driving, and ones our R&D hub rely on heavily. In both datasets, we use data from spinning lidars for inference and cameras for visualization purposes.

Lidars and cameras work independently, capturing data at different rates. To obtain a full picture, all data have to be mapped to a common coordinate system and adjusted for time. This is no easy task. As lidars are constantly spinning, each point is captured at a different time, while simultaneously the position and rotation of the car in relation to world coordinates is changing. Meanwhile, the precise location and angle of the car is not known perfectly due to limitations of geolocation systems such as GPS. These difficulties make it extremely difficult to precisely and stably determine the absolute positions of objects around you (SLAM can be used to tackle some of the problems).

Fortunately, absolute positioning of objects around the vehicle is not always required.

Algorithms

There are a vast number of approaches when it comes to 3D data. However, factors such as the length to and between objects and high sparsity will play an essential role in which algorithm we ultimately settle on. As in the first example above, we used Part-A2 net.

Deployment

We have relied on a complete, in-house solution for visualization, data handling, and UI. We have used expertise in the Unity engine to develop a cross-platform, graphically rich and fully flexible solution. In terms of a platform, we opted for maximum availability, which can be satisfied by a popular web browser like Chrome or Mozilla and WebGL as Unity’s compilation platform.

Visualization/UI

WebGL, while very comfortable for the user, disables drive access and advanced GPU features, limits available RAM to 2GB and processing to a single thread. Additionally, while standalone solutions in Unity may rely on existing libraries for point cloud visualization, making it possible to visualize hundreds of millions of points (thanks to advanced GPU features), this is not the case in WebGL.

Therefore, we have developed an in-house visualization solution enabling real-time, in-browser visualization of up to 70 mln points. Give it a try!

Such visualization could be tailored to the company’s specific needs. In a recent project, we took a different approach: we used AR glasses in visualizing a factory in all its complexity. This enabled our client to reach next level user experience and see the factory in a whole new light.

Summary

We hope that this post has shed some light on how AI can be used with 3D data. If you have a particular 3D use case in mind or you are just curious about the potential for AI solutions in your field, please reach out to us. We’ll be happy to share our experience and discuss potential ways we can help you apply the power of artificial intelligence in your business. Please drop us an email at contact@deepsense.ai.

Why the ML project template and code generators?

General rules for creating Machine Learning project template

1. Harnessing experiences from different projects (both DS and SE)

2. Adapting to software house project style

3. Adding tools and configurations that are useful

4. Making onboarding as easy as possible

5. Handing over ownership to ensure independence

6. Being PEP and other best practices complaint

7. Being cautious

deepsense.ai standard template

cookiecutter.json file

Hooks

Project template directory

Tests

Documentation & CI

Overview of our cookiecutter project

Cookiecutter template usage summary

MODEL. It’s never too late to learn.

MULTIPLE APPROACHES. No pain no gain.

FEATURE ENGINEERING. Learn to walk before you run.

META

Introduction

The challenge of labeling in supervised learning and reinforcement learning

Pretext tasks

Useful representations

Augmentations

Invariance term: squeeze representations together

Collapse

Projection head

Contrastive learning (SimCLR)

Noncontrastive methods (Barlow Twins)

Summary and additional reading

Introduction

Knowledge graphs & graph databases

Graphs on the data science project – what’s more there apart from graphML?

Graph benefits at different project stages

Stage 1: Data Acquisition

Benefit 1: Intuitive data modeling

Stage 2: Data cleansing

Benefit 2: Improved data exploration & data discovery

Benefit 3: Easier data model iterations

Stage 3: Feature Extraction/Selection

Benefit 4: Enhanced feature engineering/selection abilities via the graph query language

Benefit 5: Graph data science

Conclusions and what’s next

Further learning resources

Solution overview

Supervised object detection approach

The One-shot learning approach

Parallelization of the stream

Logo detection analytics

PROCESS. As you make your bed, so you must lie upon it.

FRAMING. All that glitters is not gold.

DATA. A bad workman blames his tools.

MODEL. It is never late to learn.

PRESENTATION & CONTINUATION. All’s well that ends well.

PRODUCTION & MAINTENANCE. Don’t count your chickens before they hatch.

THE DEEPSENSE.AI TAKEAWAY

Predictive maintenance

Quality control

Demand forecasting

Marketing optimization

Summary

Customer service automation

Customer insights:

Boosting Sales

Credit risk management

Back-office optimization

Summary

How the regular process works

What machine learning brings to the table

Combining the models

Summary

In the 3D world, there is no Swiss Army Knife

Let us have a closer look at a few examples

1. Autonomous driving

2. Indoor scene mapping

3. Medical diagnosis

4. A 3D-enhanced 2D approach

3D data needs a mix of competencies