Logo detection and brand visibility analytics

Logo detection and brand visibility analytics – example

Monitoring brand visibility and measuring ROI on marketing campaigns are important  business challenges, especially in ad-driven industries. Brands frequently have to function in cluttered advertising spaces, including outdoors or on websites, with limited exposition times. In this post we describe our deep learning solution for automated logo detection and visibility analytics.

How many people does your brand reach?

Brands are often promoted through sponsorship campaigns at sports and cultural events. These events attract large numbers of viewers both directly and via media reports, allowing brands to get favorable positioning. However, sponsorship contracts often come at a steep price, so brand owners are naturally more than a little interested in finding out how effectively their outlays are working for them. The problem is, it’s difficult to assess quantitatively just how great the brand exposure is.

The current approach to computing such statistics is based on manually annotating broadcast material, which is tedious and expensive. In order to address this problem, we have developed an automated tool for logo detection and visibility analysis, providing both raw detections and a rich set of statistics.

Related:  Deep learning for satellite imagery via image segmentation

Solution overview

We decided to break down the problem into two steps: logo detection with convolutional neural nets and an analytics step where summary statistics are computed.

Logo detection system overview

 

The main advantage of this approach is that it is straightforward to swap the analytics module for a different one if different types of statistics are called for, or even if the neural net is to be trained for a completely different task (we had plenty of fun modifying this system to spot and count coins – stay tuned for a future blog post on that).

Logo detection with deep learning

There are two principal approaches to object detection with convolutional neural networks: region-based methods and fully convolutional methods.

Region-based methods, such as R-CNN and its descendants, first identify image regions which are likely to contain objects (region proposals), then extract these regions and process them individually with an image classifier. This can be quite slow, a problem that can be remedied to some extent with Fast R-CNN, where the image is processed by the convolutional network as a whole and then region representations are extracted from high-level feature maps. Faster R-CNN is a further improvement where region proposals are also computed from high-level CNN features, which accelerates the region proposal step.

Fully convolutional methods, such as SSD, do away with processing individual region proposals and instead aim to output class labels at the region proposal step. This approach can be much faster, since there is no need to extract and process region proposals individually. In order to make this work for objects with very different sizes, the SSD network has several detection layers attached to feature maps of different resolutions.

 

Logo detection convolutional net

 

Since real-time video processing is one of the requirements of our system, we decided to go with the SSD method rather than Faster R-CNN. Our network also uses ResNet-50 as its convnet backbone, rather than the default VGG-16. This made it much less memory-hungry, while also helping to stabilize the training process.

Related:  What is the best method of efficiently training machine learning for teams?

Model training

In the process of refining the SSD architecture for our requirements, we ran dozens of experiments. This was an iterative process with a large delay between start and completion of an experiment (typically 1-2 days). In order to run numerous experiments in parallel, we used Neptune, our machine learning experiment manager. While the experiment is running, Neptune captures the values of the loss function and other statistics, displaying them in a friendly web UI. Additionally, it can capture images via image channels and display them, which helped greatly with troubleshooting the different variations of data augmentation that we tested.

Logo detection - Neptune screenshot

Logo detection analytics

The model we produced generates detections just fine. However, when even a short video is analyzed, the raw description can span thousands of lines. To help humans analyze the results, we created software that translates these descriptions into a series of statistics, charts, rankings and visualizations that can be assembled into a concise report.

The statistics are calculated globally and per brand. Some of them, like brand display time, are meant to be displayed, but many are there to fuel the visual representation. Speaking of which, the charts are really expressive in this task. Some features include brand exposure size in time, a heatmap of a logo’s position on the screen and bar charts to allow you to easily compare various statistics across the brands. Last but not least, we have a module for creating highlights – visualizations of the bounding boxes detected by the model. This module serves a double purpose, because such visualizations are also a source of valuable information for data scientists tweaking the model.

Related:  Five hottest big data trends 2018 for the techies

Results

We processed a short video featuring  a rivalry between Coca-Cola and Pepsi to see which brand received more exposure in quantitative terms. You can watch it on YouTube by following this link. Which logo has better visibility?

Below, you can compare your guesses with what our model reported:

Logo detection report

Possible extensions

We found that our system can easily be adapted for other object detection tasks. This is a major advantage because object detection often appears in the context of computer vision projects, either as a goal in itself or as part of a longer chain of processing steps.

There are many business problems where object detection can be helpful and at deepsense.ai we have worked on a number of them. Here’s a partial list of work we’ve done:

  • For one of our clients we have developed a solution for extracting information about ingredients from photographs of FMCG products, using object detection networks to locate the list of ingredients on the product photograph. This helps our client make data collection more efficient and automatic.
  • Recently, we took part in Kaggle’s iMaterialist challenge,  and rose to the task of building a machine learning model that recognizes product attributes based on images. This system can be used to automatically assign tags to products in online retail, helping customers to find more products they may be interested in (a cross-selling use case). You can read more about this solution in our blog post.
  • Our team also placed 4th in Kaggle’s Satellite Imagery Feature Detection challenge. The goal was to segment satellite images into 10 different classes of objects. Possible applications include agriculture, disaster relief and environmental studies. See our blog post for more details.

Related Posts

6 replies
    • Michal Romaniuk
      Michal Romaniuk says:

      Hi Reev,

      Thanks for reading. For the initial feasibility study we had a dataset with a bunch of popular brands annotated in a few dozen images each, but since then we have switched to generating artificial images by pasting template logos with some color distortion onto background images which were verified to have none of the logos we’re interested in (inspired by https://arxiv.org/abs/1612.09322). The main advantage of this second approach is that we can train for 100 most valuable brands and then also add new ones (without having to go out and collect a lot of data). It also reduces the chance of overfitting image context (e.g. car logos appearing on a car grille). The main drawback is that the data you get in this way is not too realistic. It may also be necessary to check your background images again when you add new logos, unless you were strict enough to only include images with no logos at all.

      Reply
  1. Max
    Max says:

    Hello,

    I was wondering if I could use your model to detect and recognize brand logo’s.

    Is there an API or another way I can use your model?

    Kind Regards,

    Max Teeuwen

    Reply
    • Michal Romaniuk
      Michal Romaniuk says:

      Hi Max,

      We’re not making the code publicly available, but it’s a fairly typical SSD on top of ResNet with a few tweaks. As of mid-2018, I would probably suggest using some standard implementation of Retinanet or Mask R-CNN (there are quite a few on Github – look for ones that are stable and show some test results to prove they really do work). Most of the effort is then in getting hold of some data, feeding it into the training process in the right format and making sure the data augmentation doesn’t mess anything up.

      If you have any specific questions, feel free to ask.

      Reply
  2. Max
    Max says:

    Hello,

    I was wondering if I could use your model to detect and recognize brand logo’s.

    Is there an API or another way I can use your model?

    Kind Regards,

    Max Teeuwen

    Reply
    • Michal Romaniuk
      Michal Romaniuk says:

      Hi Max,

      We’re not making the code publicly available, but it’s a fairly typical SSD on top of ResNet with a few tweaks. As of mid-2018, I would probably suggest using some standard implementation of Retinanet or Mask R-CNN (there are quite a few on Github – look for ones that are stable and show some test results to prove they really do work). Most of the effort is then in getting hold of some data, feeding it into the training process in the right format and making sure the data augmentation doesn’t mess anything up.

      If you have any specific questions, feel free to ask.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *