Monitoring brand visibility and measuring ROI on marketing campaigns are important business challenges, especially in ad-driven industries. Brands frequently have to function in cluttered advertising spaces, including outdoors or on websites, with limited exposition times. In this post we describe our deep learning solution for automated logo detection and visibility analytics.
How many people does your brand reach?
Brands are often promoted through sponsorship campaigns at sports and cultural events. These events attract large numbers of viewers both directly and via media reports, allowing brands to get favorable positioning. However, sponsorship contracts often come at a steep price, so brand owners are naturally more than a little interested in finding out how effectively their outlays are working for them. The problem is, it’s difficult to assess quantitatively just how great the brand exposure is.
The current approach to computing such statistics is based on manually annotating broadcast material, which is tedious and expensive. In order to address this problem, we have developed an automated tool for logo detection and visibility analysis, providing both raw detections and a rich set of statistics.
We decided to break down the problem into two steps: logo detection with convolutional neural nets and an analytics step where summary statistics are computed.
The main advantage of this approach is that it is straightforward to swap the analytics module for a different one if different types of statistics are called for, or even if the neural net is to be trained for a completely different task (we had plenty of fun modifying this system to spot and count coins – stay tuned for a future blog post on that).
Logo detection with deep learning
There are two principal approaches to object detection with convolutional neural networks: region-based methods and fully convolutional methods.
Region-based methods, such as R-CNN and its descendants, first identify image regions which are likely to contain objects (region proposals), then extract these regions and process them individually with an image classifier. This can be quite slow, a problem that can be remedied to some extent with Fast R-CNN, where the image is processed by the convolutional network as a whole and then region representations are extracted from high-level feature maps. Faster R-CNN is a further improvement where region proposals are also computed from high-level CNN features, which accelerates the region proposal step.
Fully convolutional methods, such as SSD, do away with processing individual region proposals and instead aim to output class labels at the region proposal step. This approach can be much faster, since there is no need to extract and process region proposals individually. In order to make this work for objects with very different sizes, the SSD network has several detection layers attached to feature maps of different resolutions.
Since real-time video processing is one of the requirements of our system, we decided to go with the SSD method rather than Faster R-CNN. Our network also uses ResNet-50 as its convnet backbone, rather than the default VGG-16. This made it much less memory-hungry, while also helping to stabilize the training process.
In the process of refining the SSD architecture for our requirements, we ran dozens of experiments. This was an iterative process with a large delay between start and completion of an experiment (typically 1-2 days). In order to run numerous experiments in parallel, we used Neptune, our machine learning experiment manager. While the experiment is running, Neptune captures the values of the loss function and other statistics, displaying them in a friendly web UI. Additionally, it can capture images via image channels and display them, which helped greatly with troubleshooting the different variations of data augmentation that we tested.
Logo detection analytics
The model we produced generates detections just fine. However, when even a short video is analyzed, the raw description can span thousands of lines. To help humans analyze the results, we created software that translates these descriptions into a series of statistics, charts, rankings and visualizations that can be assembled into a concise report.
The statistics are calculated globally and per brand. Some of them, like brand display time, are meant to be displayed, but many are there to fuel the visual representation. Speaking of which, the charts are really expressive in this task. Some features include brand exposure size in time, a heatmap of a logo’s position on the screen and bar charts to allow you to easily compare various statistics across the brands. Last but not least, we have a module for creating highlights – visualizations of the bounding boxes detected by the model. This module serves a double purpose, because such visualizations are also a source of valuable information for data scientists tweaking the model.
We processed a short video featuring a rivalry between Coca-Cola and Pepsi to see which brand received more exposure in quantitative terms. You can watch it on YouTube by following this link. Which logo has better visibility?
Below, you can compare your guesses with what our model reported:
We found that our system can easily be adapted for other object detection tasks. This is a major advantage because object detection often appears in the context of computer vision projects, either as a goal in itself or as part of a longer chain of processing steps.
There are many business problems where object detection can be helpful and at deepsense.ai we have worked on a number of them. Here’s a partial list of work we’ve done:
- For one of our clients we have developed a solution for extracting information about ingredients from photographs of FMCG products, using object detection networks to locate the list of ingredients on the product photograph. This helps our client make data collection more efficient and automatic.
- Recently, we took part in Kaggle’s iMaterialist challenge, and rose to the task of building a machine learning model that recognizes product attributes based on images. This system can be used to automatically assign tags to products in online retail, helping customers to find more products they may be interested in (a cross-selling use case). You can read more about this solution in our blog post.
- Our team also placed 4th in Kaggle’s Satellite Imagery Feature Detection challenge. The goal was to segment satellite images into 10 different classes of objects. Possible applications include agriculture, disaster relief and environmental studies. See our blog post for more details.