Logo detection in sports sponsorship
Consumers love brands that bring them closer to sporting events. This has compelled the largest brands to jump headlong into sports sponsorship. While the benefits of sponsorship are undeniable, measuring ROI precisely remains a challenge.
With machine learning-powered tools, brands can evaluate a campaign’s ROI by analyzing video coverage of sponsored events. Thanks to computer vision solutions, it is possible to precisely determine the time and place a brand was positioned during a sporting event. Image analysis also allows companies to see the branding activities of competitors and compare their brands’ visibility. Having precise information on brand positioning also enables advertising equivalent value to be calculated, and the most impactful events to be determined and the activities of competitors to be monitored. Such analyses would be extremely time consuming and far less accurate if performed manually. Automated analysis based on advanced machine learning algorithms allows brands to quickly provide new valuable insights and boost the effectiveness of marketing campaigns. To address these needs, deepsense.ai developed an automated tool for logo detection and visibility analysis that provides both raw detection and a rich set of statistics.
Solution overview
deepsense.ai’s solution is based on a combination of two approaches – supervised object detection and one-shot learning.
Supervised object detection approach
In this approach, to train models on a labelled data set, one of the many well-tested architectures can be used. They include fully convolutional YOLO / SSD or the R-CNN family (regional convolutional neural networks). Since video streaming is essential to logo detection during sports broadcasts, a fully convolutional model is the best choice. In this case, the model does not have to process each proposal region separately and has a constant inference time, independent of the number of objects detected. This enables the model to run in real time.
The advantages of this approach include:
- Simplicity of operation, well-developed use cases (many open, tested and ready-to-use implementations).
But there are also disadvantages:
- It’s impossible to quickly add a new version of a logo without obtaining a large amount of training data;
- The system is very sensitive to changes in the appearance of the logo, and updating a model trained as described above would require a large amount of new data.
Therefore, in order to increase the efficiency of the system, the approaches were combined (supervised object detection + one-shot learning).
The One-shot learning approach
This approach effectively solves the problem of dynamic logos and allows us to add new logos to the database without the need to collect large amounts of data. All we need is to have a reference set of template vectors for each supported logo and model detecting logo region proposals without performing classification on them. The model is trained by a process known as triplet loss.
During the training process, each mini-batch consists of a three element tuple:
- Example of company A logo (anchor),
- Photo with regions with the company’s logo (positive),
- Image with regions with a different brand’s logo (negative).
For architecture, we use the fully convolutional YOLOv3 model, which will both embed the template set of logos into a certain relatively low-dimensional vector space and detect objects in photos (but without assigning them specific classes).
During training, the “template logo” (anchor) is encoded by the same model that is used for object detection. The one difference is that on the map of features extracted for anchor, we calculate average pooling to obtain a single feature vector-anchor vector.
An approximate diagram of this process is presented in the figure below.
The optimized target function in this case is triplet margin loss – a differentiable function that yields small values if the vector representing the positive region is close to the anchor pattern vector (the logos are similar) and the negative region vector is far away from it (they are not similar).
After training, the model will process the available template logos to create a template vector database for each of the classes supported. In the inference after the logo region is detected, we extract the vector representing this region and compare it with the patterns. The label of the most similar is selected as the class of the given detection.
Updating a model trained in this way requires only that new elements be added to the reference vector database, or the “old” logo be replaced with “new” one, without having to train the model for optimization issues.
Parallelization of the stream
To speed up the system’s performance, we parallelized the stream without processing it “frame by frame”. As the streaming data flowed in gradually over time, we opted not to use “batch” inference with one instance of the model. In this context, it is also important to synchronize the processes in order to return the processed stream elements in chronological order.
- We initialize n workers (parameter, natural number). Each of them is simply a pair of YOLOv3 detection networks (one trained with a supervised method and the other with one-shot).
- We create a FIFO queue to which we throw the incoming data from the stream and from which workers collect frames for processing.
- To ensure the chronology, the processed frames are thrown by workers into the heap,
- A separate, looped process checks that the heap is not empty. If it isn’t, then the element with the smallest id is taken from it; if the value is 1 greater than the id of the last processed frame, then we update the value of this variable and return the processed frame, alternatively the frame and id are thrown back onto the heap.
The diagram below presents an approximate scheme of the system.
Enabling live processing, this approach significantly improves the performance.
Logo detection analytics
Automated logo detection analytics helps advertisers evaluate the results of sponsorships by providing a series of statistics, charts, rankings and visualizations that can be assembled into a concise report. The statistics can be calculated globally and per brand. Some features include brand exposure size in time, heatmaps of a logo’s position on the screen and bar charts to allow you to easily compare various statistics across the brands. Last but not least, we have a module for creating highlights – visualizations of the bounding boxes detected by the model. This module serves a double purpose: in addition to making the analysis easy to track, such visualizations are also a source of valuable information for data scientists tweaking the model.