Table of contents
Table of contents
Consumers love brands that bring them closer to sporting events. This has compelled the largest brands to jump headlong into sports sponsorship. While the benefits of sponsorship are undeniable, measuring ROI precisely remains a challenge.
With machine learning-powered tools, brands can evaluate a campaign’s ROI by analyzing video coverage of sponsored events. Thanks to computer vision solutions, it is possible to precisely determine the time and place a brand was positioned during a sporting event. Image analysis also allows companies to see the branding activities of competitors and compare their brands’ visibility. Having precise information on brand positioning also enables advertising equivalent value to be calculated, and the most impactful events to be determined and the activities of competitors to be monitored. Such analyses would be extremely time consuming and far less accurate if performed manually. Automated analysis based on advanced machine learning algorithms allows brands to quickly provide new valuable insights and boost the effectiveness of marketing campaigns. To address these needs, deepsense.ai developed an automated tool for logo detection and visibility analysis that provides both raw detection and a rich set of statistics.
Solution overview
deepsense.ai’s solution is based on a combination of two approaches – supervised object detection and one-shot learning.Supervised object detection approach
In this approach, to train models on a labelled data set, one of the many well-tested architectures can be used. They include fully convolutional YOLO / SSD or the R-CNN family (regional convolutional neural networks). Since video streaming is essential to logo detection during sports broadcasts, a fully convolutional model is the best choice. In this case, the model does not have to process each proposal region separately and has a constant inference time, independent of the number of objects detected. This enables the model to run in real time. The advantages of this approach include:- Simplicity of operation, well-developed use cases (many open, tested and ready-to-use implementations).
- It’s impossible to quickly add a new version of a logo without obtaining a large amount of training data;
- The system is very sensitive to changes in the appearance of the logo, and updating a model trained as described above would require a large amount of new data.
The One-shot learning approach
This approach effectively solves the problem of dynamic logos and allows us to add new logos to the database without the need to collect large amounts of data. All we need is to have a reference set of template vectors for each supported logo and model detecting logo region proposals without performing classification on them. The model is trained by a process known as triplet loss. During the training process, each mini-batch consists of a three element tuple:- Example of company A logo (anchor),
- Photo with regions with the company’s logo (positive),
- Image with regions with a different brand’s logo (negative).
Parallelization of the stream
To speed up the system’s performance, we parallelized the stream without processing it “frame by frame”. As the streaming data flowed in gradually over time, we opted not to use “batch” inference with one instance of the model. In this context, it is also important to synchronize the processes in order to return the processed stream elements in chronological order.- We initialize n workers (parameter, natural number). Each of them is simply a pair of YOLOv3 detection networks (one trained with a supervised method and the other with one-shot).
- We create a FIFO queue to which we throw the incoming data from the stream and from which workers collect frames for processing.
- To ensure the chronology, the processed frames are thrown by workers into the heap,
- A separate, looped process checks that the heap is not empty. If it isn’t, then the element with the smallest id is taken from it; if the value is 1 greater than the id of the last processed frame, then we update the value of this variable and return the processed frame, alternatively the frame and id are thrown back onto the heap.