deepsense.aideepsense.ai logo
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI advisory
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
How to create a product recognition solution

How to create a product recognition solution

August 22, 2017/in Data science, Deep learning, Machine learning, Neptune /by Krzysztof Dziedzic and Patryk Miziuła

Product recognition is a challenging area that offers great financial promise. Automatically detected product attributes in photos should be easy to monetize, e.g., as a basis for cross-selling and upselling.

However, product recognition is a tough task because the same product can be photographed from different angles, in different lighting, with varying levels of occlusion, etc. Also, different fine-grained product labels, such as ones in royal blue or turquoise, may prove difficult to distinguish visually. Fortunately, properly tuned convolutional neural networks can effectively resolve these problems.
In this post, we discuss our solution for the iMaterialist challenge announced by CVPR and Google and hosted on Kaggle in order to show our approach to product recognition.

The problem

Data and goal

The iMaterialist organizer provided us with hyperlinks to more than 50,000 pictures of shoes, dresses, pants and outerwear. Some tasks were attached to every picture and some labels were matched to every task. Here are some examples:

product recogntion: exemplary picture of dress
task labels
dress: occasion wedding party, cocktail party, cocktail, party, formal, prom
dress: length knee
dress: color dark red, red
product recogntion: exemplary picture of outerwear
task labels
outerwear: age adult
outerwear: type blazers
outerwear: gender men
pants: color brown
product recogntion: exemplary picture of pants
task labels
pants: material jeans, denim, denim jeans
pants: color blue, blue jeans, denim blue, light blue, light, denim
pants: type jeans
pants: age adult
pants: decoration men jeans
pants: gender men
product recogntion: exemplary picture of shoes
task labels
shoe: color dark brown
shoe: up height kneehigh
pants: color black

Our goal was to match a proper label to every task for every picture from the test set. From the machine learning perspective this was a multi-label classification problem.

There were 45 tasks in total (a dozen per cloth type) and we had to predict a label for all of them for every picture. However, tasks not attached to the particular test image were skipped during the evaluation. Actually, usually only a few tasks were relevant to a picture.

Problems with data

There were two main problems with data:

  • We weren’t given the pictures themselves, but only the hyperlinks. Around 10% of them were expired, so our dataset was significantly smaller than the organizer had intended. Moreover, the hyperlinks were a potential source of a data leak. One could use text-classification techniques to take advantage of leaked features hidden in hyperlinks, though we opted not to do that.
  • Some labels with the same meaning were treated by the organizer as different, for example “gray” and “grey”, “camo” and “camouflage”. This introduced noise in the training data and distorted the training itself. Also, we had no choice but to guess if a particular picture from the test set was labeled by the organizer as either “camo” or “camouflage”.

Evaluation

The evaluation score function was the average error over all test pictures and relevant tasks. A score value of 0 meant that all the relevant tasks for all the test pictures were properly labeled, while a score of 1 implied that no relevant task for any picture was labeled correctly. A random sample submission provided by the organizer yielded a score greater than 0.99. Hence we knew that a good result couldn’t be achieved by accident and we would need a model that could actually learn how to solve the problem.

Our solution

A bunch of convolutional neural networks

Our solution consisted of about 20 convolutional neural networks. We used the following architectures in several variants:

  • DenseNet,
  • ResNet,
  • Inception,
  • VGG.

All of them were initialized with weights pretrained on the ImageNet dataset. Our models also differed in terms of the data preprocessing (cropping, normalizing, resizing, switching of color channels) and augmentation applied (random flips, rotations, color perturbations from Krizhevsky’s AlexNet paper). All the neural networks were implemented using the PyTorch framework.

Choosing the training loss function

Which loss function to choose for the training stage was one of the major problems we faced. 576 unique pairs of task/label occurred in the training data so the outputs of our networks were 576-dimensional. On the other hand, typically only a few labels were matched to  a picture’s tasks. Therefore the ground truth vector was very sparse – only a few of its 576 coordinates were nonzero – so we struggled to choose the right training loss function.
Assume that \((z_1,…,z_{576})in mathbb{R}^{576}\) is a model output and
[y_i=left{begin{array}{ll}1, & text{if task/label pair }itext{ matches the picture,}, & text{elsewhere,}end{array}right.quadtext{for } i=1,2,ldots,576.]

  • As this was a multi-label classification problem,  choosing the popular crossentropy loss function:
    \([sum_{i=1}^{576}-y_ilog p_i,quad text{where } p_i=frac{exp(z_i)}{sum_{j=1}^{576}exp(z_j)},]\)
    wouldn’t be a good idea. This loss function tries to distinguish only one class from others.
  • Also, for the ‘element-wise binary crossentropy’ loss function:
    \([sum_{i=1}^{576}-y_ilog q_i-(1-y_i)log(1-q_i),quad text{where } q_i=frac{1}{1+exp(-z_i)},]\)
    the sparsity caused the models to end up constantly predicting no labels for any picture.
  • In our solution, we used the ‘weighted element-wise crossentropy’ given by:
    \([sum_{i=1}^{576}-bigg(frac{576}{sum_{j=1}^{576}y_j}bigg)cdot y_ilog q_i-(1-y_i)log(1-q_i),quad text{where } q_i=frac{1}{1+exp(-z_i)}.]\)
    This loss function focused the optimization on positive cases.

Ensembling

Predictions from particular networks were averaged, all with equal weights. Unfortunately, we didn’t have enough time to perform any more sophisticated ensembling techniques, like xgboost ensembling.

Other techniques tested

We also tested other approaches, though they proved less successful:

  • Training the triplet network and then training xgboost models on features extracted via embedding (different models for different tasks).
  • Mapping semantically equivalent labels like “gray” and “grey” to a common new label and remapping those to the original ones during postprocessing.

Neptune

We managed all of our experiments using Neptune, deepsense.ai’s Machine Learning Lab. Thanks to that, we were easily able to track the tuning of our models, compare them and recreate them.
product recogntion: Neptune dashboard

Results

We achieved a score of 0.395, which means that we correctly predicted more than 60% of all the labels matched to relevant tasks.
product recogntion: kaggle leaderboard
We are pleased with this result, though we could have improved on it significantly if the competition had lasted longer than only one month.

Summary

Challenges like iMaterialist are a good opportunity to create product recognition models. The most important tools and tricks we used in this project were:

  • Playing with training loss functions. Choosing the proper training loss function was a real breakthrough as it boosted accuracy by over 20%.
  • A custom training-validation split. The organizer provided us with a ready-made training-validation split. However, we believed we could use more data for training so we prepared our own split with more training data while maintaining sufficient validation data.
  • Using the PyTorch framework instead of the more popular TensorFlow. TensorFlow doesn’t provide the official pretrained models repository, whereas PyTorch does. Hence working in PyTorch was more time-efficient. Moreover, we determined empirically that, much to our surprise, the same architectures yielded better results when implemented in PyTorch than in TensorFlow.

We hope you have enjoyed this post and if you have any questions, please don’t hesitate to ask!

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
  • Share on Reddit
  • Share by Mail
https://deepsense.ai/wp-content/uploads/2019/02/how-to-create-a-product-recognition-solution.jpg 337 1140 Krzysztof Dziedzic https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Krzysztof Dziedzic2017-08-22 13:57:172021-01-05 16:49:37How to create a product recognition solution

Start your search here

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    THE NEWEST AI MONTHLY DIGEST

    • AI Monthly Digest 20 - TL;DRAI Monthly Digest 20 – TL;DRMay 12, 2020

    CATEGORIES

    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • AI trends for 2021AI trends for 2021January 7, 2021
    • A comprehensive guide to demand forecastingA comprehensive guide to demand forecastingMay 28, 2019
    • What is reinforcement learning? The complete guideWhat is reinforcement learning? deepsense.ai’s complete guideJuly 5, 2018

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Knowledge base
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only
    Cookies To make this site work properly, we sometimes place small data files called cookies on your device. Most big websites do this too.
    Accept
    Change Settings
    Cookie Box Settings
    Cookie Box Settings

    Privacy settings

    Decide which cookies you want to allow. You can change these settings at any time. However, this can result in some functions no longer being available. For information on deleting the cookies, please consult your browser’s help function. Learn more about the cookies we use.

    With the slider, you can enable or disable different types of cookies:

    • Block all
    • Essentials
    • Functionality
    • Analytics
    • Advertising

    This website will:

    This website won't:

    • Essential: Remember your cookie permission setting
    • Essential: Allow session cookies
    • Essential: Gather information you input into a contact forms, newsletter and other forms across all pages
    • Essential: Keep track of what you input in a shopping cart
    • Essential: Authenticate that you are logged into your user account
    • Essential: Remember language version you selected
    • Functionality: Remember social media settings
    • Functionality: Remember selected region and country
    • Analytics: Keep track of your visited pages and interaction taken
    • Analytics: Keep track about your location and region based on your IP number
    • Analytics: Keep track of the time spent on each page
    • Analytics: Increase the data quality of the statistics functions
    • Advertising: Tailor information and advertising to your interests based on e.g. the content you have visited before. (Currently we do not use targeting or targeting cookies.
    • Advertising: Gather personally identifiable information such as name and location
    • Remember your login details
    • Essential: Remember your cookie permission setting
    • Essential: Allow session cookies
    • Essential: Gather information you input into a contact forms, newsletter and other forms across all pages
    • Essential: Keep track of what you input in a shopping cart
    • Essential: Authenticate that you are logged into your user account
    • Essential: Remember language version you selected
    • Functionality: Remember social media settings
    • Functionality: Remember selected region and country
    • Analytics: Keep track of your visited pages and interaction taken
    • Analytics: Keep track about your location and region based on your IP number
    • Analytics: Keep track of the time spent on each page
    • Analytics: Increase the data quality of the statistics functions
    • Advertising: Tailor information and advertising to your interests based on e.g. the content you have visited before. (Currently we do not use targeting or targeting cookies.
    • Advertising: Gather personally identifiable information such as name and location
    Save & Close