At deepsense.ai, we’re doing our best to make our mark in state‑of‑the‑art data science. For many years, we have been competing in machine learning challenges, gaining both conceptual and technical expertise. Now, we have decided to open source an end‑to‑end image classification sample solution for the ongoing Cdiscount Kaggle competition. In so doing, we believe we’ll encourage data scientists both seasoned and new to compete on Kaggle and test their neural nets.
Introduction
Competing in machine learning challenges is fun, but also a lot of work. Participants must design and implement end‑to‑end solutions, test neural architectures and run dozens of experiments to train deep models properly. But this is only a small part of the story. Strong Kaggle competition solutions have advanced data pre‑ and post‑processing, ensembling and validation routines, to name just a few. At this point, competing effectively becomes really complex and difficult to manage, which may discourage some data scientists from rolling up their sleeves and jumping in. Here at deepsense.ai we believe that Kaggle is a great platform for advanced data scientific training at any level of expertise. So great, in fact, that we felt compelled to open‑source an image classification sample solution to the currently open Cdiscount challenge. Below, we describe what we have prepared.
[irp posts=”15545″ name=”Spot the flaw – visual quality control in manufacturing”]
Image classification sample solution overview
When we say our solution is end‑to‑end, we mean that we started with raw input data downloaded directly from the Kaggle site (in the bson format) and finish with a ready‑to‑upload submit file. Here are the components:
data loader
Keras custom iterator for bson file
label encoder representing product IDs to fit the Keras API
neural network training on n classes and k examples per class. We use the following architectures:
[irp posts=”16874″ name=”Playing Atari with deep reinforcement learning – deepsense.ai’s approach”]
What if I want to use my network architecture?
You are encouraged to replace our network with your own. Below you can find a short snippet of code that you simply place in the models.py file:
class MyModel(BasicKerasClassifier):
def _build_model(self, params):
return Model
Otherwise I would suggest extending BasicKerasClassifier, or KerasDataLoader with custom augmentations, learning rate schedules and other tricks of your choice.
modify this image classification sample solution to fit your needs
have fun competing on Kaggle!
Image classification sample solution running in Neptune. Live charts presents log-loss and accuracy for the running experiment.
[irp posts=”15435″ name=”How to start with machine learning wisely and become a data scientist?”]
Final remarks
Feel free to use, modify and run this code for your own purposes. We run multiple of them on Neptune, which you may find useful for managing your experiments.