deepsense.aideepsense.ai logo
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI discovery workshops
    • GPT and other LLMs discovery workshops
    • Generative models
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
Starting deep learning hands-on: image classification on CIFAR-10

Starting deep learning hands-on: image classification on CIFAR-10

November 20, 2017/in Deep learning, Neptune /by Piotr Migdal

So, you want to start practicing deep learning? I wrote Learning Deep Learning with Keras as a general overview for using neural networks for image classification. It got quite popular. Yet, I think it is missing one crucial element – practical, hands-on exercises. This post tries to bridge that gap.

Practical deep learning

Deep learning has one dirty secret – regardless how much you know, there is always a lot of trial-and-error. You need to test various network architectures, data preprocessing approaches, parameter and optimizers and so on. Even the top deep learning experts cannot just write a neural network, run it and call it a day.
Each time you see a state-of-the-art neural network and ask yourself “why are there 6 convolutional layers?” or “why do they set dropout rate to 0.3?” the answer is they tried various parameters and chose the ones they did on an empirical basis. However, knowledge of other solutions does give us a good starting point. Theoretical knowledge builds an intuition of which ideas are worth trying and which are unlikely to improve a neural network.
A fairly general approach to solving any deep learning problem is:

  • use some state-of-the-art architecture for a given class of problems,
  • modify it to optimize performance for your particular problem.

Modification goes both with changing its architecture (e.g. the number of layers, adding or removing auxiliary layers like dropout or batch normalization) and tuning its parameters. The only performance measure that matters is the validation score, i.e. if a network trained on one dataset is able to make good predictions on a new one it has never encountered. Everything else boils down to experimentation and tweaking.

deepsense.ai Kaggle Leaderboard for Right Whale Recognition

Kaggle Leaderboard for the Right Whale Recognition competition

I like bringing the example of Right Whale Recognition – a Kaggle competition which our deepsense.ai team won by a large margin. All top teams used convolutional neural networks. I was surprised to see that other winners used very similar architectures (clearly, it was a starting point without which it would be hard to accomplish a lot). The many, many small optimizations we made made a huge difference in the performance of our network.

Related:  How to start with machine learning wisely and become a data scientist?

A good dataset – CIFAR-10 for image classification

Many introductions to image classification with deep learning start with MNIST, a standard dataset of handwritten digits. This is unfortunate. Not only does it not produce a “Wow!” effect or show where deep learning shines, but it also can be solved with shallow machine learning techniques. In this case, plain k-Nearest Neighbors produces more than 97% accuracy (or even 99.5% with some data preprocessing!). Moreover, MNIST is not a typical image dataset – and mastering it is unlikely to teach you transferable skills that would be useful for other classification problems.

“Many good ideas will not work well on MNIST (e.g. batch norm). Inversely[,] many bad ideas may work on MNIST and no[t] transfer to real [computer vision]” – a tweet by François Chollet (creator of Keras).

If you really need to stick with a 28×28 grayscale image dataset, there is notMNIST (A-J letters from strange fonts) and a MNIST-like dataset with fashion products. They are slightly better, and harder. However, I think that there is no excuse for avoiding using actual photos.
We will work on CIFAR-10, a classic dataset of small color images. It has 60k of 32×32 pixel images, each belonging to one of ten classes. 50k are in the training set (i.e. the one we use to train our neural network) and 10k are in the validation dataset. Have a look at these sample pictures:

CIFAR-10 classes with example images

Getting our hands dirty

I really encourage you to do the exercises. Sure, it is much faster to just read. But with data science (and programming in general) it matters more how much you write than read. After all, if you want to learn to swim you won’t master it unless you actually dip your toes in the water.
Before we get started:

  • Create a Neptune account (we give you $5 for computing, so no worries – this tutorial won’t cost you a cent; you’re not likely to use up more than $5 worth of your credit).
  • Clone or copy the repository https://github.com/deepsense-ai/hands-on-deep-learning/ – all scripts we use need to be run from its cifar_image_classification directory.
  • On Neptune, click on projects and create a new one – CIFAR-10 (with code: CIF).

The code is in Keras, a high-level Python neural network library. We will use Python 3 and TensorFlow backend. The only Neptune-specific part of this code is logging. If you want to run it on another infrastructure, just change a few lines.

Architectures and blocks (in Keras)

One thing that differentiates deep learning from classical machine learning is its compositional architecture. Instead of using a one-step classifier (be it Logistic Regression, Random Forest or XGBoost) we create a network out of blocks (called layers).

Deep Learning metaphors - ConvNet layers as Jenga blocks

Deep Learning metaphors: ConvNet layers as Jenga blocks

Logistic regression

Let’s start with something simple – a multi-class logistic regression. It is a “shallow” machine learning technique, yet can be expressed in the language of neural networks. Its architecture consists of only one meaningful layer. In Keras, we write the following:

model = Sequential()
model.add(Flatten(input_shape=(32, 32, 3)))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer=’adam’,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

If we want to see step-by-step what happens with our data flow, with respect to dimensions and the number of weights to be optimized, we can use my keras-sequential-ascii script:

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
               Input   #####     32   32    3
             Flatten   ||||| -------------------         0      0.0%
                       #####        3072
               Dense   XXXXX -------------------     30730    100.0%
             softmax   #####          10

The flatten layer just transforms (x, y, channels) into a flat vector of pixel values. The dense layer connects all inputs to all outputs. Softmax then changes real numbers into probabilities.
To run it, just type in the terminal:

$ neptune send lr.py

This opens a browser tab in which you can keep track of the training process. You can even look up misclassified images. However, this linear model will look mostly for colors and their locations on the image.

Neptune channels dashboard showing misclassified images

The overall score is not impressive. I got 41% accuracy on the training set and, more importantly, 37% on validation. Note that 10% is a baseline for making random guesses.

Multilayer perceptron

Old-school neural networks consist of a few dense layers. Between the layers we need to use an activation function. This function, applied on each component separately, allows us to make it non-linear, capturing much more complex patterns than logistic regression does. The historical approach (motivated by an abstraction of biological neural networks) is to use a sigmoid.

model = Sequential()
model.add(Flatten(input_shape=(32, 32, 3)))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer=adam,
             loss='categorical_crossentropy',
             metrics=['accuracy'])

What does this mean for our data?

          OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
              Input   #####     32   32    3
            Flatten   ||||| -------------------         0     0.0%
                      #####        3072
              Dense   XXXXX -------------------    393344    95.7%
            sigmoid   #####         128
              Dense   XXXXX -------------------     16512     4.0%
            sigmoid   #####         128
              Dense   XXXXX -------------------      1290     0.3%
            softmax   #####          10

We used two additional (so-called hidden) layers, each with with sigmoid as its activation function. Let’s run it!

$ neptune send mlp.py

I suggest creating a custom chart combining both training and validation channels on one plot.

Accuracy and log-loss for training and validation sets, live

In principle, even with a single hidden layer it is possible to approximate any function (see the universal approximation theorem). However, that does not yet mean that it works well in practice, with a finite amount of data. If the hidden layer is too small, it is not able to approximate any function. When it gets too big, the network can easily overfit – i.e. memorize training data, but not be generalizable to other images. Any time your training score goes up at the cost of the validation score, your network overfits.
We can get to around 45% accuracy on the validation set, which is an improvement over logistic regression. Yet we can easily do much better. If you want to play with this kind of network – edit file, run it (I suggest adding –tags my-experiment) in the command line and see if you can do better. Make a few approaches, and see how it goes.
Hints:

  • Use more than 20 epochs.
  • In practice, neural networks use 2-3 dense layers.
  • Make big changes to see a difference. In this case change the hidden layer size by 2x or even 10x.

Just because you should in theory be able to create any picture (or even any photograph) with MS Paint, drawing pixel-by-pixel, it does not mean it will work in practice. We need to take advantage of the spatial structure and use a convolutional neural network (often abbreviated as ConvNet or CNN).

Convolutional neural networks

Instead of trying to connect everything with everything, we can process images in a smarter way. Convolution is an operation which performs the same local operation on each part of the image. Some examples of what convolution can do include blurring, amplifying edges or detecting color gradients – see Image Kernels – Visually Explained.
Each convolution layer produces new channels based on those which preceded it. First, we start with 3 channels for red, green and blue (RGB) components. Next, channels get more and more abstract. To get some idea of what is going on, visit How neural networks build up their understanding of images to see patterns that activate subsequent layers – from simple colors and gradients to much more complex patterns.
As we create channels representing various properties of the image, we need to reduce the resolution (usually with max-pooling). Also, modern networks typically use ReLU as the activation function as it works much better for deeper models.

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu',
                 input_shape=(32, 32, 3)))
model.add(MaxPool2D())
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Network architecture looks like this:

          OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
              Input   #####     32   32    3
             Conv2D    |/  -------------------       896     2.1%
               relu   #####     30   30   32
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####     15   15   32
             Conv2D    |/  -------------------     18496    43.6%
               relu   #####     13   13   64
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####      6    6   64
            Flatten   ||||| -------------------         0     0.0%
                      #####        2304
              Dense   XXXXX -------------------     23050    54.3%
            softmax   #####          10

To run it, we type:

$ neptune send cnn_simple.py

Even with this simple neural network we get 70% accuracy on validation. That is much more than we got with logistic regression or a multilayer perceptron!
Now, feel free to experiment.
Hints:

  • Play with the number of channels and how they grow.
  • Usually 3×3 convolutions work the best; stick to them (and 1×1 convolutions which only mix channels).
  • You can have 1-3 convolutional layers before each MaxPool operation.
  • Adding a Dense layer may help.
  • Between dense layers you can use Dropout, to reduce overfitting (i.e. if you see that training accuracy is higher than validation accuracy).

So, is that it? No! This is only the beginning.
To compare results, click on the project name. You will see the whole list of projects. In Manage columns, tick all accuracy (and possible log-loss) scores. You can order your results using validation accuracy. You get some sort of your own personal Kaggle leaderboard!

Your personal, configurable Kaggle Leaderboard

In addition to architecture (which is a big deal), optimizers significantly change the accuracy of the overall results. Very often, we get better results by adding more epochs (i.e. the number of times the whole training dataset is processed) and reducing the learning rate at the same time.
For example, try this network I wrote:

          OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
              Input   #####     32   32    3
             Conv2D    |/  -------------------       896     0.1%
               relu   #####     32   32   32
             Conv2D    |/  -------------------      1056     0.2%
               relu   #####     32   32   32
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####     16   16   32
 BatchNormalization    μ|σ  -------------------       128     0.0%
                      #####     16   16   32
            Dropout    | || -------------------         0     0.0%
                      #####     16   16   32
             Conv2D    |/  -------------------     18496     2.9%
               relu   #####     16   16   64
             Conv2D    |/  -------------------      4160     0.6%
               relu   #####     16   16   64
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####      8    8   64
 BatchNormalization    μ|σ  -------------------       256     0.0%
                      #####      8    8   64
            Dropout    | || -------------------         0     0.0%
                      #####      8    8   64
             Conv2D    |/  -------------------     73856    11.5%
               relu   #####      8    8  128
             Conv2D    |/  -------------------     16512     2.6%
               relu   #####      8    8  128
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####      4    4  128
 BatchNormalization    μ|σ  -------------------       512     0.1%
                      #####      4    4  128
            Dropout    | || -------------------         0     0.0%
                      #####      4    4  128
            Flatten   ||||| -------------------         0     0.0%
                      #####        2048
              Dense   XXXXX -------------------    524544    81.6%
               relu   #####         256
            Dropout    | || -------------------         0     0.0%
                      #####         256
              Dense   XXXXX -------------------      2570     0.4%
            softmax   #####          10
$ neptune send cnn_adv.py

It will take around 0.5h, but the results will be much better. Patience pays off – validation accuracy should be around 83%!
You can try other examples of networks for CIFAR-10: one from the Keras repository (though I had trouble reproducing their score) and one from this blog post. Both of them train in around 1.5h.

$ neptune send cnn_fchollet.py
$ neptune send cnn_pkaur.py

Can you do better? :)
Maybe you can beat 83%? Or create a network which achieves the same goal, but is much simpler? Or one that trains much faster? If you do better, I encourage you to post your validation score in the comments below, with a link to network architecture (e.g. via a link to your GitHub repo or a gist).

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
  • Share on Reddit
  • Share by Mail
https://deepsense.ai/wp-content/uploads/2019/02/deep-learning-hands-on-image-classification.png 337 1130 Piotr Migdal https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Piotr Migdal2017-11-20 16:52:542022-09-27 19:01:34Starting deep learning hands-on: image classification on CIFAR-10

Start your search here

Build your AI solution
with us!

Contact us!

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    CATEGORIES

    • Generative models
    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • Diffusion models in practice. Part 1 - The tools of the tradeDiffusion models in practice. Part 1: The tools of the tradeMarch 29, 2023
    • Solution guide - The diverse landscape of large language models. From the original Transformer to GPT-4 and beyondGuide: The diverse landscape of large language models. From the original Transformer to GPT-4 and beyondMarch 22, 2023
    • ChatGPT – what is the buzz all about?ChatGPT – what is the buzz all about?March 10, 2023

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • AI software
    • Team augmentation
    • AI discovery workshops
    • GPT and other LLMs discovery workshops
    • Generative models
    • Train your team
    • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only