deepsense.aideepsense.ai logo
  • Careers
    • Job Offers
    • Summer Internship
  • Clients’ stories
  • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT Operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • Blog
    • R&D Hub
  • About us
    • Our story
    • Management
    • Advisory Board
    • Press center
  • Contact
  • Menu Menu
An internal validation leaderboard in Neptune

An internal validation leaderboard in Neptune

January 19, 2017/in Data science, Deep learning, Machine learning, Neptune /by Patryk Miziuła

Internal validation is a useful tool for comparing results of experiments performed by team members in any business or research task. It can also be a valuable complement of public leaderboards attached to machine learning competitions on platforms like Kaggle.

In this post, we present how to build an internal validation leaderboard using Python scripts and the Neptune environment. As an example of a use case, we will take the well known classification dataset CIFAR-10. We study it using a deep convolutional neural network provided in the TensorFlow tutorial.

Why internal leaderboard?

Whenever we solve the same problem in many ways, we want to know which way is the best. Therefore we validate, compare and order the solutions. In this way, we naturally create the ranking of our solutions – the leaderboard.
We usually care about the privacy of our work. We want to keep the techniques used and the results of our experiments confidential. Hence, our validation should remain undisclosed as well – it should be internal.
If we keep improving the models and produce new solutions at a fast pace, at some point we are no longer able to manage the internal validation leaderboard manually. Then we need a tool which will do that for us automatically and will present the results to us in a readable form.

Business and research projects

In any business or research project you are probably interested in the productivity of team members. You would like to know who and when submits his or her solution to the problem, what kind of model they use and how good the solution is.
A good internal leaderboard stores all that information. It also allows you to search for submissions sent by specific user, defined in some time window or using a particular model. Finally, you can sort the submissions with respect to the accuracy metric to find the best one.

Machine learning competitions

The popular machine learning platform, Kaggle, offers a readable public leaderboard for every competition. Each contestant can follow his position in the ranking and try to improve several times a day.
However, an internal validation would be very useful for every competing team. A good internal leaderboard has many advantages over a public one:

  • the results remain exclusive,
  • there is no limit on the number of daily submissions,
  • metrics other than those chosen by the competition organizers can be evaluated as well,
  • the submissions can be tagged, for example to indicate the used model.

Note that in every official competition the ground truth labels for the test data are not provided. Hence, to produce the internal validation we are forced to split the available public training data. One part is used to tune the model, the other is needed to evaluate it internally. This division can be an origin of unexpected problems (e.g., data leaks) so perform it carefully!

Why Neptune?

Neptune was designed to manage multiple experiments. Among many features, it supports storing parameters, logs and metric values from various experiment executions. The results are accessible through an aesthetic Web UI.
In Neptune you can:

  • gather experiments from various projects in groups,
  • add tags to experiments and filter by them,
  • sort experiments by users, date of creation, or – most importantly for us – by metric values.

Due to that, Neptune is a handy tool for creating an internal validation leaderboard for your team.

An internal validation leaderboard in Neptune

Tracking a TensorFlow experiment in Neptune

Let’s do it!

Let’s build an exemplary internal validation leaderboard in Neptune.

CIFAR-10 dataset

We use the well-known classification dataset CIFAR-10. Every image in this dataset is a member of one of 10 classes, labeled by numbers from 0 to 9. Using the train data we build a model which allows us to predict the labels of images from test data. CIFAR-10 is designed for educational purposes, therefore the ground truth labels for test data are provided.

Evaluating functions

Let’s fix the notation:

  • \(N\) – number of images we have to classify.
  • \(c_i\) – class to which the \(i\)th image belongs; \(iin{0,ldots,N-1}\), \(c_iin{0,ldots,9}\).
  • \(p_{ij}\) – estimated probability that the \(i\)th image belongs to the class \(j\); \(iin{0,ldots,N-1}\), \(jin{0,ldots,9}\), \(p_{ij}in[0,1]\).

We evaluate our submission with two metrics. The first metric is the classification accuracy given by
\(frac 1Nsum_{i=0}^{N-1}mathbb{1}Big(argmax_j p_{ij}=c_iBig)\)
This is the percentage of labels that are predicted correctly. We would like to maximize it, the optimal value is 1. The second metric is the average cross entropy given by
\(-frac 1Nsum_{i=0}^{N-1}log p_{ic_i}\)
This formula is simpler than the principal entropy since the classes are completely mutually exclusive. We would like to minimize it, preferably to 0.

Implementation details

Prerequisites

To run the code we provide you need the following software:

  • Neptune: apply for our Early Adopters Program or try it immediately with Neptune Go,
  • TensorFlow 1.0.

Repository

The code we use is based on that available in the TensorFlow convolutional neural networks tutorial. You can download our code from our GitHub repository. It consists of the following files:

File Purpose
main.py The script to execute.
cifar10_submission.py Computes submission for a CIFAR-10 model.
evaluation.py Contains functions required to create the leaderboard in Neptune.
config.yaml Neptune configuration file.

Description

When you run main.py, you first train a neural network using function cifar10_train provided by TensorFlow. We hard-coded the number of training steps. This could be enhanced to dynamic using Neptune action, but for the sake of brevity we skip this topic in the blog post. Due to TensorFlow Integration you can track the tuning of the network in Neptune. Moreover, the parameters of the tuned network are stored in a file manageable by TensorFlow saver objects.
Then function cifar10_submission is called. It restores parameters of the network from the file created by cifar10_train. Next, it forward-propagates the images from the test set through the network to obtain a submission. The submission is stored as a Python Numpy array submission of the shape \(Ntimes 10\), the \(i\)th row contains estimated probabilities \(p_{i0},ldots,p_{i9}\). The ground truth labels forms a Python Numpy array true_labels of the shape \(Ntimes 1\), the \(i\)th row contains label \(c_i\).
Ultimately, for given submission and true_labels arrays function evaluate_and_send_to_neptune from script evaluation.py computes metric values and sends them to Neptune.
File config.yaml is a Neptune job configuration file, essential for running Neptune jobs. Please download all the files and place them in the same folder.

Step by step

We create a validation leaderboard in Neptune in 4 easy steps:

  1. Creating a Neptune group
  2. Creating an evaluation module
  3. Sending submissions to Neptune
  4. Customizing a view in Neptune’s Web UI

1. Creating a Neptune group

We create the Neptune group where all the submissions will be stored. We do this as follows:

  1. Enter the Neptune home screen.
  2. Click "+" in the lower left corner, enter the name “CIFAR-10 leaderboard”, click "+" again.
    An internal validation leaderboard in Neptune 2
  3. Choose “project” “is” and type “CIFAR-10”, click “Apply”.
    An internal validation leaderboard in Neptune 3

Our new group appears in the left column. We can edit or delete it by clicking the wrench icon next to the group name.

An internal validation leaderboard in Neptune 4

2. Creating an evaluation module

We created the module evaluation.py consisting of 5 functions:

  1. _evaluate_accuracy and _evaluate_cross_entropy compute the respective metrics,
  2. _prepare_neptune adds tags to the Neptune job (if specified – see Step 4) and create Neptune channels to send evaluated metrics,
  3. _send_to_neptune sends metrics to channels,
  4. evaluate_and_send_to_neptune calls the above functions.

You can easily adapt this script to evaluate and send any other metrics.

3. Sending submissions to Neptune

To place our submissions in the Neptune group, we need to specify project: CIFAR-10 in a Neptune config file config.yaml . This is a three-line-long file, it also contains project name and a description.
Assume that the files from our repository are placed in the folder named leaderboard . The last preparation step we have to do is clone CIFAR-10 scripts from the TensorFlow repository. To do it, we go to the folder above folder leaderboard  and type:

git clone https://github.com/tensorflow/models/
export PYTHONPATH="$PWD/models/tutorials/image/cifar10:$PYTHONPATH"

Now we are ready to send our results to the leaderboard created in Neptune!  We run the script main.py from the folder above folder leaderboard  by typing

neptune run leaderboard/main.py --config leaderboard/config.yaml --dump-dir-url leaderboard/dump --paths-to-dump leaderboard

using Neptune CLI. The script executes for about half an hour on a modern laptop. Training would be significantly faster on a GPU.
There are only 5 lines related to Neptune in the main.py script. First we load the library:

from deepsense import neptune

Then we initialize a Neptune context:

ctx = neptune.Context()

Next, command

ctx.integrate_with_tensorflow()

automatically creates and manages Neptune channels related to TensorFlow SummaryWriter objects. Thereby, we can observe the progress of our network in the Neptune Dashboard. Finally, in lines

tags = ["tensorflow", "tutorial"]
evaluation.evaluate_and_send_to_neptune(submission, true_labels, ctx, tags)

we evaluate our submission and send metric values to dedicated Neptune channels. tags is a list of tags which we can add to the Neptune job. In this way, we attach some keywords to the Neptune job. We can easily filter jobs by tags in the Neptune Web UI.

4. Customizing a view in Neptune’s Web UI

If the job has been successfully executed, we can see our submission in the Neptune group we created. One more thing worth doing is setting up the view of columns.

  1. Click “Show/hide columns” in the upper part of the Neptune Web UI.
  2. Check/uncheck the names. You should:
    • uncheck “Project” since all the submissions in this group come from the same project CIFAR-10,
    • check channel names “accuracy” and “cross entropy” because you want to sort with respect to them.

You can sort submissions by accuracy or cross entropy value by clicking the triangle over the respective column.

Summary

That’s all! Now your internal validation leaderboard in Neptune is all set up. You and your team members can compare your models tuned up to the CIFAR-10 dataset. You can also filter your results by dates, users or custom tags.
Of course, CIFAR-10 is not the only possible application of the provided code. You can easily adapt it for other applications like: contests, research or business intelligence. Feel free to use an internal validation leaderboard in Neptune wherever and whenever you need.

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
  • Share on Reddit
  • Share by Mail
https://deepsense.ai/wp-content/uploads/2019/02/creating-internal-leaderboard-in-neptune-updated-logo.jpg 337 1140 Patryk Miziuła https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Patryk Miziuła2017-01-19 16:36:352021-02-23 11:19:31An internal validation leaderboard in Neptune

Start your search here

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    THE NEWEST AI MONTHLY DIGEST

    • AI Monthly Digest 20 - TL;DRAI Monthly Digest 20 – TL;DRMay 12, 2020

    CATEGORIES

    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • AI trends for 2021AI trends for 2021January 7, 2021
    • A comprehensive guide to demand forecastingA comprehensive guide to demand forecastingMay 28, 2019
    • What is reinforcement learning? The complete guideWhat is reinforcement learning? The complete guideJuly 5, 2018

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Knowledge base
    • Blog
    • R&D Hub
    • deepsense.ai
    • Careers
    • Summer Internship
    • Our story
    • Management
    • Scientific Advisory Board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only