Neptune Archives - deepsense.ai

Starting deep learning hands-on: image classification on CIFAR-10

November 20, 2017/in Deep learning, Neptune /by Piotr Migdal

So, you want to start practicing deep learning? I wrote Learning Deep Learning with Keras as a general overview for using neural networks for image classification. It got quite popular. Yet, I think it is missing one crucial element – practical, hands-on exercises. This post tries to bridge that gap.

Practical deep learning

Deep learning has one dirty secret – regardless how much you know, there is always a lot of trial-and-error. You need to test various network architectures, data preprocessing approaches, parameter and optimizers and so on. Even the top deep learning experts cannot just write a neural network, run it and call it a day.
Each time you see a state-of-the-art neural network and ask yourself “why are there 6 convolutional layers?” or “why do they set dropout rate to 0.3?” the answer is they tried various parameters and chose the ones they did on an empirical basis. However, knowledge of other solutions does give us a good starting point. Theoretical knowledge builds an intuition of which ideas are worth trying and which are unlikely to improve a neural network.
A fairly general approach to solving any deep learning problem is:

use some state-of-the-art architecture for a given class of problems,
modify it to optimize performance for your particular problem.

Modification goes both with changing its architecture (e.g. the number of layers, adding or removing auxiliary layers like dropout or batch normalization) and tuning its parameters. The only performance measure that matters is the validation score, i.e. if a network trained on one dataset is able to make good predictions on a new one it has never encountered. Everything else boils down to experimentation and tweaking.

deepsense.ai Kaggle Leaderboard for Right Whale Recognition

Kaggle Leaderboard for the Right Whale Recognition competition

I like bringing the example of Right Whale Recognition – a Kaggle competition which our deepsense.ai team won by a large margin. All top teams used convolutional neural networks. I was surprised to see that other winners used very similar architectures (clearly, it was a starting point without which it would be hard to accomplish a lot). The many, many small optimizations we made made a huge difference in the performance of our network.

A good dataset – CIFAR-10 for image classification

Many introductions to image classification with deep learning start with MNIST, a standard dataset of handwritten digits. This is unfortunate. Not only does it not produce a “Wow!” effect or show where deep learning shines, but it also can be solved with shallow machine learning techniques. In this case, plain k-Nearest Neighbors produces more than 97% accuracy (or even 99.5% with some data preprocessing!). Moreover, MNIST is not a typical image dataset – and mastering it is unlikely to teach you transferable skills that would be useful for other classification problems.

“Many good ideas will not work well on MNIST (e.g. batch norm). Inversely[,] many bad ideas may work on MNIST and no[t] transfer to real [computer vision]” – a tweet by François Chollet (creator of Keras).

If you really need to stick with a 28×28 grayscale image dataset, there is notMNIST (A-J letters from strange fonts) and a MNIST-like dataset with fashion products. They are slightly better, and harder. However, I think that there is no excuse for avoiding using actual photos.
We will work on CIFAR-10, a classic dataset of small color images. It has 60k of 32×32 pixel images, each belonging to one of ten classes. 50k are in the training set (i.e. the one we use to train our neural network) and 10k are in the validation dataset. Have a look at these sample pictures:

CIFAR-10 classes with example images

Getting our hands dirty

I really encourage you to do the exercises. Sure, it is much faster to just read. But with data science (and programming in general) it matters more how much you write than read. After all, if you want to learn to swim you won’t master it unless you actually dip your toes in the water.
Before we get started:

Create a Neptune account (we give you $5 for computing, so no worries – this tutorial won’t cost you a cent; you’re not likely to use up more than $5 worth of your credit).
Clone or copy the repository https://github.com/deepsense-ai/hands-on-deep-learning/ – all scripts we use need to be run from its cifar_image_classification directory.
On Neptune, click on projects and create a new one – CIFAR-10 (with code: CIF).

The code is in Keras, a high-level Python neural network library. We will use Python 3 and TensorFlow backend. The only Neptune-specific part of this code is logging. If you want to run it on another infrastructure, just change a few lines.

Architectures and blocks (in Keras)

One thing that differentiates deep learning from classical machine learning is its compositional architecture. Instead of using a one-step classifier (be it Logistic Regression, Random Forest or XGBoost) we create a network out of blocks (called layers).

Deep Learning metaphors: ConvNet layers as Jenga blocks

Logistic regression

Let’s start with something simple – a multi-class logistic regression. It is a “shallow” machine learning technique, yet can be expressed in the language of neural networks. Its architecture consists of only one meaningful layer. In Keras, we write the following:

model = Sequential()
model.add(Flatten(input_shape=(32, 32, 3)))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer=’adam’,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

If we want to see step-by-step what happens with our data flow, with respect to dimensions and the number of weights to be optimized, we can use my keras-sequential-ascii script:

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
               Input   #####     32   32    3
             Flatten   ||||| -------------------         0      0.0%
                       #####        3072
               Dense   XXXXX -------------------     30730    100.0%
             softmax   #####          10

The flatten layer just transforms (x, y, channels) into a flat vector of pixel values. The dense layer connects all inputs to all outputs. Softmax then changes real numbers into probabilities.
To run it, just type in the terminal:

$ neptune send lr.py

This opens a browser tab in which you can keep track of the training process. You can even look up misclassified images. However, this linear model will look mostly for colors and their locations on the image.

Neptune channels dashboard showing misclassified images

The overall score is not impressive. I got 41% accuracy on the training set and, more importantly, 37% on validation. Note that 10% is a baseline for making random guesses.

Multilayer perceptron

Old-school neural networks consist of a few dense layers. Between the layers we need to use an activation function. This function, applied on each component separately, allows us to make it non-linear, capturing much more complex patterns than logistic regression does. The historical approach (motivated by an abstraction of biological neural networks) is to use a sigmoid.

model = Sequential()
model.add(Flatten(input_shape=(32, 32, 3)))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer=adam,
             loss='categorical_crossentropy',
             metrics=['accuracy'])

What does this mean for our data?

          OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
              Input   #####     32   32    3
            Flatten   ||||| -------------------         0     0.0%
                      #####        3072
              Dense   XXXXX -------------------    393344    95.7%
            sigmoid   #####         128
              Dense   XXXXX -------------------     16512     4.0%
            sigmoid   #####         128
              Dense   XXXXX -------------------      1290     0.3%
            softmax   #####          10

We used two additional (so-called hidden) layers, each with with sigmoid as its activation function. Let’s run it!

$ neptune send mlp.py

I suggest creating a custom chart combining both training and validation channels on one plot.

Accuracy and log-loss for training and validation sets, live

In principle, even with a single hidden layer it is possible to approximate any function (see the universal approximation theorem). However, that does not yet mean that it works well in practice, with a finite amount of data. If the hidden layer is too small, it is not able to approximate any function. When it gets too big, the network can easily overfit – i.e. memorize training data, but not be generalizable to other images. Any time your training score goes up at the cost of the validation score, your network overfits.
We can get to around 45% accuracy on the validation set, which is an improvement over logistic regression. Yet we can easily do much better. If you want to play with this kind of network – edit file, run it (I suggest adding –tags my-experiment) in the command line and see if you can do better. Make a few approaches, and see how it goes.
Hints:

Use more than 20 epochs.
In practice, neural networks use 2-3 dense layers.
Make big changes to see a difference. In this case change the hidden layer size by 2x or even 10x.

Just because you should in theory be able to create any picture (or even any photograph) with MS Paint, drawing pixel-by-pixel, it does not mean it will work in practice. We need to take advantage of the spatial structure and use a convolutional neural network (often abbreviated as ConvNet or CNN).

Convolutional neural networks

Instead of trying to connect everything with everything, we can process images in a smarter way. Convolution is an operation which performs the same local operation on each part of the image. Some examples of what convolution can do include blurring, amplifying edges or detecting color gradients – see Image Kernels – Visually Explained.
Each convolution layer produces new channels based on those which preceded it. First, we start with 3 channels for red, green and blue (RGB) components. Next, channels get more and more abstract. To get some idea of what is going on, visit How neural networks build up their understanding of images to see patterns that activate subsequent layers – from simple colors and gradients to much more complex patterns.
As we create channels representing various properties of the image, we need to reduce the resolution (usually with max-pooling). Also, modern networks typically use ReLU as the activation function as it works much better for deeper models.

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu',
                 input_shape=(32, 32, 3)))
model.add(MaxPool2D())
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Network architecture looks like this:

          OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
              Input   #####     32   32    3
             Conv2D    |/  -------------------       896     2.1%
               relu   #####     30   30   32
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####     15   15   32
             Conv2D    |/  -------------------     18496    43.6%
               relu   #####     13   13   64
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####      6    6   64
            Flatten   ||||| -------------------         0     0.0%
                      #####        2304
              Dense   XXXXX -------------------     23050    54.3%
            softmax   #####          10

To run it, we type:

$ neptune send cnn_simple.py

Even with this simple neural network we get 70% accuracy on validation. That is much more than we got with logistic regression or a multilayer perceptron!
Now, feel free to experiment.
Hints:

Play with the number of channels and how they grow.
Usually 3×3 convolutions work the best; stick to them (and 1×1 convolutions which only mix channels).
You can have 1-3 convolutional layers before each MaxPool operation.
Adding a Dense layer may help.
Between dense layers you can use Dropout, to reduce overfitting (i.e. if you see that training accuracy is higher than validation accuracy).

So, is that it? No! This is only the beginning.
To compare results, click on the project name. You will see the whole list of projects. In Manage columns, tick all accuracy (and possible log-loss) scores. You can order your results using validation accuracy. You get some sort of your own personal Kaggle leaderboard!

Your personal, configurable Kaggle Leaderboard

In addition to architecture (which is a big deal), optimizers significantly change the accuracy of the overall results. Very often, we get better results by adding more epochs (i.e. the number of times the whole training dataset is processed) and reducing the learning rate at the same time.
For example, try this network I wrote:

          OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)
              Input   #####     32   32    3
             Conv2D    |/  -------------------       896     0.1%
               relu   #####     32   32   32
             Conv2D    |/  -------------------      1056     0.2%
               relu   #####     32   32   32
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####     16   16   32
 BatchNormalization    μ|σ  -------------------       128     0.0%
                      #####     16   16   32
            Dropout    | || -------------------         0     0.0%
                      #####     16   16   32
             Conv2D    |/  -------------------     18496     2.9%
               relu   #####     16   16   64
             Conv2D    |/  -------------------      4160     0.6%
               relu   #####     16   16   64
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####      8    8   64
 BatchNormalization    μ|σ  -------------------       256     0.0%
                      #####      8    8   64
            Dropout    | || -------------------         0     0.0%
                      #####      8    8   64
             Conv2D    |/  -------------------     73856    11.5%
               relu   #####      8    8  128
             Conv2D    |/  -------------------     16512     2.6%
               relu   #####      8    8  128
       MaxPooling2D   Y max -------------------         0     0.0%
                      #####      4    4  128
 BatchNormalization    μ|σ  -------------------       512     0.1%
                      #####      4    4  128
            Dropout    | || -------------------         0     0.0%
                      #####      4    4  128
            Flatten   ||||| -------------------         0     0.0%
                      #####        2048
              Dense   XXXXX -------------------    524544    81.6%
               relu   #####         256
            Dropout    | || -------------------         0     0.0%
                      #####         256
              Dense   XXXXX -------------------      2570     0.4%
            softmax   #####          10

$ neptune send cnn_adv.py

It will take around 0.5h, but the results will be much better. Patience pays off – validation accuracy should be around 83%!
You can try other examples of networks for CIFAR-10: one from the Keras repository (though I had trouble reproducing their score) and one from this blog post. Both of them train in around 1.5h.

$ neptune send cnn_fchollet.py
$ neptune send cnn_pkaur.py

Can you do better? :)
Maybe you can beat 83%? Or create a network which achieves the same goal, but is much simpler? Or one that trains much faster? If you do better, I encourage you to post your validation score in the comments below, with a link to network architecture (e.g. via a link to your GitHub repo or a gist).

Image classification sample solution for Kaggle competition

October 24, 2017/in Deep learning, Neptune /by Jakub Czakon

At deepsense.ai, we’re doing our best to make our mark in state‑of‑the‑art data science. For many years, we have been competing in machine learning challenges, gaining both conceptual and technical expertise. Now, we have decided to open source an end‑to‑end image classification sample solution for the ongoing Cdiscount Kaggle competition. In so doing, we believe we’ll encourage data scientists both seasoned and new to compete on Kaggle and test their neural nets.

Introduction

Competing in machine learning challenges is fun, but also a lot of work. Participants must design and implement end‑to‑end solutions, test neural architectures and run dozens of experiments to train deep models properly. But this is only a small part of the story. Strong Kaggle competition solutions have advanced data pre‑ and post‑processing, ensembling and validation routines, to name just a few. At this point, competing effectively becomes really complex and difficult to manage, which may discourage some data scientists from rolling up their sleeves and jumping in. Here at deepsense.ai we believe that Kaggle is a great platform for advanced data scientific training at any level of expertise. So great, in fact, that we felt compelled to open‑source an image classification sample solution to the currently open Cdiscount challenge. Below, we describe what we have prepared.

Image classification sample solution overview

When we say our solution is end‑to‑end, we mean that we started with raw input data downloaded directly from the Kaggle site (in the bson format) and finish with a ready‑to‑upload submit file. Here are the components:

data loader
1. Keras custom iterator for bson file
2. label encoder representing product IDs to fit the Keras API
neural network training on n classes and k examples per class. We use the following architectures:
1. MobileNet (Howard et al. ’17)
2. Inception v3
3. ensembles of the models mentioned above
model predictions
1. single-model prediction
2. ensembling (by averaging) for multiple models
submit generation

For instance, the image classification with MobileNets ensemble would be defined as followings:

@register_pipeline
def MobilenetEnsemblePipeline(num_classes, epochs, workers, models_dir):
	pipe_legs_params = {'mobilenet_128_{}'.format(num_classes): (128, 128),
	                    'mobilenet_160_{}'.format(num_classes): (160, 64),
	                    'mobilenet_192_{}'.format(num_classes): (192, 32),
	                    }
	pipe_legs = []
	for name, (target_size, batch_size) in pipe_legs_params.items():
		leg = DeepPipeline([('loader', KerasDataLoader(num_classes, target_size, batch_size)),
		                    ('model', KerasMobileNet(
			                    architecture_cfg={'input_size': target_size, 'classes': num_classes},
			                    training_cfg={'epochs': epochs, 'workers': workers, 'verbose': 1},
			                    callbacks_cfg={'models_dir': models_dir, 'model_name': name}))])
		pipe_legs.append((name, leg))
	pipe_avg = PredictionAverage(pipe_legs)
	pipeline = LabelEncoderWrapper(pipe_avg)
	return pipeline

What if I want to use my network architecture?

You are encouraged to replace our network with your own. Below you can find a short snippet of code that you simply place in the models.py file:

class MyModel(BasicKerasClassifier):
    def _build_model(self, params):
        return Model

Otherwise I would suggest extending BasicKerasClassifier, or KerasDataLoader with custom augmentations, learning rate schedules and other tricks of your choice.

How to get started?

To start using our pipeline, follow these steps:

download the source code from https://github.com/deepsense-ai/cdiscount-starter
follow the README instructions to run the code
modify this image classification sample solution to fit your needs
have fun competing on Kaggle!

Image classification sample solution - neptune dashboard with Kaggle experiment

Image classification sample solution running in Neptune. Live charts presents log-loss and accuracy for the running experiment.

Final remarks

Feel free to use, modify and run this code for your own purposes. We run multiple of them on Neptune, which you may find useful for managing your experiments.

Logo detection and brand visibility analytics – example

August 29, 2019/in Data science, Deep learning, Machine learning, Neptune /by Michal Romaniuk and Konrad Budek

Companies pay astonishing amounts of money to sponsor events and raise brand visibility. Calculating the ROI from such sponsorship can be augmented with machine learning-powered tools to deliver more accurate results.

Event sponsoring is a well-established marketing strategy to build brand awareness. Despite being one of the most recognizable brands in the automotive industry, Chevrolet pays $71.4 million dollars each year to put its brand on Manchester United shirts.

How many people does your brand reach?

According to Eventmarketer’s study, 72% of consumers positively view brands that provide them with positive experiences, be it a great sports game or another cultural event, such as a music festival. Such events attract large numbers of viewers both directly and via media reports, allowing brands to get favorable positioning and work on their word-of-mouth recognition.

Sponsorship contracts often come at a steep price, so brand owners are naturally more than a little interested in finding out how effectively their outlays are working for them. However, it’s difficult to assess quantitatively just how great the brand exposure is in a given campaign. The information on brand exposure can further support demand forecasting efforts, as the company gains information on expected demand peaks that result from greater brand exposure in media coverage.

The current approach to computing such statistics has involved manually annotating broadcast material, which is tedious and expensive. To address these problems, we have developed an automated tool for logo detection and visibility analysis that provides both raw detection and a rich set of statistics.

Solution overview

We decided to break the problem down into two steps: logo detection with convolutional neural networks and an analytics for computing summary statistics.

The main advantage of this approach is that swapping the analytics module for a different one is straightforward. This is essential when different types of statistics are called for, or even if the neural net is to be trained for a completely different task (we had plenty of fun modifying this system to spot and count coins – stay tuned for a future blog post on that).

Logo detection with deep learning

There are two principal approaches to object detection with convolutional neural networks: region-based methods and fully convolutional methods.

Region-based methods, such as R-CNN and its descendants, first identify image regions which are likely to contain objects (region proposals). They then extract these regions and process them individually with an image classifier. This process tends to be quite slow, but can be sped up to some extent with Fast R-CNN, where the image is processed by the convolutional network as a whole and then region representations are extracted from high-level feature maps. Faster R-CNN is a further improvement where region proposals are also computed from high-level CNN features, which accelerates the region proposal step.

Fully convolutional methods, such as SSD, do away with processing individual region proposals and instead aim to output class labels where the region proposal step would be. This approach can be much faster, since there is no need to extract and process region proposals individually. In order to make this work for objects with very different sizes, the SSD network has several detection layers attached to feature maps of different resolutions.

Since real-time video processing is one of the requirements of our system, we decided to go with the SSD method rather than Fast R-CNN. Our network also uses ResNet-50 as its convnet backbone, rather than the default VGG-16. This made it much less memory-hungry, while also helping to stabilize the training process.

Model training

In the process of refining the SSD architecture for our requirements, we ran dozens of experiments. This was an iterative process with a large delay between the start and finish of an experiment (typically 1-2 days). In order to run numerous experiments in parallel, we used Neptune, our machine learning experiment manager. Neptune captures the values of the loss function and other statistics while an experiment is running, displaying them in a friendly web UI. Additionally, it can capture images via image channels and display them, which really helped us troubleshoot the different variations of the data augmentation we tested.

Logo detection analytics

The model we produced generates detections very well. However, when even a short video is analyzed, the raw description can span thousands of lines. To help humans analyze the results, we created software that translates these descriptions into a series of statistics, charts, rankings and visualizations that can be assembled into a concise report.

The statistics are calculated globally and per brand. Some of them, like brand display time, are meant to be displayed, but many are there to fuel the visual representation. Speaking of which, the charts are really expressive in this task. Some features include brand exposure size in time, heatmaps of a logo’s position on the screen and bar charts to allow you to easily compare various statistics across the brands. Last but not least, we have a module for creating highlights – visualizations of the bounding boxes detected by the model. This module serves a double purpose: in addition to making the analysis easy to track, such visualizations are also a source of valuable information for data scientists tweaking the model.

Results

We processed a short video featuring a competition between rivals Coca-Cola and Pepsi to see which brand received more exposure in quantitative terms. You can watch it on YouTube by following this link. Which logo has better visibility?

Below, you can compare your guesses with what our model reported:

Possible extensions

There are many business problems where object detection can be helpful. Here at deepsense.ai, we have worked on a number of them.

We developed a solution for Nielsen that extracts information about ingredients from photographs of FMCG products, using object detection networks to locate the list of ingredients in photographs of products. This made Nielsen’s data collection more efficient and automatic. In its bid to save the gravely endangered North Atlantic Right Whale,The NOAA used a related technique to spot whales in aerial photographs. Similar techniques are used when the reinforcement learning-based models behind autonomous vehicles learn to recognize road signs.

With logo detection technology, companies can evaluate a campaign’s ROI by analyzing any media coverage of a sponsored event. With the information on brand positioning in hand, it is easy to calculate the advertising equivalent value or determine the most impactful events to sponsor.

With further extrapolation, companies can monitor the context of media coverage and track whether their brand is shown with positive or negative information, providing even more knowledge for the marketing team.

Fall 2017 release – launching Neptune 2.1 today!

October 12, 2017/in Data science, Deep learning, Machine learning, Neptune /by Mariusz Gądarowski

We’re thrilled today to announce the latest version of Neptune: Machine Learning Lab. This release will allow data scientists using Neptune to take some giant steps forward. Here we take a quick look at each of them.

Cloud support

One of the biggest differences between Neptune 1.x and 2.x is that 2.x supports Google Cloud Platform. If you want to use NVIDIA® Tesla® K80 GPUs to train your deep learning models or Google’s infrastructure for your computations, you can just select your machine type and easily send your computations to the cloud. Of course, you can still run experiments on your hardware the way it was. We currently support only GCP–but stay tuned as we will not only be bringing more clouds and GPUs into the Neptune support fold, but offering them at even better prices!
With cloud support, we are also changing our approach to managing data. Neptune uses shared storage to store data about each experiment, for both the source code and the results (channel values, logs, output files, e.g. trained models). On top of that, you can upload any data to a project and use it in your experiments. As you execute your experiments, you’ve got all your sources at your fingertips, in the /neptune directory, which is available on fast drive for reading and writing. It is also your current working directory – just like you would run it on your local machine. Alongside this feature, Neptune can still keep your original sources so you can easily reproduce your experiments. For more details please read documentation.

Interactive Notebooks

Engineers love how interactive and easy to use Notebooks are, so it should come as no surprise that they’re among the most frequently used data science tools. Neptune now allows you to prototype faster and more easily using Jupyter Notebooks in the cloud, which is fully integrated with Neptune. You can choose from among many environments with different libraries (Keras, TensorFlow, Pytorch, etc) and Neptune will save your code and outputs automatically.

New Leaderboard

Use Neptune’s new leaderboard to organize data even more easily.
You can change the width of all columns and reorder them by simply drag and dropping their headings.

You can also edit the name, tags and notes directly in the table and display metadata including running time, worker type, environment, git hash, source code size and md5sum.

The experiments are now presented with their Short ID. This allows you to identify an experiment among those with identical names.

Sometimes you may want to see the same type of data throughout the entire project. You can now fix chosen columns on the left for quick reference as you scroll horizontally through the other sections of the table.

Parameters

Neptune comes with new, lightweight and yet more expressive parameters for experiments.
This means you no longer need to define parameters in configuration files. Instead, you just write them in the command line!
Let’s assume you have a script named main.py and you want to have 2 parameters: x=5 and y=foo . You need to pass them in the neptune send command:

neptune send -- '--x 5 --y foo'

Under the hood, Neptune will run python main.py –x 5 –y foo , so your parameters are placed in sys.argv . You can then parse these arguments using the library of your choice.
An example using argparse :

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--x', type=int)
parser.add_argument('--y')
params = parser.parse_args() # params.x = 5, params.y = 'foo'

If you want Neptune to track a parameter, just write ‘%’ in front of its value — it’s as simple as that!

neptune send -- '--x %5 --y %foo'

The parameters you track will be displayed on the experiment’s dashboard in the UI. You will be able to sort your experiments by parameter values.

The new parameter syntax supports grid search for numeric and string parameters:

neptune send -- '--x %[1, 10, 100] --y %(0.0, 10.0, 0.1)'
neptune send -- '--classifier %["SVM", "Naive Bayes", "Random Forest"]'

You can read more about the new parameters in our documentation.

Try Neptune 2.1

If you still haven’t tried Neptune, give it a go today! Sign up for a free! It takes just 2 minutes to get started! Neptune’s community forum and detailed documentation will help you navigate the process.

Solving Atari games with distributed reinforcement learning

October 4, 2017/in Data science, Deep learning, Neptune /by Igor Adamski

At deepsense.ai, we strive to make our mark on the cutting-edge research leading towards intelligent machines by providing practical machine learning tools and designs that make it much easier for scientists to track their experiments and verify novel ideas.

One particular step towards achieving this ideal was distributing a state-of-the-art Reinforcement Learning algorithm on a large CPU cluster, allowing super-fast training of agents that learned to master a wide range of Atari 2600 games. This post contains a brief description of our Distributed Deep Reinforcement Learning experiments. For a more in-depth look you can read our paper on the matter here.

Distributed reinforcement learning

Atari games are a widely accepted benchmark for deep reinforcement learning (RL). One common characteristic of these games is that they are very easy for humans to crack conceptually. Comparing the time it takes humans and computers to master these games can provide a clear indication of the capabilities of modern artificial intelligence. The first approaches to teach an agent to play Atari were developed by DeepMind and required around a week of training. The A3C algorithm developed later was able to achieve human performance in most games and did so with a similar amount of training time. But could computers ever learn faster than us?
Creating such a quick and bright Atari games learner would mean that computers outpaced us in understanding a game environment. The techniques that said agent would use to quickly develop a good grasp of the game could be studied to further develop our understanding of the cognitive features of a human brain. Moreover, faster training would give researchers considerably more flexibility in terms of experimenting and thus make verifying various RL approaches much quicker. Today, we present a Distributed Reinforcement Learning algorithm that efficiently trains on a large cluster of 64 12-core CPUs (768 cores in total). Our design enables agents to learn to play Atari games in as little as 20 minutes. We’re making our implementation available here.

	Breakout
Initial performance	After 15 minutes of training	After 30 minutes of training

	Assault
Initial performance	After 15 minutes of training	After 30 minutes of training

	Boxing
Initial performance	After 15 minutes of training	After 30 minutes of training

Our achievement and results

By distributing the BA3C (details of single-machine implementation here) reinforcement learning algorithm, we were able to make an agent teach itself to play a wide range of Atari games rapidly, by just looking at a raw pixel output (game screen) from the game emulator. Our best experiments were distributed across 64 machines, each of which had 12 Intel CPU cores. In the game of Breakout, our agent achieves a superhuman score in just 20 minutes, which is a significant reduction of the single machine implementation learning time.
Training for Breakout on a single computer takes around 15 hours, bringing our implementation very close to the theoretical scaling (assuming computational power is maximized, using 64 times more CPUs should yield a 64-fold speed-up). The graph below shows the scaling of our implementation for different numbers of machines. Moreover, our algorithm exhibits robust results on many Atari environments, meaning that it is not only fast, but also adaptable to various learning tasks.

Scaling graph showing the mean time to achieve a good score in Atari's Breakout using our distributed reinforcement learning algorithm

Graph showing the mean time of our algorithm (DBA3C) to achieve a score of 300 in the game of Breakout (average score of 300 needs to be obtained in 50 consecutive tries). The green line shows the theoretical scaling in reference to a single machine implementation.

Using Neptune, a tool developed here at deepsense.ai, we were able to proactively track the performance of our agents. This enabled us to instantly verify if a certain feature of the algorithm works as expected. In Neptune, we could observe our agents’ real-time scores along with many other experiment-related metrics that we later used to optimize the algorithm. The graph below shows training curves from 10 different experiments on the Breakout game. Graphs were updated live in Neptune as the training went on.

Mean score on Atari's Breakout in our distributed reinforcement learning set-up

A plot showing the live mean score obtained by the agent in 50 consecutive trials of Breakout

We managed to achieve very competitive training times. As we hope to inspire further research in the RL domain, we decided to open-source the implementation of our distributed reinforcement learning algorithm.

Details of the implementation

In the following section we describe the technicalities of our distributed set-up, aiming primarily to address a more advanced audience. To get the most out of our description, we recommend readers familiarize with this study done by the Google Brain team.
For parallelization we chose the synchronous paradigm. Synchronizing all our workers yielded much faster training times than the asynchronous set-up, where each node works for itself. Using a synchronous design prevented our model from using stale gradients in the updates, but at the same time introduced a problem known as slow stragglers. As suggested in the Google study linked above, deploying a few more backup workers can significantly reduce the impact of the slow stragglers, and doing just that has worked very well for us.
One of the biggest challenges that arises when dealing with largely distributed training is the cluster interconnect congestion on the parameter server nodes. Sending the gradients from multiple workers to a single parameter server bottlenecks the pipeline, effectively slowing down the training process.
To deal with that, we first reduced the model’s size. We noticed that a contraction of the neural network did not affect the accuracy of the algorithm, but did significantly increase the number of points processed per second, and hence also its speed.
Since the communication overhead between the workers and parameter server was the biggest impeding factor to the speed of learning, we decided to balance the pressure on the pipeline by adding more parameter servers. This way, with the model weights distributed uniformly on multiple parameter servers, our training times began to pick up speed. The increase in processed data points per second for a different number of parameter servers can be seen below.

relation between the number of parameter servers and processed data points per second in our distributed reinforcement learning set-up

Graph showing the relation between the number of parameter servers and processed data-points per second – we can see that using more parameter servers significantly increases the dp/s

Related work

The distributed paradigm has been a topic of extensive research. Parallelization on 256 concurrent GPUs recently enabled a Facebook team to efficiently train the Resnet-51 model in one hour. Later developments from UC Berkeley reduced the time of training ImageNet to merely 24 minutes. The development of a distributed evolution strategy (ES) algorithm has led researchers from OpenAI to train agents to play Atari games in one hour by using 720 parallel CPUs. Since none of these designs have ever been applied to classical RL, the work done here can be considered pioneering in the field of distributed reinforcement learning.

Acknowledgements

The work on this distributed reinforcement learning design would not have been possible without the services of the PL-Grid supercomputing infrastructure, which provided us with all the computational power needed to conduct this research. We would like to thank Henryk Michalewski from the University of Warsaw for supervising the project and granting us access to the PL-Grid. We also used tensorpack, developed by Yuxin Wu, a very efficient open-source implementation of the A3C algorithm.

How to create a product recognition solution

August 22, 2017/in Data science, Deep learning, Machine learning, Neptune /by Krzysztof Dziedzic and Patryk Miziuła

Product recognition is a challenging area that offers great financial promise. Automatically detected product attributes in photos should be easy to monetize, e.g., as a basis for cross-selling and upselling.

However, product recognition is a tough task because the same product can be photographed from different angles, in different lighting, with varying levels of occlusion, etc. Also, different fine-grained product labels, such as ones in royal blue or turquoise, may prove difficult to distinguish visually. Fortunately, properly tuned convolutional neural networks can effectively resolve these problems.
In this post, we discuss our solution for the iMaterialist challenge announced by CVPR and Google and hosted on Kaggle in order to show our approach to product recognition.

The problem

Data and goal

The iMaterialist organizer provided us with hyperlinks to more than 50,000 pictures of shoes, dresses, pants and outerwear. Some tasks were attached to every picture and some labels were matched to every task. Here are some examples:

product recogntion: exemplary picture of dress

task	labels
dress: occasion	wedding party, cocktail party, cocktail, party, formal, prom
dress: length	knee
dress: color	dark red, red

product recogntion: exemplary picture of outerwear

task	labels
outerwear: age	adult
outerwear: type	blazers
outerwear: gender	men
pants: color	brown

product recogntion: exemplary picture of pants

task	labels
pants: material	jeans, denim, denim jeans
pants: color	blue, blue jeans, denim blue, light blue, light, denim
pants: type	jeans
pants: age	adult
pants: decoration	men jeans
pants: gender	men

product recogntion: exemplary picture of shoes

task	labels
shoe: color	dark brown
shoe: up height	kneehigh
pants: color	black

Our goal was to match a proper label to every task for every picture from the test set. From the machine learning perspective this was a multi-label classification problem.

There were 45 tasks in total (a dozen per cloth type) and we had to predict a label for all of them for every picture. However, tasks not attached to the particular test image were skipped during the evaluation. Actually, usually only a few tasks were relevant to a picture.

Problems with data

There were two main problems with data:

We weren’t given the pictures themselves, but only the hyperlinks. Around 10% of them were expired, so our dataset was significantly smaller than the organizer had intended. Moreover, the hyperlinks were a potential source of a data leak. One could use text-classification techniques to take advantage of leaked features hidden in hyperlinks, though we opted not to do that.
Some labels with the same meaning were treated by the organizer as different, for example “gray” and “grey”, “camo” and “camouflage”. This introduced noise in the training data and distorted the training itself. Also, we had no choice but to guess if a particular picture from the test set was labeled by the organizer as either “camo” or “camouflage”.

Evaluation

The evaluation score function was the average error over all test pictures and relevant tasks. A score value of 0 meant that all the relevant tasks for all the test pictures were properly labeled, while a score of 1 implied that no relevant task for any picture was labeled correctly. A random sample submission provided by the organizer yielded a score greater than 0.99. Hence we knew that a good result couldn’t be achieved by accident and we would need a model that could actually learn how to solve the problem.

Our solution

A bunch of convolutional neural networks

Our solution consisted of about 20 convolutional neural networks. We used the following architectures in several variants:

All of them were initialized with weights pretrained on the ImageNet dataset. Our models also differed in terms of the data preprocessing (cropping, normalizing, resizing, switching of color channels) and augmentation applied (random flips, rotations, color perturbations from Krizhevsky’s AlexNet paper). All the neural networks were implemented using the PyTorch framework.

Choosing the training loss function

Which loss function to choose for the training stage was one of the major problems we faced. 576 unique pairs of task/label occurred in the training data so the outputs of our networks were 576-dimensional. On the other hand, typically only a few labels were matched to a picture’s tasks. Therefore the ground truth vector was very sparse – only a few of its 576 coordinates were nonzero – so we struggled to choose the right training loss function.
Assume that $(z_1,…,z_{576})in mathbb{R}^{576}$ is a model output and
[y_i=left{begin{array}{ll}1, & text{if task/label pair }itext{ matches the picture,}, & text{elsewhere,}end{array}right.quadtext{for } i=1,2,ldots,576.]

As this was a multi-label classification problem, choosing the popular crossentropy loss function:
$[sum_{i=1}^{576}-y_ilog p_i,quad text{where } p_i=frac{exp(z_i)}{sum_{j=1}^{576}exp(z_j)},]$
wouldn’t be a good idea. This loss function tries to distinguish only one class from others.
Also, for the ‘element-wise binary crossentropy’ loss function:
$[sum_{i=1}^{576}-y_ilog q_i-(1-y_i)log(1-q_i),quad text{where } q_i=frac{1}{1+exp(-z_i)},]$
the sparsity caused the models to end up constantly predicting no labels for any picture.
In our solution, we used the ‘weighted element-wise crossentropy’ given by:
$[sum_{i=1}^{576}-bigg(frac{576}{sum_{j=1}^{576}y_j}bigg)cdot y_ilog q_i-(1-y_i)log(1-q_i),quad text{where } q_i=frac{1}{1+exp(-z_i)}.]$
This loss function focused the optimization on positive cases.

Ensembling

Predictions from particular networks were averaged, all with equal weights. Unfortunately, we didn’t have enough time to perform any more sophisticated ensembling techniques, like xgboost ensembling.

Other techniques tested

We also tested other approaches, though they proved less successful:

Training the triplet network and then training xgboost models on features extracted via embedding (different models for different tasks).
Mapping semantically equivalent labels like “gray” and “grey” to a common new label and remapping those to the original ones during postprocessing.

Neptune

We managed all of our experiments using Neptune, deepsense.ai’s Machine Learning Lab. Thanks to that, we were easily able to track the tuning of our models, compare them and recreate them.

Results

We achieved a score of 0.395, which means that we correctly predicted more than 60% of all the labels matched to relevant tasks.

We are pleased with this result, though we could have improved on it significantly if the competition had lasted longer than only one month.

Summary

Challenges like iMaterialist are a good opportunity to create product recognition models. The most important tools and tricks we used in this project were:

Playing with training loss functions. Choosing the proper training loss function was a real breakthrough as it boosted accuracy by over 20%.
A custom training-validation split. The organizer provided us with a ready-made training-validation split. However, we believed we could use more data for training so we prepared our own split with more training data while maintaining sufficient validation data.
Using the PyTorch framework instead of the more popular TensorFlow. TensorFlow doesn’t provide the official pretrained models repository, whereas PyTorch does. Hence working in PyTorch was more time-efficient. Moreover, we determined empirically that, much to our surprise, the same architectures yielded better results when implemented in PyTorch than in TensorFlow.

We hope you have enjoyed this post and if you have any questions, please don’t hesitate to ask!

Region of interest pooling in TensorFlow – example

April 25, 2017/in Data science, Deep learning, Machine learning, Neptune /by Krzysztof Dziedzic, Patryk Miziuła and Błażej Osiński

In the previous post we explained what region of interest pooling (RoI pooling for short) is. In this one, we present an example of applying RoI pooling in TensorFlow. We base it on our custom RoI pooling TensorFlow operation. We also use Neptune as a support in our experiment performance tracking.

Example overview

Our goal is to detect cars in the images. We’d like to construct a network that is able to automatically draw a box around every car.
In our example we deal with car images from the Pascal VOC 2007 dataset. For simplicity we choose only cars not marked as truncated.

Exemplary images from Pascal VOC 2007 dataset

Neptune

We manage our experiment using Neptune. It’s a pretty handy tool:

We track the tuning in real time. Especially, we preview the currently estimated bounding boxes.
We can change model hyperparameters on the fly.
We can easily integrate Neptune with TensorFlow and get all the charts, graphs and summary objects from the TensorFlow graph.
We store the executed experiments in an aesthetic list.

Network architecture

In our example we use the Fast R-CNN architecture.
The network has two inputs:

Batch of images
Batch of potential bounding boxes – RoI proposals
In the Fast R-CNN model RoI proposals are generated via an external algorithm, for example selective search. In our example, we take ground truth bounding boxes from the Pascal annotations and generate more negative bounding boxes ourselves.

The network has two outputs:

Batch of RoI proposals not classified as background (with corrected coordinates)
Probabilities that RoI proposals consist of objects of the consecutive categories

The network consists of three main parts:

Deep convolutional neural network
- Input: images
- Output: feature map
We use the popular VGG16 network pretrained on the ImageNet dataset.
RoI pooling layer
- Input: feature map, RoI proposals resized to a feature map
- Output: max-pooled RoI proposals
Fully connected layer with RoI features
- Input: max-pooled RoI proposals
- Output: corrected RoI proposals, probabilities

RoI pooling in TensorFlow scheme — Fast R-CNN architecture

We note that our detection task can be also solved with the Faster R-CNN architecture, which works significantly faster :). However, the implementation of Faster R-CNN requires much more code to write, so we chose the simpler Fast R-CNN.

Loss function

We tune the network to minimize the loss given by
$loss = frac 1nsum_{i=1}^n frac 1{k_i} sum_{j=1}^{k_i} loss_{ij}$
where:

$n$ is a number of images in a batch,
$k_i$ is a number of RoI proposals for the image $i$,
$loss_{ij}$ is a loss for the RoI proposal $j$ for the image $i$.

For a single RoI proposal, $loss_{ij}$ is the sum of the classification and regression loss, where:

classification loss is the common cross entropy,

regression loss is a smooth L1 distance between the rescaled coordinates of a RoI proposal and the ground-truth box. The regression loss is computed if the ground-truth box is not categorized as background, otherwise it’s defined as 0.

Implementation details

Prerequisites

To run the code we provide, you need the following software:

CUDA 8,
TensorFlow 1.0 with GPU support,
our custom RoI pooling TensorFlow operation,
OpenCV,
Neptune (version 1.5): apply for our Early Adopters Program or try it immediately with Neptune Go.

Repository

You can download our code from our GitHub repository. It consists of two folders with the following content:

File	Purpose
code
main.py	The script to execute.
fast_rcnn.py	Builds the TensorFlow graph.
trainer.py	Preprocesses data and trains the network.
neptune_handler.py	Contains Neptune utilities.
config.yaml	Neptune configuration file.
get_data.py	Downloads images from Pascal VOC 2007 dataset
data
vgg16-20160129.tfmodel.torrent	References to weights of the pretrained network.

Description

When we run main.py , the script trainer.py first restores the VGG16 network with the pretrained weights. Then it adds the RoI pooling layer and the fully connected layer. Finally, it begins tuning the entire network with use of provided images and RoI proposals. It also sends information to Neptune, so we can track the tuning progress in real time.
After cloning the repository, please download the file vgg16-20160129.tfmodel referred to by the torrent file vgg16-20160129.tfmodel.torrent and save it in the data directory. Also, please run the script get_data.py to download needed images:

python get_data.py

Let’s test our RoI pooling in TensorFlow!

We run the script main.py from the code folder by typing:

neptune run --
            --im_folder $PWD/../data/images
            --roidb $PWD/../data/roidb
            --pretrained_path $PWD/../data/vgg16-20160129.tfmodel

If we want to also use a non-default learning rate value or the number of epochs, we can add:

--learning_rate 1e-03 --num_epochs 200

to the command at the end.
After a while, we can start observing the tuning progress in Neptune:

RoI pooling in TensorFlow - tuning — Tracking the network tuning in Neptune

Moreover, we can display the RoIs fitted to the cars by our network. We could just load all the processed images, but this procedure would take much of resources. That’s why we decided to activate this feature by a simple Neptune action.
To do that, we can go to the Actions tab and click ‘RUN’ to start sending the images.

RoI pooling in TensorFlow - turning on the image sending in Neptune — Turning on the image sending

After that, we can go to the Channels tab and expand the channels ‘region proposals for RoI pooling’ and ‘network detections’ by clicking ‘+’ signs.

Roi pooling in TensorFlow - expanding image channels in Neptune — Expanding image channels

Now we can see the RoIs in real time!

RoI pooling in TensorFlow - RoI preview — RoI proposals preview in Neptune

We can click on the pictures to zoom them. If we want Neptune to stop sending new images, we go to the Actions tab and click ‘RUN’ again.
An exemplary NeptuneGo execution of our script can be found here.

Summary

We hope you enjoy our example of RoI pooling in TensorFlow and experiment managing features offered by Neptune. If you want to comment our work, don’t be hesitate to leave us feedback!

References

R. Girshick, Fast R-CNN, IEEE International Conference on Computer Vision (ICCV), 2015.
S. Ren, K. He, R. Girshick & J. Sun, Faster R-CNN: towards real-time object detection with Region Proposal Networks, Neural Information Processing Systems (NIPS), 2015.
deepsense.ai, Region of interest pooling explained, 2017.

Neptune 1.5 – Python 3 support, simplified CLI, compact view

April 21, 2017/in Data science, Deep learning, Machine learning, Neptune /by Rafał Hryciuk

At the end of April 2017, deepsense.ai released a new version of Neptune, the DevOps platform for data scientists. Neptune 1.5 introduces a range of new features and improvements, including support for Python 3, simplification of Neptune CLI, offline execution, compact view, improved channels and charts, and a number of improvements in the user experience.

Python 3.5 Support

One of the most upvoted tickets on our feedback channel has been requests to add support for Python 3. Well, we put your request on our roadmap and now, using version 1.5, you can run Neptune experiments using both Python 2.7 and 3.5.
We encourage you to stay active in our feedback forum and vote for features you need in your data science work. This is how you will influence where Neptune goes and how it develops – and ultimately make it more convenient for you.

Simplification of Neptune CLI

Until now, Neptune CLI’s commands were long and complex. With version 1.5, however, convenience has taken center stage as we’ve introduced a host of improvements and simplifications. Click over and have a look at the simplified CLI commands and configuration file in our documentation.
To see how this change could work for you, compare the commands for running our “Flower Species Prediction” example.
In version 1.4:

neptune run flower-species-prediction/main.py --config flower-species-prediction/config.yaml --storage-url /tmp/neptune-iris --paths-to-dump flower-species-prediction

In version 1.5:

neptune run

Offline Execution

Our users often run parts of their experiments using Jupyter Notebook, but the Neptune client library requires communication with our server. Thanks to offline execution, users can disable communication with the server and run their experiments without CLI. Read more about this convenient development here.

Compact View

To make comparing your experiments easier we have introduced a compact view of the experiments table. You can now display more experiment results on your screen, and draw conclusions even faster and more confidently.

Improved Channels and Charts

Neptune 1.5 comes with the new API for channels and charts. Thanks to the new API you will be able to send and display even more data points. Your charts will load faster and more seamlessly. We encourage you to give the improved channels and charts a go.

The Neptune Pipeline

We are already working on the next version of Neptune, which is slated for a May release and will focus on better displaying the experiments list.
We hope you will enjoy working with our DevOps platform for data scientists. Neptune 1.5 will help you manage and monitor your machine learning experiments even more conveniently.
Would you like to test drive Neptune? Visit NeptuneGo!, have a look around and run your first experiments.

Training XGBoost with R and Neptune

March 27, 2017/in Machine learning, Neptune /by Jan Lasek

In this blogpost we present the R library for Neptune – the DevOps platform for data scientists. Neptune’s R extension is presented by demonstrating the powerful XGBoost library and a bank marketing dataset (available at the UCI Machine Learning Repository).

The goal is to build a model that predicts how likely a given customer is to subscribe to a bank deposit. Such a model can be used as a basis for a recommendation system or for more efficient allocation of resources in a call center. The model is built by using XGBoost: a state-of-the-art library for training predictive models. XGBoost has a long legacy of successful applications in data science – here you can find a list of use cases in which it was used to win open machine learning challenges. If you are interested in more details and other modeling approaches to the problem under consideration we refer to this publication.
Let’s start with describing the dataset!

Bank customer data

The data we are dealing with here is a set of over 41K customer records from a bank. They comprise various features describing each customer. Among others, the provided data is the customer’s age, marital status and some other features regarding previous purchase history. We are also given a set of macroeconomic indicators, for example, the consumer confidence index. Finally, we are given the binary information whether a given customer subscribed for a bank deposit – about 11.3% of all customers decided to do so. Our goal is to build a model that gives the probability of this event.
Inquiries on subscribing to a bank deposit were made in a phone call. Along with data described above, we are also given information about how long such a call with each customer lasted. In the analysis, this feature should be disregarded as it would be considered a data leak. We want to train the model that gives information about the probability of subscribing to a deposit prior to taking any action (in particular, making a phone call). At the end of the day, we want to save resources and time spent on calling customers in vain.
We will employ relatively few preprocessing steps before plugging the data to the model. We will use the R’s model.matrix() function to encode categorical attributes. After this step, the data comprise 53 numeric attributes and a single target column. Loading the data and preprocessing is done using the code below. We also load all necessary libraries that we will use in this example: xgboost, neptune and ModelMetrics.

library(xgboost)
library(neptune)
library(ModelMetrics)
customer_data <- read.csv('https://s3-us-west-2.amazonaws.com/deepsense.neptune/data/bank-additional/bank-additional-full.csv', sep = ';')
customer_data$duration <- NULL
y <- customer_data$y == 'yes'
x <- model.matrix(y~.-1, data = customer_data)

Training XGBoost model

XGBoost is a powerful library for building ensemble machine learning models via the algorithm called gradient boosting. Training an XGBoost model is an iterative process. In each iteration, a new tree (or a forest) is built, which improves the accuracy of the current (ensemble) model.
In order to train and evaluate the model, we will split the data into three parts: a training set, a validation set and a test set. The training set will be used to build our model. With the validation set we will monitor the model’s performance on a different dataset than the training one. Finally, the test set will serve as a sanity check for the model’s final performance on a previously unseen holdout dataset. Here we decide to devote 60% of the data for training, 20% for validation and the remaining 20% for testing. For reproducibility we set a seed here.

set.seed(999)
train_valid_test <- sample(1:3, prob = c(0.6, 0.2, 0.2), replace = T, size = nrow(x))
train_idx <- train_valid_test == 1
valid_idx <- train_valid_test == 2
test_idx <- train_valid_test == 3
y_train <- y[train_idx]
y_valid <- y[valid_idx]
y_test <- y[test_idx]
x_train <- x[train_idx,]
x_valid <- x[valid_idx,]
x_test <- x[test_idx,]

At this point we should introduce an accuracy metric that we will employ. First, we note that there is some class imbalance in the response rate: as few as 1 out of 9 of all the responses are positive (this is typical in case of marketing data). In such applications, the area under the ROC curve (abbreviated as AUC) is often a metric of choice because it handles classification under imbalanced classes well. The rare class – customers subscribing to a deposit – is in this application of special interest for us.
To monitor the training process in Neptune, we need to specify the appropriate Neptune channels. They are used for keeping track of metrics that are important to us. We will use two numeric channels – for the training and validation of the AUC scores. Finally, we are going to record the test AUC score. We can also keep track of other things like the training time for each iteration. On top of the channels we can create custom charts, which can be later viewed in Neptune’s Dashboard:

Below we set up the Neptune channels and Neptune charts:

createNumericChannel('train_auc')
createNumericChannel('valid_auc')
createNumericChannel('test_auc')
createChart(chartName = 'Train & validation auc', series = list('train_auc', 'valid_auc'))
createNumericChannel('execution_time')
createChart(chartName = 'Total execution time', series = list('execution_time'))

Neptune facilitates monitoring our computations that are organized in a process called a job. When executing the job, created channels will be visible for inspection in Neptune’s menu. So far we defined four numeric channels and two charts. After executing the job, you can view the channels in Neptune’s UI:

Yet another useful feature of XGBoost is the possibility of calling a custom callback function after each iteration of the boosting algorithm. Callbacks are useful functions for debugging and online performance monitoring of your models. We will write our own function to keep track of the training process. Here you can see some examples of callback functions. We will specifically overwrite the function cb.print.evaluation() from the repository so that we can track the progress of learning in Neptune. This function is presented below. The variable start_time is a global variable that allows us to monitor the total execution time (we will create it in the next step).

cb.print.evaluation <- function (period = 1) {
  callback <- function(env = parent.frame()) {
    if (length(env$bst_evaluation) == 0 || period == 0)
      return()
    i <- env$iteration
    if ((i - 1)%%period == 0 || i == env$begin_iteration || i == env$end_iteration) {
      channelSend('train_auc', i, env$bst_evaluation[1])
      channelSend('valid_auc', i, env$bst_evaluation[2])
      channelSend('execution_time', i, as.numeric(Sys.time() - start_time))
    }
  }
  attr(callback, 'call') <- match.call()
  attr(callback, 'name') <- 'cb.print.evaluation'
  callback
}

Note that the function refers to its parent environment that it is called from – the parent.frame() function. This allows us to access the objects created during the training process (in this case we choose to monitor both the training and validation of the AUC scores). We can access them all by referring to the parent environment as shown above.
The conditional instructions in the code above check if any monitoring was set and then performs it in every period of iterations (this can be changed via the parameter print.every.n in the xgb.train() function), for the first and last iteration.
Next, we train our model. To start, we create a start_time variable to monitor the execution time.

start_time <- Sys.time()
model <- xgb.train(
params = list(
objective = 'binary:logistic',
eval_metric = 'auc',
max_depth = 4),
data = xgb.DMatrix(x_train, label = y_train),
nrounds = 50,
watchlist = list(
  train = xgb.DMatrix(x_train, label = y_train),
  validation = xgb.DMatrix(x_valid, label = y_valid)),
  callbacks = list(cb.print.evaluation()))

Let’s have a closer look at what is going on here. Above, the training and validation sets for performance monitoring are submitted to the model as the parameter named watchlist. Some extra configuration needs to be done. We set:

evaluation metric to be AUC (parameter eval_metric = ‘auc’)
learning task as a binary classification with logistic loss (objective = ‘binary:logistic’)
maximal depth of individual tree depth to 4 – an arbitrary value (max_depth = 4)
number of boosting iterations to 50 (nrounds = 50).

To finalize the preparations we need to specify a configuration file for Neptune with some metadata. For now it may be as simple as the exemplary file below.

name: Bank Marketing
project: Predicting Deposit Subscription

We can run our code with Neptune in command line with:

$neptune run bank_marketing.R --config xgb_config.yaml --dump-dir-url my_dump_dir

The file bank_marketing.R is our model’s R code. The file name xgb_config.yaml represents Neptune’s configuration and the dump_dir is a directory where all the job’s output and source code will be stored. This is useful for the reproducibility of the experiment.
In Neptune’s dashboard we can see both the training and validation of the AUC scores on a plot.

XGBoost has a useful parameter early_stopping. This parameter stops further training, when the evaluation metric values for the validation set does not improve for the next early_stopping iterations. In machine learning, it is a common way to prevent the overfitting of a model. However, it is not known in advance to what value you have to set this parameter to. The idea here is to plot the training and validation of loss and observe the moment when the training is no longer necessary. From the visualization of the training process above it appears that 10 is a sufficient number of iterations (trees) in our ensemble model. In this way, we also arrive at a less complex model with no loss of its accuracy.
We also changed the random seed above to observe if we arrive at a stable solution for different data shuffles (see train/validation/test split above). In general, based on our experimentation, the ensemble of 10 trees (that is, running the model for 10 iterations) appears to produce a decent and stable model overall. Here, Neptune helps us diagnose a proper early stopping time via presentation of accuracy scores. Finally, it is convenient to set the number of iterations as the job parameter and extend the configuration file as discussed here. For example, in Neptune’s R library, the command line argument nrounds can be accessed in the job via the nrounds <- params(‘nrounds’) command.
It’s time to inspect the model performance on the reminder set of records.

Model evaluation and the lift curve

Finally, we make predictions and evaluate the model using the holdout test set. Using parameter ntreelimit we may specify the model built after 10 iterations to be used for predicting new data (as discussed above).

predictions_test <- predict(model, xgb.DMatrix(x_test), ntreelimit = 10)
auc_test <- auc(y_test, predictions_test)

Evaluation yields 0.80 AUC. This score is close to the accuracy obtained on the validation set, which is good: we created a stable model with no sign of overfitting – it is ready to be used!
Let’s apply it to the holdout test set and measure its effectiveness on the lift chart. This is a tool for getting insight into the expected performance of our marketing campaign if we target it for the most likely customers as predicted by the model.
To produce the lift chart, we sort the true responses (y_test) according to the probability of deposit subscription in a decreasing order and compute a cumulative fraction of responses. This fraction is also normalized against the baseline level equal to the fraction of responses in the test data (11.7%). We create an extra numeric channel and a chart for the lift curve and plot it for top 10%, 20%, …., 100% of customers. This is accomplished using the function below.

lift_chart <- function(responses, predictions) {
  baseline <- mean(responses)
  responses_ordered <- responses[order(predictions, decreasing = TRUE)]
  lift <- cumsum(responses_ordered) / 1:length(responses_ordered) / baseline
  createNumericChannel('lift')
  createChart(chartName = 'Lift chart', series = list('lift'))
  n <- length(lift)
  for(x in seq(0.1, 1, by = 0.1)) {
    # max(., 1) assures a proper index >= 1
    channelSend('lift', x, lift[max(round(x * n), 1)])
  }
}
lift_chart(y_test, predictions_test)

In Neptune’s dashboard this produces an interactive plot named “Lift chart” presented below.

Based on this chart, we may analyze our results in greater detail. For example, we see that contacting top 20% of customers (as selected by our model) translates to over 3-fold increase in the hit rate (that is, the fraction of subscriptions in the selected group) as compared to the baseline level for the test data. This results in a more efficient targeting of our campaign. In practice, the desired number of contacted customers depends on the resources available and costs associated with contacting them.

The end

That’s all! We trained a model to predict how likely a customer is to order a given bank product. Using R and XGBoost with the help of Neptune, we trained a model and tracked its learning process. There is still room for improvement of the accuracy of the model. Playing with the parameters described above would be a good starting point here. You can give it a try and access the complete workflow at our Github repository (for the 1.4 version) or explore the job in NeptuneGo!.
This is our first attempt at integrating Neptune with R. We will be very grateful for your feedback here! We would like to make sure that it fits well with your experimentation pipeline. Moreover, we hope that it can improve your pipeline as dramatically as Neptune’s Python API changed our experience.

Neptune machine learning platform: grid search, R & Java support

March 6, 2017/in Data science, Deep learning, Machine learning, Neptune /by Rafał Hryciuk

In February we released a new version of Neptune, our machine learning platform for data scientists, supporting them in more efficient experiment management and monitoring. The latest 1.4 release introduces new features, like grid search — a hyperparameter optimization method and support for R and Java programming languages.

Grid Search

The first major feature introduced in Neptune 1.4 is support for grid search, which is one of the most popular hyperparameter optimization method. You can read more about grid search here.
In version 1.4 in your Neptune experiment you can pass a list or a range of values instead of passing a specific value for the numeric parameter. Neptune will create a grid search experiment and run a job for every combination of parameters’ values. Neptune groups and helps you manage results within the grid search experiment. You can define custom metrics for evaluation. Neptune will automatically select the combination of hyperparameters’ values that give the best value of the metric. Read an example.

R Support and Java Support

Neptune exposes REST API, so it is completely language and platform agnostic. deepsense.ai provides high-level client libraries for the most popular programming languages among data scientists (according to the poll taken in the community — see the results). Thanks to client libraries, users don’t have to implement communication via REST API themselves but instead they can invoke high-level functions. Until version 1.4 we only supported client library for Python. In version 1.4 we introduced support for for R and Java (which also covers Scala users). Thanks to new client libraries you can run, monitor and manage your experiments written in R or Java in the Neptune machine learning platform. You can get client libraries for R and Java here.

Future Plans

We have already been working on the next version of Neptune, which will be released at the beginning of April 2017. Next release will contain:

Architectural and API changes that will improve user experience.
New approach for handling snapshots of the experiments’ code.
Neptune Offline Context — the user will be able to run the code that uses Neptune API offline.

I hope you will enjoy working with our machine learning platform, now with grid search support and client libraries for R and Java. If you’d like to give us feedback, feel free to use our forum at https://community.neptune.ml.
Do you want to check out Neptune? Visit NeptuneGo!, look around and run your first experiments.

Practical deep learning

A good dataset – CIFAR-10 for image classification

Getting our hands dirty

Architectures and blocks (in Keras)

Logistic regression

Multilayer perceptron

Convolutional neural networks

Introduction

Image classification sample solution overview

What if I want to use my network architecture?

How to get started?

Final remarks

How many people does your brand reach?

Solution overview

Logo detection with deep learning

Model training

Logo detection analytics

Results

Possible extensions

Cloud support

Interactive Notebooks

New Leaderboard

Parameters

Try Neptune 2.1

Distributed reinforcement learning

Our achievement and results

Details of the implementation

Related work

Acknowledgements

The problem

Data and goal

Problems with data

Evaluation

Our solution

A bunch of convolutional neural networks

Choosing the training loss function

Ensembling

Other techniques tested

Neptune

Results

Summary

Example overview

Neptune

Network architecture

Loss function

Implementation details

Prerequisites

Repository

Description

Let’s test our RoI pooling in TensorFlow!

Summary

References

Python 3.5 Support

Simplification of Neptune CLI

Offline Execution

Compact View

Improved Channels and Charts

The Neptune Pipeline

Bank customer data

Training XGBoost model

Model evaluation and the lift curve

The end

Grid Search

R Support and Java Support

Future Plans

Contact us

Locations

Let us know how we can help

Services

Resources

About us

Support

Join our community