Blog posts Archives - Page 13 of 18

Region of interest pooling explained

Region of interest pooling (also known as RoI pooling) is an operation widely used in object detection tasks using convolutional neural networks. For example, to detect multiple cars and pedestrians in a single image. Its purpose is to perform max pooling on inputs of nonuniform sizes to obtain fixed-size feature maps (e.g. 7×7).

We’ve just released an open-source implementation of RoI pooling layer for TensorFlow (you can find it here). In this post, we’re going to say a few words about this interesting neural network layer. But first, let’s start with some background.
Two major tasks in computer vision are object classification and object detection. In the first case the system is supposed to correctly label the dominant object in an image. In the second case it should provide correct labels and locations for all objects in an image. Of course there are other interesting areas of computer vision, such as image segmentation, but today we’re going to focus on detection. In this task we’re usually supposed to draw bounding boxes around any object from a previously specified set of categories and assign a class to each of them. For example, let’s say we’re developing an algorithm for self-driving cars and we’d like to use a camera to detect other cars, pedestrians, cyclists, etc. — our dataset might look like this.
In this case we’d have to draw a box around every significant object and assign a class to it. This task is more challenging than classification tasks such as MNIST or CIFAR. On each frame of the video, there might be multiple objects, some of them overlapping, some poorly visible or occluded. Moreover, for such an algorithm, performance can be a key issue. In particular for autonomous driving we have to process tens of frames per second.
So how do we solve this problem?

Typical architecture

The object detection architecture we’re going to be talking about today is broken down in two stages:

Region proposal: Given an input image find all possible places where objects can be located. The output of this stage should be a list of bounding boxes of likely positions of objects. These are often called region proposals or regions of interest. There are quite a few methods for this task, but we’re not going to talk about them in this post.
Final classification: for every region proposal from the previous stage, decide whether it belongs to one of the target classes or to the background. Here we could use a deep convolutional network.

Object detection pipeline wit region of interest pooling

Object detection pipeline with region of interest pooling

Usually in the proposal phase we have to generate a lot of regions of interest. Why? If an object is not detected during the first stage (region proposal), there’s no way to correctly classify it in the second phase. That’s why it’s extremely important for the region proposals to have a high recall. And that’s achieved by generating very large numbers of proposals (e.g., a few thousands per frame). Most of them will be classified as background in the second stage of the detection algorithm.
Some problems with this architecture are:

Generating a large number of regions of interest can lead to performance problems. This would make real-time object detection difficult to implement.
It’s suboptimal in terms of processing speed. More on this later.
You can’t do end-to-end training, i.e., you can’t train all the components of the system in one run (which would yield much better results)

That’s where region of interest pooling comes into play.

What is region of interest pooling?

Region of interest pooling is a neural-net layer used for object detection tasks. It was first proposed by Ross Girshick in April 2015 (the article can be found here) and it achieves a significant speedup of both training and testing. It also maintains a high detection accuracy. The layer takes two inputs:

A fixed-size feature map obtained from a deep convolutional network with several convolutions and max pooling layers.
An N x 5 matrix of representing a list of regions of interest, where N is a number of RoIs. The first column represents the image index and the remaining four are the coordinates of the top left and bottom right corners of the region.

An image from the Pascal VOC dataset annotated with region proposals (the pink rectangles)

What does the RoI pooling actually do? For every region of interest from the input list, it takes a section of the input feature map that corresponds to it and scales it to some pre-defined size (e.g., 7×7). The scaling is done by:

Dividing the region proposal into equal-sized sections (the number of which is the same as the dimension of the output)
Finding the largest value in each section
Copying these max values to the output buffer

The result is that from a list of rectangles with different sizes we can quickly get a list of corresponding feature maps with a fixed size. Note that the dimension of the RoI pooling output doesn’t actually depend on the size of the input feature map nor on the size of the region proposals. It’s determined solely by the number of sections we divide the proposal into. What’s the benefit of RoI pooling? One of them is processing speed. If there are multiple object proposals on the frame (and usually there’ll be a lot of them), we can still use the same input feature map for all of them. Since computing the convolutions at early stages of processing is very expensive, this approach can save us a lot of time.

Region of interest pooling — example

Let’s consider a small example to see how it works. We’re going to perform region of interest pooling on a single 8×8 feature map, one region of interest and an output size of 2×2. Our input feature map looks like this:

Let’s say we also have a region proposal (top left, bottom right coordinates): (0, 3), (7, 8). In the picture it would look like this:
Normally, there’d be multiple feature maps and multiple proposals for each of them, but we’re keeping things simple for the example.
By dividing it into (2×2) sections (because the output size is 2×2) we get:

Notice that the size of the region of interest doesn’t have to be perfectly divisible by the number of pooling sections (in this case our RoI is 7×5 and we have 2×2 pooling sections).
The max values in each of the sections are:

And that’s the output from the Region of Interest pooling layer. Here’s our example presented in form of a nice animation:

What are the most important things to remember about RoI Pooling?

It’s used for object detection tasks
It allows us to reuse the feature map from the convolutional network
It can significantly speed up both train and test time
It allows to train object detection systems in an end-to-end manner

If you need an opensource implementation of RoI pooling in TensorFlow you can find our version here.
In the next post, we’re going to show you some examples on how to use region of interest pooling with Neptune and TensorFlow.

References

Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE International Conference on Computer Vision. 2015.
Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
Sermanet, Pierre, et al. “Overfeat: Integrated recognition, localization and detection using convolutional networks.” arXiv preprint arXiv:1312.6229. 2013.

Machine Learning for Applications in Manufacturing

June 22, 2019/in Data science, Machine learning /by Michal Romaniuk, Barbara Rutkowska and Konrad Budek

While modern manufacturing technology is starting to incorporate machine learning throughout the production process, predictive algorithms are being used to plan machine maintenance adaptively rather than on a fixed schedule. And these are only the tip of the iceberg.

Machine learning models can enhance nearly every aspect of a business, from marketing to sales to maintenance. In manufacturing, the rise of IoT, and the unprecedented amounts of data it throws off, has ushered in numerous opportunities to utilize machine learning. The computerization of industrial machinery is also undergoing rapid computerization. IDC data indicates that spending on IoT platforms will rise from $745 billion annually in 2019 to over $1 trillion in 2022.

According to a Global Market Insights report, global machine learning in manufacturing is going to skyrocket from $1 billion in 2018 to $16 billion by 2025. Alongside this, there will be a continuous need to reduce costs and grow the adoption of industry 4.0 technologies, including the predictive maintenance and machine inspection done by AI.

Examples of ML-based predictive maintenance

Machine learning enables predictive monitoring, with machine learning algorithms forecasting equipment breakdowns before they occur and scheduling timely maintenance. With the work it did on predictive maintenance in medical devices, deepsense.ai reduced downtime by 15%.

But it isn’t just in straightforward failure prediction where Machine learning supports maintenance. In another recent application, our team delivered a system that automates industrial documentation digitization, effectively reducing workflow time by up to 90%.

An automotive plant implemented a predictive maintenance solution for a hydraulic press used in vehicle panel production. Detailed studies of the maintenance process showed that engineers were spending far too much time attending to breakdowns instead of allocating resources for planned maintenance. The new solution enabled them to predict equipment failure with an accuracy of 92%, plan maintenance more effectively and offer greater asset reliability and product quality. Overall equipment efficiency increased from 65% (the industry average) to 85%.

Machine Learning for Applications in Manufacturing - Img 1

An automotive plant implemented a predictive maintenance solution for a hydraulic press used in vehicle panel production.

A beverage industry manufacturer of industrial equipment fit their machines with a monitoring and prediction system to help engineers plan better preventative maintenance. This solved the problem of inefficient, reactive customer service and helped to optimize equipment maintenance schedules, which had always been based on defined time intervals, rather than real needs. The application of machine learning increased business scalability and optimized the company’s cost structure.

Predictive maintenance is also expected to become an important technological component of autonomous vehicles. Self-driving cars are likely to follow the -as-a-service business model (rather than the ownership model of today’s car industry). That means cars will have to monitor their own condition rather than rely on their owner-driver to spot problems and take the vehicle to a service station.

Quality control

Artificial intelligence (AI) is also being adopted for product inspection and quality control. ML-based computer vision algorithms can learn from a set of samples to distinguish the “good” from the flawed. In particular, semi-supervised anomaly detection algorithms only require “good” samples in their training set, making a library of possible defects unnecessary. Alternatively, a solution can be developed that compares samples to typical cases of defects.

The savings machine learning offers in visual quality control in manufacturing vary by niche. In semiconductor manufacturing, the cost of testing and failures account for up to 30% of overall product costs. According to Forbes, automated quality testing done with machine learning can increase detection rates by up to 90%.

A manufacturer of agricultural product packing equipment has recently introduced a high-performance fruit sorting machine that uses computer vision and machine learning to classify skin defects. The operator can teach the sorting platform to distinguish between different types of defects and sort the fruit into sophisticated pack grades. The solution combines hardware, software and operational optimization to reduce the complexity of the sorting process.

Logistics and inventory management

It isn’t only on the assembly line and production plant where large strides have been made. Industry also requires an astonishing amount of logistics to power the entire production process. Employing machine learning-based solutions to handle logistics-related issues boosts efficiency and slashes costs.

According to Material, Handling and Logistics Magazine, the average US business loses $171,340 per year due to repetitive, mundane and time-wasting tasks like searching for order numbers, processing papers and calculating the value of orders. In any manufacturing company, logistics and production-related paperwork sap thousands of man-hours annually.

Resource management is another strength of machine learning-based algorithms. To see just how strong it can be, look no further than the power-consumption optimization algorithm Google applied in its data center cooling systems to reduce its electric bills–by up to 40%. That it was done without any infrastructure modernization or modification – the big data flowing through the system itself was enough – makes the feat all the more impressive.

Rising interest in machine learning applications in the manufacturing industry

Manufacturing companies now sponsor competitions for data scientists to see how well their specific problems can be solved with machine learning. A recent one, hosted by Kaggle, the most popular global platform for data science contests, challenged competitors to predict which manufactured parts would fail quality control. The participants needed to base their predictions on thousands of measurements and tests that had been done earlier on each component along the assembly line.

The competition was sponsored by Bosch, which is striving to trace causes of manufacturing defects back to specific steps in the production process, as well as support waste reduction by rejecting faulty components at early stages. The results of the competition are expected to improve product quality and lower costs.

The near future – robotics-powered manufacturing

Modern manufacturing, despite being largely automated, is still heavily reliant on the human workforce. Machines perform tasks to perfection–when they are in a closed environment–but performing a precise job in a more changing environment requires a human specialist.

Machine Learning for Applications in Manufacturing- Img 3

In the near future, a large part of manufacturing could be taken over by robots that are flexible enough to cooperate with humans

In the near future, a large part of manufacturing could be taken over by robots that are flexible enough to cooperate with humans and perform tasks in a more human way. They will be able to adapt to both the changing environment and the object being produced. Democratization and productization are among today’s leading AI trends.

Robotics provides a great opportunity for reinforcement learning. This machine learning technique is currently the only way for machines to adapt to a changing environment and build more complex strategies for achieving their goals, which are unachievable with explicitly programmed solutions.

deepsense.ai’s Research and Development Hub currently has several scientific projects supporting the application of AI in robotics, with expert augmented reinforcement learning and artificial imagination as a training environment.

GeoJson Operations in Apache Spark with Seahorse SDK

February 7, 2017/in Big data & Spark, Seahorse /by Adam Jakubowski

A few days ago we released Seahorse 1.4, an enhanced version of our machine learning, Big Data manipulation and data visualization product.

This release also comes with an SDK – a Scala toolkit for creating new custom operations to be used in Seahorse.
As a showcase, we will create a custom Geospatial operation with GeoJson and make a simple Seahorse workflow using it. This use-case is inspired by Example 8 from book Advanced Analytics with Spark.

Geospatial data in GeoJson format

GeoJson can encode many geographic data types. For example:

Location Point
Line
Region described by a polygon

GeoJson encodes Geospatial data using Json.

{
    "type": "Point",
    "coordinates": [125.6, 10.1]
}

New York City Taxi Trips

In our workflow we will use a New York Taxi Trip dataset with pickup and drop-off location points.
Let’s say that we want to know how many Taxi Trips started on Manhattan and ended up in Bronx. To achieve this, we could use an operation filtering dataset by location contained inside some geographic region. Let’s call this operation ‘Filter inside Polygon’.

Manhattan GeoJson Polygon visualization using geojson.io

Bronx GeoJson Polygon visualization using geojson.io

You can download GeoJson data with New York boroughs from https://github.com/dwillis/nyc-maps.

Implementing custom operation using Seahorse SDK steps:

Clone Seahorse SDK Example Git Repository.
Implement custom Geospatial operation.
Assembly jar with custom Geospatial operation.
Add assembled jar to Seahorse instance.
Use custom Geospatial operation in a Workflow.

Seahorse SDK Example Git Repository

The fastest way to start developing a custom Seahorse operations is by cloning our SDK Example Git Repository and write your code from there.

git clone --branch 1.4.0 https://github.com/deepsense-io/seahorse-sdk-example

The Seahorse SDK Example Repository has all Seahorse and Apache Spark dependencies already defined in an SBT build file definition.
Let’s add the geospatial data library to our dependencies in the build.sbt file:

libraryDependencies += "com.esri.geometry" % "esri-geometry-api" % "1.0"

Now we can implement the FilterInsidePolygon operation:

@Register // 1
final class FilterInsidePolygon
 extends DOperation1To1[DataFrame, DataFrame] { // 2
 override val id: Id = "48fa3638-bc8d-4430-909f-85d4ece824a3" // 3
 override val name: String = "Filter Location Inside Polygon"
 override val description: String = "Filter out rows " +
   "for which location is contained within a specified GeoJson polygon"
 lazy val geoJsonPointColumnSelector = SingleColumnSelectorParam(
  name = "GeoJson Point - Column Name",
  description = Some("Column name containing " +
    "location written as Point object in GeoJson"),
  portIndex = 0
 )
 lazy val geoJsonPolygon = StringParam( // 5
   name = "GeoJson Polygon",
   description = Some("Polygon written in GeoJson format used for filtering out the rows")
 )
 override def params = Array(geoJsonPointColumnSelector, geoJsonPolygon)

We need to annotate the operation with @Register so that Seahorse knows that it should be registered in the operation catalogue.
We extend DOperation1To1[DataFrame, DataFrame] because our operation takes one DataFrame as an input and returns one DataFrame as the output.
UUID is used to uniquely identify this operation.
After making changes in operation (e.g. name is changed), idshould not be changed – it is used by Seahorse to recognize that it’s the same operation as before.
Parameter geoJsonPointColumnSelector will be used for selecting a column with Point Location data.

Parameter geoJsonPolygon represents a Polygon we will be checking location points against.
Now we can implement an execute method with actual GeoSpatial and Apache Spark dataframe code.

override protected def execute(input: DataFrame)(context: ExecutionContext): DataFrame = {
 // 1 - Parse polygon
 val polygon = GeometryEngine.geometryFromGeoJson(
   $(geoJsonPolygon),
   GeoJsonImportFlags.geoJsonImportDefaults,
   Geometry.Type.Polygon
 )
 val columnName = DataFrameColumnsGetter.getColumnName(
   input.schema.get,
   $(geoJsonPointColumnSelector)
 )
 val filtered = input.sparkDataFrame.filter(row => {
   try {
     val pointGeoJson = row.getAs[String](columnName)
     // 2 - Parse location point
     val point = GeometryEngine.geometryFromGeoJson(
       pointGeoJson,
       GeoJsonImportFlags.geoJsonImportDefaults,
       Geometry.Type.Point
     )
     // EPSG:4326 Spatial Reference
     val standardCoordinateFrameForEarth = SpatialReference.create(4326)
     // 3 - Test if polygon contains point
     GeometryEngine.contains(
       polygon.getGeometry,
       point.getGeometry,
       standardCoordinateFrameForEarth
     )
   } catch {
     case _ => false // ignore invalid rows for now
   }
 })
 DataFrame.fromSparkDataFrame(filtered)
}

We parse our polygon data using GeometryEngine class from esri-geometry-api library.
Inside the Apache Spark dataFrame filter we use GeometryEngine class again to parse location points in each row.
We use GeometryEngine class to test whether the point is contained inside the specified polygon.

Now that the implementation is finished, you can build a JAR and add it to Seahorse:

Run sbt assembly. This produces a JAR in the target/scala-2.11 directory.
Put this JAR in $SEAHORSE/jars, where $SEAHORSE is the directory with docker-compose.yml or Vagrantfile (depending whether you run Docker or Vagrant).
Restart Seahorse (By either `stop`ing and `start`ing `docker-compose` or `halt`ing and `up`ing `vagrant`).
Operations are now visible and ready to use in Seahorse Workflow Editor.

And now that we have Filter Inside Polygon, we can start implementing our workflow:

First we need to specify our datasource. You can download the whole 26GB dataset from http://www.andresmh.com/nyctaxitrips.
Since we are using a Local Apache Spark Master we sample 100k rows from the whole dataset.
Additionally we also transformed the latitude and longitude column into one GeoJson column for both pickup and drop-off locations.
You can download the final preprocessed dataset used for this experiment from here.

Building our Workflow

Let’s start by defining our Datasource in Seahorse:

Next, we use NYC borough GeoJson polygon values from https://github.com/dwillis/nyc-maps as attributes in our Filter Inside Polygon operations:

Finally, let’s run our workflow and see how many Taxi trips started on Manhattan and ended in Bronx:

As we can see from our 100k data sample, only 383 trips started on Manhattan and ended in Bronx.

Summary

We cloned the Seahorse SDK Example repository and starting from it we implemented a custom Filter Inside Polygon operation using Scala. Then we built a JAR file with our operation and we added it to Seahorse. After that we built our Workflow and used custom Filter Inside Polygon operation working in Apache Spark dataframes with GeoJson data.

Scheduling Spark jobs in Seahorse

January 30, 2017/in Big data & Spark, Seahorse /by Michal Szostek

In the latest Seahorse release we introduced the scheduling of Spark jobs. We will show you how to use it to regularly collect data and send reports generated from that data via email.

Use case

Let’s say that we have a local meteo station and the data from this station is uploaded automatically to Google Sheets. That is, it contains only one row (i.e. row number 2) of data which always contains current weather characteristics.

We would like to generate some graphs from this data every now and then and do it without any repeated effort.

Solution

Collecting data

The first workflow that we create collects a snapshot of live data and appends it to a historical data set. It will be run at regular intervals.
Let’s start by creating two data sources in Seahorse. The first data source represents the Google Spreadsheet with live-updated weather data:

For the historical weather data, we create a data source representing a file stored locally within Seahorse. Initially, it only contains a header and one row – current data.

Now we can create our first Spark job which reads both data sources, concatenates them and writes back to the “Historical weather data” data source. In this way, there will be a new data row in “Historical weather data” every time the workflow is run.

There is one more step for us to do – this Spark job needs to be scheduled. It is run every half an hour and after every scheduled job execution an email is sent.

In the email there is a link to that workflow, where we are able to see node reports from the execution.

Sending a weather report

The next workflow is very simple – it consists of a Read DataFrame operation and a Python Notebook. It is done separately from collecting the data because we want to send reports at a much lower rate.

In a notebook, first we transform the data a little:

import pandas
import numpy
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
data = dataframe().toPandas()
data['time'] = pandas.to_datetime(data['time'])
data = data.sort('time')

Next, we prepare a function that will be used to line-plot a time series of weather characteristics like temperature, air pressure and humidity:

def plot(col_to_plot):
   ax = plt.axes()
   ax.spines["top"].set_visible(False)
   ax.spines["right"].set_visible(False)
   ax.get_xaxis().tick_bottom()
   ax.get_yaxis().tick_left()
   y_formatter = matplotlib.ticker.ScalarFormatter(useOffset=False)
   ax.yaxis.set_major_formatter(y_formatter)
   plt.locator_params(nbins=5, axes='x')
   plt.locator_params(nbins=5, axes='y')
   plt.yticks(fontsize=14)
   plt.xticks(fontsize=14)
   plt.plot(data['time'], data[col_to_plot])
   plt.gcf().autofmt_xdate()
   plt.show()

After editing the notebook, we set up its parameters so that reports are sent to our email address after each run.

Finally, we can set up a schedule for our report. It is similar to the previous one, only sent less often– we send a report every day in the evening. After each execution two emails are sent – one, as previously, with a link to the executed workflow and the second one containing a notebook execution result.

Result

After some time, we have enough data to plot it. Here is a sample from the “Historical weather data” data source…

… and graphs that are in the executed notebook:

Summary

We created a data processing and email reporting pipeline using two simple workflows and some Python code. Seahorse works well not only for big data, but also for scenarios when you need to integrate data from different sources and periodically generate reports – Seahorse data sources and Spark job scheduling are the right tool for the job.

An internal validation leaderboard in Neptune

January 19, 2017/in Data science, Deep learning, Machine learning, Neptune /by Patryk Miziuła

Internal validation is a useful tool for comparing results of experiments performed by team members in any business or research task. It can also be a valuable complement of public leaderboards attached to machine learning competitions on platforms like Kaggle.

In this post, we present how to build an internal validation leaderboard using Python scripts and the Neptune environment. As an example of a use case, we will take the well known classification dataset CIFAR-10. We study it using a deep convolutional neural network provided in the TensorFlow tutorial.

Why internal leaderboard?

Whenever we solve the same problem in many ways, we want to know which way is the best. Therefore we validate, compare and order the solutions. In this way, we naturally create the ranking of our solutions – the leaderboard.
We usually care about the privacy of our work. We want to keep the techniques used and the results of our experiments confidential. Hence, our validation should remain undisclosed as well – it should be internal.
If we keep improving the models and produce new solutions at a fast pace, at some point we are no longer able to manage the internal validation leaderboard manually. Then we need a tool which will do that for us automatically and will present the results to us in a readable form.

Business and research projects

In any business or research project you are probably interested in the productivity of team members. You would like to know who and when submits his or her solution to the problem, what kind of model they use and how good the solution is.
A good internal leaderboard stores all that information. It also allows you to search for submissions sent by specific user, defined in some time window or using a particular model. Finally, you can sort the submissions with respect to the accuracy metric to find the best one.

Machine learning competitions

The popular machine learning platform, Kaggle, offers a readable public leaderboard for every competition. Each contestant can follow his position in the ranking and try to improve several times a day.
However, an internal validation would be very useful for every competing team. A good internal leaderboard has many advantages over a public one:

the results remain exclusive,
there is no limit on the number of daily submissions,
metrics other than those chosen by the competition organizers can be evaluated as well,
the submissions can be tagged, for example to indicate the used model.

Note that in every official competition the ground truth labels for the test data are not provided. Hence, to produce the internal validation we are forced to split the available public training data. One part is used to tune the model, the other is needed to evaluate it internally. This division can be an origin of unexpected problems (e.g., data leaks) so perform it carefully!

Why Neptune?

Neptune was designed to manage multiple experiments. Among many features, it supports storing parameters, logs and metric values from various experiment executions. The results are accessible through an aesthetic Web UI.
In Neptune you can:

gather experiments from various projects in groups,
add tags to experiments and filter by them,
sort experiments by users, date of creation, or – most importantly for us – by metric values.

Due to that, Neptune is a handy tool for creating an internal validation leaderboard for your team.

An internal validation leaderboard in Neptune

Tracking a TensorFlow experiment in Neptune

Let’s do it!

Let’s build an exemplary internal validation leaderboard in Neptune.

CIFAR-10 dataset

We use the well-known classification dataset CIFAR-10. Every image in this dataset is a member of one of 10 classes, labeled by numbers from 0 to 9. Using the train data we build a model which allows us to predict the labels of images from test data. CIFAR-10 is designed for educational purposes, therefore the ground truth labels for test data are provided.

Evaluating functions

Let’s fix the notation:

$N$ – number of images we have to classify.
$c_i$ – class to which the $i$th image belongs; $iin{0,ldots,N-1}$, $c_iin{0,ldots,9}$.
$p_{ij}$ – estimated probability that the $i$th image belongs to the class $j$; $iin{0,ldots,N-1}$, $jin{0,ldots,9}$, $p_{ij}in[0,1]$.

We evaluate our submission with two metrics. The first metric is the classification accuracy given by
$frac 1Nsum_{i=0}^{N-1}mathbb{1}Big(argmax_j p_{ij}=c_iBig)$
This is the percentage of labels that are predicted correctly. We would like to maximize it, the optimal value is 1. The second metric is the average cross entropy given by
$-frac 1Nsum_{i=0}^{N-1}log p_{ic_i}$
This formula is simpler than the principal entropy since the classes are completely mutually exclusive. We would like to minimize it, preferably to 0.

Implementation details

Prerequisites

To run the code we provide you need the following software:

Neptune: apply for our Early Adopters Program or try it immediately with Neptune Go,
TensorFlow 1.0.

Repository

The code we use is based on that available in the TensorFlow convolutional neural networks tutorial. You can download our code from our GitHub repository. It consists of the following files:

File	Purpose
main.py	The script to execute.
cifar10_submission.py	Computes submission for a CIFAR-10 model.
evaluation.py	Contains functions required to create the leaderboard in Neptune.
config.yaml	Neptune configuration file.

Description

When you run main.py, you first train a neural network using function cifar10_train provided by TensorFlow. We hard-coded the number of training steps. This could be enhanced to dynamic using Neptune action, but for the sake of brevity we skip this topic in the blog post. Due to TensorFlow Integration you can track the tuning of the network in Neptune. Moreover, the parameters of the tuned network are stored in a file manageable by TensorFlow saver objects.
Then function cifar10_submission is called. It restores parameters of the network from the file created by cifar10_train. Next, it forward-propagates the images from the test set through the network to obtain a submission. The submission is stored as a Python Numpy array submission of the shape $Ntimes 10$, the $i$th row contains estimated probabilities $p_{i0},ldots,p_{i9}$. The ground truth labels forms a Python Numpy array true_labels of the shape $Ntimes 1$, the $i$th row contains label $c_i$.
Ultimately, for given submission and true_labels arrays function evaluate_and_send_to_neptune from script evaluation.py computes metric values and sends them to Neptune.
File config.yaml is a Neptune job configuration file, essential for running Neptune jobs. Please download all the files and place them in the same folder.

Step by step

We create a validation leaderboard in Neptune in 4 easy steps:

Creating a Neptune group
Creating an evaluation module
Sending submissions to Neptune
Customizing a view in Neptune’s Web UI

1. Creating a Neptune group

We create the Neptune group where all the submissions will be stored. We do this as follows:

Enter the Neptune home screen.
Click in the lower left corner, enter the name “CIFAR-10 leaderboard”, click again.
Choose “project” “is” and type “CIFAR-10”, click “Apply”.

Our new group appears in the left column. We can edit or delete it by clicking the icon next to the group name.

An internal validation leaderboard in Neptune 4

2. Creating an evaluation module

We created the module evaluation.py consisting of 5 functions:

_evaluate_accuracy and _evaluate_cross_entropy compute the respective metrics,
_prepare_neptune adds tags to the Neptune job (if specified – see Step 4) and create Neptune channels to send evaluated metrics,
_send_to_neptune sends metrics to channels,
evaluate_and_send_to_neptune calls the above functions.

You can easily adapt this script to evaluate and send any other metrics.

3. Sending submissions to Neptune

To place our submissions in the Neptune group, we need to specify project: CIFAR-10 in a Neptune config file config.yaml . This is a three-line-long file, it also contains project name and a description.
Assume that the files from our repository are placed in the folder named leaderboard . The last preparation step we have to do is clone CIFAR-10 scripts from the TensorFlow repository. To do it, we go to the folder above folder leaderboard and type:

git clone https://github.com/tensorflow/models/
export PYTHONPATH="$PWD/models/tutorials/image/cifar10:$PYTHONPATH"

Now we are ready to send our results to the leaderboard created in Neptune! We run the script main.py from the folder above folder leaderboard by typing

neptune run leaderboard/main.py --config leaderboard/config.yaml --dump-dir-url leaderboard/dump --paths-to-dump leaderboard

using Neptune CLI. The script executes for about half an hour on a modern laptop. Training would be significantly faster on a GPU.
There are only 5 lines related to Neptune in the main.py script. First we load the library:

from deepsense import neptune

Then we initialize a Neptune context:

ctx = neptune.Context()

Next, command

ctx.integrate_with_tensorflow()

automatically creates and manages Neptune channels related to TensorFlow SummaryWriter objects. Thereby, we can observe the progress of our network in the Neptune Dashboard. Finally, in lines

tags = ["tensorflow", "tutorial"]
evaluation.evaluate_and_send_to_neptune(submission, true_labels, ctx, tags)

we evaluate our submission and send metric values to dedicated Neptune channels. tags is a list of tags which we can add to the Neptune job. In this way, we attach some keywords to the Neptune job. We can easily filter jobs by tags in the Neptune Web UI.

4. Customizing a view in Neptune’s Web UI

If the job has been successfully executed, we can see our submission in the Neptune group we created. One more thing worth doing is setting up the view of columns.

Click “Show/hide columns” in the upper part of the Neptune Web UI.
Check/uncheck the names. You should:
- uncheck “Project” since all the submissions in this group come from the same project CIFAR-10,
- check channel names “accuracy” and “cross entropy” because you want to sort with respect to them.

You can sort submissions by accuracy or cross entropy value by clicking the triangle over the respective column.

Summary

That’s all! Now your internal validation leaderboard in Neptune is all set up. You and your team members can compare your models tuned up to the CIFAR-10 dataset. You can also filter your results by dates, users or custom tags.
Of course, CIFAR-10 is not the only possible application of the provided code. You can easily adapt it for other applications like: contests, research or business intelligence. Feel free to use an internal validation leaderboard in Neptune wherever and whenever you need.

Neptune 1.3 with TensorFlow integration and experiments in Docker

December 30, 2016/in Data science, Deep learning, Machine learning, Neptune /by Rafał Hryciuk

We’re happy to announce that a new version of Neptune became available this month. The latest 1.3 release of deepsense.ai’s machine learning platform introduces powerful new features and improvements. This release’s key added features are: integration with TensorFlow and running Neptune experiments in Docker containers (see complete release notes).

TensorFlow Integration

The first major feature introduced in Neptune 1.3 is TensorFlow integration. We think that TensorFlow will become a leading technology for deep learning problems. TensorFlow comes with its own monitoring tool: TensorBoard. We don’t want to compete with TensorBoard, instead we want to incorporate TensorBoard’s well known functionalities into Neptune. Starting with Neptune 1.3, data scientist can see all available TensorBoard metrics and graphs in Neptune. Read more.

Running Neptune Experiments in Docker Containers

Neptune creates a snapshot of code for every experiment execution. Thanks to this users can easily recreate the results of every experiment. The problem is that the technology world is changing very quickly and saving the source code is often not enough. We also need to save our execution environment because the source code depends on specific versions of libraries. Neptune 1.3 gives users the option to run a Neptune experiment in a Docker container. A Docker container is an encapsulation of the execution environment. Thanks to this the user can have containers with different versions of the libraries and use them on the same host to recreate the experiment’s results.
Running Neptune experiments in Docker containers is also important for Windows users. The suggested way of running TensorFlow experiments on Windows is to run them in Docker containers. Now, a data scientist can use TensorFlow with Neptune on Windows.
Follow the link to read more about running Neptune experiments in docker containers.

Future Plans

We are already working on the next version of Neptune which will be released at the end of January 2017. The next release will contain:

Client Library for R and Java; and
Support for hyperparameter optimization, grid search method.

We hope you will enjoy working with our machine learning platform, which now features TensorFlow integration and enables running experiments in Docker containers. If you’d like to provide us with any feedback, feel free to use our forum at https://community.neptune.ml/.
Don’t have Neptune yet? Join our Early Adopters Program and get free access.

Machine Learning Models Predicting Dangerous Seismic Events

October 27, 2016/in Data science, Machine learning /by Michał Tadeusiak

Underground mining poses a number of threats including fires, methane outbreaks or seismic tremors and bumps. An automatic system for predicting and alerting against such dangerous events is of utmost importance – and also a great challenge for data scientists and their machine learning models. This was the inspiration for the organizers of AAIA’16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines.
Our solutions topped the final leaderboard by taking the first two places. In this post, we present the competition and describe our winning approach. For a more detailed description please see our publication.

The Challenge

The datasets provided by organizers comprised of seismic activity records from Polish coal mines (24 working sites) collected throughout a period of several months. The goal was to develop a prediction model that, based on recordings from a 24-hour period, would predict whether an energy warning level (50k Joules) was going to be exceeded in the following 8 hours.
The participants were presented with two datasets: one for training and the other for testing purposes. The training set consisted of 133,151 observations accompanied by labels – binary values indicating warning level excess. The test set was of a moderate size of 3,860 observations. The prediction model should automatically assess the danger for each event, i.e. evaluate the likelihood of exceeding the energy warning level for each observation.
The model’s quality was to be judged by the accuracy of the likelihoods measured with respect to the Area Under ROC curve (AUC) metric. This is a common metric for evaluating binary predictions, particularly when the classes are imbalanced (only about 2% of events in the training dataset were labeled as dangerous). The AUC measure can be interpreted as the probability that a randomly drawn observation from the danger class will have a higher likelihood assigned to it than a randomly drawn observation from the safe class. Knowing the interpretation it is easy to assess the quality of the results: a score of 1.0 means perfect prediction and 0.5 is the equivalent of random guessing. (Bonus question for the readers: what would a score of 0 mean?).

Overview of the data

What constitutes an observation in this dataset? An observation is not a single number but a list of 541 values. There are 13 general features describing the entire 24 hour-long period, such as:

ID of the working site;
Overall energies of bumps, tremors, destressing blasts and total seismic energy;
Latest experts’ hazard assessments.

The remaining features are 24 hour-long time series of the number of seismic events and their energies.
Apart from the above data, the participants were provided with metadata about each working site. This was mostly information specific to a given working site, e.g. its name.

Our insights

The very first and one of the major tasks facing a data scientist is getting familiar with the dataset. One should assess its coherence, check for missing values and, in many cases, modify it to obtain a more meaningful representation of underlying phenomena. In other words, one would perform feature engineering. Below we present insights which we found interesting from a machine learning perspective.

Sparse information

The majority of the values in the training set were just zeros – around 66 percent! A sensible approach is to aggregate related sparse features to produce denser representations. An example could be to sum hourly counts of seismic events. By summing values in all 24 hours, the last 5 hours and in the very last hour, we would end up with 3 features instead of 24 – eight times less. We would not only reduce the size of the dataset – we would also have a greater chance that our models would see some interesting patterns more easily. At the same time, we would prevent the models from getting fooled by temporary artifacts – their impact will fade down after aggregation.

Multiple locations

An interesting difficulty – or rather a challenge – are the observations’ origins. The seismic records were collected at different coal mines with varying seismic activity. In some of them dangerous events were much more frequent than in others, ranging from 0 to 17% of observations. Ideally, we would produce a single specialized model for each working site. However, there is a high probability that these models would be useless when confronted with a new location. They would also be useless in this data mining challenge – the table below presents coal mines along with the number of available observations – notice that not all of them appear in both training and test sets and that the number of observations vary considerably between the sites. Thus, there was a need to build a location-agnostic model that would learn from general seismic activity phenomena.

Working site ID	Training observations	Test observations	Warnings frequency
373	31236	–	1.1%
264	20533	–	0.4%
725	14777	330	9.4%
777	13437	330	0.0%
437	11682	–	0.4%
541	6429	5	0.9%
146	5591	98	0.1%
575	4891	253	0.5%
765	4578	329	0.0%
149	4248	98	7.3%
155	3839	98	17.2%
583	3552	215	0.2%
479	2488	35	0.0%
793	2346	330	0.0%
607	2328	209	0.0%
599	1196	363	1.9%
799	–	317	–
470	–	258	–
490	–	160	–
703	–	145	–
641	–	97	–
689	–	83	–
508	–	58	–
171	–	49	–

Working site proxies

As mentioned above, our goal was to build a model which generalizes well and is able to infer predictions from a broad variety of clues (the features we extracted) without getting biased toward particular locations. Therefore, a necessary step is to exclude the mine IDs from the features set. We have to be careful though, as the information identifying a particular site can leak through other features, turning them into working site proxies. We have already mentioned them – these were the coal mines’ metadata and the majority of them were excluded straight away (various names specific only to particular locations). We only kept the main working height, as this information was valuable. However, as you can see in the figure below, some of the mines can be exactly identified by their main working height values. To obstruct the identification, we added some Gaussian noise:

After injecting some randomness, it is no longer possible to exactly identify location based on a longwall’s height. However, the essential information is still conveyed.

Features

The above insights allowed us to significantly reduce the number of features. We ended up with 133 features – 4 times less than the original set.
This was not the end of feature extraction. Often some valuable information is not stored directly in the features, but rather in the interactions between them (e.g. pairwise multiplications). We selected around 1,000 of the most promising interactions and treated them as additional features.

Validation

The condition – that the model has to generalize well – sets particular requirements on the validation procedure. It has to favor the models that make accurate predictions for the known working sites, as well as for previously unseen sites. One possible validation procedure which satisfies these conditions is to, for every train-test split, divide the working sites into three groups: first group will be used solely for training, second only for testing, in the third the observations will be split. For the sake of the results’ stability, we would generate many of these configurations (25 worked well in our case) and average the outcomes.

Model selection

Having settled on the features set and the validation procedure we focused on model selection. The process was quite complex, so let’s go through it step-by-step:

We decided to use tree-based classifiers as they work particularly well with unbalanced classes and possibly correlated features. Namely, we used ExtraTreesClassifier (ETC) from scikit-learn and XGBoost (XGB) classifier from xgboost;
We ran a preliminary grid search to find well-performing meta-parameters;
The models were used to select promising features subsets. We ran hundreds of evaluations, where in each run we made use of a random selection of features (between 20 and 40) and interactions (up to 10).

Model ensembling

The above procedure was meant to reveal relatively small feature subspaces. All of them were reasonably accurate. Moreover, they were also (to some extent) independent. The models based on them could see only a small part of the entire features space, therefore they were inferring their predictions based on diverse clues. And what can one do with a number of varied models? Let them vote! The final submission to the contest was a blend of predictions generated by multiple models (2 major ones which saw all features and 40 minor ones, which saw only subspaces).

Final results

The final submission reached 0.94 AUC and turned out to be the best in the competition, outperforming 200 other machine learning teams. This meant that deepsense.ai retained its top position in the AAIA Data Mining Challenges (you can read about our success last year here).
I would like to thank all the competition participants for a challenging environment, as well as the organizers from the University of Warsaw and the Institute of Innovative Technologies EMAG for preparing such an interesting and meaningful dataset. I would also like to thank my colleagues from the deepsense.ai Machine Learning Team: Robert Bogucki, Jan Kanty Milczek and Jan Lasek for the great time we had tackling the problem.

Playing Atari with deep reinforcement learning - deepsense.ai’s approach

Playing Atari on RAM with Deep Q-learning

September 27, 2016/in Data science, Deep learning, Machine learning /by Henryk Michalewski

In 2013 the Deepmind team invented an algorithm called deep Q-learning. It learns to play Atari 2600 games using only the input from the screen. Following a call by OpenAI, we adapted this method to deal with a situation where the playing agent is given not the screen, but rather the RAM state of the Atari machine. Our work was accepted to the Computer Games Workshop accompanying the IJCAI 2016 conference. This post describes the original DQN method and the changes we made to it. You can re-create our experiments using a publicly available code.

Atari games

Atari 2600 is a game console released in the late 1970s. If you were a lucky teenager at that time, you would connect the console to the TV-set, insert a cartridge containing a ROM with a game and play using the joystick. Even though the graphics were not particularly magnificent, the Atari platform was popular and there are currently around $400$ games available for it. This collection includes immortal hits such as Boxing, Breakout, Seaquest and Space Invaders.

Atari 2600 console

Reinforcement Learning

We will approach the Atari games through a general framework called reinforcement learning. It differs from supervised learning (e.g. prediction what is represented in an image using Alexnet) and unsupervised learning (e.g. clustering, like in the nearest neighbours algorithm) because it utilizes two separate entities to drive the learning:

agent (which sees states and rewards and decides on actions) and
environment (which sees actions, changes states and gives rewards).

Richard S. Sutton and Andrew G. Barto: Reinforcement Learning: An Introduction

R. Sutton and A. Barto: Reinforcement Learning: An Introduction

In our case, the environment is the Atari machine, the agent is the player and the states are either the game screens or the machine’s RAM states. The agent’s goal is to maximize the discounted sum of rewards during the game. In our context, “discounted” means that rewards received earlier carry more weight: the first reward has a weight of $1$, the second some $gamma$ (close to $1$), the third $gamma^2$ and so on.

Q-values

Q-value (also called action-value) measures how attractive a given action is in a given state. Let’s assume that the agent’s strategy (the choice of the action in a given state) is fixed. Then a Q-value of a state-action pair $(s, a)$ is the cumulative discounted reward the agent will get if it is in a state $s$, executes the action $a$ and follows his strategy from there on.
The Q-value function has an interesting property – if a strategy is optimal, the following holds:
$$
Q(s_t, a_t) = r_t + gamma max_a Q(s_{t+1}, a)
$$
One can mathematically prove that the reverse is also true. Namely, any strategy which satisfies the property for all state-action pairs is optimal. This fact is not restricted to deterministic strategies. For stochastic strategies, you have to add some expectation value signs and all the results still hold.

Q-learning

The above concept of Q-value leads to an algorithm learning a strategy in a game. Let’s slowly update the estimates of the state-action pairs to the ones that locally satisfy the property and change the strategy, so that in each state it will choose an action with the highest sum of expected reward (estimated as an average reward received in a given state after following given action) and biggest Q-value of the subsequent state.

for all (s,a):
    Q[s, a] = 0 #initialize q-values of all state-action pairs
for all s:
    P[s] = random_action() #initialize strategy
# assume we know expected rewards for state-action pairs R[s, a] and
# after making action a in state s the environment moves to the state next_state(s, a)
# alpha : the learning rate - determines how quickly the algorithm learns;
#                             low values mean more conservative learning behaviors,
#                             typically close to 0, in our experiments 0.0002
# gamma : the discount factor - determines how we value immediate reward;
#                               higher gamma means more attention given to far-away goals
#                               between 0 and 1, typically close to 1, in our experiments 0.95
repeat until convergence:
  1. for all s:
         P[s] = argmax_a (R[s, a] + gamma * max_b(Q[next_state(s, a), b]))
  2. for all (s, a):
         Q[s, a] = alpha*(R[s, a] + gamma * max_b Q[next_state(s, a), b]) + (1 - alpha)Q[s, a]

This algorithm is called Q-learning. It converges in the limit to an optimal strategy. For simple games like Tic-Tac-Toe, this algorithm, without any further modifications, solves them completely not only in theory but also in practice.

Deep Q-learning

Q-learning is a correct algorithm, but not an efficient one. The number of states which need to be visited multiple times to learn their action-values is too big. We need some form of generalization: when we learn about the value of one state-action pair, we can also improve our knowledge about other similar state-actions.
The deep Q-learning algorithm uses the convolutional neural network as a function approximating the Q-value function. It accepts the screen (after some transformations) as an input. The algorithm transforms the input with a couple of layers of nonlinear functions. Then it returns an up to $18$ dimensional vector. Entries of the vector denote the approximated Q-values of the current state and each of the possible actions. The action to choose is the one with the highest Q-value.
Training consists of playing the episodes of the game, observing transitions from state to state (doing the currently best actions) and rewards. Having all this information, we can estimate the error, which is the square of the difference between the left- and right-hand side of the Q-learning property above:
$$
error = (Q(s_t, a_t) – (r_t + gamma max_a Q(s_{t+1}, a)))^2
$$
We can calculate the gradient of this error according to the network parameters and update them to decrease the error using one of the many gradient descent optimization algorithms (we used RMSProp).

DQN+RAM

In our work, we adapted the deep Q-learning algorithm so that its input are not game screens, but the RAM states of the Atari machine. Atari 2600 has only $128$ bytes of RAM. On one hand, this makes our task easier, as our input is much smaller than the full screen of the console. On the other hand, the information about the game may be hard to retrieve. We tried two network architectures. One with $2$ hidden ReLU layers, $128$ nodes each, and the other with $4$ such layers. We obtained results comparable (in two games higher, in one lower) to those achieved with the screen input in the original DQN paper. Admittedly, higher scores can be achieved using more computational resources and some additional tricks.

Tricks

The method of learning Atari games, as presented above and even with neural networks employed to approximate the Q-values, would not yield good results. To make it work, the original DQN paper’s authors and we in our experiments, employed a few improvements to the basic algorithm. In this section, we discuss some of them.

Epsilon-greedy strategy

When we begin our agent’s training, it has little information about the value of particular game states. If we were to completely follow the learned strategy at the start of the training we’d be nearly randomly choosing some actions to follow in the first game states. As the training continues, we’d stick to these actions for the first states, as their value estimation would be positive (and for the other actions would be nearly zero). The value of the first-chosen action would improve and we’d only pick these, without even testing the other possibilities. The first decisions, made with little information, would be reinforced and followed in the future.
We say that such a policy doesn’t have a good exploration-exploitation tradeoff. On one hand, we’d like to focus on the actions that led to reasonable results in the past, but on the other ahnd, we prefer our policies to extensively explore the state-action space.
The solution to this problem used in DQN is using an epsilon-greedy strategy during training. This means that at any time with some small probability $varepsilon$ the agent chooses a random action, instead of always choosing the action with the best Q-value in the given state. Then, every action will get some attention and its state-action value estimation will be based on some (possibly limited) experience and not the initialization values.
We join this method with epsilon decay — at the beginning of training we set $varepsilon$ to a high value ($1.0$), meaning that we prefer to explore the various actions and gradually decrease $varepsilon$ to a small value, that indicate the preference to exploit the well-learned action-values.

Experience replay

Another trick used in DQN is called experience replay. The process of training a neural network consist of training epochs; in each epoch we pass all the training data, in batches, as the network input and update the parameters based on the calculated gradients.
When training reinforcement learning models, we don’t have an explicit dataset. Instead, we experience some states, actions, and rewards. We pass them to the network, so that our statistical model can learn what to do in similar game states. As we want to pass a particular state/action/reward tuple multiple times to the network, we save them in memory when they are seen. To fit this data to the RAM of our GPU, we store at most $100 000$ recently observed state/action/reward/next state tuples.
When the dataset is queried for a batch of training data, we don’t return consecutive states, but a set of random tuples. This reduces correlation between states processed in a given step of learning. As a result, this improves the statistical properties of the learning process.
For more details about experience replay you can see Section 3.5 of Lin’s thesis. As quite a few other tricks in reinforcement learning, this method was invented back in 1993 – significantly before the current deep learning boom.

Frameskip

Atari 2600 was designed to use an analog TV as the output device. The console generated $60$ new frames appearing on the screen every second. To simplify the search space we imposed a rule that one action is repeated over a fixed number of frames. This fixed number is called the frame skip. The standard frame skip used in the original work on DQN is $4$. For this frame skip the agent makes a decision about the next move every $ 4cdot frac{1}{60} = frac{1}{15}$ of a second. Once a decision is made, it remains unchanged during the next $4$ frames. A low frame skip allows the network to learn strategies based on a super-human reflex. A high frame skip will limit the complexity of a strategy. Hence learning may be faster and more successful whenever strategy matters over tactic. In our work , we tested frame skips equal to $4,8$ and $30$.

Further research

We are currently testing other strategy-learning algorithms on a Xeon Phi architecture.

Neptune – Machine Learning Platform

September 26, 2016/in Data science, Deep learning, Machine learning, Neptune /by Rafał Hryciuk

In January 2016, deepsense.ai won the Right Whale Recognition contest on Kaggle. The competition’s goal was to automate the right whale recognition process using a dataset of aerial photographs of individual whales. The terms and conditions for the competition stated that to collect the prize, the winning team had to provide source code and a description of how to recreate the winning solution. A fair request, but as it turned out, the winning solution’s authors spent about three weeks recreating all of the steps that led them to the winning machine learning model.

When data scientists work on a problem, they need to test many different approaches – various algorithms, neural network structures, numerous hyperparameter values that can be optimized etc. The process of validating one approach can be called an experiment. The inputs for every experiment include: source code, data sets, hyperparameter values and configuration files. The outputs of every experiment are: output model weights (definition of the model), metric values (used for comparing different experiments), generated data and execution logs. As we can see, that’s a lot of different artifacts for each experiment. It is crucial to save all of these artifacts to keep track of the project – comparing different models, determining which approaches were already tested, expanding research from some experiment from the past etc. Managing the process of experiment executions is a very hard task and it is easy to make a mistake and lose an important artifact.
To make the situation even more complicated, experiments can depend on each other. For example, we can have two different experiments training two different models and a third experiment that takes these two models and creates a hybrid to generate predictions. Recreating the best solution means finding the path from the original data set to the model that gives the best results.

Recreating the path that led to the best model

The deepsense.ai research team performed around 1000 experiments to find the competition-winning solution. Knowing all that, it becomes clear why recreating the solution was such a difficult and time consuming task.
The problem of recreating a machine learning solution is present not only in an academic environment. Businesses struggle with the same problem. The common scenario is that the research team works to find the best machine learning model to solve a business problem, but then the software engineering team has to put the model into a production environment. The software engineering team needs a detailed description of how to recreate the model.
Our research team needed a platform that would help them with these common problems. They defined the properties of such a platform as:

Every experiment and the related artifacts are registered in the system and accessible for browsing and comparing;
Experiment execution can be monitored via real-time metrics;
Experiment execution can be aborted at any time;
Data scientists should not be concerned with the infrastructure for the experiment execution.

deepsense.ai decided to build Neptune – a brand new machine learning platform that organizes data science processes. This platform relieves data scientists of the manual tasks related to managing their experiments. It helps with monitoring long-running experiments and supports team collaboration. All these features are accessible through the powerful Neptune Web UI and useful for scripting CLI.

Neptune is already used in all machine learning projects at deepsense.ai. Every week, our data scientists execute around 1000 experiments using this machine learning platform. Thanks to that, the machine learning team can focus on data science and stop worrying about process management.

Experiment Execution in Neptune

Main Concepts of the Machine Learning Platform

Job

A job is an experiment registered in Neptune. It can be registered for immediate execution or added to a queue. The job is the main concept in Neptune and contains a complete set of artifacts related with the experiment:

source code snapshot: Neptune creates a snapshot of the source code for every job. This allows a user to revert to any job from the past and get the exact version of the code that was executed;
metadata: name, description, project, owner, tags;
parameters: customly defined by a user. Neptune supports boolean, numeric and string types of parameters;
data and logs generated by the job;
metric values represented as channels.

Neptune is library and framework agnostic. Users can leverage their favorite libraries and frameworks with Neptune. At deepsense.ai we currently execute Neptune jobs that use: TensorFlow, Theano, Caffe, Keras, Lasagne or scikit-learn.

Channel

A channel is a mechanism for real-time job monitoring. In the source code, a user can create channels, send values through them and then monitor these values live using the Neptune Web UI. During job execution, a user can see how his or her experiment is performing. The Neptune machine learning platform supports three types of channels:

Numeric: used for monitoring any custom-defined metric. Numeric channels can be displayed as charts. Neptune supports dynamic chart creation from the Neptune Web UI with multiple channels displayed in one chart. This is particularly useful for comparing various metrics;
Text: used for logs;
Image: used for sending images. A common use case for this type of channel is checking the behavior of an applied augmentation when working with images.

Neptune - Machine Learning Platform, Comparing two Neptune image channels — Comparing two Neptune image channels

Queue

A queue is a very simple mechanism that allows a user to execute his or her job on remote infrastructure. A common setup for many research teams is that data scientists develop their code on local machines (laptops), but due to hardware requirements (powerful GPU, large amount of RAM, etc) code has to be executed on a remote server or in a cloud. For every experiment, data scientists have to move source code between the two machines and then log into the remote server to execute the code and monitor logs. Thanks to our machine learning platform, a user can enqueue a job from a local machine (the job is created in Neptune, all metadata and parameters are saved, source code copied to users’ shared storage). Then, on a remote host that meets the job requirements the user can execute the job with a single command. Neptune takes care of copying the source code, setting parameters etc.
The queue mechanism can be used to write a simple script that queries Neptune for enqueued jobs and execute the first job from the queue. If we run this script on a remote server in an infinite loop, we don’t have to log to the server ever again because the script executes all the jobs from the queue and reports the results to the machine learning platform.

Creating a Job

Neptune is language and framework agnostic. A user can communicate with Neptune using REST API and Web Sockets from his or her source code written in any language. To make the communication easier, we provide a high-level client library for Python (other languages are going to be supported soon).
Let’s examine a simple job that provided with amplitude and sampling_rate generates sine and cosine as functions of time (in seconds).

import math
import time
from deepsense import neptune
ctx = neptune.Context()
amplitude = ctx.params.amplitude
sampling_rate = ctx.params.sampling_rate
sin_channel = ctx.job.create_channel(name='sin', channel_type=neptune.ChannelType.NUMERIC)
cos_channel = ctx.job.create_channel(name='cos', channel_type=neptune.ChannelType.NUMERIC)
logging_channel = ctx.job.create_channel(name='logging', channel_type=neptune.ChannelType.TEXT)
ctx.job.create_chart(name='sin & cos chart', series={'sin': sin_channel, 'cos': cos_channel})
ctx.job.finalize_preparation()
# The time interval between samples.
period = 1.0 / sampling_rate
# The initial timestamp, corresponding to x = 0 in the coordinate axis.
zero_x = time.time()
iteration = 0
while True:
    iteration += 1
    # Computes the values of sine and cosine.
    now = time.time()
    x = now - zero_x
    sin_y = amplitude * math.sin(x)
    cos_y = amplitude * math.cos(x)
    # Sends the computed values to the defined numeric channels.
    sin_channel.send(x=x, y=sin_y)
    cos_channel.send(x=x, y=cos_y)
    # Formats a logging entry.
    logging_entry = "sin({x})={sin_y}; cos({x})={cos_y}".format(x=x, sin_y=sin_y, cos_y=cos_y)
    # Sends a logging entry.
    logging_channel.send(x=iteration, y=logging_entry)
    time.sleep(period)

The first thing that we can see is that we need to import Neptune library and create a neptune.Context object. The Context object is an entrypoint for Neptune integration. Afterwards, using the context we obtain values for job parameters: amplitude and sampling_rate.
Then, using neptune.Context.job we create numeric channels for sending sine and cosine values and a text channel for sending logs. We want to display sin_channel and cos_channel on a chart, so we use neptune.Context.job.create_chart to define a chart with two series named sin and cos. After that, we need to tell Neptune that the preparation phase is over and we are starting the proper computation. That is what: ctx.job.finalize_preparation() does.
In an infinite loop we calculate sine and cosine functions values and send these values to Neptune using the channel.send method. We also create a human-readable log and send it through logging_channel.
To run main.py as a Neptune job we need to create a configurtion file – a descriptor file with basic metadata for the job.

name: Sine-Cosine Generator
project: Trigonometry
owner: Your Name
parameters:
  - name: amplitude
    type: double
    default: 1.0
    required: false
  - name: sampling_rate
    type: double
    default: 2
    required: false

config.yaml contains basic information about the job: name, project, owner and parameter definitions. For our simple Sine-Cosine Generator we need two parameters of double type: amplitude and sampling_rate (we already saw in the main.py how to obtain parameter values in the code).
To run the job we need to use the Neptune CLI command:
neptune run main.py –config config.yaml –dump-dir-url my_dump_dir — –amplitude 5 –sampling_rate 2.5
For neptune run we specify: the script that we want to execute, the configuration for the job and a path to a directory where snapshot of the code will be copied to. We also pass values of the custom-defined parameters.

Job Monitoring

Every job executed in the machine learning platform can be monitored in the Neptune Web UI. A user can see all useful information related to the job:

metadata (name, description, project, owner);
job status (queued, running, failed, aborted, succeeded);
location of the job source code snapshot;
location of the job execution logs;
parameter schema and values.

Neptune – Machine Learning Platform, Parameters for Sine-Cosine Generator — Parameters for Sine-Cosine Generator

A data scientist can monitor custom metrics sent to Neptune through the channel mechanism. Values of the incoming channels are displayed in the Neptune Web UI in real time. If the metrics are not satisfactory, the user can decide to abort the job. Aborting the job can also be done from the Neptune Web UI.

Neptune – Machine Learning Platform, Channels for Sine-Cosine Generator — Channels for Sine-Cosine Generator

Neptune – Machine Learning Platform, Comparing values of multiple metrics using Neptune channels — Comparing values of multiple metrics using Neptune channels

Numeric channels can be displayed graphically as charts. A chart representation is very useful to compare various metrics and to track changes of metrics during job execution.

Neptune – Machine Learning Platform, Chart for Sine-Cosine Generator — Chart for Sine-Cosine Generator

Neptune – Machine Learning Platform, Charts displaying custom metrics — Charts displaying custom metrics

For every job a user can define a set of tags. Tags are useful for marking significant differences between jobs and milestones in the project (i.e if we are doing a MINST project, we can start our research by running the job with a well known and publicly available algorithm and tag it ‘benchmark’).

Comparing Results and Collaboration

Every job executed in the Neptune machine learning platform is registered and available for browsing. Neptune’s main screen shows a list of all executed jobs. User can filter jobs using job metadata, execution time and tags.

Neptune – Machine Learning Platform, Neptune jobs list — Neptune jobs list

A user can select custom-defined metrics to show as columns on the list. The job list can be sorted using values from every column. That way, a user can select which metric he or she wants to use for comparison, sort all jobs using this metric and then find the job with the best score.
Thanks to a complete history of job executions, data scientists can compare their jobs with jobs executed by their teammates. They can compare results, metrics values, charts and even get access to the snapshot of code of a job they’re interested in.
Thanks to Neptune, the machine learning team at deepsense.ai was able to:

get rid off spreadsheets for keeping history of executed experiments and their metrics values;
eliminate sharing source code across the team as an email attachment or other innovative tricks;
limit communication required to keep track of project progress and achieved milestones;
unify visualisation for metrics and generated data.

Join the Early Adopters Program

Apply for our Early Adopters Program and get early access to Neptune – Machine Learning Platform.
Benefits of joining the program include:

You will be one of the first to get full access to this innovative product designed especially for data scientists for FREE;
You will have direct impact on future product features;
You will get support from our team of engineers;
You can share your ideas with our experts and the community of the world’s leading data scientists.

R Notebook and Custom R Operations in the new Seahorse release

September 23, 2016/in Seahorse /by Jan Lasek

In this post, we present new features that come with Seahorse, Release 1.3 – the availability of writing custom operations, data exploration and ad-hoc analysis in R language using Jupyter Notebook. Along with Python (included in the previous Seahorse releases), R is one of the most popular languages in data science. Now, its users can embrace the new release of Seahorse with their favorite R operations included. Spark and R integration is obtained thanks to the SparkR package.

What’s new: R Notebook and custom R operations

There are several ways users can leverage R in their workflows. These include interactive Jupyter R Notebooks, custom R operations for processing data frames and tailor-made R evaluation functions. To demonstrate some of these new features, we are going to work with a dataset regarding credit card clients’ defaults available at the UCI Machine Learning Repository. We will train a model to predict the probability of a default of a loan. The dataset consists of 25 columns and 30K rows. Each row describes a customer of a bank along with information on his/her loan status. The last column – default_payment_next_month is our target variable. For a more detailed description of the dataset, please consult its description page and this paper.
Having loaded the data, we proceed to its initial exploration. To this end, let’s use the R Notebook block from Seahorse’s operations palette.

We open the Jupyter Notebook, and start exploring the data. With the use of the dataframe() function we are able to connect to data on the Spark cluster and operate on it. We can invoke it like this: df <- dataframe().

Some of the variables are categorical, for example: Education. In fact, reviewing the dataset documentation, we find that it is an ordinal variable where 0 and 6 denote the lowest and the highest education levels, respectively. In the notebook, after printing out the first three lines of data (the head() command), we performed a sample query for grouping the response variable with respect to education levels and compute the fraction of defaults within the groups. We observe some variability in default fraction with respect to education. Moreover, some of the categories are rare. In total, across all customers, the default fraction is equal about 22%.
Note the use of collect() functions here. This is a Spark-specific operation – it invokes the computation of an appropriate command and returns results to the driver node in a Spark application (in particular, it can be your desktop).
For later analysis we will drop the ID variable as it is irrelevant for prediction. We also see that there are relatively few occurrences of extreme values of the Education variable: 0 and 6 which denote the lowest and highest education levels, respectively. Our decision is to decrease the granularity of this variable and truncate its values to a range of [1, 5]. This can be achieved by a custom R Column Transformation as shown below.

Along with the Education variable, there are also other categorical variables in the dataset such as: marital status or payment statuses. We will not employ any special preprocessing for them since the model that we plan to use – Random Forest – is relatively insensitive to the encoding of categorical variables. Since most of the categories are ordinal, encoding them with consecutive integers is actually a natural choice here. For some other methods, like the logistic regression model, we would need to handle categorical data in a special way, for example, using one-hot-encoder.

To default or not to default

At this point we are ready to train a model. Let’s use the introduced Random Forest classifier, an off-the-shelf ensemble trees machine learning model. We devote 2/3 of data for tuning the model’s parameters – the number of trees in the forest and depth of a single tree – by Grid Search operation available on the palette. The other part of the data will be used for final model evaluation. Since the distribution of the target variable is slightly imbalanced toward “good” labels – about 78% customers in database paid their dues – we will use Area Under ROC as the evaluation metric. This metric is applicable when one class (label) dominates the other. It is more sensitive to retrieval of defaults in data rather than accuracy score (that is, the number of correctly classified instances regardless of their actual class).
For demonstration purposes, we varied the parameters on a relatively small grid: 20, 40, 80 for the number of trees in the forest and 5, 10, 20 for tree depth. The results of the grid search are presented below:

We see that the optimal values for the number of trees and a tree depth are 80 and 10, respectively. Since there is little difference between choosing 20, 40 or 80 as the number of trees, we will stick with the smallest (and less complex) forest consisting of 20 trees as our final model.

Final check

Finally, we evaluate the model on the remaining 1/3 part of data. It achieves a score of 0.7795 AUC on this hold-out dataset. This means that our model learned reasonably well to distinguish between defaulting and non-defaulting customers. For the sake of more interpretable results, the confusion matrix may be helpful. In order to compute it, we will create another R notebook and attach it to the final data with the computed predictions.

The confusion matrix is computed by retrieving two columns from the data frame and aggregating them. This, once again, is done using the collect() function. Here, it fetches the two selected columns to the driver node. Such operations should be performed with care – the result needs to fit into the driver node’s RAM. Here, we only retrieved two columns and there are merely 30K rows in the dataset so the operation succeeds. For larger datasets, we would order the computations to be performed by Spark. Finally, observe that we explicitly called the table() function from R’s base package via base::table(). This is necessary since the R native function is masked once SparkR is loaded.
Let’s proceed to the analysis of the confusion matrix. For example, its bottom left entry denotes that in 359 cases the model falsely predicted default while in fact there wasn’t one. Based on this matrix we may compute the precision and recall statistics. Precision is the fraction of correct predictions (true positives) out of all defaults predicted by our model. Recall is defined as the fraction of correctly predicted defaults divided by the total number of defaults in the data. In our application, precision and recall and equal about 67% and 37%, respectively.

The end

That’s it for today! We presented an analysis of prediction of defaults of a bank’s customers. Throughout the process, we used R for custom operations and exploratory data analysis. We trained the model, optimized and evaluated it. Based on costs associated with customers’ defaults, we can tweak precision and recall by, for example, adjusting the threshold level for probability of the default prediction and the final label (no default vs default) produced by the model. We encourage you to try out your own ideas – the complete workflow is available for download here.

Typical architecture

What is region of interest pooling?

Region of interest pooling — example

References

Examples of ML-based predictive maintenance

Quality control

Logistics and inventory management

Rising interest in machine learning applications in the manufacturing industry

The near future – robotics-powered manufacturing

Geospatial data in GeoJson format

New York City Taxi Trips

Implementing custom operation using Seahorse SDK steps:

Seahorse SDK Example Git Repository

Building our Workflow

Summary

Links

Use case

Solution

Collecting data

Sending a weather report

Result

Summary

Why internal leaderboard?

Business and research projects

Machine learning competitions

Why Neptune?

Let’s do it!

CIFAR-10 dataset

Evaluating functions

Implementation details

Prerequisites

Repository

Description

Step by step

1. Creating a Neptune group

2. Creating an evaluation module

3. Sending submissions to Neptune

4. Customizing a view in Neptune’s Web UI

Summary

TensorFlow Integration

Running Neptune Experiments in Docker Containers

Future Plans

The Challenge

Overview of the data

Our insights

Sparse information

Multiple locations

Working site proxies

Features

Validation

Model selection

Model ensembling

Final results

Atari games

Reinforcement Learning

Q-values

Q-learning

Deep Q-learning

DQN+RAM

Tricks

Epsilon-greedy strategy

Experience replay

Frameskip

Further research

Experiment Execution in Neptune

Main Concepts of the Machine Learning Platform

Job

Channel

Queue

Creating a Job

Job Monitoring

Comparing Results and Collaboration

Join the Early Adopters Program

What’s new: R Notebook and custom R operations

To default or not to default

Final check

The end

Contact us

Locations

Let us know how we can help