Some time ago our herd has expanded by a guinea pig called Hugo. It turns out that the presence of a pet at home is a great pretext for discussing with children the concepts of randomness, distribution functions and distribution in general.

And this is how it started: — Dad, how long do the guinea pigs live? — On average from 5 to 7 years (google). — And do other home pets live longer or shorter? — Mice have the shortest lives, they live for a year up to three years but then, rodents are short-lived in general. But parrots, for example, they may live up to twenty years, even forty years. And tortoises live even until seventy years. — What about dogs, how long do they live? — It depends on the breed – bigger ones live shorter and smaller ones live longer, but usually it is between eight and sixteen years. — Does it mean that all the dogs die after they reach sixteenth year of life? — Well… (a longer pause at google+R+ggplot), no, that is only a typical lifespan. Some dogs live longer, some even up to twenty four years. — And do cats or dogs live longer? — Well… (a longer pause at google+R+ggplot). Among cats there are many more specimen who live longer than 20 years. — So who will live longer, Bromba (our friends’ dog) or Mufinka (our friends’ cat)? — Nobody knows how long will a particular cat or dog live – it depends from the quality of care of their owners and from many other factors. We can say something more about more numerous groups of animals. For example, it one gathered 100 dogs, it could be expected that half of them would live longer than 12 years. But if one gathered 100 cats, then it would turn out that half of them would live over 14 years. And every fifth cat would live over 18 years, what is longer than a lifespan of any dog really. The presented numerical data is based on the data base VetCompass. It concerns animals visiting the vet, that is animals ”better taken care of”.

https://deepsense.ai/wp-content/uploads/2019/02/Do-cats-or-dogs-live-longer.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-10-15 10:42:402019-07-09 12:19:31Do cats or dogs live longer?

Children bring from school strange home assignments, like for example a question: What is your dad’s job similar to? After several hits (a cosmonaut, Formula 1 driver, firefighter) it turns out that the work performed by a statistician is very much similar to the work of a shoemaker. Why? I don’t mean that a shoemaker drinks a lot and swears when something doesn’t work out (read: swears a lot). These are stereotypes. I mean that we can identify similar subgroups within both craft guilds. A craftsman – seller. He gets a machine for making shoes (or ready-made shoes) and he sells them. He knows a bit about shoes, for example, that there is a right and left shoe and that if the client put them on wrongly, he will not like it. He can show the shoes from a better perspective and when he lacks the right size, he claims that after some period of wear the shoes will become looser or they will shrink etc. Who cares that the method assumes normality while the data are as skew as the Tower of Pisa? Surely, it is very robust and the impact of unrealized assumptions is marginal. Too little observation and the power of the test barely has the level of significance? And what of it? It rains very rarely in this area so the shoe will last for long. A craftsman – artist. He knows best what his client needs. The client brought a beautiful rabbit skin and wanted high winter boots, but the artist made a beautiful cap. No, please, believe me, such data can serve as a source for such analyses. No, you don’t need a test for two means. In this case PCA will serve ideally. You will be satisfied and the cap looks fantastic. A craftsman – theoretician. He does not make shoes but he examines what would happen to the shoes in some untypical or impossible scenarios. For example, how would the shoe’s lifespan change if one run with the speed coming close to infinity, or if one jumped on the surface of Mars, or if one had not two but 314 feet? A craftsman – handyman. Clients come to him with shoes with holes in the soles and he patches them up, sews them up. He makes up medians or add some Wilcoxon test results. Maybe the final result is a little scary for people with taste but the clients are able to go home and do not have to look for a skin for a new pair of shoes. A craftsman – future man. He notices that people’ve got bored with shoes and he is looking for new and more attractive niches. He does shoe science, is interested only in large sizes of shoes because such shoes entail technological challenges. The wall of his workshop is decorated with a black belt in six shoe sigma and he commutes riding an elephant. In general, they make such a funny group which meets from time to time in order to exchange notes and decide which skin makes nicer, more comfortable, more stable or easier to make shoes. Have I omitted some subgroups of this guild?

https://deepsense.ai/wp-content/uploads/2019/02/Statistician-like-a-shoemaker.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-10-09 08:53:002019-07-09 12:21:06Statistician like a shoemaker

I was facing an interesting problem last week. Playing with data from The Genome Cancer Atlas (full genetic and clinical data for thousands of patients) I was building a classifier that predicts the type of cancer based on sets of genetic signatures. In the PANCAN33 subset there are samples for 33 different types of cancer. And the classifier shall be able to classify a new sample to one of these 33 classes. I’ve tried different methods like random forest, svm, bgmm and few others, and end up with collection of classifiers. How to choose the best one? We need a method that computes an agreement between classifier predictions and true labels/cancer types. For binary classifiers there is a lot of commonly used metrics like precision, recall, accuracy etc. But here we have 33 classes. The confusion matrix is 33×33 cells large, a lot of number to compare. Of course there are some straightforward solutions like fraction of samples on which classifier correctly guesses true labels. But such easy solutions suffer a lot if there is unequal distribution of classes (quite common). Such metrics may be high for dummy classifier like: always vote for most common class. It is better to avoid such metrics. Are they other measures of agreement that we can use? Actually I used two interesting ones – Cohen Kappa and Krippendorff Alpha. They take into account the distribution of votes for each rater. Moreover Krippendorff alpha takes into account missing data (find more information here). Both coefficients are widely used by psychometricians (e.g. to asses how two psychiatrists agree on a diagnosis). We use them in order to estimate the performance of the classifier. Both coefficients are implemented in the irr package. Below you will find an example application.

I was looking for biplots created with the use of ggplot2 library (because they look good and are customisable). It turns out that there are some nice solutions for PCA (like sinhrks/ggfortify; kassambara/factoextra; vqv/ggbiplot; fawda123/ggord) but I could not find suitable solution for correspondence analysis. So I create one. It’s available in pbiecek/ggplotit package and works for both CA{FactoMineR} and ca{ca} functions. You will find source of this function below, but let’s start with an example.

I’m going to use data about car sale offers from the PogromcyDanych package. Let’s see what is the relation between a brand and a type of fuel. Guess in which brands oil is more common than gas? Let’s see. Porsche, Mini and Smarts – these brands are mostly gas only. Daewoo, Dodge – here you will find LPG fuelled cars. LandRover, Audi, Volkswagen – here oil is most common. This example was created by these lines:

1st September was just few days ago. After the reform ‘lowering the age at which children start their school education’ the second group of 6 and 7-year-old children started attending the freshmen classes. And since we are in the ‘pre-election’ mode there are some votes about a reform reestablishing the previous age for starting school education. We are going to use R, Wikipedia and packages: htmltable, ggplot2 and archivist to juxtapose these ideas with demographic data. We will use easily accessible data, that is, birth statistics. Let’s use the htmltable package to download from Wikipedia birth statistics concerning Poland from the last 50 years. Then let’s use the ggplot package and bar charts to present the birth rate (chart on the right-hand side, click to enlarge). There is a clear decreasing tendency, but some local ‘waves’ are also visible on the chart. After more numerous age cohorts from 1975-1985 more children were then born around 2008.

library(archivist); library(ggplot2); library(scales)
# access the dataset through archivist hook
births = archivist::aread("pbiecek/graphGallery/5a6c2a732c20d5a1bebe6507ebf09afa")
ggplot(births, aes(x=rok, ymin=0, ymax=urodzenia)) +
geom_linerange(size=2) + theme_bw() +
scale_y_continuous(label=comma) + scale_x_continuous(limits=c(1965,2015)) +
ggtitle("Number of births in last 50 years")

Since we are interested in the ‘present’, we will limit the period to the last 15 years (chart on the right-hand side, click to enlarge). We can easily see that more children were born between 2008-2010. Let us look at this data through the prism of the education reform. If we want to lower the age at which children start school education, we must accumulate x+1 age groups in x years. Which x would you choose and which age groups would you accumulate in that way? Unfortunately (in reality) the accumulation concerned the age groups 2007-2009 – the age groups from the slight demographic boom.

The chart on the right-hand side (click to enlarge) presents a theoretical result of such accumulation. Theoretical, because due to certain mortality rate and migration not all of the children born in Poland go to schools in Poland. What is more, for several years the parents of six-year-olds were allowed to decide if they wanted to send their child to school a year sooner. A year before the compulsory lowering of the school age only 20% of parents decided to take that step. All of that means that the estimations presented here diverge to some extend from the actual number of children who will attend the first year. These numbers are not completely know at the moment as some of them belong to the future.

All right then, what about the reestablishing the previous age for starting school education? Let us assume that in the referendum (we know that it will not happen soon, but let’s do a mind exercise) most of the voters will decide that the school age should be increased and the government will follow the majority vote and it will quickly adopt a relevant act. It will mean that now one age group would have to be divided in two. The chart on the right-hand side (click to enlarge) displays a theoretical result of such action (still preserving the same reservation that the estimation is very rough). If we have another reform, it will unfortunately hit a demographic decline. After 6 years of schools almost coming apart at the seams accommodating over 150% of average number of students and working on shifts, suddenly the first year will have 3.3 times less pupils.

How this looks like in other countries? Let’s download data from World Bank, data related to changes in the age of entering the primary education and let’s use a great package networkD3 to visualize how different countries behaves. We see the general tendency that more and more countries decide to lower the school starting age to 6.

https://deepsense.ai/wp-content/uploads/2019/02/Are-you-in-favour-of-abolition-of-compulsory-education-for-six-year-old-children-and-return-to-compulsory-education-for-seven-year-old-children.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-09-10 11:28:232019-07-09 12:23:43Are you in favour of abolition of compulsory education for six-year-old children and return to compulsory education for seven-year-old children?

The one on the left has no signs of diabetic retinopathy, while the other one has severe signs of it.

If you are not a trained clinician, the chances are, you will find it quite hard to correctly identify the signs of this disease. So, how well can a computer program do it? In July, we took part in a Kaggle competition, where the goal was to classify the severity of diabetic retinopathy in the supplied images of retinas. As we’ve learned from the organizers, this is a very important task. Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people. The contest started in February, and over 650 teams took part in it, fighting for the prize pool of $100,000. The contestants were given over 35,000 images of retinas, each having a severity rating. There were 5 severity classes, and the distribution of classes was fairly imbalanced. Most of the images showed no signs of the disease. Only a few percent had the two most severe ratings. The metric with which the predictions were rated was a quadratic weighted kappa, which we will describe later. The contest lasted till the end of July. Our team scored 0.82854 in the private standing, which gave us 6th place. Not too bad, given our quite late entry. You can see our progress on this plot:

Also, you can read more about the competition here.

Solution overview

What should be no surprise in an image recognition task, most of the top contestants used deep convolutional neural networks (CNNs), and so did we. Our solution consisted of multiple steps:

image preprocessing

training multiple deep CNNs

eye blending

kappa score optimization

We briefly describe each of these steps below. Throughout the contest we used multiple methods for image preprocessing and trained many nets with different architectures. When ensembled together, the gain over the best preprocessing method and the best network architecture was little. We therefore limited ourselves to describing the single best model. If you are not familiar with convolutional networks, check out this great introduction by Andrej Karpathy: http://cs231n.github.io/convolutional-networks/.

Preprocessing

The input images, as provided by the organizers, were produced by very different equipment, had different sizes and very different colour spectrum. Most of them were also way too large to perform any non-trivial model fitting on them. A minimum preprocessing to make network training possible is to standardize the dimensions, but ideally one would want to normalize all other characteristics as well. Initially, we used the following simple preprocessing steps:

Crop the image to the rectangular bounding box containing all pixels above a certain threshold

Scale it to 256×256 while maintaining the aspect ratio and padding with black background (the raw images have black background as well, more or less)

For each RGB component separately, remap the colour intensities so that the CDF (cumulative distribution function) looks as close to linear as possible (this is called “histogram normalization”)

All these steps can be achieved in a single call of ImageMagick’s command line tool. In time, we realized that some of the input images contain regions of rather intensive noise. When using the simple bounding-box cropping described above, this leads to very bad quality crops, i.e. the actual eye occupying an arbitrary and rather small part of the image. You can see gray noise at the top of the image. Using state of the art edge detectors, e.g. Canny, did not help much. Eventually, we developed a dedicated cropping procedure. This procedure chooses the threshold adaptively, exploiting two assumptions based on analysis of provided images:

There always exists a threshold level separating noise from the outline of the eye

The outline of the eye has an ellipsoidal shape, close to a circle, possibly truncated at the top and bottom. In particular it is a rather smooth curve, and one can use this smoothness to recognize the best values for the threshold

The resulting cropper produced almost ideal crops for all images, and is what we used for our final solutions. We also changed the target resolution to 512×512, as it seemed to significantly improve the performance of our neural networks compared to the smaller 256×256 resolution. Here is how the preprocessed image looks like. Just before passing the images to the next stage we transformed the images so the mean of each channel (R, G, B) over all images is approximately 0, and standard deviation approximately 1.

Convnet architecture

The core of our solution was a deep convolutional neural network. Although we started with fairly shallow models — 4 convolutional layers, we quickly discovered that adding more layers, and filters inside layers helps a lot. Our best single model consisted of 9 convolutional layers. The detailed architecture is:

All Conv layers have 3×3 kernel, stride 1 and padding 1. That way the size (height, width) of the output of the convolution is the same as the size of the input. In all our convolutional layers we follow the convolutional layer by batch normalization layer and ReLu activations. Batch normalization is a simple but powerful method to normalize the pre-activation values in the neural net, so that their distribution does not change too much during the training. One often standardizes the data to make zero mean and unit variance. Batch normalization takes it a step further. Check this paper by Google to learn more. Our Pool layers always use max pooling. The size of the pooling window is 3×3, and the stride is 2. That way the height and width of the image get halved by each pooling layer. In the FC (fully connected) layers we again use ReLu as activation function. The first fully connected layer, FC1 also employs batch normalization. For regularization we used Dropout layer before the first fully connected layer, and L2 regularization applied to some of the parameters. Overall, the net has 925,013 parameters. We trained the net using stochastic gradient descent with momentum and multiclass logloss as a loss function. Moreover, the learning rate has been adjusted manually a few times during the training. We have used our own implementation based on Theano and Nvidia cuDNN. To further regularize the network, we augmented the data during the training by taking random 448×448 crops of images and flipping them horizontally and vertically, independently with probability 0.5. During the test time, we took few such random crops, flips for each eye and averaged our predictions over them. Predictions were also averaged over multiple epochs. It took quite long to train and compute test predictions even for a single network. On a g2.2xlarge AWS instance (using Nvidia GRID K520) it took around 48 hours.

Eye blending

At some point we realized that the correlation between the scores of two eyes in a pair was quite high. For example, the percent of eye pairs for which the score for the left eye is the same as for the right one is 87.2%. For 95.7% of pairs the scores differ by at most 1, and for 99.8% by at most 2. There are two likely reasons for this kind of correlation. The first is that the retinas of both eyes were exposed to the damaging effects of diabetes for the same amount of time, and are similar in structure, so the conjecture is that they should develop the retinopathy at similar rate. The less obvious reason is that the ground truth labels were produced by humans, and it is conceivable that a human expert is more likely to give the same image different scores, depending on the score of the other image of the pair. Interestingly, one can exploit this correlation between the scores of a pair of eyes to produce a better predictor. One simple way is to take the predicted distributions D_L and D_R for the left and right eye respectively and produce new distributions using linear blending, as follows. For the left eye, we predict c⋅D_{L}+(1-c)⋅D_{R}, similarly we predict c⋅D_{R}+(1-c)⋅D_{L} for the right eye, for some c in [0, 1]. We tried c = 0.7 and a few other values. Even this simple blending produced a significant increase in our kappa score. However, a much bigger improvement was gained when instead of an ad-hoc linear blend we trained a neural network. This network takes two distributions (i.e. 10 numbers) as inputs, and returns the new “blended” versions of the first 5 inputs. It can be trained using predictions on validation sets. As for the architecture, we decided to go with a very strongly regularized (by dropout) one with two inner layers of 500 rectified linear nodes each. One obvious idea is to integrate the convolutional networks and the blending network into a single network. Intuitively, this could lead to stronger results, but such a network might also be significantly harder to train. Unfortunately, we did not manage to try this idea before the contest deadline.

Kappa optimization

Quadratic weighted kappa (QWK), the loss function proposed by the organizers, seems to be a standard one in the area of retinopathy diagnosis, but from the point of view of mainstream machine learning it is very unusual. The score of a submission is defined to be one minus the ratio between the total square error of the submission (TSE) and the expected squared error (ESE) of an estimator that answers randomly with the same distribution as the submission (look here for a more detailed description). This is a rather hard loss function to directly optimize. Therefore, instead of trying to do that, we use a two-step procedure. We first optimize our models for multiclass logloss. This gives a probability distribution for each image. We then choose a label for each image by using a simulated annealing based optimizer. Of course we cannot really optimize QWK without knowing the actual labels. Instead, we define and optimize a proxy for QWK, in the following way. Recall that QWK = 1 – TSE/ESE. We estimate both TSE and ESE by assuming that the true labels are drawn from the distribution described by our prediction, and then plug these predictions into the QWK formula, instead of the true values. Note that both TSE and ESE are underestimated by the procedure above. These two effects cancel each other out to some extent, still our predictions QWK were off by quite a lot compared to the leaderboard scores. That said, we found no better way of producing submissions. In particular, the optimizer described above outperforms all the ad-hoc methods we tried, such as: integer-rounded expectation, mode, etc.

We would like to thank California Healthcare Foundation for being a sponsor, EyePACS for providing the images, and Kaggle for setting up this competition. We learned a lot and were happy to take part in the development of tools that can potentially help diagnose diabetic retinopathy. We are looking forward to solving the next challenge.

deepsense.ai Machine Learning Team

https://deepsense.ai/wp-content/uploads/2019/02/Diagnosing-diabetic-retinopathy-with-deep-learning.jpg217750Robert Boguckihttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgRobert Bogucki2015-09-03 16:30:472019-07-09 12:26:14Diagnosing diabetic retinopathy with deep learning

I recently came across the presentations of the results of the surveys concerning participation in the planned referenda published by the portal Gazeta.pl.

My attention was caught by the diagram presented below which displays a distribution of answers to the question: “Are you going to vote in the referendum scheduled for 6th September?” [source] I guess that most of us would like to check on such diagram whether the group of eager voters is bigger than the group of reluctant voters or not. Is it easy to read such information from this graph? Absolutely not. At least three mistakes should be corrected. Mistake 1. Choice of colours. Two most similar colors on the diagram are yellow and orange. They correspond to the two most extreme answers: ‘definitely not’ and ‘definitely yes’ Remember: when you present ordered values, the colour scale should be ordered as well. Mistake 2. Order of the elements in the legend and on the diagram. The right-hand side of the graph is devoted to the answers ‘rather yes’ / ‘definitely yes’. The right-hand side of the legend concerns the answers ‘rather no’ / ‘definitely no’. Remember: Make your legend easily understandable. It should be closely adjusted to the diagram. Mistake 3. Position of the ‘don’t know’ element. As a neutral element ‘don’t know’ should be placed between ‘rather yes’ and ‘rather no’. Remember: Use symmetries. If ‘definitely yes’ and ‘definitely no’ begun ideally in the top point of the diagram, it would be much easier to determine whether more people answered ‘yes’ or ‘no’.

https://deepsense.ai/wp-content/uploads/2019/02/You’re-doing-it-wrong-surveys-concerning-the-referendum-which-is-to-take-place-on-6th-September.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-08-27 09:42:482019-07-09 12:27:02You’re doing it wrong: surveys concerning the referendum which is to take place on 6th September

I have been working recently on visualization of genetic data. In that field a popular method of presentation are circles generated by the circlize library.

Turning a blind eye to the problem of reading information from circles I must say that the possibilities offered by that library are impressive. In order to test the possibilities of that library I downloaded data on premieres of movies that took place in 2014 and 2015 from the FilmWeb website. I presented these premieres on a circle assigning different colours to movies representing various genres and giving information on the average rating. The two graphs below present all the premieres which took place in 2014 and 2015. You can see much more details when you click on them to enlarge them. You can access both datasets with following instructions.

# Year 2014
archivist::aread("pbiecek/graphGallery/88a0baa22b4165fc9356f86d85afdece")
# Year 2015
archivist::aread("pbiecek/graphGallery/b8fc8cff338fa865af0604f7d22e3840")

And here you can find source code for these plots. It turns out that premieres are least frequent in June and July – nothing strange about this, everybody is on holidays then. Interestingly, expected premieres (the second part of 2015) enjoy higher ratings than the films which have already been on in the cinemas ;-). Is the best saved for last or maybe our expectations are better than the reality?

https://deepsense.ai/wp-content/uploads/2019/02/Circles-and-films-from-FilmWeb.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-08-20 07:30:552019-07-09 12:27:51Circles and films from FilmWeb

I am working on a short introduction to the Grammar of Graphics and its implementation in the ggplot2 package. Process of systematization of the elements of syntax reveals various ‘spices’ of ggplot2 and today I will talk about one of them, namely about application of transformations to diagrams.

Let us start with a chart without any transformations at all. As an example we’ll use the famous iris dataset. On our diagram we’ll draw points (geom_point) and the linear regression line (geom_smooth).

Let us check how this relationship looks like after the logarithmic transformation. The ggplot2 package allows for transformations at three levels: transformations of the variables, of the scale and of the coordinate system. In a moment we will see what are the differences between these transformations and how to perform them.

Transformations of the variables

Logarithmic transformation of the variables can be performed either with the function aes() to define the mapping or outside the ggplot() function. Below you can see an example of the first of the possibilities. The presented diagram displays relation between log-length of sepals and log-length of petals. The axes present log-values and a linear trend is adjusted to the logarithmized values.

The second option is transformation of the coordinate system. We substitute traditional axes with logarithmic axes and then we present the values and statistics (such as the linear trend) from the original chart on our new coordinate system. In the new coordinate system the linear trend may not seem to be linear at all as you may notice in the example below. Statistics based on the data are calculated before the transformation of axes. You may use the function coord_trans to transform the coordinate system.

The third possibility is transformation of the scale. The points on the chart and its axes look in the same way as in case of transformation of the coordinate system. What is different is that the statistics are calculated after the transformation. This means that in the example presented below the linear trend was determined for the logarithmized data and it looks like a straight line (thus becoming a multiplicative trend). Scale transformations may be performed with the functions for description of the scales, such as for example scale_y_log10.

Three different approaches. Is it too much? If not, note that you can combine them together ;-)

https://deepsense.ai/wp-content/uploads/2019/02/Transformations-of-variables-scales-and-coordinates-in-ggplot2.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-08-13 07:30:542019-07-09 12:29:52Transformations of variables, scales and coordinates in ggplot2

I’ve been wondering if this year’s useR conference foreshadowed some gigantic groundbreaking change in the world of R. The previous useR conference was a sort of catalyst for dplyr package and operator %>%. The profession (especially from California) had been familiar with both solutions for several months but useR 2014 transferred the admiration for pipes from early adopters to the (almost) whole community.

Did anything of that sort happen during this year’s useR? The problem with revolutions is that when they take place, nobody notices them. However, looking back in time we see them quite clearly. With that in mind I would like to take the risk and play fortuneteller. In my opinion the dark horse of the conference was the package htmlwidgets. Why? Ok, first I will explain what it actually is (and if the explanation does not explain, you may visit the project’s website and see for yourself). The package htmlwidgets is a framework allowing one to create interactive java script based widgets in R. Such plug-in can be embedded in HTML5 (what means on presentation’s slides, in knitr documents, shiny applications). At the same time the plug-ins do not need R on the server side to remain interactive as they are written in java script (while shiny requires R and as a result one has to find a sever, maintain necessary libraries, …). It doesn’t seem like much. Just a short code that makes it easier to embed java script at HTML site created by R. However, on the other hand, popularity of shiny package seems to be soaring recently. In tandem with docker it forms a brilliant tool for easy creation of complex websites for data exploration. Yet it has one huge drawback which is its reliance on R sever as far as interaction is concerned. This feature requires lot of resources. It may fail when the server has a ‘weak moment’. It is useful for prototyping but risky in production (unless we have some scalable backend embedded in AWS or some other infinite machine). The htmlwidgets lets us get rid of the necessity of R on serverside. Objects from R are rendered into java script. It offers a practically free access to a growing base of various libraries including almost everything based on D3.js. For instance, the following two lines of code in R generate a picture which can be published on a blog (the figure is interactive, feel free to rotate/zoom/flip).