Two days ago Hadley Wickham tweeted a link with introduction to his new package multidplyr. Basically it’s a tool to take advantage of many cores for dplyr operations. Let’s see how to play with it.
When you execute an action on a RDD, Apache Spark runs a job that in turn triggers tasks using DAGScheduler and TaskScheduler, respectively. They are all low-level details that may be often useful to understand when a simple transformation is no longer simple performance-wise and takes ages to complete.
The lives of brave firemen are threatened during dangerous emergency missions while they try to save other people and their property. In this post I would like to share my experiences and winning strategy for the AAIA’15 Data Mining Competition: Tagging Firefighter Activities at a Fire Scene, in which I took first place.
7th term of the Sejm has already come to its end. It would be nice to see how have the Members of Polish Parliament voted for these last 4 years! In total they took part in over 6000 votings. Did the representatives of the same clubs voted more similarly to each other? Did the Members of Polish Parliament who changed the clubs they belonged to voted in a different way than the Members of Parliament from their former clubs? Let’s see!
Some time ago our herd has expanded by a guinea pig called Hugo. It turns out that the presence of a pet at home is a great pretext for discussing with children the concepts of randomness, distribution functions and distribution in general.
Children bring from school strange home assignments, like for example a question: What is your dad’s job similar to? After several hits (a cosmonaut, Formula 1 driver, firefighter) it turns out that the work performed by a statistician is very much similar to the work of a shoemaker. Why?
I was facing an interesting problem last week. Playing with data from The Genome Cancer Atlas (full genetic and clinical data for thousands of patients) I was building a classifier that predicts the type of cancer based on sets of genetic signatures.
I was looking for biplots created with the use of ggplot2 library (because they look good and are customisable).
It turns out that there are some nice solutions for PCA (like sinhrks/ggfortify; kassambara/factoextra; vqv/ggbiplot; fawda123/ggord) but I could not find suitable solution for correspondence analysis.
So I create one….
1st September was just few days ago. After the reform ‘lowering the age at which children start their school education’ the second group of 6 and 7-year-old children started attending the freshmen classes. And since we are in the ‘pre-election’ mode there are some votes about a reform reestablishing the previous age for starting school education.
What is the difference between these 2 images? The one on the left has no signs of diabetic retinopathy, while the other one has severe signs of it. If you are not a trained clinician, the chances are, you will find it quite hard to correctly identify the signs of this disease.