What color car is the fastest?
RECOMB 2015, a conference devoted to computational molecular biology (with emphasis on computational), came to an end yesterday. Many interesting papers were presented, yet this post was inspired by a conversation that I had the pleasure to have during dinner break.
We were discussing statistical relationships; one thought led to another until we reached a hypothesis that the fastest cars are red (obviously we did not mean that color has any influence on speed but rather we entertained the idea that the relation results from color and engine power-related preferences of car owners). I had my doubts but my interlocutor was rather strongly convinced of the truth of the hypothesis. As would be expected of men believing in data, we decided to explore this issue.
A package called PogromcyDanych includes a data set auta2012 with information about around 200 thousand offers for sale of cars published in 2012. Most offers specified the color and power engine of the car.
All right, let us then check what color cars have the highest engine power.
We will calculate power median breaking cars into color groups.
library(PogromcyDanych) # this function converts polish variable names to english ones setLang() auta2012 %.% group_by(Color) %.% summarise(mKM = median(HP, na.rm=T)) %.% arrange(-mKM) %.% slice(1)
Hmm, white?
I did not associate this color with high speed.
Maybe we make a mistake when we calculate medians for each color separately? Maybe it would be better to check how powerful are the most powerful white cars instead of calculating the average engine power for the whole group?
Let’s see what will be the result if we choose the 90th percentile instead of median.
auta2012 %.% group_by(Color) %.% summarise(mKM = quantile(HP, .9, na.rm=TRUE)) %.% arrange(-mKM) %.% slice(1)
White metallic?
It seems that the greatest number of cars with a very high engine power can be found in the group of white cars.
But the other thing is that many of the fast white cars are also relatively old. Maybe we should analyze old and new cars separately?
We will calculate then what is the fastest color for each production year and then we will be able to check which color dominates most often.
auta2012 %.% filter(Color!="") %.% group_by(Color, Year) %.% summarise(median_KM = quantile(HP, 0.9, na.rm=T), count = n()) %.% filter(count>20) %.% group_by(Year) %.% arrange(-median_KM) %.% slice(1) %.% group_by() %.% arrange(-Year) %.% as.data.frame() ## Color Year median_KM count ## 1 black 2012 396.0 22 ## 2 white-metallic 2011 317.2 411 ## 3 grey 2010 435.0 58 ## 4 white-metallic 2009 290.8 139 ## 5 grey 2008 321.8 59 ## 6 grey 2007 314.5 50 ## 7 white-metallic 2006 279.0 73 ## 8 white-metallic 2005 311.4 68 ## 9 black 2004 232.7 343 ## 10 black 2003 220.0 299 ## 11 brown-metallic 2002 199.0 51 ## 12 brown-metallic 2001 240.4 43 ## 13 black-metallic 2000 194.7 1058 ## 14 brown-metallic 1999 214.4 39 ## 15 brown-metallic 1998 203.0 29 ## 16 graphite-metallic 1997 194.4 156 ## 17 black-metallic 1996 187.5 251 ## 18 black-metallic 1995 218.0 188 ## 19 navy-metallic 1994 220.0 27 ## 20 silver-metallic 1993 231.0 88 ## 21 black-metallic 1992 217.3 102 ## 22 black-metallic 1991 218.2 64 ## 23 black-metallic 1990 310.0 44 ## 24 black-metallic 1989 203.4 21 ## 25 black 1988 246.0 23 ## 26 white 1987 136.0 21
The list presented above is dominated by three colors; black, white and grey. Red is completely absent.
But let us check also what color is the most popular among cars with engine power exceeding 500 hp.
auta2012[auta2012$HP > 500,"Color"] %.% table() %.% sort() %.% tail(1)
Black metallic.
Looking at the data we have to admit that the cars with most powerful engines are not red at all but rather black, grey or white.
Przemyslaw Biecek