Last week we tried to find out what is the color of the cars with the highest engine power. It turned out that black and black metallic are most popular colors of the fastest cars. Yet engine power is not all. We still may explore the relation between color and brand.
Our data set auta2012 includes as many as 37 colors and 106 makes. How can we present relation between so many of them in a readable manner?
We will employ correspondence analysis which is available in R in the ca function in the ca package.
Profiles of rows and columns of the contingency matrix are presented in one space (2D in this case) in such a way that closeness of color and brand marks popularity of a given color in the offers for sale of each particular brand.
library(PogromcyDanych) library(ca) # converts variable names into english setLang() contingency = table(auta2012$Color, auta2012$Brand) # only colors and brands with 500 or more offers tab = contingency[rowSums(contingency) > 500, colSums(contingency) > 500] plot(ca(tab), arrows = c(TRUE, FALSE))
The most interesting direction is set out by black and black metallic. As you can see these colors are more frequently used for Lexus and Porsche cars than for other more colorful brands.
You may find more information on correspondence analysis at http://www.jstatsoft.org/v20/i03/paper.