Home Blog Biplots, correspondence analysis and ggplot2

Biplots, correspondence analysis and ggplot2

Biplots, correspondence analysis and ggplot2

Table of contents

Table of contents

I was looking for biplots created with the use of ggplot2 library (because they look good and are customisable). It turns out that there are some nice solutions for PCA (like sinhrks/ggfortify; kassambara/factoextra; vqv/ggbiplot; fawda123/ggord) but I could not find suitable solution for correspondence analysis. So I create one. It’s available in pbiecek/ggplotit package and works for both CA{FactoMineR} and ca{ca} functions. You will find source of this function below, but let’s start with an example. I’m going to use data about car sale offers from the PogromcyDanych package. Let’s see what is the relation between a brand and a type of fuel. Guess in which brands oil is more common than gas? Screen Shot 2015-09-18 at 20.20.41 Let’s see. Porsche, Mini and Smarts – these brands are mostly gas only. Daewoo, Dodge – here you will find LPG fuelled cars. LandRover, Audi, Volkswagen – here oil is most common. This example was created by these lines:
library(PogromcyDanych)
library(ggplotit)
library(FactoMineR)
# contingency matrix for cars
tab = table(auta2012$Marka,  auta2012$Rodzaj.paliwa)
tab = tab[rowSums(tab) > 300, c(1,2,6)]
# correspondence analysis
obj = CA(tab)
ggplotit(obj, c(FALSE,TRUE), list(rownames(tab), c("Gas", "Gas+LPG", "Oil")))
And full source of the function
function (x, arrows = c(FALSE, FALSE), names = NULL, ...)
{
stopifnot(length(arrows) == 2)
stopifnot(length(names) == 2 | is.null(names))
X = as.data.frame(x$row$coord[, 1:2])
Y = as.data.frame(x$col$coord[, 1:2])
if (!is.null(names)) {
X$Names = names[[1]]
Y$Names = names[[2]]
}
else {
X$Names = rownames(x$row$coord)
Y$Names = rownames(x$col$coord)
}
colnames(X) = c("x.Dim1", "x.Dim2", "x.Names")
colnames(Y) = c("y.Dim1", "y.Dim2", "y.Names")
pl = ggplot() + geom_text(data = X, aes(x.Dim1, x.Dim2,
label = x.Names), color = "blue", size = 3) + geom_text(data = Y,
aes(y.Dim1, y.Dim2, label = y.Names), color = "red",
size = 3) + geom_hline(xintercept = 0, alpha = 0.5) +
geom_vline(yintercept = 0, alpha = 0.5) + theme_bw()
if (arrows[1]) {
pl = pl + geom_segment(data = X, aes(x = 0, xend = x.Dim1,
y = 0, yend = x.Dim2, label = x.Names), color = "blue",
arrow = arrow(angle = 15))
}
if (arrows[2]) {
pl = pl + geom_segment(data = Y, aes(x = 0, xend = y.Dim1,
y = 0, yend = y.Dim2, label = y.Names), color = "red",
arrow = arrow(angle = 15))
}
pl
}