Transformations of variables, scales and coordinates in ggplot2
I am working on a short introduction to the Grammar of Graphics and its implementation in the ggplot2 package. Process of systematization of the elements of syntax reveals various ‘spices’ of ggplot2 and today I will talk about one of them, namely about application of transformations to diagrams.
Let us start with a chart without any transformations at all. As an example we’ll use the famous iris dataset. On our diagram we’ll draw points (geom_point) and the linear regression line (geom_smooth).
library(ggplot2) # Hook: # archivist::aread("pbiecek/Eseje/arepo/0fc9e4e43559336a44598117911f2e4f") ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + geom_smooth(se=FALSE, method="lm") + theme_bw()
Let us check how this relationship looks like after the logarithmic transformation. The ggplot2 package allows for transformations at three levels: transformations of the variables, of the scale and of the coordinate system. In a moment we will see what are the differences between these transformations and how to perform them.
Transformations of the variables
Logarithmic transformation of the variables can be performed either with the function aes() to define the mapping or outside the ggplot() function. Below you can see an example of the first of the possibilities. The presented diagram displays relation between log-length of sepals and log-length of petals. The axes present log-values and a linear trend is adjusted to the logarithmized values.
# Hook: # archivist::aread("pbiecek/Eseje/arepo/b3c778ffbff4e9256ddc94cb423bc58a") ggplot(iris, aes(log10(Sepal.Length), log10(Petal.Length))) + geom_point() + geom_smooth(se=FALSE, method="lm") + theme_bw()
Transformations of the coordinate system
The second option is transformation of the coordinate system. We substitute traditional axes with logarithmic axes and then we present the values and statistics (such as the linear trend) from the original chart on our new coordinate system. In the new coordinate system the linear trend may not seem to be linear at all as you may notice in the example below. Statistics based on the data are calculated before the transformation of axes.
You may use the function coord_trans to transform the coordinate system.
# Hook: # archivist::aread("pbiecek/Eseje/arepo/3d12e66a581545ac99cb0d5e273487a5") ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + coord_trans(ytrans="log", xtrans="log") + geom_smooth(se=FALSE, method="lm") + theme_bw()
Transformations of the scale
The third possibility is transformation of the scale. The points on the chart and its axes look in the same way as in case of transformation of the coordinate system.
What is different is that the statistics are calculated after the transformation. This means that in the example presented below the linear trend was determined for the logarithmized data and it looks like a straight line (thus becoming a multiplicative trend).
Scale transformations may be performed with the functions for description of the scales, such as for example scale_y_log10.
# Hook: # archivist::aread("pbiecek/Eseje/arepo/2500b1e21379508414d41ca88d93c6bb") ggplot(iris, aes(Sepal.Length, Petal.Length)) + geom_point() + scale_y_log10(breaks=1:10) + scale_x_log10(breaks=1:10) + geom_smooth(se=FALSE, method="lm") + theme_bw()
Three different approaches. Is it too much?
If not, note that you can combine them together ;-)