Home Blog Sapkowski, Dukaj and the wikipediatrend package

Sapkowski, Dukaj and the wikipediatrend package

Sapkowski, Dukaj and the wikipediatrend package

Table of contents

Table of contents

Recently I tested a quite nice package for R: wikipediatrend (available on CRAN). With just a few lines of code, it can easily download and visualize daily wikipedia page views statistics. Great package, so we are going to take a closer look. I’ve just finished Season of Storms (Andrzej Sapkowski, part of The Witcher saga) and The Old Axolotl (Jacek Dukaj). Let’s see if there is any relation between page views statistics and publication dates for these two books. First: download the data. With the function wp_trend it’s enough to set page names (here: The Wither and Axolotl), languages (here: Polish and English) and the time period (let’s see what is happening since the beginning of 2013). Then: use the ggplot2 package to plot the data. Note, that you can retrieve the figure presented below directly to R with the hook: archivist::aread(“pbiecek/graphGallery/25fbc8bc66bbf02fe66b7715ff53b083”).
library(wikipediatrend)
wp = wp_trend(page = c("Aksolotl","Axolotl","The_Witcher", "Wiedzmin"),
               from = "2013-01-01",
               to   = today(),
               lang = c("pl","en","en","pl"))
head(wp)
##   date       count lang page     rank month  title
## 1 2013-08-26 70    pl   Aksolotl  -1  201308 Aksolotl
## 2 2013-08-27 74    pl   Aksolotl  -1  201308 Aksolotl
## 3 2013-08-28 69    pl   Aksolotl  -1  201308 Aksolotl
## 4 2013-08-19 83    pl   Aksolotl  -1  201308 Aksolotl
## 5 2013-08-18 71    pl   Aksolotl  -1  201308 Aksolotl
## 6 2013-08-31 87    pl   Aksolotl  -1  201308 Aksolotl
library(ggplot2)
# note that OY axis is sqrt transformed
ggplot(wp, aes(date, count, group=page, color = page)) +
  geom_point(alpha=0.5) +
  geom_smooth(size=1.5, se=FALSE, span=0.1) +
  theme_bw() + scale_y_sqrt(limits=c(0,20000)) +
  facet_grid(lang~.)
wikipediatrend package Few important dates, that may help to understand this figure. Season of Storms was published on Nov 6, 2013 and you can notice a small bump around this date. In 2014 it gets translated to few languages. But noticeably the largest impact on page views statistics has the release of the computer game: The Wither 3: Wild Hunt (release: May 19, 2015). The Old Axolotl was published in 2015, yet it is not that easy to see any larger change in page views statistics. What about statistics of Sapkowski and Dukaj wikipages? Again, it takes just two lines of R code to download and plot required data. In the figure below it is easier to spot bumps around publication dates for both books and the game.
wp = wp_trend(page = c("Andrzej_Sapkowski","Andrzej_Sapkowski","Jacek_Dukaj", "Jacek_Dukaj"),
               from = "2013-01-01",
               to   = today(),
               lang = c("pl","en","pl","en"))
head(wp)
##   date       count lang page              rank month  title
## 1 2013-01-12 464   pl   Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski
## 2 2013-01-13 538   pl   Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski
## 3 2013-01-10 536   pl   Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski
## 4 2013-01-11 457   pl   Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski
## 5 2013-01-16 540   pl   Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski
## 6 2013-01-17 541   pl   Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski
ggplot(wp, aes(date, count, group=page, color = page)) +
  geom_point(alpha=0.5) +
  geom_smooth(size=1.5, se=FALSE, span=0.1) +
  theme_bw() + scale_y_sqrt() +
  facet_grid(lang~.)
wikipediatrend package 1