Sapkowski, Dukaj and the wikipediatrend package
Recently I tested a quite nice package for R: wikipediatrend (available on CRAN). With just a few lines of code, it can easily download and visualize daily wikipedia page views statistics. Great package, so we are going to take a closer look. I’ve just finished Season of Storms (Andrzej Sapkowski, part of The Witcher saga) and The Old Axolotl (Jacek Dukaj). Let’s see if there is any relation between page views statistics and publication dates for these two books.
First: download the data. With the function wp_trend it’s enough to set page names (here: The Wither and Axolotl), languages (here: Polish and English) and the time period (let’s see what is happening since the beginning of 2013).
Then: use the ggplot2 package to plot the data. Note, that you can retrieve the figure presented below directly to R with the hook: archivist::aread(“pbiecek/graphGallery/25fbc8bc66bbf02fe66b7715ff53b083”).
library(wikipediatrend) wp = wp_trend(page = c("Aksolotl","Axolotl","The_Witcher", "Wiedzmin"), from = "2013-01-01", to = today(), lang = c("pl","en","en","pl")) head(wp) ## date count lang page rank month title ## 1 2013-08-26 70 pl Aksolotl -1 201308 Aksolotl ## 2 2013-08-27 74 pl Aksolotl -1 201308 Aksolotl ## 3 2013-08-28 69 pl Aksolotl -1 201308 Aksolotl ## 4 2013-08-19 83 pl Aksolotl -1 201308 Aksolotl ## 5 2013-08-18 71 pl Aksolotl -1 201308 Aksolotl ## 6 2013-08-31 87 pl Aksolotl -1 201308 Aksolotl library(ggplot2) # note that OY axis is sqrt transformed ggplot(wp, aes(date, count, group=page, color = page)) + geom_point(alpha=0.5) + geom_smooth(size=1.5, se=FALSE, span=0.1) + theme_bw() + scale_y_sqrt(limits=c(0,20000)) + facet_grid(lang~.)
Few important dates, that may help to understand this figure. Season of Storms was published on Nov 6, 2013 and you can notice a small bump around this date. In 2014 it gets translated to few languages. But noticeably the largest impact on page views statistics has the release of the computer game: The Wither 3: Wild Hunt (release: May 19, 2015). The Old Axolotl was published in 2015, yet it is not that easy to see any larger change in page views statistics.
What about statistics of Sapkowski and Dukaj wikipages?
Again, it takes just two lines of R code to download and plot required data.
In the figure below it is easier to spot bumps around publication dates for both books and the game.
wp = wp_trend(page = c("Andrzej_Sapkowski","Andrzej_Sapkowski","Jacek_Dukaj", "Jacek_Dukaj"), from = "2013-01-01", to = today(), lang = c("pl","en","pl","en")) head(wp) ## date count lang page rank month title ## 1 2013-01-12 464 pl Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski ## 2 2013-01-13 538 pl Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski ## 3 2013-01-10 536 pl Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski ## 4 2013-01-11 457 pl Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski ## 5 2013-01-16 540 pl Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski ## 6 2013-01-17 541 pl Andrzej_Sapkowski 1366 201301 Andrzej_Sapkowski ggplot(wp, aes(date, count, group=page, color = page)) + geom_point(alpha=0.5) + geom_smooth(size=1.5, se=FALSE, span=0.1) + theme_bw() + scale_y_sqrt() + facet_grid(lang~.)