Exploration of data from iPhone motion coprocessor (2)
Last week we have downloaded and loaded into R data from fitness tracker (motion coprocessor in iphone). Then with just few lines of R code we decomposed the data into a seasonal weekly component and the trend. Today we are going to see how to plot the number of steps per hour for different days of week. And then same data will be used to check how often there was any activity at given time.
So, first, let’s load the data from the xml file.
require(ggplot2) require(dplyr) library(lubridate) require(XML) data <- xmlParse("eksport.xml") xml_data <- xmlToList(data) xml_dataSel <- xml_data[sapply(xml_data, function(x) (!is.null(x["type"])) & x["type"]== "HKQuantityTypeIdentifierStepCount")] xml_dataSelDF <- data.frame( unit=unlist(sapply(xml_dataSel, function(x) x["unit"])), day=substr(unlist(sapply(xml_dataSel, function(x) x["startDate"])),1,10), hour=substr(unlist(sapply(xml_dataSel, function(x) x["startDate"])),12,13), value=as.numeric(as.character(unlist(sapply(xml_dataSel, function(x) x["value"])))) ) xml_dataSelDF$wday <- wday(ymd(as.character(xml_dataSelDF$day)), label=T)
To plot the activity we need to do some aggregation. Here we will count number of steps per hour per day of a week.
dataDF <- xml_dataSelDF %>% group_by(wday, hour) %>% summarise(sum = sum(value)) head(dataDF) # Source: local data frame [6 x 3] # Groups: wday [1] # # wday hour sum # (fctr) (fctr) (dbl) # 1 Sun 00 2013 # 2 Sun 01 2475 # 3 Sun 02 347 # 4 Sun 03 919 # 5 Sun 04 2158 # 6 Sun 05 4062
Great. How to present this data? Of course with ggplot2!
Here the geom_bar geometry will be used to present number of steps per hour for different days of week.
As we see the largest activity is around 4pm (time to collect kids from schools).
ggplot(dataDF, aes(x=hour, y=sum, fill=wday)) + geom_bar(stat="identity") + facet_wrap(~wday) + coord_flip() + theme(legend.position="none") + ylab("avg. steps per hour") + xlab("hour")
We can use such data to try other things.
Like for example, one can check if there was any activity during this hour. It will not be 100% accurate, since not always phone is in the pocket.
Then we will compute the fraction of days in which there was any activity during this hour.
Let’s see these patterns.
dataDF <- xml_dataSelDF %>% group_by(day, wday, hour) %>% summarise(sum = sum(value)>0 + 0) %>% spread(hour, sum, fill=0) %>% gather(hour, value, -day, -wday) %>% group_by(wday, hour) %>% summarise(mv = mean(value)) head(dataDF) # Source: local data frame [6 x 3] # Groups: wday [1] # # wday hour mv # (fctr) (fctr) (dbl) # 1 Sun 00 0.17021277 # 2 Sun 01 0.17021277 # 3 Sun 02 0.14893617 # 4 Sun 03 0.06382979 # 5 Sun 04 0.06382979 # 6 Sun 05 0.06382979
And the ggplot.
ggplot(dataDF, aes(x=hour, y=mv, fill=wday)) + geom_bar(stat="identity") + facet_wrap(~wday) + coord_flip() + theme(legend.position="none") + ylab("percent of days with any activity") + xlab("hour") + scale_y_continuous(label=percent)