Home Blog Exploration of data from iPhone motion coprocessor (2)

Exploration of data from iPhone motion coprocessor (2)

Exploration of data from iPhone motion coprocessor (2)

Table of contents

Table of contents

Last week we have downloaded and loaded into R data from fitness tracker (motion coprocessor in iphone). Then with just few lines of R code we decomposed the data into a seasonal weekly component and the trend. Today we are going to see how to plot the number of steps per hour for different days of week. And then same data will be used to check how often there was any activity at given time. So, first, let’s load the data from the xml file.
require(ggplot2)
require(dplyr)
library(lubridate)
require(XML)
data <- xmlParse("eksport.xml")
xml_data <- xmlToList(data)
xml_dataSel <- xml_data[sapply(xml_data, function(x) (!is.null(x["type"])) & x["type"]== "HKQuantityTypeIdentifierStepCount")]
xml_dataSelDF <- data.frame(
  unit=unlist(sapply(xml_dataSel, function(x) x["unit"])),
  day=substr(unlist(sapply(xml_dataSel, function(x) x["startDate"])),1,10),
  hour=substr(unlist(sapply(xml_dataSel, function(x) x["startDate"])),12,13),
  value=as.numeric(as.character(unlist(sapply(xml_dataSel, function(x) x["value"]))))
)
xml_dataSelDF$wday <- wday(ymd(as.character(xml_dataSelDF$day)), label=T)
To plot the activity we need to do some aggregation. Here we will count number of steps per hour per day of a week.
dataDF <- xml_dataSelDF %>%
   group_by(wday, hour) %>%
   summarise(sum = sum(value))
head(dataDF)
# Source: local data frame [6 x 3]
# Groups: wday [1]
#
#     wday   hour   sum
#   (fctr) (fctr) (dbl)
# 1    Sun     00  2013
# 2    Sun     01  2475
# 3    Sun     02   347
# 4    Sun     03   919
# 5    Sun     04  2158
# 6    Sun     05  4062
Great. How to present this data? Of course with ggplot2! Here the geom_bar geometry will be used to present number of steps per hour for different days of week. As we see the largest activity is around 4pm (time to collect kids from schools).
ggplot(dataDF, aes(x=hour, y=sum, fill=wday)) +
  geom_bar(stat="identity") +
  facet_wrap(~wday) +
  coord_flip() + theme(legend.position="none") +
  ylab("avg. steps per hour") + xlab("hour")
ip1 We can use such data to try other things. Like for example, one can check if there was any activity during this hour. It will not be 100% accurate, since not always phone is in the pocket. Then we will compute the fraction of days in which there was any activity during this hour. Let’s see these patterns.
dataDF <- xml_dataSelDF %>%
  group_by(day, wday, hour) %>%
  summarise(sum = sum(value)>0 + 0) %>%
  spread(hour, sum, fill=0) %>%
  gather(hour, value, -day, -wday) %>%
  group_by(wday, hour) %>%
  summarise(mv = mean(value))
head(dataDF)
# Source: local data frame [6 x 3]
# Groups: wday [1]
#
#     wday   hour         mv
#   (fctr) (fctr)      (dbl)
# 1    Sun     00 0.17021277
# 2    Sun     01 0.17021277
# 3    Sun     02 0.14893617
# 4    Sun     03 0.06382979
# 5    Sun     04 0.06382979
# 6    Sun     05 0.06382979
And the ggplot.
ggplot(dataDF, aes(x=hour, y=mv, fill=wday)) +
  geom_bar(stat="identity") +
  facet_wrap(~wday) +
  coord_flip() + theme(legend.position="none") +
  ylab("percent of days with any activity") + xlab("hour") +
  scale_y_continuous(label=percent)
ip2