Two weeks ago we showed how to scrap data from IMDB database with the use of rvest package. Last week we showed a shiny application, that compares ratings from two selected groups of users. Today we are going to finish the IMDB trilogy. This time I am going to show how to create an ggvis plot based on IMDB data.
Last week’s post showed how to download data on ratings of over 200 television series. The rating was broken down by gender and age of the user. The application presented below allows for selection of any two age/gender groups of users and comparison of their ratings…
https://deepsense.ai/wp-content/uploads/2019/02/You-should-not-watch-these-movies-with-your-wife-or-girl.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-03-19 06:30:392021-01-05 16:54:24You should not watch these movies with your wife / girl
Data harvested from the web pages is a source of interesting information. Pulling data used to require quite a lot of resilience and misshapen Perl scripts struggling with messy sources of web pages. Today’s web pages more and more frequently comply meet the standards. There are also more and more civilized tools for parsing websites.
Last week we wrote about multidimensional linear models. We discussed a case in which a k-dimensional vector of the dependent variables is related to a grouping variable. We look at matrices E and H in order to find out whether there is any relationship (see the previous blog).
https://deepsense.ai/wp-content/uploads/2019/02/Canonical-discriminant-analyses-and-HE-plots.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-02-26 06:30:552021-01-05 16:54:32Canonical discriminant analyses and HE plots
GPS helps the drivers to avoid traffic jams, yet in more advanced uses it allows for fleet management or remote drone strikes. It is just the same with visualization. Bars and dots can be used to present a set of several means but there are also more advanced uses…
Spark wins more and more hearts. And no wonder, comments from different sources tell us about a significant speed up (by an order of magnitude) for analysis of big datasets. Well-developed system for caching objects in memory allows us to avoid torturing hard discs during iterative operations performed on same data.
Do you know where Kamil Stoch earns most of his points in season 2013/2014? Some time ago I came across a pheatmaps package for R software which generates much nicer heat maps than the standard heatmap() function. This is why the package is named…
A friend of mine took part in a project in which he had to perform future prediction of Y characteristic. The problem was that Y characteristic showed an increasing trend over time. For the purposes of this post let us assume that Y characteristic was energy demand or milk yield of cows or any other characteristic that with positive trend over time.
https://deepsense.ai/wp-content/uploads/2019/02/Is-a-simple-linear-regression-able-to-knock-spots-off-SVM-and-Random-Forest.jpg3371140Przemyslaw Biecekhttps://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svgPrzemyslaw Biecek2015-01-15 06:30:142023-03-01 19:56:00Is a simple linear regression able to knock spots off SVM and Random Forest?
IMDB + ggvis, a happy couple
/in Data science /by Przemyslaw BiecekTwo weeks ago we showed how to scrap data from IMDB database with the use of rvest package. Last week we showed a shiny application, that compares ratings from two selected groups of users. Today we are going to finish the IMDB trilogy. This time I am going to show how to create an ggvis plot based on IMDB data.
You should not watch these movies with your wife / girl
/in Data science /by Przemyslaw BiecekLast week’s post showed how to download data on ratings of over 200 television series. The rating was broken down by gender and age of the user. The application presented below allows for selection of any two age/gender groups of users and comparison of their ratings…
R, rvest and web-harvesting
/in Data science /by Przemyslaw BiecekData harvested from the web pages is a source of interesting information. Pulling data used to require quite a lot of resilience and misshapen Perl scripts struggling with messy sources of web pages. Today’s web pages more and more frequently comply meet the standards. There are also more and more civilized tools for parsing websites.
Canonical discriminant analyses and HE plots
/in Data science /by Przemyslaw BiecekLast week we wrote about multidimensional linear models. We discussed a case in which a k-dimensional vector of the dependent variables is related to a grouping variable. We look at matrices E and H in order to find out whether there is any relationship (see the previous blog).
HE plots
/in Data science /by Przemyslaw BiecekGPS helps the drivers to avoid traffic jams, yet in more advanced uses it allows for fleet management or remote drone strikes. It is just the same with visualization. Bars and dots can be used to present a set of several means but there are also more advanced uses…
Spark + R = SparkR
/in Data science /by Przemyslaw BiecekSpark wins more and more hearts. And no wonder, comments from different sources tell us about a significant speed up (by an order of magnitude) for analysis of big datasets. Well-developed system for caching objects in memory allows us to avoid torturing hard discs during iterative operations performed on same data.
Pretty heat maps
/in Data science /by Przemyslaw BiecekDo you know where Kamil Stoch earns most of his points in season 2013/2014? Some time ago I came across a pheatmaps package for R software which generates much nicer heat maps than the standard heatmap() function. This is why the package is named…
Is a simple linear regression able to knock spots off SVM and Random Forest?
/in Data science /by Przemyslaw BiecekA friend of mine took part in a project in which he had to perform future prediction of Y characteristic. The problem was that Y characteristic showed an increasing trend over time. For the purposes of this post let us assume that Y characteristic was energy demand or milk yield of cows or any other characteristic that with positive trend over time.