deepsense.aideepsense.ai logo
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI advisory
    • Train your team
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
eXtreme Gradient Boosting vs Random Forest [and the caret package for R

eXtreme Gradient Boosting vs Random Forest [and the caret package for R]

November 27, 2015/in Data science /by Przemyslaw Biecek

Decision trees are cute. It is easy to visualize them, easy to explain, easy to apply and even easy to construct. Unfortunately they are quite unstable, particularly for large sets of correlated features.

Fortunately, there are some solutions that may help. One of the most popular solutions is to create a random forest, an ensemble of trees that vote independently, each tree is build on bootstrap sample of observations and subset of features. The other interesting approach is to use a gradient boosting method, to create a collection of trees that optimize the cases that are badly predicted by previous trees. Also one may use bagging instead of boosting so there are much more choices.
Screen Shot 2015-11-25 at 21.20.31 kopia
For me, the random forest if one of favorite tools when it comes to genetic data (because of OOB, proximity scores and feature importance scores). But recently here and there more and more discussions starts to point the eXtreme Gradient Boosting as a new sheriff in town.
So, let’s compare these two methods.
The literature shows that something is going on. For example Trevor Hastie said that
Boosting    >    Random Forest    >    Bagging    >    Single Tree
You will find more details on slides, and if you prefer videos rather than slides with math, you can watch this example.
I am going to use the caret package (a really really great package) to compare both methods. Random forest have tag “rf” while gradient boosting “xgbTree“.
I am going to use data from The Cancer Genome Atlas Project (next generation sequencing, expression of mRNA, 33 different tumors, 17000+ features, 300+ cases, 33 different classes) and the classifier should predict the type of cancer based on gene expression (Actually I am interested in genetic signatures, but classification is the first step).
Dataset is going to be divided into a testing and training data and the whole procedure will be replicated hundreds times to see what is the variability in model performance.
With the caret package, the training is so easy that I’ve added boosted logistic regression and SVM ,,just in case’’.

library(caret)
mat = lapply(c("LogitBoost", 'xgbTree', 'rf', 'svmRadial'),
          function (met) {
  train(subClasTrain~., method=met, data=smallSetTrain)
})

So, what are the results?
So, last week I’ve compared these two methods based on Walmart Recruiting Trip from Kaggle. There the goal was to classify a trip to one of 34 types of trips. It was easier to get good results with the use of random forest rather than boosting gradient. But let’s see what is happening with the cancer data.
Below you can see the distribution of accuracies (not a perfect measure, but here it is not a bad one, either) for random splits into the testing/training dataset.
Screen Shot 2015-11-25 at 20.55.10
So, it looks like for this dataset the random forest is doing better.
The ‘train’ function has a great argument ‘tuneGrid’, you can specify grid of parameters to be tested. Results may be different for different parameters and of course different datasets.

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
  • Share on Reddit
  • Share by Mail
https://deepsense.ai/wp-content/uploads/2019/02/eXtreme-Gradient-Boosting-vs-Random-Forest.jpg 337 1140 Przemyslaw Biecek https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Przemyslaw Biecek2015-11-27 11:44:472021-01-05 16:52:12eXtreme Gradient Boosting vs Random Forest [and the caret package for R]

Start your search here

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    THE NEWEST AI MONTHLY DIGEST

    • AI Monthly Digest 20 - TL;DRAI Monthly Digest 20 – TL;DRMay 12, 2020

    CATEGORIES

    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • AI trends for 2021AI trends for 2021January 7, 2021
    • A comprehensive guide to demand forecastingA comprehensive guide to demand forecastingMay 28, 2019
    • What is reinforcement learning? The complete guideWhat is reinforcement learning? deepsense.ai’s complete guideJuly 5, 2018

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Knowledge base
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only