deepsense.ai
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI advisory
    • Train your team
    • Generative models
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
Is a simple linear regression able to knock spots off SVM and Random Forest?

Is a simple linear regression able to knock spots off SVM and Random Forest?

January 15, 2015/in Data science /by Przemyslaw Biecek

A friend of mine took part in a project in which he had to perform future prediction of Y characteristic. The problem was that Y characteristic showed an increasing trend over time. For the purposes of this post let us assume that Y characteristic was energy demand or milk yield of cows or any other characteristic that with positive trend over time.

So, we have discussed possible approaches to this problem. As a benchmark we used the techniques that heats up processors, like the random forest and SVM. However, it turns out (and after the fact it is along intuition) that if we deal with a generally stable trend, the range of values observed in the future might be different than the range of values observed in the past. In that case techniques such as simple linear regression may give better results than the mentioned SVM and RF (which more or less look for similar cases in the past and average them).
Let us consider the following example. We have N predictors at our disposal and we want to predict development of Y characteristic. In reality it depends on only the first characteristic. We will juxtapose SVM, RandomForest, simple regression and lasso type regularised regression.
This example will be purely simulation based. We start with small random data, 100 observations and N= 25predictors (results will be similar for larger datasets). Testing set will be beyond the domain of the training set, i.e. we increase all values by +1.

library(dplyr)
library(lasso2)
library(e1071)
library(randomForest)
library(ggplot2)
# will be useful in simulations
getData <- function(n = 100, N = 25 ){
x <- runif(N*n) %>%
matrix(n, N)
# artificial out-of-domain x
x_test <- x + 1
list(x = x,
y = x[,1] * 5 + rnorm(n),
x_test = x_test,
y_test = x_test[,1] * 5 + rnorm(n))
}
# let's draw a dataset
gdata <- getData()
head(gdata$y)
# [1] -0.5331184 3.1140116 4.9557897 3.2433499 2.8986888 5.2478431
dim(gdata$x)
# [1] 100 25

There is a linear relationships within the selected data between the first predictor and Y. We added a small random noice to avoid being too tendentious.

with(gdata,
qplot(x[,1], y) +
geom_smooth(se=FALSE, method="lm")
)

linear
Let us fit the model for each approach and calculate MSE for each model.

fitModels <- function(x, y) {
ndata <- data.frame(y, x)
list(
model_lasso = l1ce(y ~ ., data=ndata),
model_lm = lm(y ~ ., data=ndata),
model_svm = svm(x, y),
model_rf = randomForest(x, y))
}
testModels <- function(models, x_test, y_test) {
predict_lasso <- predict(models$model_lasso, data.frame(x_test))
predict_lm <- predict(models$model_lm, data.frame(x_test))
predict_svm <- predict(models$model_svm, x_test)
predict_rf <- predict(models$model_rf, x_test)
c(
lasso = mean((predict_lasso - y_test)^2),
lm = mean((predict_lm - y_test)^2),
rf = mean((predict_rf - y_test)^2),
svm = mean((predict_svm - y_test)^2))
}
# time for fitting
models <- fitModels(gdata$x, gdata$y)
testModels(models, gdata$x_test, gdata$y_test)
# lasso lm rf svm
# 0.8425946 1.4672156 15.7713529 25.0271363

This time the Lasso wins. Now we are going to repeat random drawing and model adjustment 100 times.
And pipe this results directly to boxplots (I %)

replicate(100,{
gdata <- getData(N=N)
models <- fitModels(gdata$x, gdata$y) testModels(models, gdata$x_test, gdata$y_test) }) %>%
t() %>%
boxplot(main = paste("MSE for", N, "variables"))

MSE25var
The lower MSE, the better.
Boxplots present results of the whole simulations. We did not select the characteristics so the linear regression suffers from the random noise. Lasso regularisation helps as expected.
Of course, methods such as SVM or RandomForest must have lost that competition because in the ‘future’ value of Y is X1*5 but range of X1 is between 1-2 and not between 0-1.
The same situation takes place in case of N=5 variables (here regression has an advantage) and N=50 variables (and here it has not).
MSE5var
MSE50var
What are the conclusions?
– SVM and RandomForest work in a ‘domain’. If some monotonic trend is observed and future values are likely to be far from these in training set, the trend should be removed in advance.
– If there are many variables, for regression it is advisable to do some variable selection first or optionally to choose a method that would do that for us (like Lasso for example). RF deal with this problem on it’s own.
– It is much more difficult to predict the future than the past ;)

Przemyslaw Biecek

https://deepsense.ai/wp-content/uploads/2019/02/Is-a-simple-linear-regression-able-to-knock-spots-off-SVM-and-Random-Forest.jpg 337 1140 Przemyslaw Biecek https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Przemyslaw Biecek2015-01-15 06:30:142023-03-01 19:56:00Is a simple linear regression able to knock spots off SVM and Random Forest?
Page 10 of 10«‹8910

Start your search here

Build your AI solution
with us!

Contact us!

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    CATEGORIES

    • Generative models
    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • ChatGPT – what is the buzz all about?ChatGPT – what is the buzz all about?March 10, 2023
    • How to leverage ChatGPT to boost marketing strategyHow to leverage ChatGPT to boost marketing strategy?February 26, 2023
    • How can we improve language models using reinforcement learning? ChatGPT case studyHow can we improve language models using reinforcement learning? ChatGPT case studyFebruary 20, 2023

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Generative models
    • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only