Variable Selection

Bastian Goldluecke,Variational Method,Velvety Reflectance,Video Mosaicing,Zhigang Zhu,Cees G. M. Snoek,A. Smeulders,Vasu Parameswaran,Ashok Veeraraghavan

Published 2019 in Model-Based Clustering and Classification for Data Science

ABSTRACT

In our discussion of regression to date we have assumed that all the explanatory variables included in the model are chosen in advance. However, in many situations the set of explanatory variables to be included is not predetermined and selecting them becomes part of the analysis. There are two main approaches towards variable selection: the all possible regressions approach and automatic methods. The all possible regressions approach considers all possible subsets of the pool of explanatory variables and finds the model that best fits the data according to some criteria (e.g. Adjusted R 2 , AIC and BIC). These criteria assign scores to each model and allow us to choose the model with the best score. The function regsubsets() in the library " leaps " can be used for regression subset selection. Thereafter, one can view the ranked models according to different scoring criteria by plotting the results of regsubsets(). Before using the function for the first time you will need to install the library using the R GUI. Alternatively, you can use the command install.packages(" leaps ") to install it. Ex. Data was collected on 100 homes recently sold in a city. It consisted of the sales price (in $), house size (in square feet), the number of bedrooms, the number of bathrooms, the lot size (in square feet) and annual real estate tax (in $). Use price as the response variable and determine which of the five explanatory variables should be included in the regression model using the all possible regressions approach. To view the ranked models according to the adjusted R-squared criteria and BIC, respectively, type: > plot(leaps, scale="adjr2") > plot(leaps, scale="bic")

PUBLICATION RECORD

  • Publication year

    2019

  • Venue

    Model-Based Clustering and Classification for Data Science

  • Publication date

    2019-07-31

  • Fields of study

    Not labeled

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-100 of 526 references · Page 1 of 6

CITED BY

Showing 101-200 of 10302 citing papers · Page 2 of 104