An introduction to Statistical Learning

A book by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor.

  • Statistical learning refers to a set of tools to understand data.
  • Supervised statistical learning invlves building a model for predicting or estimating an output based on inputs.
  • In unsupervised there are inputs but no supervising output. We learn the relationships from the data. Wage data
  • What is the association between an employee’s age, education and year, on their wage?
  • Ideally we should predict wage in a way that accounts for the non-linear relationship between wage and age. Stock Market Data
  • Sometimes we want to predict a non-numerical value, categorical or qualitative.
  • The goal is to predict if the index will decrease or increase. Gene Expression Data
  • We may want to know what types of customers are similar to each other. This is a clustering problem.
  • Deciding the number of clusters is often a difficult problem.

    A brief history of statistical learning

  • Linear regression is used to predict quantitative values.
  • In this book:
    • n is the number of data points.
    • p is the number of variables available.

P.20