An introduction to Statistical Learning
A book by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor.
- Statistical learning refers to a set of tools to understand data.
- Supervised statistical learning invlves building a model for predicting or estimating an output based on inputs.
- In unsupervised there are inputs but no supervising output. We learn the relationships from the data. Wage data
- What is the association between an employee’s age, education and year, on their wage?
- Ideally we should predict wage in a way that accounts for the non-linear relationship between wage and age. Stock Market Data
- Sometimes we want to predict a non-numerical value, categorical or qualitative.
- The goal is to predict if the index will decrease or increase. Gene Expression Data
- We may want to know what types of customers are similar to each other. This is a clustering problem.
- Deciding the number of clusters is often a difficult problem.
A brief history of statistical learning
- Linear regression is used to predict quantitative values.
- In this book:
- n is the number of data points.
- p is the number of variables available.
P.20