Chapter 2 Notes
is a function we don't know but want to estimate. is a regressor vector, is our output, and is our irreducible error that results for either missing regressors in or from unmeasurable variance in nature.
Two classes of problems
Prediction: Estimate with .
Inference: see how changes with changes in
Classes of Problems
Regression problems deal with quantitative responses.
- parametric: reduces problem of estimating to a problem of estimating parameters
- i.e: When we want to use linear model , we only have the estimate
- Generally more flexible - gets us further from true form of
- More restrictive but generally more interperable than non-parametric models
- no assumption about shape/form of
- tries to use that is as close to data-points as possible
- Needs very large number of observations
Classification problems deal with categorical or qualitative responses. Note that the type of predictor indicates what type of problem we are trying to solve.
Assessing Model Accuracy
Measuring Quality of Fit - to quantify how "off" predictions are from true response data
We can use mean squared error
- MSE is defined for both training and test data sets.
- Note: Can't just minimize training MSE to minimize test MSE. Most learning models work to minimize training MSE, which does not guarantee a smallest test MSE (might overfit to data). Flexible methods tend to have a higher chance of "overfitting" training data, resulting in a higher test MSE.
Bias: Error introduced by approximating real-life problem with simple models (i.e: linear model).
Variance: the amount by which would change if it was estimated using a different training data set; ideally, shouldn't vary much across training sets
- As model flexibility increases, bias generally decreases and variance increases
- As we increase flexibility, the bias initially decreases faster than the variance increases. At some point, the variance increases faster than the bias declines and thus the test MSE begins to increase.
- Note that test MSE will always be above
- test MSE is minimized when sum of variance and bias is lowest
Training Error Rate
Use training error rate to quantify accuracy of :
where is an indicator random variable that is 1 if and 0 otherwise
Basically, training error rate averages all misclassifications across observations
Test Error Rate
- a good classifier minimizes this
The Bayes classifier
Assign a test observation a class for which is the highest.
Bayes Error Rate: - expectation just averages probability over all possible X
- Bayes Error Rate is analagous to irreducible error
Brief Description: For an observation , we look at the class of its nearest neighbors' classes. The class with the highest propotion wins and we assign the class .
Classifier that classifies according to
- Note that low K means high model flexibility and high K means low model flexibility
- We also observe the characteristic U shape as we increase model flexibility in the error rates in KNN