Chapter 2 Notes

Fundamental Problem

$Y = f(x) + \\epsilon$ is a function we don't know but want to estimate. $x$ is a regressor vector, $Y$ is our output, and $\\epsilon$ is our irreducible error that results for either missing regressors in $x$ or from unmeasurable variance in nature.

Two classes of problems

Prediction: Estimate $Y$ with $\\hat{Y} = \\hat f(x) + \\epsilon$.

Inference: see how $Y$ changes with changes in $X\_1 ... X\_p$

Classes of Problems

Regression problems deal with quantitative responses.

1. parametric: reduces problem of estimating $f$ to a problem of estimating parameters
1. i.e: When we want to use linear model $f(x) = B\_0 + B\_1X\_1 + ... + B\_pX\_p$, we only have the estimate $B\_0 + B\_1 + ... + B\_p$
2. Generally more flexible - gets us further from true form of $f$
3. More restrictive but generally more interperable than non-parametric models
2. non-parametric:
1. no assumption about shape/form of $f$
2. tries to use $\\hat f$ that is as close to data-points as possible
3. Needs very large number of observations

Classification problems deal with categorical or qualitative responses. Note that the type of predictor indicates what type of problem we are trying to solve.

Assessing Model Accuracy

Measuring Quality of Fit - to quantify how "off" predictions are from true response data

We can use mean squared error

$MSE = \\frac{1}{n}\\sum\_{i=1}^{n}(y\_i - \\hat{f}(x\_i))^2$
• MSE is defined for both training and test data sets.

• Note: Can't just minimize training MSE to minimize test MSE. Most learning models work to minimize training MSE, which does not guarantee a smallest test MSE (might overfit to data). Flexible methods tend to have a higher chance of "overfitting" training data, resulting in a higher test MSE.

$E[testMSE] = Var(\\hat f(x)) + [Bias(\\hat{f}(x\_0))]^2 + Var(\\epsilon)$

Bias: Error introduced by approximating real-life problem with simple models (i.e: linear model).

Variance: the amount by which $\\hat{f}$ would change if it was estimated using a different training data set; ideally, $\\hat f$ shouldn't vary much across training sets

• As model flexibility increases, bias generally decreases and variance increases
• As we increase flexibility, the bias initially decreases faster than the variance increases. At some point, the variance increases faster than the bias declines and thus the test MSE begins to increase.
• Note that test MSE will always be above $\\epsilon$
• test MSE is minimized when sum of variance and bias is lowest

Classification

Training Error Rate

Use training error rate to quantify accuracy of $\\hat f$:

$\\frac{1}{n}\\sum\_{i=1}^{n}I(Y\_i \\neq \\hat Y\_i)$

where $I$ is an indicator random variable that is 1 if $Y\_i \\neq \\hat Y\_i$ and 0 otherwise

Basically, training error rate averages all misclassifications across $n$ observations

Test Error Rate

$Ave(I(Y\_i \\neq \\hat Y\_i))$ - a good classifier minimizes this

The Bayes classifier

Assign a test observation $x\_0$ a class $j$ for which $P(Y = j | X = x\_0)$ is the highest.

Bayes Error Rate: $1 - E(max\_jPr(Y = j | X))$ - expectation just averages probability over all possible X

• Bayes Error Rate is analagous to irreducible error

KNN

Brief Description: For an observation $i$, we look at the class of its $K$ nearest neighbors' classes. The class $k$ with the highest propotion wins and we assign $i$ the class $k$.

Classifier that classifies according to

$Pr(Y=j|X=x) = \\frac{1}{K}\\sum\_{i\\in N\_0}I(y\_i = j)$
• Note that low K means high model flexibility and high K means low model flexibility

• We also observe the characteristic U shape as we increase model flexibility in the error rates in KNN