*irreducible error* that results for either missing regressors in

Prediction: Estimate

$Y$ with$\\hat{Y} = \\hat f(x) + \\epsilon$ .

Inference: see how

$Y$ changes with changes in$X\_1 ... X\_p$

*Regression* problems deal with quantitative responses.

- parametric: reduces problem of estimating
$f$ to a problem of estimating parameters- i.e: When we want to use linear model
$f(x) = B\_0 + B\_1X\_1 + ... + B\_pX\_p$ , we only have the estimate$B\_0 + B\_1 + ... + B\_p$ - Generally more flexible - gets us further from true form of
$f$ - More restrictive but generally more interperable than non-parametric models

- i.e: When we want to use linear model
- non-parametric:
- no assumption about shape/form of
$f$ - tries to use
$\\hat f$ that is as close to data-points as possible - Needs very large number of observations

- no assumption about shape/form of

*Classification* problems deal with categorical or qualitative responses. Note that the type of predictor indicates what type of problem we are trying to solve.

Measuring Quality of Fit - to quantify how "off" predictions are from true response data

We can use *mean squared error*

- MSE is defined for both training and test data sets.

**Note:**Can't just minimize training MSE to minimize test MSE. Most learning models work to minimize training MSE, which does not guarantee a smallest test MSE (might overfit to data). Flexible methods tend to have a higher chance of "overfitting" training data, resulting in a higher test MSE.

Bias: Error introduced by approximating real-life problem with simple models (i.e: linear model).

Variance: the amount by which

$\\hat{f}$ would change if it was estimated using a different training data set; ideally,$\\hat f$ shouldn't vary much across training sets

- As model flexibility increases, bias generally decreases and variance increases
- As we increase flexibility, the bias initially decreases faster than the variance increases. At some point, the variance increases faster than the bias declines and thus the test MSE begins to increase.
- Note that test MSE will always be above
$\\epsilon$ - test MSE is minimized when sum of variance and bias is lowest

Use *training error rate* to quantify accuracy of

where

Basically, training error rate averages all misclassifications across

Assign a test observation

Bayes Error Rate:

$1 - E(max\_jPr(Y = j | X))$ - expectation just averages probability over all possible X

- Bayes Error Rate is analagous to irreducible error

Brief Description: For an observation

Classifier that classifies according to

- Note that low K means high model flexibility and high K means low model flexibility

- We also observe the characteristic U shape as we increase model flexibility in the error rates in KNN