Lecture 21: Bias Variance Tradeoff
 relation between x and y
 we observe the random error \( \epsilon \)
 we only see \( Y \), the data
 prediction is \( \hat{Y} \)
Prediction Error
 g is the right model, \( \epsilon \) is random error
 red Y hat is our prediction
Model Risk

expectation of the squared difference
 take a sample and get the mean

Chance Error
 random

Bias
 when our model is bad
Observation Variance
 \( \epsilon \) is random, expectation is zero and variance is \( \sigma^2 \)
 so Var of Y, g(x) is constant so varaince is only of epsilon
 called observation error
 measuring error, missing information
 irreducia error
Chance Error
 vary a little
 from a random sample
Model Variance
 average of prediction
 can overfit into the data
 reduce model complexity
 don't fit the noise
 bias
Our Model Vs the Truth
 green is true, red is fixed
Model Bias
 model prediction minus true g based on fixed x
 not random
 underfitting from not domain knowledge
 overfit
 average prediction vs actual value vs average prediction
Decomposition of Error and Risk
 expected squared diff
 Expectation in error (variance of observation), square of the bias, model variance
Bias Variance Decomposition
Predicting by a Function with Parameters
 f is just y