# Lecture 11: Optimization

## From Last Time

• Graph of averages
• Minimizing average loss
• It has no assumption of the shape of the data.
• There is always a line of best fit, but might not be right Anscombe Quartet ### Transformation

• Linear easy to interpret

#### Log transformation $$y=a^x, log(y)=xlog(a)$$ $$y=ax^k, log(y)=log(a)+k~log(x)$$ ### Simple Linear Regression: Interpreting the Slope

$$slope = r~\frac{\sigma_y}{\sigma_x}$$

Regression is associative. Not causation.

For a slope of 0.09 in per pounds, we say 0.09 is the estimated difference in height between two people whose weights are one pound apart.

## Recap on Modeling

• For engineers is make predictions - accuracy
• For scientists it's interpretability.
• Parameters like $$F = ma$$

## Steps for Modeling ### Squared Loss vs. Absolute Loss $$L^2$$ nice optimization properties (differentiable) and sensitive to outliers.

### Calculus for Loss Minimization  $$h(x)=f(g(x)))$$

$$\frac{\partial h}{\partial x} = \frac{\partial f}{\partial g} \frac{\partial g}{\partial x}$$

It cancels out. Derivative of outside times derivative of inside. Repeat! ### Numerical Optimization

• First Order: Gradients, the slope of our loss landscape.
• Second order is Hessian, computation is slow. loss function $$f$$ takes a vector and returns a scalar

Gradient takes the scalar and returns a vector.  Like falling down a well. 1 Dimension example. Update the weights $$\theta$$. $$\rho(\tau)$$ is the learning rate.