# Lecture 11: Optimization

## From Last Time

• Graph of averages
• Minimizing average loss
• It has no assumption of the shape of the data.
• There is always a line of best fit, but might not be right Anscombe Quartet

### Transformation

• Linear easy to interpret

#### Log transformation

$$y=a^x, log(y)=xlog(a)$$ $$y=ax^k, log(y)=log(a)+k~log(x)$$

### Simple Linear Regression: Interpreting the Slope

$$slope = r~\frac{\sigma_y}{\sigma_x}$$

Regression is associative. Not causation.

For a slope of 0.09 in per pounds, we say 0.09 is the estimated difference in height between two people whose weights are one pound apart.

## Recap on Modeling

• For engineers is make predictions - accuracy
• For scientists it's interpretability.
• Parameters like $$F = ma$$

## Steps for Modeling

### Squared Loss vs. Absolute Loss

$$L^2$$ nice optimization properties (differentiable) and sensitive to outliers.

### Calculus for Loss Minimization

$$h(x)=f(g(x)))$$

$$\frac{\partial h}{\partial x} = \frac{\partial f}{\partial g} \frac{\partial g}{\partial x}$$

It cancels out.

Derivative of outside times derivative of inside. Repeat!

### Numerical Optimization

• First Order: Gradients, the slope of our loss landscape.
• Second order is Hessian, computation is slow.

loss function $$f$$ takes a vector and returns a scalar
Update the weights $$\theta$$. $$\rho(\tau)$$ is the learning rate.