Webcast

Slides

- Compute slope at that point

- Update (calculated from the gradient is red)

- Initial vector (weights?)
- Update by learning rate
- Converge, gradient is 0 or stop early
- We can do better! Calculating the gradient is slow

- Computing on the population is expensive!!!

**Sample**! Called a **batch**. B term
- Assume loss is decomposable?

- _
*Decomposable Loss*, must be able to be written as a sum on loss

- SGD is faster, but
**on average**, the mean is correct and it converges!!!

- Using the forward pass, calculates gradient
**chain rules**, **individual calculus operations** - **computation graph**

- Graph of each individual operation
**Backward Differentiation** uses the chain rule!!

- Can use GPUs, and auto_diff

- Line of best fit and residuals

- Mean Square Error Loss Surface

- \( L^{1} \) loss surface for comparison!
- Sharp at the end

- PyTorch
`nn.Module`

, can add parameters
- Only need a
`forward`

function!

- N steps, get loss, do loss.backward()
`with torch.no_gradient():`

- Visualization SimpleLinearModel

`nepochs`

how many times walk through data, and `loader`

for the batch size