Lecture 13: Review Modeling and Optimization, Intro to Regression

Human Contexts and Ethics

Imagine you are a Data Scientist on Twitter's "Trust and Safety" team.

  1. Question/Prob Formation
    • Fake News is a problem
    • Doesn't have to be an Engineering-Focused problem!
  2. Data Acquisition and Cleaning
    • What data do we have and need to collect
    • President's Tweet
  3. Exploratory Data Analysis
    • Example classify tweets as healthy or unhealthy
    • Think about the context of your problem
    • Note biases and anomalies
  4. Predictions and Inference
    • What is the story, social good
    • Think of who is listening what kind of power do you have

Think about your social context.

Review Modeling and Optimization


  • Models are a function f that map from X to Y.

  • Parametric Models
    • Have parameters, often represented as a vector
    • Linear Models

  • Non-Parametric Models?
    • Nearest Neighbor
    • copy the prediction from the closest datapoint
    • Really big! Grows with the size of the data

  • Kernel Density Estimator has a param, but it's more like a hyper param

Tradeoffs in Modeling

  • Can predict midterm grades from homeworks
    • Simple model interpretable, summarize data
    • Complex model

Loss Functions

  • loss how close is our model prediction to the actual value

  • Average Loss

  • Solve it with optimization, find the \( \theta \) (param) that min loss

  • \( f_\theta(x) \) is our model. \( L(\theta) \) is the loss func

  • F.l1_loss equiv
  • Keep it as tensors. can do autograd

  • When building a model, do class ExponentialModel(nn.Module)
  • Weights self.w = nn.Parameter(torch.ones(2, 1))
    • Are initial weights is a 2x1 tensor of ones [1, 1]
  • forward is how to make a prediction
  • to evaluate:
m = ExponentialModel()
m(0) # returns tensor of 2

  • In the 3d plot, have w0, w1, and loss. Find point that minimizes loss.

  • Example of orange vs yellow line and it's location on the loss landscape

Optimization of the Model

  • You know your loss, compute the gradient (how to improve our loss)
    • grad: scalar -> vector, each index is deriv with respect to param

  • Take deriv and evaluate it

  • Auto Diff reuses gradients