Imagine you are a Data Scientist on Twitter's "Trust and Safety" team.
- Question/Prob Formation
- Fake News is a problem
- Doesn't have to be an Engineering-Focused problem!
- Data Acquisition and Cleaning
- What data do we have and need to collect
- President's Tweet
- Exploratory Data Analysis
- Example classify tweets as healthy or unhealthy
- Think about the context of your problem
- Note biases and anomalies
- Predictions and Inference
- What is the story, social good
- Think of who is listening what kind of power do you have
Think about your social context.
- Models are a function f that map from X to Y.
- Parametric Models
- Have parameters, often represented as a vector
- Linear Models
- Non-Parametric Models?
- Nearest Neighbor
- copy the prediction from the closest datapoint
- Really big! Grows with the size of the data
- Kernel Density Estimator has a param, but it's more like a hyper param
- Can predict midterm grades from homeworks
- Simple model interpretable, summarize data
- Complex model
- loss how close is our model prediction to the actual value
- Average Loss
- Solve it with optimization, find the \( \theta \) (param) that min loss
- \( f_\theta(x) \) is our model. \( L(\theta) \) is the loss func
- Keep it as tensors. can do autograd
- When building a model, do
self.w = nn.Parameter(torch.ones(2, 1))
- Are initial weights is a 2x1 tensor of ones [1, 1]
forwardis how to make a prediction
- to evaluate:
m = ExponentialModel() m(0) # returns tensor of 2
- In the 3d plot, have
loss. Find point that minimizes loss.
- Example of orange vs yellow line and it's location on the loss landscape
- You know your loss, compute the gradient (how to improve our loss)
- grad: scalar -> vector, each index is deriv with respect to param
- Take deriv and evaluate it
- Auto Diff reuses gradients