Lecture 24: Logistic Regression Part 2

  • using a constant model, good baseline

  • partioned into 7? and calculated the proportion in each interval

Bonus K-nearest Neighbor

  • average stored in a heap?

  • kind of bumpy

Logistic Regression

  • \( \frac{1}{1+exp(-t)} \)
  • \( t=\sum_{k=0}^d \theta_d x_d\)

One Dimensional Logistic Regression Model

  • different co-efficients

  • slope and intercept (lower intercept move right)

Loss

  • cross entropy loss
    • y log f(x) + (1- y) log f(x) sum
  • have to use an iterative method

  • the code
  • forward has the model
  • with cross entropy loss
  • zero_gradient so we can take the gradient again

  • it's sexy
from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression(solver='lbfgs')

lr_model.coef_

lr_model.intercept_

lr_model.predict # vs lr_model.predict_prob

  • if theta is infinity, now we are certain instead we want some regularization
  • 1.0e-5 * theta ** 2

The Decision Rule

  • predict 1 if P(Y=1 | x) > .5
  • but can choose a value other than .5
  • accuracy: fraction of correct predictions out of all predictions

Confusion Matrix

  • False-Positive when it is 0 (false) but the algorithm predicts 1 (true)
  • False-Negative when it is 1 (true) but the algorithm predicts 0 (false)
  • Precision: true positives over true positive + false positives
    • how many selected items are relevant
  • Recall: true positives over true positives + true negatives
    • how many relevant items are selected

  • say we want to ensure 95% of malignant tumors are classified as malignant
  • np.argmin(recall > .95) - 1

  • the pathologist would have to verify 611% of the samples.
  • false diagnoses of 5% as benign when malignant is unacceptable in practice!