# Lecture 24: Logistic Regression Part 2

• using a constant model, good baseline

• partioned into 7? and calculated the proportion in each interval

## Bonus K-nearest Neighbor

• average stored in a heap?

• kind of bumpy

## Logistic Regression

• $$\frac{1}{1+exp(-t)}$$
• $$t=\sum_{k=0}^d \theta_d x_d$$

### One Dimensional Logistic Regression Model

• different co-efficients

• slope and intercept (lower intercept move right)

### Loss

• cross entropy loss
• y log f(x) + (1- y) log f(x) sum
• have to use an iterative method

• the code
• forward has the model
• with cross entropy loss

• it's sexy
from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression(solver='lbfgs')

lr_model.coef_

lr_model.intercept_

lr_model.predict # vs lr_model.predict_prob


• if theta is infinity, now we are certain instead we want some regularization
• 1.0e-5 * theta ** 2

## The Decision Rule

• predict 1 if P(Y=1 | x) > .5
• but can choose a value other than .5
• accuracy: fraction of correct predictions out of all predictions

### Confusion Matrix

• False-Positive when it is 0 (false) but the algorithm predicts 1 (true)
• False-Negative when it is 1 (true) but the algorithm predicts 0 (false)
• Precision: true positives over true positive + false positives
• how many selected items are relevant
• Recall: true positives over true positives + true negatives
• how many relevant items are selected

• say we want to ensure 95% of malignant tumors are classified as malignant
• np.argmin(recall > .95) - 1

• the pathologist would have to verify 611% of the samples.
• false diagnoses of 5% as benign when malignant is unacceptable in practice!