# Lecture 24: Logistic Regression Part 2 • using a constant model, good baseline • partioned into 7? and calculated the proportion in each interval

## Bonus K-nearest Neighbor • average stored in a heap? • kind of bumpy

## Logistic Regression • $$\frac{1}{1+exp(-t)}$$
• $$t=\sum_{k=0}^d \theta_d x_d$$

### One Dimensional Logistic Regression Model • different co-efficients • slope and intercept (lower intercept move right)

### Loss • cross entropy loss
• y log f(x) + (1- y) log f(x) sum
• have to use an iterative method • the code
• forward has the model
• with cross entropy loss • it's sexy
from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression(solver='lbfgs')

lr_model.coef_

lr_model.intercept_

lr_model.predict # vs lr_model.predict_prob • if theta is infinity, now we are certain instead we want some regularization
• 1.0e-5 * theta ** 2

## The Decision Rule

• predict 1 if P(Y=1 | x) > .5
• but can choose a value other than .5
• accuracy: fraction of correct predictions out of all predictions

### Confusion Matrix

• False-Positive when it is 0 (false) but the algorithm predicts 1 (true)
• False-Negative when it is 1 (true) but the algorithm predicts 0 (false)
• Precision: true positives over true positive + false positives
• how many selected items are relevant
• Recall: true positives over true positives + true negatives
• how many relevant items are selected   • say we want to ensure 95% of malignant tumors are classified as malignant
• np.argmin(recall > .95) - 1 • the pathologist would have to verify 611% of the samples.
• false diagnoses of 5% as benign when malignant is unacceptable in practice!