7 Confusion Matrices

Last week, we built two models to predict mortgage approval:

a linear probability model (LPM)
a logistic regression model (logit)

Both models produce predicted probabilities, but they behave differently. The LPM can produce invalid probabilities below 0 or above 1, while the logit model always produces probabilities between 0 and 1. In the LPM, marginal effects are constant. In the logit model, marginal effects can vary across observations.

In this classwork, we’ll think about the LPM and the logit from the point of view of a lender. The lender needs to turn a model’s predicted probabilities into an approval decision through a cutoff or a threshold. These classification decisions are what we’ll explore today.

library(tidyverse)

# You may need to change the file path (this one is for macs)
mortgage <- read_csv("~/Downloads/county_06065.csv") %>%
  filter(
    business_or_commercial_purpose == 2,
    total_units == 1,
    derived_dwelling_category == "Single Family (1-4 Units):Site-Built",
    action_taken %in% c(1, 2, 3),
    income >= 0,
    income <= 800
  ) %>%
  mutate(
    approved = if_else(action_taken %in% c(1, 2), 1, 0),

    loan_purpose = case_when(
      loan_purpose == 1  ~ "Home purchase",
      loan_purpose == 2  ~ "Home improvement",
      loan_purpose == 31 ~ "Refinance",
      loan_purpose == 32 ~ "Cash-out refinance",
      loan_purpose == 4  ~ "Other",
      .default = NA_character_
    ),

    derived_ethnicity = case_when(
      derived_ethnicity == "Hispanic or Latino"     ~ "Hispanic",
      derived_ethnicity == "Not Hispanic or Latino" ~ "Not Hispanic",
      derived_ethnicity == "Joint"                  ~ "Joint",
      .default = NA_character_
    ),

    derived_sex = na_if(derived_sex, "Sex Not Available"),

    derived_race = case_when(
      derived_race == "White"                      ~ "White",
      derived_race == "Black or African American" ~ "Black",
      derived_race == "Asian"                     ~ "Asian",
      derived_race == "Joint"                     ~ "Joint",
      derived_race %in% c(
        "American Indian or Alaska Native",
        "Native Hawaiian or Other Pacific Islander",
        "2 or more minority races"
      ) ~ "Other",
      .default = NA_character_
    ),

    applicant_age = na_if(applicant_age, "8888"),

    debt_to_income_ratio = case_when(
      debt_to_income_ratio == "<20%" ~ "15",
      debt_to_income_ratio == "20%-<30%" ~ "25",
      debt_to_income_ratio == "30%-<36%" ~ "33",
      debt_to_income_ratio == "50%-60%" ~ "55",
      debt_to_income_ratio == ">60%" ~ "65",
      debt_to_income_ratio %in% as.character(36:49) ~ debt_to_income_ratio,
      .default = NA_character_
    ),

    loan_type = case_when(
      loan_type == 1 ~ "Conventional",
      loan_type == 2 ~ "FHA",
      loan_type == 3 ~ "VA",
      loan_type == 4 ~ "USDA/FSA",
      .default = NA_character_
    ),

    lien_status = case_when(
      lien_status == 1 ~ "First lien",
      lien_status == 2 ~ "Subordinate lien",
      .default = NA_character_
    ),

    occupancy_type = case_when(
      occupancy_type == 1 ~ "Principal residence",
      occupancy_type == 2 ~ "Second residence",
      occupancy_type == 3 ~ "Investment property",
      .default = NA_character_
    ),

    loan_amount = loan_amount / 1000,
    property_value = as.numeric(property_value) / 1000,
    debt_to_income_ratio = as.numeric(debt_to_income_ratio),
    loan_term = as.numeric(loan_term),
    interest_rate = as.numeric(interest_rate),

    loan_purpose = factor(loan_purpose),
    derived_ethnicity = factor(derived_ethnicity),
    derived_race = factor(derived_race),
    derived_sex = factor(derived_sex),
    applicant_age = factor(
      applicant_age,
      levels = c("<25", "25-34", "35-44", "45-54", "55-64", "65-74", ">74")
    ),
    conforming_loan_limit = factor(conforming_loan_limit),
    loan_type = factor(loan_type),
    lien_status = factor(lien_status),
    occupancy_type = factor(occupancy_type)
  ) %>%
  select(
    approved,
    income,
    debt_to_income_ratio,
    race = derived_race,
    ethnicity = derived_ethnicity,
    sex = derived_sex,
    age = applicant_age,
    loan_amount,
    loan_type,
    loan_purpose,
    lien_status,
    conforming_loan_limit,
    property_value,
    occupancy_type
  ) %>%
  mutate(
    race = fct_relevel(race, "White"),
    ethnicity = fct_relevel(ethnicity, "Not Hispanic"),
    sex = fct_relevel(sex, "Joint"),
    age = fct_relevel(age, "35-44"),
    loan_type = fct_relevel(loan_type, "Conventional"),
    loan_purpose = fct_relevel(loan_purpose, "Home purchase"),
    lien_status = fct_relevel(lien_status, "First lien"),
    conforming_loan_limit = fct_relevel(conforming_loan_limit, "C"),
    occupancy_type = fct_relevel(occupancy_type, "Principal residence")
  )

Part 1: Add `lpm_prediction` and `logit_prediction` to the data set

Generate predicted probabilities from LPM and logit. Verify that the predicted probabilities look right: LPM predictions may fall below 0 or above 1, while logit predictions should always be strictly between 0 and 1.

lpm <- mortgage %>% ___
logit <- mortgage %>% ___

mortgage <- mortgage %>%
  mutate(
    lpm_prediction = predict(___),
    logit_prediction = predict(___)
    ) %>%
  select(lpm_prediction, logit_prediction, approved, everything())

Part 2: Build Classifiers

Next, turn predicted probabilities into decisions by using mutate() with if_else() at the end of the code chunk above. Name the new variables lpm_classifier and logit_classifier.

Use a threshold of 0.7 and classify an application as approved if the predicted probability is greater than 0.7.

Part 3: Confusion Matrices

Assume that approved reflects the truth about whether the applicant should receive the loan.

Explain each of the following in words from the perspective of a lender:

A true positive is when we (approve/deny) (good/bad) borrowers.
A true negative is when we (approve/deny) (good/bad) borrowers.
A false positive is when we (approve/deny) (good/bad) borrowers.
A false negative is when we (approve/deny) (good/bad) borrowers.

Which error is worse for a lender: a false positive or a false negative? Explain your reasoning.

# Remember in the output below, `approved` is the truth
# and `lpm_classifier` or `logit_classifier` is the model’s
# decision.

# LPM Confusion Matrix
mortgage %>%
  drop_na(lpm_classifier) %>%
  count(approved, lpm_classifier)

# The LPM has:
# ____ true positives.
# ____ true negatives.
# ____ false positives.
# ____ false negatives.

# Logit Confusion Matrix
mortgage %>%
  drop_na(logit_classifier) %>%
  count(approved, logit_classifier)

# The logit has:
# ____ true positives.
# ____ true negatives.
# ____ false positives.
# ____ false negatives.

Accuracy is the share of all predictions that are correct: \[\mathrm{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}\]

The model’s accuracy tells us how often the model is right overall, but it does not tell us whether the model is making the right kind of mistakes.

The LPM has an accuracy of ___.
The logit has an accuracy of ___.

Based on accuracy alone, the ____ model appears to perform better. But if one kind of mistake is especially costly, accuracy may not be the most important metric.

The False Positive Rate answers: “Out of all bad applicants, how many did we mistakenly approve?” \(\mathrm{FPR} = \frac{FP}{FP + TN}\).

The LPM has a FPR of ___.
The logit has a FPR of ___.

Based on FPR alone, the ____ appears to be the best model.

The False Negative Rate answers: “Out of all good applicants, how many did we mistakenly deny?” \(\mathrm{FNR} = \frac{FN}{FN + TP}\).

The LPM has a FNR of ___.
The logit has a FNR of ___.

Based on FNR alone, the ____ appears to be the best model.

A “stricter” model will approve fewer people overall, which brings (FPR/FNR) up but (FPR/FNR) down. A more “lenient” model will approve more people overall, which brings (FPR/FNR) up but (FPR/FNR) down. In this case, the (logit/LPM) seems to be the stricter model. Of course, you can also manipulate the strictness of the model by changing the threshold.

Part 4: Threshold Sensitivity

A threshold of 0.7 is not special. In this part, we’ll change the threshold to 0.6 or 0.8 and notice the changes to the FPR and the FNR.

Next class, instead of choosing one threshold, we will evaluate all possible thresholds at once using the ROC curve.

confusion <- function(threshold, prediction_var) {
  mortgage %>%
    mutate(
      classifier = if_else({{ prediction_var }} > threshold, 1, 0)
    ) %>%
    drop_na(classifier) %>%
    summarise(
      TP = sum(approved == ___ & classifier == ___),
      TN = sum(approved == ___ & classifier == ___),
      FP = sum(approved == ___ & classifier == ___),
      FN = sum(approved == ___ & classifier == ___)
    ) %>%
    mutate(
      accuracy = round((TP + TN) / (TP + TN + FP + FN), 2),
      false_positive_rate = round(__ / (__ + __), 2),
      false_negative_rate = round(__ / (__ + __), 2)
    ) %>%
    select(accuracy, false_positive_rate, false_negative_rate)
}

confusion(0.6, lpm_prediction)
confusion(0.6, logit_prediction)
confusion(0.8, lpm_prediction)
confusion(0.8, logit_prediction)

What threshold do you recommend for the LPM and for the logit to balance high accuracy and a low FPR or FNR (whichever you think is more important)? Explain.

Download this assignment

Here’s a link to download this assignment.

Part 1: Add lpm_prediction and logit_prediction to the data set

Part 2: Build Classifiers

Part 3: Confusion Matrices

Part 4: Threshold Sensitivity

Download this assignment

Part 1: Add `lpm_prediction` and `logit_prediction` to the data set