13 Regularization

Handout

Ridge and LASSO on the Mortgage Problem

First, showing that \(\hat{\beta}_1 = \frac{cov(x, y)}{var(x)}\):

mortgage %>%
  summarize(
    lm_beta1 = cov(approved, income) / var(income)
    )
# A tibble: 1 × 1
  lm_beta1
     <dbl>
1 0.000785
mortgage %>%
  lm(approved ~ income, data = .) %>%
  broom::tidy()
# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept) 0.665    0.00437       152.  0        
2 income      0.000785 0.0000250      31.5 2.80e-214
# CV AUC for LPM, logit, and KNN
mortgage %>%
  select(-race, -ethnicity, -sex, -age) %>%
  mutate(fold = sample(1:5, size = nrow(.), replace = T)) %>%
  {
    c(
      lpm = cross_validate(., "lpm"),
      ridge_lpm = cross_validate(., "lpm ridge"),
      lasso_lpm = cross_validate(., "lpm LASSO"),
      logit = cross_validate(., "logit"),
      ridge_logit = cross_validate(., "logit ridge"),
      lasso_logit = cross_validate(., "logit LASSO")
      )
  }

# CV AUC:
# LPM:              ___
# LPM with ridge:   ___
# LPM with LASSO:   ___
# Logit:            ___
# Logit with ridge: ___
# Logit with LASSO: ___

If Ridge or LASSO help our CV AUC a lot, that’s an indicator that our models are overfitted. Is that a problem for us?

Compare Coefficients

Inspect the coefficients for the LPM, LPM with ridge regularization, and LPM with LASSO. Which coefficients shrink or disappear?

mortgage <- mortgage %>% 
  select(-race, -ethnicity, -sex, -age)

# No regularization
lpm <- mortgage %>%
  lm(approved ~ ., data = .) %>%
  broom::tidy() %>%
  select(term, ols_estimate = estimate)

# Ridge
ridge <- cv.glmnet(
  x = model.matrix(approved ~ ., data = mortgage)[, -1],
  y = mortgage %>% pull(approved),
  alpha = 0, # 0 = ridge; 1 = lasso
  family = "gaussian"
) %>% 
  coef(s = "lambda.1se") %>%
  as.matrix() %>%
  as_tibble(rownames = "term") %>%
  rename(ridge_estimate = 2)

# Lasso
lasso <- cv.glmnet(
  x = model.matrix(approved ~ ., data = mortgage)[, -1],
  y = mortgage %>% pull(approved),
  alpha = 1, # 0 = ridge; 1 = lasso
  family = "gaussian"
) %>% 
  coef(s = "lambda.1se") %>%
  as.matrix() %>%
  as_tibble(rownames = "term") %>%
  rename(lasso_estimate = 2)

# View coefficients from all three:
lpm %>% 
  left_join(ridge) %>% 
  left_join(lasso)
# A tibble: 50 × 4
   term                           ols_estimate ridge_estimate lasso_estimate
   <chr>                                 <dbl>          <dbl>          <dbl>
 1 (Intercept)                       -1.68         0.808              0.0842
 2 income                             0.000145    -0.000230           0     
 3 debt_to_income_ratio               0.0588       0.00189            0.0440
 4 loan_amount                       -0.00101     -0.00000199         0     
 5 loan_typeFHA                       0.0410       0.0622             0.0496
 6 loan_typeUSDA/FSA                 -0.140       -0.155             -0.106 
 7 loan_typeVA                        0.101        0.0976             0.102 
 8 loan_purposeCash-out refinance    -0.142       -0.106             -0.125 
 9 loan_purposeHome improvement      -0.143       -0.111             -0.122 
10 loan_purposeOther                 -0.186       -0.150             -0.167 
# ℹ 40 more rows

Reflection Questions:

  1. Compare the coefficients across OLS, ridge, and lasso. List the variables whose coefficients remained stable and then those that changed dramatically. Are the coefficients that remained stable the variables that might be less correlated with other explanatory variables?

  2. Did LASSO force any variables to totally disappear? Does it make sense that these variables are not as important to our prediction?

  3. Ridge and LASSO changed many coefficients, but CV AUC barely changed. What does this suggest about interpreting coefficients in predictive models?

Download this assignment

Here’s a link to download this assignment.