class: center, middle, inverse, title-slide .title[ # Lecture .mono[006] ] .subtitle[ ## Classification ] .author[ ### Edward Rubin ] --- exclude: true --- layout: true # Admin --- class: inverse, middle --- name: admin-today ## Material .b[Last time] - `tidymodels` - Shrinkage methods - Ridge regression 🏔 - (The) lasso 🤠 - Elasticnet 🥅 .b[Today] Classification methods - Introduction to classification - Linear probability models - Logistic regression --- name: admin-soon ## Upcoming .b[Readings] .note[Today] .it[ISL] Ch. 4 .b[Problem sets] - .it[Cross validation and shrinkage] Due next week. - .it[Classification] After that... .b[Project] Topic due Sunday! --- layout: true # Classification --- class: inverse, middle --- name: intro ## Intro .attn[Regression problems] seek to predict the number an outcome will take—integers (_e.g._, number of cats), reals (_e.g._, home/cat value), _etc._ .super[.pink[†]] .footnote[ .pink[†] Maybe: Binary indicators... ] -- .attn[Classification problems] instead seek to predict the category of an outcome - .b[Binary outcomes]<br>success/failure; true/false; A or B; cat or .it[not cat]; _etc._ - .b[Multi-class outcomes]<br>yes, no, .it[or maybe]; colors; letters; type of cat;.super[.pink[††]] _etc._ .footnote[ .tran[† Maybe: Binary indicators...] .pink[††] It turns out, all of machine learning is about cats. ] This type of outcome is often called a .it[qualitative] or .it[categorical] response. --- name: examples ## Examples For the past few weeks, we've been immersed in regression problems. It's probably helpful to mention a few .hi[examples of classification problems]. -- - Using life/criminal history (and demographics?):<br>Can we predict whether a defendant is .b[granted bail]? -- - Based upon a set of symptoms and observations:<br>Can we predict a patient's .b[medical condition](s)? -- - From the pixels in an image:<br>Can we classify images as .b[bagel, puppy, or other]? --- ## Approach One can imagine two.super[.pink[†]] related .hi[approaches to classification] .footnote[ .pink[†] At least. ] 1. Predict .b[which category] the outcome will take. 1. Estimate the .b[probability of each category] for the outcome. -- That said, the general approach will - Take a set of training observations `\((x_1,y_1),\, (x_2,y_2),\,\ldots,\,(x_n,y_n)\)` - Build a classifier `\(\hat{y}_o=\mathop{f}(x_o)\)` all while balancing bias and variance..super[.pink[††]] .footnote[ .tran[† At least.] .pink[††] Sound familiar? ] --- layout: false class: clear, middle .qa[Q] If everything is so similar, can't we use regression methods? .white[No] --- class: clear, middle .qa[Q] If everything is so similar, can't we use regression methods? .qa[A] .it[Sometimes.] -- .it[Other times:] No. -- Plus you still need new tools. --- layout: true # Classification ## Why not regression? --- name: no-regress Regression methods are not made to deal with .b[multiple categories]. .ex[Ex.] Consider three medical diagnoses: .pink[stroke], .purple[overdose], and .orange[seizure]. Regression needs a numeric outcome—how should we code our categories? -- .left-third[ .center.note[Option 1] `$$Y=\begin{cases} \displaystyle 1 & \text{if }\color{#e64173}{\text{ stroke}} \\ \displaystyle 2 & \text{if }\color{#6A5ACD}{\text{ overdose}} \\ \displaystyle 3 & \text{if }\color{#FFA500}{\text{ seizure}} \\ \end{cases}$$` ] -- .left-third[ .center.note[Option 2] `$$Y=\begin{cases} \displaystyle 1 & \text{if }\color{#6A5ACD}{\text{ overdose}} \\ \displaystyle 2 & \text{if }\color{#e64173}{\text{ stroke}} \\ \displaystyle 3 & \text{if }\color{#FFA500}{\text{ seizure}} \\ \end{cases}$$` ] -- .left-third[ .center.note[Option 3] `$$Y=\begin{cases} \displaystyle 1 & \text{if }\color{#FFA500}{\text{ seizure}} \\ \displaystyle 2 & \text{if }\color{#e64173}{\text{ stroke}} \\ \displaystyle 3 & \text{if }\color{#6A5ACD}{\text{ overdose}} \\ \end{cases}$$` ] -- The categories' ordering is unclear—let alone the actual valuation. <br> The choice of ordering and valuation can affect predictions. 😿 --- As we've seen, .b[binary outcomes] are simpler. -- .ex[Ex] If we are only choosing between .pink[stroke] and .purple[overdose] .left-wide[ .center.note[Option 1] `$$Y=\begin{cases} \displaystyle 0 & \text{if }\color{#e64173}{\text{ stroke}} \\ \displaystyle 1 & \text{if }\color{#6A5ACD}{\text{ overdose}} \\ \end{cases}$$` ] .left-thin.center[<br><br>.center[and]] .left-wide[ .center.note[Option 2] `$$Y=\begin{cases} \displaystyle 0 & \text{if }\color{#6A5ACD}{\text{ overdose}} \\ \displaystyle 1 & \text{if }\color{#e64173}{\text{ stroke}} \\ \end{cases}$$` ] .clear-up[ will provide the same results. ] --- name: lpm In these .b[binary outcome] cases, we .it[can] apply linear regression. These models are called .attn[linear probability models] (LPMs). The .b[predictions] from an LPM 1. estimate the conditional probability `\(y_i = 1\)`, _i.e._, `\(\mathop{\text{Pr}}\left(y_o = 1 \mid x_o\right)\)` 1. are not restricted to being between 0 and 1.super[.pink[†]] 1. provide an ordering—and a reasonable estimate of probability .footnote[ .pink[†] Some people get very worked up about this point. ] -- .note[Other benefits:] Coefficients are easily interpreted + we know how OLS works. --- layout: true class: clear, middle --- Let's consider an example: the `Default` dataset from `ISLR`
--- exclude: true --- .hi-purple[The data:] The outcome, default, only takes two values (only 3.3% default). <img src="slides_files/figure-html/boxplot-default-balance-1.svg" style="display: block; margin: auto;" /> --- .hi-purple[The data:] The outcome, default, only takes two values (only 3.3% default). <img src="slides_files/figure-html/plot-default-points-1.svg" style="display: block; margin: auto;" /> --- .hi-pink[The linear probability model] struggles with prediction in this setting. <img src="slides_files/figure-html/plot-default-lpm-1.svg" style="display: block; margin: auto;" /> --- .hi-orange[Logistic regression] .it[appears] to offer an improvement. <img src="slides_files/figure-html/plot-default-logistic-1.svg" style="display: block; margin: auto;" /> --- So... what's logistic regression? --- layout: true # Logistic regression --- class: inverse, middle --- name: logistic-intro ## Intro .attn[Logistic regression] .b[models the probability] that our outcome `\(Y\)` belongs to a .b[specific category] (often whichever category we think of as `TRUE`). -- For example, we just saw a graph where $$ `\begin{align} \mathop{\text{Pr}}\left(\text{Default} = \text{Yes} | \text{Balance}\right) = p(\text{Balance}) \end{align}` $$ we are modeling the probability of `default` as a function of `balance`. -- We use the .b[estimated probabilities] to .b[make predictions], _e.g._, - if `\(p(\text{Balance})\geq 0.5\)`, we could predict "Yes" for Default - to be conservative, we could predict "Yes" if `\(p(\text{Balance})\geq0.1\)` --- name: logistic-logistic ## What's .it[logistic]? We want to model probability as a function of the predictors `\(\left(\beta_0 + \beta_1 X\right)\)`. .col-centered[ .hi-pink[Linear probability model] <br> .pink[linear] transform. of predictors $$ `\begin{align} p(X) = \beta_0 + \beta_1 X \end{align}` $$ ] .col-centered[ .hi-orange[Logistic model] <br> .orange[logistic] transform. of predictors $$ `\begin{align} p(X) = \dfrac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \end{align}` $$ ] .clear-up[ What does this .it[logistic function] `\(\left(\frac{e^x}{1+e^x}\right)\)` do? ] 1. ensures predictions are between 0 `\((x\rightarrow-\infty)\)` and 1 `\((x\rightarrow\infty)\)` 1. forces an S-shaped curved through the data (not linear) --- ## What's .it[logistic]? With a little math, you can show $$ `\begin{align} p(X) = \dfrac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \implies \color{#e64173}{\log \left( \dfrac{p(X)}{1-p(X)}\right)} = \color{#6A5ACD}{\beta_0 + \beta_1 X} \end{align}` $$ .note[New definition:] .hi-pink[log odds].super[.pink[†]] on the RHS and .hi-purple[linear predictors] on the LHS. .footnote[ .pink[†] The "log odds" is sometimes called "logit". ] -- 1. .b[interpretation] of `\(\beta_j\)` is about .pink[log odds]—not probability -- 1. .b[changes in probability] due to `\(X\)` depend on level of `\(X\)`.super[.pink[††]] .footnote[ .tran[† The "log odds" is sometimes called "logit".] .pink[††] It's nonlinear! ] --- name: logistic-estimation ## Estimation Before we can start predicting, we need to estimate the `\(\beta_j\)`s. $$ `\begin{align} p(X) = \dfrac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \implies \color{#e64173}{\log \left( \dfrac{p(X)}{1-p(X)}\right)} = \color{#6A5ACD}{\beta_0 + \beta_1 X} \end{align}` $$ We estimate logistic regression using .attn[maximum likelihood estimation]. -- .attn[Maximum likelihood estimation] (MLE) searches for the `\(\beta_j\)`s that make our data "most likely" given the model we've written. --- name: logistic-mle ## Maximum likelihood .attn[MLE] searches for the `\(\beta_j\)`s that make our data "most likely" using our model. $$ `\begin{align} \color{#e64173}{\log \left( \dfrac{p(X)}{1-p(X)}\right)} = \color{#6A5ACD}{\beta_0 + \beta_1 X} \end{align}` $$ -- 1. `\(\color{#6A5ACD}{\beta_j}\)` tells us how `\(x_j\)` affects the .pink[log odds] -- 1. odds `\(= \dfrac{p(X)}{1-p(X)}\)`. -- If `\(p(X) > 0.5\)`, then odds `\(>1\)` and .pink[log odds] `\(> 0\)`. -- So we want choose `\(\color{#6A5ACD}{\beta_j}\)` such that - .pink[log odds] are above zero for observations where `\(y_i=1\)` - .pink[log odds] even larger for areas of `\(x_j\)` where most `\(i\)`s have `\(y_i=1\)` --- ## Formally: The likelihood function We estimate logistic regression by maximizing .attn[the likelihood function].super[.pink[†]] .footnote[ .pink[†] Generally, we actually will maximize the .it[log] of the likelihood function. ] $$ `\begin{align} \mathop{\ell}(\beta_0,\beta_1) = \prod_{i:y_i=1} \mathop{p}(x_i) \prod_{i:y_i=0} (1-\mathop{p}(x_i)) \end{align}` $$ The likelihood function is maximized by - making `\(p(x_i)\)` large for individuals with `\(y_i = 1\)` - making `\(p(x_i)\)` small for individuals with `\(y_i = 0\)` .it[Put simply:] Maximum likelihood maximizes a predictive performance, conditional on the model we have written down. --- name: logistic-r ## In R In R, you can run logistic regression using the `glm()` function. Also: `logistic_reg()` in the `tidymodels` galaxy (with the `"glm"` engine). -- .note[Aside:] Related to `lm`, `glm` stands for .it[generalized] (linear model). -- "Generalized" essentially means that we're applying some transformation to `\(\beta_0 + \beta_1 X\)` like logistic regression applies the logistic function. More generally: $$\color{#FFA500}{\mathbf{y}} = \color{#20B2AA}{g}^{-1} \left( \color{#6A5ACD}{\mathbf{X}} \color{#e64173}{\beta} \right) \iff \color{#20B2AA}{g}(\color{#FFA500}{\mathbf{y}}) = \color{#6A5ACD}{\mathbf{X}} \color{#e64173}{\beta} $$ --- ## In R In R, you can run logistic regression using the `glm()` function. .b[Key arguments] (very similar to `lm()`) - specify a `formula`,.super[.pink[†]] _e.g._, `y ~ .` or `y ~ x + I(x^2)` - define `family = "binomial"` (so R knows to run logistic regression) - give the function some `data` .footnote[ .pink[†] Notice that we're back in the world of needing to select a model... ] -- ``` r est_logistic = glm( i_default ~ balance, * family = "binomial", data = default_df ) ``` --- layout: false class: clear ``` r est_logistic |> summary() ``` ``` #> #> Call: #> glm(formula = i_default ~ balance, family = "binomial", data = default_df) #> *#> Coefficients: *#> Estimate Std. Error z value Pr(>|z|) *#> (Intercept) -1.065e+01 3.612e-01 -29.49 <2e-16 *** *#> balance 5.499e-03 2.204e-04 24.95 <2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for binomial family taken to be 1) #> #> Null deviance: 2920.6 on 9999 degrees of freedom #> Residual deviance: 1596.5 on 9998 degrees of freedom #> AIC: 1600.5 #> #> Number of Fisher Scoring iterations: 8 ``` --- layout: true # Logistic regression --- name: logistic-prediction ## Estimates and predictions Thus, our estimates are `\(\hat{\beta}_0 \approx -10.65\)` and `\(\hat{\beta}_1 \approx 0.0055\)`. .note[Remember:] These coefficients are for the .b[log odds]. -- If we want .hi[to make predictions] for `\(y_i\)` (whether or not `\(i\)` defaults), <br>then we first must .hi[estimate the probability] `\(\mathop{p}(\text{Balance})\)` $$ `\begin{align} \hat{p}(\text{Balance}) = \dfrac{e^{\hat{\beta}_0 + \hat{\beta}_1 \text{Balance}}}{1 + e^{\hat{\beta}_0 + \hat{\beta}_1 \text{Balance}}} \approx \dfrac{e^{-10.65 + 0.0055 \cdot \text{Balance}}}{1 + e^{-10.65 + 0.0055 \cdot \text{Balance}}} \end{align}` $$ -- - If `\(\text{Balance} = 0\)`, we then estimate `\(\mathop{\hat{p}} \approx 0.000024\)` - If `\(\text{Balance} = 2,000\)`, we then estimate `\(\mathop{\hat{p}} \approx 0.586\)` - If `\(\text{Balance} = 3,000\)`, we then estimate `\(\mathop{\hat{p}} \approx 0.997\)` .super[.pink[†]] .footnote[ .pink[†] You get a sense of the nonlinearity of the predictors' effects. ] --- layout: false class: clear, middle .hi-orange[Logistic regression]'s predictions of `\(\mathop{p}(\text{Balance})\)` <img src="slides_files/figure-html/plot-default-logistic-2-1.svg" style="display: block; margin: auto;" /> --- class: clear, middle .note[Note:] Everything we've done so far extends to models with many predictors. --- layout: true # Logistic regression ## Prediction -- .note[Old news:] You can use `predict()` to get predictions out of `glm` objects. .b[New and important:] `predict()` produces multiple `type`.small[s] of predictions 1. `type = "response"` predicts .it[on the scale of the response variable] <br>for logistic regression, this means .b[predicted probabilities] (0 to 1) 1. `type = "link"` predicts .it[on the scale of the linear predictors] <br>for logistic regression, this means .b[predicted log odds] (-∞ to ∞) .attn[Beware:] The default is `type = "link"`, which you may not want. --- Putting it all together, we can get (estimated) probabilities `\(\hat{p}(X)\)` ``` r # Predictions on scale of response (outcome) variable p_hat = predict(est_logistic, type = "response") ``` which we can use to make predictions on `\(y\)` ``` r # Predict '1' if p_hat is greater or equal to 0.5 y_hat = as.numeric(p_hat >= 0.5) ``` --- layout: false class: clear, middle So how did we do? --- layout: true # Assessment --- class: inverse, middle --- name: how ## How did we do? We guessed 97.25% of the observations correctly. -- .qa[Q] 97.25% is pretty good, right? -- .qa[A] It depends... -- Remember that 3.33% of the observations actually defaulted. -- <br>So we would get 96.67% right by guessing "No" for everyone..super[.pink[†]] .footnote[ .pink[†] This idea is called the .it[null classifier]. ] -- We .it[did] guess 30.03% of the defaults -- , which is clearer better than 0%. -- .qa[Q] How can we more formally assess our model's performance? -- .qa[A] All roads lead to the .attn[confusion matrix]. --- name: confusion ## The confusion matrix The .attn[confusion matrix] is us a convenient way to display <br>.hi-orange[correct] and .hi-purple[incorrect] predictions for each class of our outcome.
Truth
No
Yes
Prediction
No
True Negative (TN)
False Negative (FN)
Yes
False Positive (FP)
True Positive (TP)
-- The .attn[accuracy] of a method is the share of .orange[correct] predictions, _i.e._, .center[ .b[Accuracy] = (.hi-orange[TN] + .hi-orange[TP]) / (.hi-orange[TN] + .hi-orange[TP] + .hi-purple[FN] + .hi-purple[FP]) ] -- This matrix also helps display many other measures of assessment. --- ## The confusion matrix .attn[Sensitivity:] the share of positive outcomes `\(Y=1\)` that we correctly predict. .center[ .b[Sensitivity] = .hi-orange[TP] / (.hi-orange[TP] + .hi-purple[FN]) ]
Truth
No
Yes
Prediction
No
True Negative (TN)
False Negative (FN)
Yes
False Positive (FP)
True Positive (TP)
Sensitivity is also called .attn[recall] and the .attn[true-positive rate]. One minus sensitivity is the .attn[type-II error rate]. --- ## The confusion matrix .attn[Specificity:] the share of neg. outcomes `\((Y=0)\)` that we correctly predict. .center[ .b[Specificity] = .hi-orange[TN] / (.hi-orange[TN] + .hi-purple[FP]) ]
Truth
No
Yes
Prediction
No
True Negative (TN)
False Negative (FN)
Yes
False Positive (FP)
True Positive (TP)
One minus specificity is the .attn[false-positive rate] or .attn[type-I error rate]. --- ## The confusion matrix .attn[Precision:] the share of predicted positives `\((\hat{Y}=1)\)` that are correct. .center[ .b[Precision] = .hi-orange[TP] / (.hi-orange[TP] + .hi-purple[FP]) ]
Truth
No
Yes
Prediction
No
True Negative (TN)
False Negative (FN)
Yes
False Positive (FP)
True Positive (TP)
--- ## The confusion matrix .attn[Negative prediction value:]<br>the share of predicted negatives `\((\hat{Y}=0)\)` that are correct. .center[ .b[NPV] = .hi-orange[TN] / (.hi-orange[TN] + .hi-purple[FN]) ]
Truth
No
Yes
Prediction
No
True Negative (TN)
False Negative (FN)
Yes
False Positive (FP)
True Positive (TP)
.note[Note:] NPV is not commonly used. --- ## Which assessment? .qa[Q] So .it[which] criterion should we use? -- .qa[A] You should use the .it[right] criterion for your context. - Are true positives more valuable than true negatives? -- <br>.note[Sensitivity] will be key. -- - Do you want to have high confidence in predicted positives? -- <br>.note[Precision] is your friend -- - Are all errors equal? -- <br> .note[Accuracy] is perfect. -- [There's a lot more](https://yardstick.tidymodels.org/reference/index.html), _e.g._, the .attn[F.sub[1] score] combines precision and sensitivity. --- name: cm-r ## Confusion in R [`conf_mat()` from `yardstick`](https://yardstick.tidymodels.org/reference/conf_mat.html) (`tidymodels`) calculates the confusion matrix. - `data`: a dataset (`factor` variables) of true values and predictions - `truth`: the name of the column (in `data`) of the truth values - `estimate`: the name of the column (in `data`) of our predictions -- ``` r cm_logistic = conf_mat( # Create a dataset of truth and predictions data = tibble( y_hat = y_hat %>% as.factor(), y = default_df$i_default %>% as.factor() ), truth = y, estimate = y_hat ) ``` --- ## Confusion in R [`conf_mat()` from `yardstick`](https://yardstick.tidymodels.org/reference/conf_mat.html) (`tidymodels`) calculates the confusion matrix. - `data`: a dataset (`factor` variables) of true values and predictions - `truth`: the name of the column (in `data`) of the truth values - `estimate`: the name of the column (in `data`) of our predictions ``` #> Truth #> Prediction 0 1 #> 0 9625 233 #> 1 42 100 ``` --- layout: true # Assessment --- ## Thresholds Your setting also dictates the "optimal" threshold that moves a prediction from one class (_e.g._, Default = No) to another class (Default = Yes). The Bayes classifier suggests a probability threshold of 0.5. The Bayes classifier can't be beat in terms of .note[accuracy], but if you have goals other than accuracy, you should consider other thresholds. --- name: thresholds layout: false class: clear As we vary the threshold, our error rates (types .hi-purple[I], .hi-orange[II], and .hi-slate[overall]) change. <img src="slides_files/figure-html/plot-threshold-1.svg" style="display: block; margin: auto;" /> --- name: roc class: clear The .attn[ROC curve] plots the true- (TP/P) and the false-positive rates (FP/N). <img src="slides_files/figure-html/plot-roc-1.svg" style="display: block; margin: auto;" /> -- "Best performance" means the .pink[ROC curve] hugs the top-left corner. --- class: clear The .hi-orange[AUC] gives the .orange[area under the (ROC) curve]. <img src="slides_files/figure-html/plot-auc-1.svg" style="display: block; margin: auto;" /> -- "Best performance" means the .orange[AUC] is near 1. Random chance: 0.5 --- class: clear, middle .qa[Q] So what information is AUC telling us? .tran[.b[A] Nothing] --- class: clear, middle .qa[Q] So what information is AUC telling us? .qa[A] AUC tells us how much we've .b[separated] the .it[positive] and .it[negative] labels. --- layout: true class: clear --- exclude: true --- .ex[Example:] Distributions of probabilities for .hi-orange[negative] and .hi-purple[positive] outcomes. <img src="slides_files/figure-html/roc-ex1-d-1.svg" style="display: block; margin: auto;" /> --- For any given .hi-pink[threshold] <img src="slides_files/figure-html/roc-ex1-threshold-1.svg" style="display: block; margin: auto;" /> --- For any given .hi-pink[threshold], we get .hi-yellow[false positives] <img src="slides_files/figure-html/roc-ex1-threshold2-1.svg" style="display: block; margin: auto;" /> --- For any given .hi-pink[threshold], we get false positives and .hi-yellow[true positives]. <img src="slides_files/figure-html/roc-ex1-threshold3-1.svg" style="display: block; margin: auto;" /> --- ... moving through all possible thresholds generates the .hi-pink[ROC] (.hi-orange[AUC] ≈ 0.872). <img src="slides_files/figure-html/roc-ex1-roc-1.svg" style="display: block; margin: auto;" /> --- Increasing separation between .hi-orange[negative] and .hi-purple[positive] outcomes... <img src="slides_files/figure-html/roc-ex2-d-1.svg" style="display: block; margin: auto;" /> --- ... reduces error (shifts .hi-pink[ROC]) and increases .hi-orange[AUC] (≈ 0.994). <img src="slides_files/figure-html/roc-ex2-roc-1.svg" style="display: block; margin: auto;" /> --- Further increasing separation between .hi-orange[negative] and .hi-purple[positive] outcomes... <img src="slides_files/figure-html/roc-ex3-d-1.svg" style="display: block; margin: auto;" /> --- ... reduces error (shifts .hi-pink[ROC]) and increases .hi-orange[AUC] (≈ 1). <img src="slides_files/figure-html/roc-ex3-roc-1.svg" style="display: block; margin: auto;" /> --- Tiny separation ("guessing") between .hi-orange[negative] and .hi-purple[positive] outcomes... <img src="slides_files/figure-html/roc-ex4-d-1.svg" style="display: block; margin: auto;" /> --- ... increases error (shifts .hi-pink[ROC]) and pushes .hi-orange[AUC] toward 0.5 (here ≈ 0.523). <img src="slides_files/figure-html/roc-ex4-roc-1.svg" style="display: block; margin: auto;" /> --- Getting .hi-orange[negative] and .hi-purple[positive] outcomes backwards... <img src="slides_files/figure-html/roc-ex5-d-1.svg" style="display: block; margin: auto;" /> --- ... increases error (shifts .hi-pink[ROC]) and pushes .hi-orange[AUC] toward 0 (here ≈ 0.012). <img src="slides_files/figure-html/roc-ex5-roc-1.svg" style="display: block; margin: auto;" /> --- name: extras layout: false # R extras .b[AUC] You can calculate AUC in R using the [`roc_auc()` function from `yardstick`](https://yardstick.tidymodels.org/reference/roc_auc.html). See the documentation for examples. .b[Logistic elasticnet] `glmnet()` (for ridge , lasso, and elasticnet) extends to logistic regression.super[.pink[†]] by specifying the `family` argument of `glmnet`, _e.g._, ``` r # Example of logistic regression with lasso logistic_lasso = glmnet( y = y, x = x, family = "binomial", alpha = 1, lambda = best_lambda ) ``` You can also use the `"glmnet"` engine for `logistic_reg()` in `parsnip`. .footnote[ .pink[†] Or many other generalized linear models. ] --- name: sources layout: false # Sources These notes draw upon - [An Introduction to Statistical Learning](http://faculty.marshall.usc.edu/gareth-james/ISL/) (*ISL*)<br>James, Witten, Hastie, and Tibshirani - *[Receiver Operating Characteristic Curves Demystified (in Python)](https://towardsdatascience.com/receiver-operating-characteristic-curves-demystified-in-python-bd531a4364d0)* --- # Table of contents .col-left[ .smallest[ #### Admin - [Today](#admin-today) - [Upcoming](#admin-soon) #### Classification - [Introduction](#intro) - [Introductory examples](#examples) - [Why not linear regression](#no-regress) - [Linear probability models](#lpm) #### Logistic regression - [Intro](#logistic-intro) - [The logistic function](#logistic-logistic) - [Estimation](#logistic-estimation) - [Maximum likelihood](#logistic-mle) - [In R](#logistic-r) - [Prediction](#logistic-prediction) ] ] .col-right[ .smallest[ #### Assessment - [How did we do?](#how) - [The confusion matrix](#confusion) - [In R](#cm-r) - [Thresholds](#thresholds) - [ROC curves and AUC](#roc) #### Other - [Extras](#extras) - [Sources/references](#sources) ] ] --- exclude: true