--- title: "Lecture .mono[008]" subtitle: "Ensembles 🌲.smallest[🌲]🌲.smallest[🎄]🌲" author: "Edward Rubin" #date: "`r format(Sys.time(), '%d %B %Y')`" date: "25 February 2020" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- exclude: true ```{R, setup, include = F} library(pacman) p_load( ISLR, broom, tidyverse, ggplot2, ggthemes, ggforce, ggridges, cowplot, scales, rayshader, latex2exp, viridis, extrafont, gridExtra, plotly, ggformula, DiagrammeR, kableExtra, DT, huxtable, data.table, dplyr, snakecase, janitor, lubridate, knitr, caret, rpart, rpart.plot, rattle, here, magrittr, parallel ) # Define colors red_pink = "#e64173" turquoise = "#20B2AA" orange = "#FFA500" red = "#fb6107" blue = "#3b3b9a" green = "#8bb174" grey_light = "grey70" grey_mid = "grey50" grey_dark = "grey20" purple = "#6A5ACD" slate = "#314f4f" # Knitr options opts_chunk$set( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunk$set(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) options(knitr.table.format = "html") ``` --- name: admin # Admin ## Today - .note[Mini-survey] What are you missing? - .note[Topic] Ensembles (applied to decision trees) ## Upcoming .b[Readings] - .note[Today] .it[ISL] Ch. 8.2 - .note[Next] .it[ISL] Ch. 9 .b[Project] Project topic was due Friday. --- class: inverse, middle # Decision trees ## Review --- name: tree-review-fundamentals # Decision trees ## Fundamentals .attn[Decision trees] - split the .it[predictor space] (our $\mathbf{X}$) into regions - then predict the most-common value within a region -- .col-left[ .hi-purple[Regression trees] - .hi-slate[Predict:] Region's mean - .hi-slate[Split:] Minimize RSS - .hi-slate[Prune:] Penalized RSS ] -- .col-right[ .hi-pink[Classification trees] - .hi-slate[Predict:] Region's mode - .hi-slate[Split:] Min. Gini or entropy.super - .hi-slate[Prune:] Penalized error rate.super[🌴] ] .footnote[ 🌴 ... or Gini index or entropy ] -- .clear-up[ An additional nuance for .attn[classification trees:] we typically care about the .b[proportions of classes in the leaves]—not just the final prediction. ] --- class: clear ```{R, data-tree-example, include = F, cache = T} # Data rec_dt = rbindlist(list( data.table(1, 000, 010, 000, 100), data.table(2, 010, 035, 000, 043), data.table(3, 010, 035, 043, 100), data.table(4, 035, 090, 000, 085), data.table(5, 035, 090, 085, 100), data.table(6, 090, 100, 000, 027), data.table(7, 090, 100, 027, 055), data.table(8, 090, 100, 055, 100) )) setnames(rec_dt, c("r", "xmin", "xmax", "ymin", "ymax")) set.seed(13) rec_dt[, val := runif(8)] # Add labels rec_dt[, r_label := paste0("R[", r, "]")] # Remove ex_dt from memory rm(ex_dt) ``` .ex[Example] Each split in our tree creates .hi-purple[regions]. ```{R, plot-tree-example, echo = F, cache = T, dependson = "data-tree-example"} # Plot ggplot( data = rec_dt, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax) ) + geom_rect(fill = NA, color = purple) + xlab(expression(x[1])) + ylab(expression(x[2])) + geom_text( aes(x = (xmin + xmax)/2, y = (ymin + ymax)/2, label = r_label), size = 6.5, family = "Fira Sans Book", color = purple, parse = T ) + theme_minimal(base_size = 18, base_family = "Fira Sans Book") ``` --- class: clear .ex[Example] Each region has its own .b[predicted value]. ```{R, plot-tree-example2, echo = F, cache = T, dependson = "data-tree-example"} # Plot ggplot( data = rec_dt, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax) ) + geom_rect(aes(fill = val), color = "grey85", size = 0.5) + xlab(expression(x[1])) + ylab(expression(x[2])) + geom_text( aes(x = (xmin + xmax)/2, y = (ymin + ymax)/2, label = r_label, color = val > 0.75), size = 6.5, family = "Fira Sans Book", parse = T ) + scale_fill_viridis_c(option = "magma") + scale_color_manual(values = c("white", "black")) + theme_minimal(base_size = 18, base_family = "Fira Sans Book") + theme(legend.position = "none") ``` --- class: clear ```{R, plot-tree-example3, echo = F, cache = T, message = F, dependson = "data-tree-example"} # Plot gg_regions = ggplot( data = rec_dt, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax) ) + geom_rect(aes(fill = val, color = val), size = 0.5) + xlab(expression(x[1])) + ylab(expression(x[2])) + scale_fill_viridis_c(option = "magma") + scale_color_viridis_c(option = "magma") + theme_minimal(base_size = 18, base_family = "Fira Sans Book") + theme(legend.position = "none") # Pass to rayshader plot_gg( gg_regions, zoom = 0.55, theta = -15, phi = 45, width = 6, windowsize = c(1400, 866), # sunangle = 225, multicore = T ) render_snapshot(clear = TRUE) ``` --- name: tree-review-tradeoff # Decision trees ## Strengths and weaknesses As with any method, decision trees have tradeoffs. -- .col-left.purple.small[ .b[Strengths]
.b[+] Easily explained/interpretted
.b[+] Include several graphical options
.b[+] Mirror human decision making?
.b[+] Handle num. or cat. on LHS/RHS.super[🌳] ] .footnote[ 🌳 Without needing to create lots of dummy variables!
.tran[🌴 Blank] ] -- .col-right.pink.small[ .b[Weaknesses]
.b[-] Outperformed by other methods
.b[-] Struggle with linearity
.b[-] Can be very "non-robust" ] .clear-up[ .attn[Non-robust:] Small data changes can cause huge changes in our tree. ] -- .footnote[ .tran[🌴 Blank]
🌲 Forests! ] .note[Next:] Create ensembles of trees.super[🌲] to strengthen these weaknesses. -- .super[🌴] .footnote[ .tran[🌴 Blank]
.tran[🌲 Forests!] 🌴 Which will also weaken some of the strengths. ] --- layout: true # Ensemble methods --- class: inverse, middle --- name: intro ## Intro Rather than focusing on training a .b[single], highly accurate model,
.attn[ensemble methods] combine .b[many] low-accuracy models into a .it[meta-model]. -- .note[Today:] Three common methods for .b[combining individual trees] 1. .attn[Bagging] 1. .attn[Random forests] 1. .attn[Boosting] -- .b[Why?] While individual trees may be highly variable and inaccurate,
a combination of trees is often quite stable and accurate. -- .super[🌲] .footnote[ 🌲 We will lose interpretability. ] --- name: bag-intro ## Bagging .attn[Bagging] creates additional samples via [.hi[bootstrapping]](https://raw.githack.com/edrubin/EC524W20/master/lecture/003/003-slides.html#62). -- .qa[Q] How does bootstrapping help? -- .qa[A] .note[Recall:] Individual decision trees suffer from variability (.it[non-robust]). -- This .it[non-robustness] means trees can change .it[a lot] based upon which observations are included/excluded. -- We're essentially using many "draws" instead of a single one..super[🌴] .footnote[ 🌴 Recall that an estimator's variance typically decreases as the sample size increases. ] --- name: bag-algorithm ## Bagging .attn[Bootstrap aggregation] (bagging) reduces this type of variability. 1. Create $B$ bootstrapped samples 1. Train an estimator (tree) $\color{#6A5ACD}{\mathop{\hat{f^b}}(x)}$ on each of the $B$ samples 1. Aggregate across your $B$ bootstrapped models: $$ \begin{align} \color{#e64173}{\mathop{\hat{f}_{\text{bag}}}(x)} = \dfrac{1}{B}\sum_{b=1}^{B}\color{#6A5ACD}{\mathop{\hat{f^b}}(x)} \end{align} $$ This aggregated model $\color{#e64173}{\mathop{\hat{f}_{\text{bag}}}(x)}$ is your final model. --- ## Bagging trees When we apply bagging to decision trees, - we typically .hi-pink[grow the trees deep and do not prune] - for .hi-purple[regression], we .hi-purple[average] across the $B$ trees' regions - for .hi-purple[classification], we have more options—but often take .hi-purple[plurality] -- .hi-pink[Individual] (unpruned) trees will be very .hi-pink[flexible] and .hi-pink[noisy],
but their .hi-purple[aggregate] will be quite .hi-purple[stable]. -- The number of trees $B$ is generally not critical with bagging.
$B=100$ often works fine. --- name: bag-oob ## Out-of-bag error estimation Bagging also offers a convenient method for evaluating performance. -- For any bootstrapped sample, we omit ∼n/3 observations. .attn[Out-of-bag (OOB) error estimation] estimates the test error rate using observations .b[randomly omitted] from each bootstrapped sample. -- For each observation $i$: 1. Find all samples $S_i$ in which $i$ was omitted from training. 1. Aggregate the $|S_i|$ predictions $\color{#6A5ACD}{\mathop{\hat{f^b}}(x_i)}$, _e.g._, using their mean or mode 1. Calculate the error, _e.g._, $y_i - \mathop{\hat{f}_{i,\text{OOB},i}}(x_i)$ --- ## Out-of-bag error estimation When $B$ is big enough, the OOB error rate will be very close to LOOCV. -- .qa[Q] Why use OOB error rate? -- .qa[A] When $B$ and $n$ are large, cross validation—with any number of folds—can become pretty computationally intensive. --- name: bag-r ## Bagging in R We can use our old friend, the `caret` package, for bagging trees. -- .col-left[ .b[Option 1:] `method = "treebag"` - Applied to `train()` - No tuning parameter ] .col-right[ ```{R, eval = F} # Train a bagged tree model train( y ~ ., data = fake_df, method = "treebag", nbagg = 100, keepX = T, trControl = trainControl( method = "oob" ) ) ``` ] --- count: false ## Bagging in R We can use our old friend, the `caret` package, for bagging trees. .col-left[ .b[Option 1:] `method = "treebag"` - Applied to `train()` - No tuning parameter - `nbagg` = number of trees ] .col-right[ ```{R, eval = F} # Train a bagged tree model train( y ~ ., data = fake_df, method = "treebag", nbagg = 100, #<< keepX = T, trControl = trainControl( method = "oob" ) ) ``` ] --- count: false ## Bagging in R We can use our old friend, the `caret` package, for bagging trees. .col-left[ .b[Option 1:] `method = "treebag"` - Applied to `train()` - No tuning parameter - `nbagg` = number of trees - `keepX = T` is necessary ] .col-right[ ```{R, eval = F} # Train a bagged tree model train( y ~ ., data = fake_df, method = "treebag", nbagg = 100, keepX = T, #<< trControl = trainControl( method = "oob" ) ) ``` ] --- count: false ## Bagging in R We can use our old friend, the `caret` package, for bagging trees. .col-left[ .b[Option 1:] `method = "treebag"` - Applied to `train()` - No tuning parameter - `nbagg` = number of trees - `keepX = T` is necessary - `method = "oob"` for OOB error ] .col-right[ ```{R, eval = F} # Train a bagged tree model train( y ~ ., data = fake_df, method = "treebag", nbagg = 100, keepX = T, trControl = trainControl( #<< method = "oob" #<< ) #<< ) ``` ] -- .clear-up[ .b[Option 2:] `caret`'s `bag()` function extends bagging to many methods. ] --- ## Example: Bagging in R ```{R, load-data-heart, include = F, cache = T} # Read data heart_df = read_csv("Heart.csv") %>% dplyr::select(-X1) %>% rename(HeartDisease = AHD) %>% clean_names() # Impute missing values heart_df %<>% preProcess(method = "medianImpute") %>% predict(newdata = heart_df) %>% mutate(thal = if_else(is.na(thal), "normal", thal)) ``` .col-left[
With OOB-based error ```{R, ex-bag-oob, cache = T, dependson = "load-data-heart"} # Set the seed set.seed(12345) # Train the bagged trees heart_bag = train( heart_disease ~ ., data = heart_df, method = "treebag", nbagg = 100, keepX = T, trControl = trainControl( method = "oob" #<< ) ) ``` ] .col-right[
With CV-based error ```{R, ex-bag-cv, eval = F} # Set the seed set.seed(12345) # Train the bagged trees heart_bag_cv = train( heart_disease ~ ., data = heart_df, method = "treebag", nbagg = 100, keepX = T, trControl = trainControl( method = "cv", #<< number = 5 #<< ) ) ``` ] --- exclude: true ```{R, sim-bag-size, cache = T} # Set the seed set.seed(12345) # Train the bagged trees bag_oob = mclapply( X = 2:300, mc.cores = 12, FUN = function(n) { train( heart_disease ~ ., data = heart_df, method = "treebag", nbagg = n, keepX = T, trControl = trainControl( method = "oob" ) )$results$Accuracy %>% data.frame(accuracy = ., n_trees = n) } ) %>% bind_rows() # Train the bagged trees bag_cv = mclapply( X = 2:300, mc.cores = 12, FUN = function(n) { train( heart_disease ~ ., data = heart_df, method = "treebag", nbagg = n, keepX = T, trControl = trainControl( method = "cv", number = 5 ) )$results$Accuracy %>% data.frame(accuracy = ., n_trees = n) } ) %>% bind_rows() ``` --- layout: false class: clear .b[Bagging and the number of trees] ```{R, plot-bag, echo = F, cache = T} ggplot( data = bind_rows( bag_oob %>% mutate(type = "Bagged, OOB"), bag_cv %>% mutate(type = "Bagged, CV") ), aes(x = n_trees, y = accuracy, color = type) ) + geom_line() + scale_y_continuous("Accuracy", labels = scales::percent) + scale_x_continuous("Number of trees") + scale_color_manual("[Method, Estimate]", values = c(red_pink, purple)) + theme_minimal(base_size = 20, base_family = "Fira Sans Book") + theme(legend.position = "bottom") + coord_cartesian(ylim = c(0.60, 0.90)) ``` --- name: bag-var # Ensemble methods ## Variable importance While ensemble methods tend to .hi[improve predictive performance],
they also tend .hi[reduce interpretability]. -- We can illustrate .attn[variables' importance] by considering their splits' reductions in the model's performance metric (RSS, Gini, entropy, _etc._)..super[🌳] .footnote[ 🌳 This idea isn't exclusive to bagging/ensembles—we can (and do) apply it to a single tree. ] -- In R, we can use `caret`'s `varImp()` function to calculate variable important. .note[Note] By default, `varImp()` will scale improtance between 0 and 100. --- class: clear ```{R, ex-var-importance, include = F, cache = T, dependson = "ex-bag-oob"} # Get importance bag_imp = varImp(heart_bag, scale = F) # Convert to data frame imp_df = tibble( variable = row.names(bag_imp$importance), importance = bag_imp$importance ) %>% mutate( variable = if_else(str_detect(variable, "thal"), "thal", variable), variable = if_else(str_detect(variable, "chest_pain"), "chest_pain", variable) ) %>% group_by(variable) %>% summarize(importance = sum(importance)) %>% mutate(importance = importance - min(importance)) %>% mutate(importance = 100 * importance / max(importance)) ``` .hi-pink[Variable importance] from our bagged tree model. ```{R, plot-var-importance, echo = F, dependson = "ex-var-importance"} # Plot importance ggplot( data = imp_df, aes(x = reorder(variable, -importance), y = importance) ) + geom_col(fill = red_pink) + geom_hline(yintercept = 0) + xlab("Variable") + ylab("Importance (scaled)") + # scale_fill_viridis_c(option = "magma", direction = -1) + theme_minimal(base_size = 20, base_family = "Fira Sans Book") + theme(legend.position = "none") + coord_flip() ``` --- name: bag-weak # Ensemble methods ## Bagging Bagging has one additional shortcoming... If one variable dominates other variables, the .hi[trees will be very correlated]. -- If the trees are very correlated, then bagging loses its advantage. -- .note[Solution] We should make the trees less correlated. --- layout: true # Ensemble methods --- name: rf-intro ## Random forests .attn[Random forests] improve upon bagged trees by .it[decorrelating] the trees. -- In order to decorrelate its trees, a .attn[random forest] only .pink[considers a random subset of] $\color{#e64173}{m\enspace (\approx\sqrt{p})}$ .pink[predictors] when making each split (for each tree). -- Restricting the variables our tree sees at a given split -- - nudges trees away from always using the same variables, -- - increasing the variation across trees in our forest, -- - which potentially reduces the variance of our estimates. -- If our predictors are very correlated, we may want to shrink $m$. --- ## Random forests Random forests thus introduce .b[two dimensions of random variation] 1. the .b[bootstrapped sample] 2. the $m$ .b[randomly selected predictors] Everything else about random forests works just as it did with bagging..super[🎄] .footnote[ 🎄 And just as it did with plain, old decision trees. ] --- name: rf-r ## Random forests in R You have .it[many] [options](http://topepo.github.io/caret/train-models-by-tag.html#Random_Forest) for training random forests in R.
_E.g._, `party`, `Rborist`, `ranger`, `randomForest`. `caret` offers access to each of these packages via `train`. -- - _E.g._, `method = "rf"` or `method = "ranger"` -- - The argument `mtry` gives the number of predictors at each split..super[🌲] .footnote[ 🌲 `predFixed` for `Rborist`. ] -- - Some methods have additional parameters, _e.g._, `ranger` needs - minimal node size `min.node.size` - a splitting rule `splitrule`. --- layout: true # Ensemble methods Training a random forest in R using `caret`... --- .col-left[ ... and `ranger` ] .col-right[ ```{R, ex-ranger, cache = T} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = 100, trControl = trainControl( method = "oob" ), tuneGrid = expand.grid( "mtry" = 2:13, "splitrule" = "gini", "min.node.size" = 1:10 ) ) ``` ] --- count: false .col-left[ ... and `ranger` - Specify `"ranger"` for method ] .col-right[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", #<< num.trees = 100, trControl = trainControl( method = "oob" ), tuneGrid = expand.grid( "mtry" = 2:13, "splitrule" = "gini", "min.node.size" = 1:10 ) ) ``` ] --- count: false .col-left[ ... and `ranger` - Specify `"ranger"` for method - Number of trees: `num.trees` ] .col-right[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = 100, #<< trControl = trainControl( method = "oob" ), tuneGrid = expand.grid( "mtry" = 2:13, "splitrule" = "gini", "min.node.size" = 1:10 ) ) ``` ] --- count: false .col-left[ ... and `ranger` - Specify `"ranger"` for method - Number of trees: `num.trees` - We can still use OOB for error ] .col-right[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = 100, trControl = trainControl( method = "oob" #<< ), tuneGrid = expand.grid( "mtry" = 2:13, "splitrule" = "gini", "min.node.size" = 1:10 ) ) ``` ] --- count: false .col-left[ ... and `ranger` - Specify `"ranger"` for method - Number of trees: `num.trees` - We can still use OOB for error - Parameters to choose/train 1. $m$, # of predictors at a split ] .col-right[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = 100, trControl = trainControl( method = "oob" ), tuneGrid = expand.grid( "mtry" = 2:13, #<< "splitrule" = "gini", "min.node.size" = 1:10 ) ) ``` ] --- count: false .col-left[ ... and `ranger` - Specify `"ranger"` for method - Number of trees: `num.trees` - We can still use OOB for error - Parameters to choose/train 1. $m$, # of predictors at a split 1. the rule for splitting ] .col-right[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = 100, trControl = trainControl( method = "oob" ), tuneGrid = expand.grid( "mtry" = 2:13, "splitrule" = "gini", #<< "min.node.size" = 1:10 ) ) ``` ] --- count: false .col-left[ ... and `ranger` - Specify `"ranger"` for method - Number of trees: `num.trees` - We can still use OOB for error - Parameters to choose/train 1. $m$, # of predictors at a split 1. the rule for splitting 1. minimum size for a leaf ] .col-right[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_forest = train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = 100, trControl = trainControl( method = "oob" ), tuneGrid = expand.grid( "mtry" = 2:13, "splitrule" = "gini", "min.node.size" = 1:10 #<< ) ) ``` ] --- layout: false class: clear .b[Accuracy] (OOB) across the grid of our parameters. ```{R, plot-rf-parameters, echo = F} ggplot( data = heart_forest$results, aes(x = mtry, y = min.node.size, fill = Accuracy) ) + geom_tile(color = "white", size = 0.3) + xlab("Number of variables at split (m)") + ylab("Min. leaf size") + scale_fill_viridis_c("Accuracy", option = "magma", labels = percent) + theme_minimal(base_size = 20, base_family = "Fira Sans Book") + theme( legend.position = "bottom", legend.key.width = unit(3, "cm") ) ``` --- class: clear exclude: true .col-left[ ```{R, sim-forest-size, cache = T} # Set the seed set.seed(12345) # Train the bagged trees rf_oob = mclapply( X = 2:300, mc.cores = 12, FUN = function(n) { train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = n, trControl = trainControl( method = "oob" ), tuneGrid = data.frame( "mtry" = 2, "splitrule" = "gini", "min.node.size" = 4 ) )$finalModel$prediction.error %>% subtract(1, .) %>% data.frame(accuracy = ., n_trees = n) } ) %>% bind_rows() ``` ] .col-right[ ```{R, sim-forest-size2, cache = T} # Set seed set.seed(6789) # Train the bagged trees rf_cv = mclapply( X = 2:300, mc.cores = 12, FUN = function(n) { train( heart_disease ~ ., data = heart_df, method = "ranger", num.trees = n, trControl = trainControl( method = "cv", number = 5 ), tuneGrid = data.frame( "mtry" = 2, "splitrule" = "gini", "min.node.size" = 4 ) )$finalModel$prediction.error %>% subtract(1, .) %>% data.frame(accuracy = ., n_trees = n) } ) %>% bind_rows() ``` ] --- class: clear .b[Tree ensembles and the number of trees] ```{R, plot-bag-rf, echo = F} ggplot( data = bind_rows( bag_oob %>% mutate(type = "Bagged, OOB"), bag_cv %>% mutate(type = "Bagged, CV"), rf_oob %>% mutate(type = "Random forest, OOB"), rf_cv %>% mutate(type = "Random forest, CV") ), aes(x = n_trees, y = accuracy, color = type) ) + geom_line() + scale_y_continuous("Accuracy", labels = scales::percent) + scale_x_continuous("Number of trees") + scale_color_manual( "[Method, Estimate]", values = c(red_pink, purple, orange, slate) ) + theme_minimal(base_size = 20, base_family = "Fira Sans Book") + theme(legend.position = "bottom") + coord_cartesian(ylim = c(0.60, 0.90)) ``` --- layout: true # Ensemble methods --- name: boost-intro ## Boosting So far, the elements of our ensembles have been acting independently:
any single tree knows nothing about the rest of the forest. -- .attn[Boosting] allows trees to pass on information to eachother. -- Specifically, .attn[boosting] trains its trees.super[🌲] .it[sequentially]—each new tree trains on the residuals (mistakes) from its predecessors. .footnote[ 🌲 As with bagging, boosting can be applied to many methods (in addition to trees). ] -- - We add each new tree to our model $\hat{f}$ (and update our residuals). - Trees are typically small—slowly improving $\hat{f}$ .it[where it struggles]. --- name: boost-param ## Boosting Boosting has three .hi[tuning parameters]. 1. The .hi[number of trees] $\color{#e64173}{B}$ can be important to prevent overfitting. -- 1. The .hi[shrinkage parameter] $\color{#e64173}{\lambda}$, which controls boosting's .it[learning rate] (often 0.01 or 0.001). -- 1. The .hi[number of splits] $\color{#e64173}{d}$ in each tree (trees' complexity). -- - Individaul trees are typically short—often $d=1$ ("stumps"). - .note[Remember] Trees learn from predecessors' mistakes,
so no single tree needs to offer a perfect model. --- name: boost-alg ## How to boost .hi-purple[Step 1:] Set $\color{#6A5ACD}{\mathop{\hat{f}}}(x) = 0$, which yields residuals $r_i = y_i$ for all $i$. -- .hi-pink[Step 2:] For $\color{#e64173}{b} = 1,\,2\,\ldots,\, B$ do: .move-right[ .b[A.] Fit a tree $\color{#e64173}{\hat{f^b}}$ with $d$ splits. ] -- .move-right[ .b[B.] Update the model $\color{#6A5ACD}{\hat{f}}$ with "shrunken version" of new treee $\color{#e64173}{\hat{f^b}}$ ] $$ \begin{align} \color{#6A5ACD}{\mathop{\hat{f}}}(x) \leftarrow \color{#6A5ACD}{\mathop{\hat{f}}}(x) + \lambda \mathop{\color{#e64173}{\hat{f^b}}}(x) \end{align} $$ -- .move-right[ .b[C.] Update the residuals: $r_i \leftarrow r_i - \lambda \mathop{\color{#e64173}{\hat{f^b}}}(x)$. ] -- .hi-orange[Step 3:] Output the boosted model: $\mathop{\color{#6A5ACD}{\hat{f}}}(x) = \sum_{b} \lambda \mathop{\color{#e64173}{\hat{f^b}}}(x)$. --- name: boost-r ## Boosting in R We will use `caret`'s `method = "gbm"` to train boosted trees..super[🌴] .footnote[ 🌴 This method uses the `gbm` package. ] `gbm` needs the three standard parameters of boosted trees—plus one more: 1. `n.trees`, the number of trees $(B)$ 1. `interaction.depth`, trees' depth (max. splits from top) 1. `shrinkage`, the learning rate $(\lambda)$ 1. `n.minobsinnode`, minimum observations in a terminal node --- exclude: true ```{R, ex-boost, cache = T, message = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(1, 300, by = 1), "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 ) ) ``` --- ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 ) ) ``` ] --- count: false ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", #<< trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 ) ) ``` ] .col-right.pad-top[
- boosted trees via `gbm` package ] --- count: false ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", #<< number = 5 #<< ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 ) ) ``` ] .col-right.pad-top[
- boosted trees via `gbm` package - cross validation now (no OOB) ] --- count: false ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), #<< "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 ) ) ``` ] .col-right.pad-top[
- boosted trees via `gbm` package - cross validation now (no OOB) - CV-search of parameter grid - number of trees ] --- count: false ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), "interaction.depth" = 1:3, #<< "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 ) ) ``` ] .col-right.pad-top[
- boosted trees via `gbm` package - cross validation now (no OOB) - CV-search of parameter grid - number of trees - tree depth (complexity) ] --- count: false ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), #<< "n.minobsinnode" = 5 ) ) ``` ] .col-right.pad-top[
- boosted trees via `gbm` package - cross validation now (no OOB) - CV-search of parameter grid - number of trees - tree depth (complexity) - shrinkage (learing rate) ] --- count: false ## Boosting in R .col-left.pad-top[ ```{R, eval = F} # Set the seed set.seed(12345) # Train the random forest heart_boost = train( heart_disease ~ ., data = heart_df, method = "gbm", trControl = trainControl( method = "cv", number = 5 ), tuneGrid = expand.grid( "n.trees" = seq(25, 200, by = 25), "interaction.depth" = 1:3, "shrinkage" = c(0.1, 0.01, 0.001), "n.minobsinnode" = 5 #<< ) ) ``` ] .col-right.pad-top[
- boosted trees via `gbm` package - cross validation now (no OOB) - CV-search of parameter grid - number of trees - tree depth (complexity) - shrinkage (learing rate) - minimum leaf size
(not searching here) ] --- layout: false class: clear .b[Comparing boosting parameters]—notice the rates of learning ```{R, plot-boost-param, echo = F} ggplot( data = heart_boost$results %>% mutate(grp = paste(shrinkage, interaction.depth, sep = ", ")), aes( x = n.trees, y = Accuracy, color = as.character(interaction.depth), linetype = as.character(shrinkage) ) ) + geom_vline(xintercept = 204, size = 1.3, alpha = 0.3, color = red_pink) + geom_line(size = 0.4) + scale_y_continuous("Accuracy", labels = percent) + scale_x_continuous("Number of trees") + scale_color_viridis_d("Tree depth", option = "magma", end = 0.85) + scale_linetype_manual("Shrinkage", values = c("longdash", "dotted", "solid")) + theme_minimal(base_size = 18, base_family = "Fira Sans Book") ``` --- class: clear .b[Tree ensembles and the number of trees] ```{R, plot-bag-rf-boost, echo = F} ggplot( data = bind_rows( bag_oob %>% mutate(type = "Bagged, OOB"), bag_cv %>% mutate(type = "Bagged, CV"), rf_oob %>% mutate(type = "RF, OOB"), rf_cv %>% mutate(type = "RF, CV"), heart_boost$results %>% filter( shrinkage == 0.1 & interaction.depth == 1 & between(n.trees, 2, 300) ) %>% transmute(accuracy = Accuracy, n_trees = n.trees, type = "Boosted, CV") ), aes(x = n_trees, y = accuracy, color = type, size = type) ) + geom_line() + scale_y_continuous("Accuracy", labels = scales::percent) + scale_x_continuous("Number of trees") + scale_color_manual( "[Method, Estimate]", values = c(red_pink, purple, turquoise, orange, slate) ) + scale_size_manual( "[Method, Estimate]", values = c(0.25, 0.25, 0.7, 0.25, 0.25) ) + theme_minimal(base_size = 18, base_family = "Fira Sans Book") + theme(legend.position = "bottom") + coord_cartesian(ylim = c(0.60, 0.90)) ``` --- name: sources layout: false # Sources These notes draw upon - [An Introduction to Statistical Learning](http://faculty.marshall.usc.edu/gareth-james/ISL/) (*ISL*)
James, Witten, Hastie, and Tibshirani --- # Table of contents .col-left[ .smallest[ #### Admin - [Today and upcoming](#admin) #### Decision trees 1. [Fundamentals](#tree-review-fundamentals) 1. [Strengths and weaknesses](#tree-review-tradeoff) #### Other - [Sources/references](#sources) ] ] .col-right[ .smallest[ #### Ensemble methods 1. [Introduction](#intro) 1. [Bagging](#bag-intro) - [Introduction](#bag-intro) - [Algorithm](#bag-algorithm) - [Out-of-bag](#bag-oob) - [In R](#bag-r) - [Variable importance](#bag-var) 1. [Random forests](#rf-intro) - [Introduction](#rf-intro) - [In R](#rf-r) 1. [Boosting](#boost-intro) - [Introduction](#boost-intro) - [Parameters](#boost-param) - [Algorithm](#boost-alg) - [In R](#boost-r) ] ]