class: center, middle, inverse, title-slide # Lecture .mono[001] ## Statistical learning: Foundations ### Edward Rubin ### 14 January 2020 --- exclude: true --- layout: true # Admin --- class: inverse, middle --- name: admin-today ## Today .hi-slate[In-class] - .note[Course website:] [https://github.com/edrubin/EC524W20/](https://github.com/edrubin/EC524W20/) - .note[Resources] - [RStudio](https://education.rstudio.com/learn/) cheatsheets, books, and tutorials - [UO library](http://uoregon.libcal.com/calendar/dataservices/?cid=11979&t=g&d=0000-00-00&cal=11979,11173) - See course page for more... - Formalizing statistical learning, notation, goals (and problems) --- layout: false class: clear, middle <img src="images/eugene-r.png" width="2373" style="display: block; margin: auto;" /> .smaller[[Tweet](https://twitter.com/ryann_crowley/status/1216880767072002048); [h/t: Grant McDermott](https://grantmcdermott.com/)] --- name: admin-soon # Admin ## Upcoming .hi-slate[Readings] - .note[Today] - .it[ISL] Ch1–Ch2 - [Prediction Policy Problems](https://www.aeaweb.org/articles?id=10.1257/aer.p20151023) by Kleinberg .it[et al.] (2015) - .note[Next] - .it[ISL] Ch. 3–4 .hi-slate[Problem set] Likely assigned Thursday and due Tuesday. --- layout: true # Statistical learning --- class: inverse, middle --- name: sl-definition ## What is it? -- .hi[Statistical learning] is a .attn[set of tools] developed .attn[to understand/model data]. -- Examples - .hi-slate[Regression analysis] quantifies the relationship between an outcome and a set of explanatory variables—most usefully in a causal setting. -- - .hi-slate[Exploratory data analysis] (EDA) is a preliminary, often graphical, "exploration" of data to understand levels, variation, missingess, *etc.* -- - .hi-slate[Classification trees] search through explanatory variables, splitting along the most "predictive" dimensions (random forests extend trees). -- - .hi-slate[Regression trees] extend *classification trees* to numerical outcomes (random forests extend, as well). -- - .hi-slate[K-means clustering] partitions observations into K groups (clusters) based upon a set of variables. --- name: sl-classes ## What is it good for? -- A lot of things. -- We tend to break statistical-learning into two(-ish) classes: 1. .hi-slate[Supervised learning] builds ("learns") a statistical model for predicting an .hi-orange[output] `\(\left( \color{#FFA500}{\mathbf{y}} \right)\)` given a set of .hi-purple[inputs] `\(\left( \color{#6A5ACD}{\mathbf{x}_{1},\, \ldots,\, \mathbf{x}_{p}} \right)\)`, -- _i.e._, we want to build a model/function `\(\color{#20B2AA}{f}\)` `$$\color{#FFA500}{\mathbf{y}} = \color{#20B2AA}{f}\!\left( \color{#6A5ACD}{\mathbf{x}_{1},\, \ldots,\, \mathbf{x}_{p}} \right)$$` that accurately describes `\(\color{#FFA500}{\mathbf{y}}\)` given some values of `\(\color{#6A5ACD}{\mathbf{x}_{1},\, \ldots,\, x_{p}}\)`. -- 2. .hi-slate[Unsupervised learning] learns relationships and structure using only .hi-purple[inputs] `\(\left( \color{#6A5ACD}{x_{1},\, \ldots,\, x_{p}} \right)\)` without any *supervising* output -- —letting the data "speak for itself." --- layout: false class: clear, middle .hi-slate[Semi-supervised learning] falls somewhere between these supervised and unsupervised learning—generally applied to supervised tasks when labeled .hi-orange[outputs] are incomplete. --- class: clear, middle <img src="images/comic-learning.jpg" width="5461" style="display: block; margin: auto;" /> .it[.smaller[[Source](https://twitter.com/athena_schools/status/1063013435779223553)]] --- layout: true # Statistical learning --- ## Output We tend to further break .hi-slate[supervised learning] into two groups, based upon the .hi-orange[output] (the .orange[outcome] we want to predict): -- 1. .hi-slate[Classification tasks] for which the values of `\(\color{#FFA500}{\mathbf{y}}\)` are discrete categories <br>*E.g.*, race, sex, loan default, hazard, disease, flight status 2. .hi-slate[Regression tasks] in which `\(\color{#FFA500}{\mathbf{y}}\)` takes on continuous, numeric values. <br>*E.g.*, price, arrival time, number of emails, temperature .note[Note.sub[1]] The use of .it[regression] differs from our use of .it[linear regression]. -- .note[Note.sub[2]] Don't get tricked: Not all numbers represent continuous, numerical values—_e.g._, zip codes, industry codes, social security numbers..super[.pink[†]] .footnote[ .pink[†] .qa[Q] Where would you put responses to 5-item Likert scales? ] --- ## Why *Learning*? .qa[Q] What puts the "learning" in statistical/machine learning? -- .qa[A] Most learning models/algorithms will .attn[tune model parameters] based upon the observed dataset—learning from the data. --- layout: true # Notation --- name: notation-source class: inverse, middle Our class will typically follow the notation and definitions of [.it[ISL]](http://faculty.marshall.usc.edu/gareth-james/ISL/). --- name: notation-data ## Data `\(\color{#e64173}{n}\)` gives the .pink[number of observations] `\(\color{#6A5ACD}{p}\)` represents the .purple[number of variables] available for predictions -- `\(\mathbf{X}\)` is our `\(\color{#e64173}{n}\times\color{#6A5ACD}{p}\)` matrix of predictors - .note[Other names] ***features***, *inputs*, *independent/explanatory variables*, ... - `\(x_{\color{#e64173}{i},\color{#6A5ACD}{j}}\)` is observation `\(\color{#e64173}{i}\)` (in `\(\color{#e64173}{1,\ldots,n}\)`) on variable `\(\color{#6A5ACD}{j}\)` (for `\(\color{#6A5ACD}{j}\)` in `\(\color{#6A5ACD}{1,...,p}\)`) -- $$ `\begin{align} \mathbf{X} = \begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,\color{#6A5ACD}{p}} \\ x_{2,1} & x_{2,2} & \cdots & x_{2,\color{#6A5ACD}{p}} \\ \vdots & \vdots & \ddots & \vdots \\ x_{\color{#e64173}{n},1} & x_{\color{#e64173}{n},2} & \cdots & x_{\color{#e64173}{n},\color{#6A5ACD}{p}} \end{bmatrix} \end{align}` $$ --- name: notation-dimensions ## Dimensions of `\(\mathbf{X}\)` Now let us split our `\(\mathbf{X}\)` matrix of predictors by its two dimensions. -- .col-left[ .hi-pink[Observation] `\(\color{#e64173}{i}\)` is a `\(\color{#6A5ACD}{p}\)`-length vector $$ `\begin{align} x_{\color{#e64173}{i}} = \begin{bmatrix} x_{\color{#e64173}{i},\color{#6A5ACD}{1}} \\ x_{\color{#e64173}{i},\color{#6A5ACD}{2}} \\ \vdots \\ x_{\color{#e64173}{i},\color{#6A5ACD}{p}} \end{bmatrix} \end{align}` $$ ] -- .col-right[ .hi-purple[Variable] `\(\color{#6A5ACD}{j}\)` is a `\(\color{#e64173}{n}\)`-length vector $$ `\begin{align} \mathbf{x}_{\color{#6A5ACD}{j}} = \begin{bmatrix} x_{\color{#e64173}{1},\color{#6A5ACD}{j}} \\ x_{\color{#e64173}{2},\color{#6A5ACD}{j}} \\ \vdots \\ x_{\color{#e64173}{n},\color{#6A5ACD}{j}} \end{bmatrix} \end{align}` $$ ] -- Applied to .mono[R]: - `dim(x_df)` = `\(\color{#e64173}{n}\)` `\(\color{#6A5ACD}{p}\)` - `nrow(x_df)` `\(= \color{#e64173}{n}\)`; `ncol(x_df)` `\(= \color{#6A5ACD}{p}\)` - `x_df[1,]` `\(\left( \color{#e64173}{i = 1} \right)\)`; `x_df[,1]` `\(\left( \color{#6A5ACD}{j = 1} \right)\)` --- name: notation-outcomes ## Outcomes In supervised settings, we will denote our .hi-orange[outcome variable] as `\(\color{#FFA500}{\mathbf{y}}\)`. .note[Synonyms] *output*, *outcome*, *dependent/response variable*, ... -- The .orange[outcome] for our .pink[i.super[th]] obsevation is `\(\color{#FFA500}{y}_{\color{#e64173}{i}}\)`. Together the `\(\color{#e64173}{n}\)` observations form $$ `\begin{align} \color{#FFA500}{\mathbf{y}} = \begin{bmatrix} y_{\color{#e64173}{1}} \\ y_{\color{#e64173}{2}} \\ \vdots \\ y_{\color{#e64173}{n}} \end{bmatrix} \end{align}` $$ -- and our full dataset is composed of `\(\bigg\{ \left( x_{\color{#e64173}{1}},\color{#FFA500}{y}_{\color{#e64173}{1}} \right),\, \left( x_{\color{#e64173}{2}},\color{#FFA500}{y}_{\color{#e64173}{2}} \right),\, \ldots,\, \left( x_{\color{#e64173}{n}},\color{#FFA500}{y}_{\color{#e64173}{n}} \right) \bigg\}\)`. --- layout: false class: clear, middle Back to the problem of (supervised) statistical learning... --- layout: true # Statistical learning --- name: sl-goal ## The goal As defined before, we want to *learn* a model to understand our data. -- 1. Take our (numeric) .orange[output] `\(\color{#FFA500}{\mathbf{y}}\)`. 2. Imagine there is a .turquoise[function] `\(\color{#20B2AA}{f}\)` that takes .purple[inputs] `\(\color{#6A5ACD}{\mathbf{X}} = \color{#6A5ACD}{\mathbf{x}_1}, \ldots, \color{#6A5ACD}{\mathbf{x}_p}\)` <br>and maps them, plus a random, mean-zero .pink[error term] `\(\color{#e64173}{\varepsilon}\)`, to the .orange[output]. `$$\color{#FFA500}{\mathbf{y}} = \color{#20B2AA}{f} \! \left( \color{#6A5ACD}{\mathbf{X}} \right) + \color{#e64173}{\varepsilon}$$` -- .qa[Q] What is `\(\color{#20B2AA}{f}\)`? -- <br>.qa[A] .note[ISL:] `\(\color{#20B2AA}{f}\)` represents the *systematic* information that `\(\color{#6A5ACD}{\mathbf{X}}\)` provides about `\(\color{#FFA500}{\mathbf{y}}\)`. -- .qa[Q] How else can you describe `\(\color{#20B2AA}{f}\)`? --- ## Our missing `\(f\)` `$$\color{#FFA500}{\mathbf{y}} = \color{#20B2AA}{f} \! \left( \color{#6A5ACD}{\mathbf{X}} \right) + \color{#e64173}{\varepsilon}$$` .qa[Q] `\(\color{#20B2AA}{f}\)` is unknown (as is `\(\color{#e64173}{\varepsilon}\)`). What should we do? -- <br> .qa[A] Use the observed data to learn/estimate `\(\color{#20B2AA}{f}(\cdot)\)`, _i.e._, construct `\(\widehat{\color{#20B2AA}{f}}\)`..super[.pink[†]] .footnote[ .pink[†] More notation: hats `\(\left( \hat{} \right)\)` are estimators/estimates. ] -- .qa[Q] Okay. How? -- <br> .qa[A] .it[How do I estimate] `\(\color{#20B2AA}{f}\)`.it[?] is one way to phrase *all questions* that underly statistical learning—model selection, cross validation, evaluation, *etc.* -- All of the techniques, algorithms, tools of stat. learning attempt to accurately recover `\(\color{#20B2AA}{f}\)` based upon the settings' goals/limitations. -- .grey-light[You'll have to wait on any real/specific answers...] --- ## Learning from `\(\hat{f}\)` There are two main reasons we want to learn about `\(\color{#20B2AA}{f}\)` 1. .hi-slate[*Causal* inference settings] How do changes in `\(\color{#6A5ACD}{\mathbf{X}}\)` affect `\(\color{#FFA500}{\mathbf{y}}\)`? <br> .grey-light[The territory of .mono[EC523] and .mono[EC525].] -- 1. .hi-slate[Prediction problems] Predict `\(\color{#FFA500}{\mathbf{y}}\)` using our estimated `\(\color{#20B2AA}{f}\)`, _i.e._, `$$\hat{\color{#FFA500}{\mathbf{y}}} = \hat{\color{#20B2AA}{f}}\!(\color{#6A5ACD}{\mathbf{X}})$$` our *black-box setting* where we care less about `\(\color{#20B2AA}{f}\)` than `\(\hat{\color{#FFA500}{\mathbf{y}}}\)`..super[.pink[†]] .footnote[ .pink[†] You shouldn't actually treat your prediction methods as total black boxes. ] -- Similarly, in causal-inference settings, we don't particulary care about `\(\hat{\color{#FFA500}{\mathbf{y}}}\)`. --- name: sl-prediction ## Prediction errors As tends to be the case in life, you will make errors in predicting `\(\color{#FFA500}{\mathbf{y}}\)`. The accuracy of `\(\hat{\color{#FFA500}{\mathbf{y}}}\)` depends upon .hi-slate[two errors]: -- 1. .hi-slate[Reducible error] The error due to `\(\hat{\color{#20B2AA}{f}}\)` imperfectly estimating `\(\color{#20B2AA}{f}\)`. <br>*Reducible* in the sense that we could improve `\(\hat{\color{#20B2AA}{f}}\)`. -- 1. .hi-slate[Irreducible error] The error component that is outside of the model `\(\color{#20B2AA}{f}\)`. <br>*Irreducible* because we defined an error term `\(\color{#e64173}{\varepsilon}\)` unexplained by `\(\color{#20B2AA}{f}\)`. -- .note[Note] As its name implies, you can't get rid of .it[irreducible] error—but we can try to get rid of .it[reducible] errors. --- ## Prediction errors Why we're stuck with .it[irreducible] error $$ `\begin{aligned} \mathop{E}\left[ \left\{ \color{#FFA500}{\mathbf{y}} - \hat{\color{#FFA500}{\mathbf{y}}} \right\}^2 \right] &= \mathop{E}\left[ \left\{ \color{#20B2AA}{f}(\color{#6A5ACD}{\mathbf{X}}) + \color{#e64173}{\varepsilon} + \hat{\color{#20B2AA}{f}}(\color{#6A5ACD}{\mathbf{X}}) \right\}^2 \right] \\ &= \underbrace{\left[ \color{#20B2AA}{f}(\color{#6A5ACD}{\mathbf{X}}) - \hat{\color{#20B2AA}{f}}(\color{#6A5ACD}{\mathbf{X}}) \right]^2}_{\text{Reducible}} + \underbrace{\mathop{\text{Var}} \left( \color{#e64173}{\varepsilon} \right)}_{\text{Irreducible}} \end{aligned}` $$ In less math: - If `\(\color{#e64173}{\varepsilon}\)` exists, then `\(\color{#6A5ACD}{\mathbf{X}}\)` cannot perfectly explain `\(\color{#FFA500}{\mathbf{y}}\)`. - So even if `\(\hat{\color{#20B2AA}{f}} = \color{#20B2AA}{f}\)`, we still have irreducible error. -- Thus, to form our .hi-slate[best predictors], we will .hi-slate[minimize reducible error]. --- name: sl-parameters ## Which type of `\(\hat{f}\)`? Once you have your .purple[inputs] `\(\left(\color{#6A5ACD}{\mathbf{X}} \right)\)` and .orange[output] `\(\left( \color{#FFA500}{\mathbf{y}} \right)\)` data, you still need to decide how parametric your `\(\hat{\color{#20B2AA}{f}}\)` should be..super[.pink[†]] .footnote[ .pink[†] I'm saying "how parametric" b/c some methods are much more parametric than others. ] -- .hi-slate[Parametric methods] assume a function typically involve two steps 1. Select a functional form (shape) to represent `\(\color{#20B2AA}{f}\)` 2. Train your selected model on your data `\(\color{#FFA500}{\mathbf{y}}\)` and `\(\color{#6A5ACD}{\mathbf{X}}\)`. -- .hi-slate[Non-parametric methods] avoid explicit assumption about the shape of `\(\color{#20B2AA}{f}\)`. <br> Attempt to .pink[flexibly fit] the data, while trying to .pink[avoid overfitting]. --- ## Which type of `\(\hat{f}\)`? Methods' parametric assumptions come with tradeoffs. .hi-slate[Parametric methods] <br> .pink.mono[+] Simpler to estimate and interpret. <br> .purple.mono[-] If assumed functional form is bad, model performance will suffer. .hi-slate[Non-parametric methods] <br> .pink.mono[+] Fewer assumptions. More flexibility. <br> .purple.mono[-] Lower interpretability. Susceptible to overfitting. Want lots of data. --- layout: true class: clear, middle --- .hi-slate[Example:] Let's start with a pretty funky, nonlinear function. --- exclude: true ```r # Sample size n = 70 # Set seed set.seed(12345) # Define function f = function(x1, x2, e) x1 + x2 - x1 * x2 + (x1 > x2) * x1 + (x1 < x2) * x2^2 + e # Generate data sample_df = tibble( x1 = runif(n = n, max = 10), x2 = runif(n = n, max = 10), e = rnorm(n = n, sd = 1), y = f(x1, x2, e) ) # Estimate linear-regression model est_lm = lm(y ~ x1 * x2, data = sample_df) # Estimate kNN models: k=10,5,1 est_knn10 = knnreg( y = sample_df$y, x = sample_df[, c("x1", "x2")], k = 10 ) est_knn5 = knnreg( y = sample_df$y, x = sample_df[, c("x1", "x2")], k = 5 ) est_knn1 = knnreg( y = sample_df$y, x = sample_df[, c("x1", "x2")], k = 1 ) # Add predictions sample_df %<>% mutate( y_lm = est_lm$fitted.values, y_knn10 = predict(est_knn10, newdata = sample_df[, c("x1", "x2")]), y_knn5 = predict(est_knn5, newdata = sample_df[, c("x1", "x2")]), y_knn1 = predict(est_knn1, newdata = sample_df[, c("x1", "x2")]) ) # Fit a linear-regression model # True data frame truth_df = tibble(x1 = seq(0, 10, 0.1), x2 = seq(0, 10, 0.1)) %>% expand(x1, x2) truth_df %<>% mutate( y = f(x1, x2, 0), y_lm = predict(est_lm, newdata = truth_df), y_knn10 = predict(est_knn10, newdata = truth_df[, c("x1", "x2")]), y_knn5 = predict(est_knn5, newdata = truth_df[, c("x1", "x2")]), y_knn1 = predict(est_knn1, newdata = truth_df[, c("x1", "x2")]) ) # Find range of x, y, and prediction errors range_x = c(0,10) range_y = c( min( sample_df %>% select(starts_with("y")), truth_df %>% select(starts_with("y")) ), max( sample_df %>% select(starts_with("y")), truth_df %>% select(starts_with("y")) ) ) range_error = c( min(sample_df %>% transmute(y - y_lm, y - y_knn10, y - y_knn5, y - y_knn1)), max(sample_df %>% transmute(y - y_lm, y - y_knn10, y - y_knn5, y - y_knn1)) ) ``` --- name: ex-truth .hi-slate[Truth:] The (nonlinear) `\(f(\mathbf{X})\)` that we hope to recover.
--- .hi-slate[The sample:] `\(n=70\)` randomly drawn observations for `\(\mathbf{y} = f(\mathbf{x}_1,\, \mathbf{x}_2) + \varepsilon\)`
--- name: ex-lm .hi-slate[Estimated linear-regression model:] `\(\hat{\mathbf{y}} = \hat\beta_0 + \hat\beta_1 \mathbf{x}_1 + \hat\beta_2 \mathbf{x}_2 + \hat\beta_3 \mathbf{x}_1 \mathbf{x}_2\)`
--- .hi-slate[Prediction error] from our fitted linear regression model
--- name: ex-knn .hi-slate[k-nearest neighbors] (kNN) using k=5 .grey-light[(a *non-parametric* method)]
--- .hi-slate[k-nearest neighbors] (kNN) using k=10 .grey-light[(notice increased smoothness)]
--- .hi-slate[k-nearest neighbors] (kNN) using k=1 .grey-light[(notice decreased smoothness)]
--- .hi-slate[Prediction error] from our fitted kNN (k=5) model
--- .hi-slate[Prediction error] from our fitted kNN (k=10) model
--- .hi-slate[Prediction error] from our fitted kNN (k=1) model
--- .note[Recall] .hi-slate[Prediction error] from our fitted linear regression model
--- layout: true # Model accuracy --- name: accuracy-questions ## Questions 1. Which of the methods was the most flexible? Inflexible? 1. Why do you think kNN with k=1 had such low prediction error? 1. How could we (better) assess model/predictive performance? 1. Why would we ever want to choose a less flexible model? --- ## Measurement You probably will not be surprised to know that there is no one-size-fits-all solution in statistical learning. .qa[Q] How do we choose between competing models? -- .qa[A] We're a few steps away, but before we do anything, we need a way to .hi-slate[define model performance]. --- name: accuracy-subtlety ## Subtlety Defining performance can actually be quite tricky... .note[Regression setting, 1] Which do you prefer? 1. Lots of little errors and a few really large errors. 1. Medium-sized errors for everyone. .note[Regression setting, 2] Is a 1-unit error (*e.g.*, $1,000) equally bad for everyone? --- ## Subtlety Defining performance can actually be quite tricky... .note[Classification setting, 1] Which is worse? 1. False positive (*e.g.*, incorrectly diagnosing cancer) 1. False negative (*e.g.*, missing cancer) .note[Classification setting, 2] Which is more important? 1. True positive (*e.g.*, correct diagnosis of cancer) 1. True negative (*e.g.*, correct diagnosis of "no cancer") --- name: mse ## MSE .attn[Mean squared error (MSE)] is the most common.super[.pink[†]] way to measure model performance in a regression setting. .footnote[ .pink[†] *Most common* does not mean best—it just means lots of people use it. ] `$$\text{MSE} = \dfrac{1}{n} \sum_{i=1}^n \left[ \color{#FFA500}{y}_i - \hat{\color{#20B2AA}{f}}(\color{#6A5ACD}{x}_i) \right]^2$$` .note[Recall:] `\(\color{#FFA500}{y}_i - \hat{\color{#20B2AA}{f}}(\color{#6A5ACD}{x}_i) = \color{#FFA500}{y}_i - \hat{\color{#FFA500}{y}}_i\)` is our prediction error. -- Two notes about MSE 1. MSE will be (relatively) very small when .hi-slate[prediction error] is nearly zero. 1. MSE .hi-slate[penalizes] big errors more than little errors (the squared part). --- name: training-testing ## Training or testing? Low MSE (accurate performance) on the data that trained the model isn't actually impressive—maybe the model is just overfitting our data..super[.pink[†]] .footnote[ .pink[†] Recall the kNN performance for k=1. ] .note[What we want:] How well does the model perform .hi-slate[on data it has never seen]? -- This introduces an important distinction: 1. .hi-slate[Training data]: The observations `\((\color{#FFA500}{y}_i,\color{#e64173}{x}_i)\)` used to .hi-slate[train] our model `\(\hat{\color{#20B2AA}{f}}\)`. 1. .hi-slate[Testing data]: The observations `\((\color{#FFA500}{y}_0,\color{#e64173}{x}_0)\)` that our model has yet to see—and which we can use to evaluate the performance of `\(\hat{\color{#20B2AA}{f}}\)`. -- .hi-slate[Real goal: Low test-sample MSE] (not the training MSE from before). --- layout: false class: clear, middle .hi-slate[Next time:] model performance, the variance-bias tradeoff, and kNN --- name: sources layout: false # Sources These notes draw upon - [An Introduction to Statistical Learning](http://faculty.marshall.usc.edu/gareth-james/ISL/) (*ISL*)<br>James, Witten, Hastie, and Tibshirani - [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)<br>Jake VanderPlas I pulled the comic from [Twitter](https://twitter.com/athena_schools/status/1063013435779223553/photo/1). --- # Table of contents .col-left[ .smallest[ #### Admin - [Today](#admin-today) - [Upcoming](#admin-soon) #### Statistical learning - [Definition](#sl-definition) - [Classes](#sl-classes) #### Notation - [Source](#notation-source) - [Data](#notation-data) - [Dimensions of `\(\mathbf{X}\)`](#notation-dimensions) - [Outcomes](#notation-outcomes) #### Statistical learning, continued - [The goal](#sl-goal) - [Prediction](#sl-prediction) - [Parameterization](#sl-parameters) ] ] .col-right[ .smallest[ #### Example - [Data-generating process (truth)](#ex-truth) - [Regression model](#ex-lm) - [kNN model](#ex-knn) #### Model accuracy - [Questions](#accuracy-questions) - [Subtlety](#accuracy-subtlety) - [MSE](#mse) - [Training *vs.* testing](#training-testing) #### Other - [Sources/references](#sources) ] ] --- exclude: true