class: center, middle, inverse, title-slide .title[ # .b[Simple Linear Regression] ] .subtitle[ ## .b[.green[EC 339]] ] .author[ ### Marcio Santetti ] .date[ ### Fall 2022 ] --- class: inverse, middle # Motivation --- # On notation In our course, we will adopt the following .hi[notation] for a regression model: <br> $$ `\begin{align} y_i = \beta_0 + \beta_1x_{1i} + u_i \end{align}` $$ -- <br> - where: - `\(y_i\)`: .hi[dependent variable]'s value for the `\(i^{th}\)` individual; - `\(x_i\)`: .hi-orange[independent variable]'s value for the `\(i^{th}\)` individual; - `\(\beta_0\)`: .hi[intercept] term; - `\(\beta_1\)`: .hi-orange[slope] coefficient; - `\(u_i\)`: .hi[residual/error] term (the `\(i^{th}\)` individual's .hi-orange[random] deviation from the population parameters). --- layout: false class: inverse, middle # Motivating regression models --- # Data are fuzzy .small[Life expectancy _vs._ GDP per capita (1952—2007):<sup>*<sup>] <img src="001-simple-regression_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> .pull-left[.footnote[ [*]: Data from [`Gapminder`](https://www.gapminder.org). ]] --- # Data are fuzzy .small[Now, including .hi[regression lines]:] <img src="001-simple-regression_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Data are fuzzy .small[Narrowing down to the Americas:] <img src="001-simple-regression_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Data are fuzzy Now, for the US... <img src="001-simple-regression_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- layout: false class: inverse, middle # Which method to use? --- # Ordinary Least Squares (OLS) <br> The .hi[Ordinary Least Squares (OLS) Estimator]: <br> - OLS .hi-orange[minimizes] the .it[squared distance] between the data points and the regression line it generates. - This way, we are .hi[minimizing] _error_ (_ignorance_) about our data and the relationship we are trying to better understand. - In addition, it is .hi-orange[easy] to estimate and interpret. --- # Ordinary Least Squares (OLS) The .hi[Ordinary Least Squares (OLS) Estimator]: .center[ `\(\text{SSR} = \sum_{i = 1}^{n} u_i^2\quad\)` where `\(\quad u_i = y_i - \hat{y}_i\)` ] -- <br> - Why .hi-orange[squaring] these residuals? -- - Bigger errors, bigger .hi[penalties]. -- $$ `\begin{align} \min_{\hat{\beta}_0,\, \hat{\beta}_1} \text{SSR} \\ \min_{\hat{\beta}_0,\, \hat{\beta}_1} (y_i - \hat{y}_i)^2 \\ \min_{\hat{\beta}_0,\, \hat{\beta}_1} \left( y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i \right)^2 \end{align}` $$ --- # Ordinary Least Squares (OLS) The .hi[Ordinary Least Squares (OLS) Estimator]: <br> - .hi[Slope coefficient]: $$ \hat{\beta}_1 = \dfrac{\sum_i (x_i - \overline{x})(y_i - \overline{y})}{\sum_i (x_i - \overline{x})^2} = \dfrac{Cov(x,y)}{Var(x)} $$ -- - .hi-orange[Intercept coefficient]: $$ \hat{\beta}_0 = \overline{y} - \hat{\beta}_1 \overline{x} $$ --- # "Best" regression lines <img src="001-simple-regression_files/figure-html/ols vs lines 1-1.svg" style="display: block; margin: auto;" /> --- # "Best" regression lines For any line — `\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\)` <img src="001-simple-regression_files/figure-html/vs lines 2-1.svg" style="display: block; margin: auto;" /> --- # "Best" regression lines For any line — `\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\)` —, we can calculate residuals: `\(u_i = y_i - \hat{y}_i\)` <img src="001-simple-regression_files/figure-html/ols vs lines 3-1.svg" style="display: block; margin: auto;" /> --- # "Best" regression lines For any line — `\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\)` —, we can calculate residuals: `\(u_i = y_i - \hat{y}_i\)` <img src="001-simple-regression_files/figure-html/ols vs lines 4-1.svg" style="display: block; margin: auto;" /> --- # "Best" regression lines For any line — `\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\)` —, we can calculate residuals: `\(u_i = y_i - \hat{y}_i\)` <img src="001-simple-regression_files/figure-html/ols vs lines 5-1.svg" style="display: block; margin: auto;" /> --- # "Best" regression lines SSR squares the errors `\(\left(\sum u_i^2\right)\)`: bigger errors get bigger penalties. <img src="001-simple-regression_files/figure-html/ols vs lines 6-1.svg" style="display: block; margin: auto;" /> --- # "Best" regression lines The OLS estimate is the combination of `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)` that minimize SSR. <img src="001-simple-regression_files/figure-html/ols vs lines 7-1.svg" style="display: block; margin: auto;" /> --- layout: false class: inverse, middle # Interpretation --- # Interpreting OLS coefficients <br> - .hi-orange[Slope] coefficient: the change (increase/decrease) in the dependent variable `\((y)\)` generated by a 1-unit increase in the independent variable `\((x)\)`. - .hi[Intercept] term: the value of the dependent variable `\((y)\)` when `\(x=0\)`. -- <br> .hi[Example]: - Interpret the following estimated regression models: $$ `\begin{align} \widehat{wage_i} = 10 + 2.65 \ educ_i \end{align}` $$ -- $$ `\begin{align} \widehat{sleep_i} = 6.5 -0.65 \ kids_i \end{align}` $$ --- layout: false class: inverse, middle # Next time: Simple regression in practice --- exclude: true