.b[Simple Linear Regression]

.title[
# .b[Simple Linear Regression]
]
.subtitle[
## .b[.green[EC 339]]
]
.author[
### Marcio Santetti
]
.date[
### Fall 2022
]

---

# Motivation

---

# On notation

In our course, we will adopt the following .hi[notation] for a regression model:

$$ 
`\begin{align}
y_i = \beta_0 + \beta_1x_{1i} + u_i 
\end{align}`
$$
--

- where:

- `$y_i$`: .hi[dependent variable]'s value for the `$i^{th}$` individual;
 - `$x_i$`: .hi-orange[independent variable]'s value for the `$i^{th}$` individual;
 - `$\beta_0$`: .hi[intercept] term;
 - `$\beta_1$`: .hi-orange[slope] coefficient;
 - `$u_i$`: .hi[residual/error] term (the `$i^{th}$` individual's .hi-orange[random] deviation from the population parameters).

---

# Motivating regression models

---

# Data are fuzzy

---

# Data are fuzzy

---

# Data are fuzzy

---

# Data are fuzzy

Now, for the US...

---

# Which method to use?

---

# Ordinary Least Squares (OLS)

The .hi[Ordinary Least Squares (OLS) Estimator]:

- OLS .hi-orange[minimizes] the .it[squared distance] between the data points and the regression line it generates.

- This way, we are .hi[minimizing] _error_ (_ignorance_) about our data and the relationship we are trying to better understand.

- In addition, it is .hi-orange[easy] to estimate and interpret.

---

# Ordinary Least Squares (OLS)

The .hi[Ordinary Least Squares (OLS) Estimator]:

`$\text{SSR} = \sum_{i = 1}^{n} u_i^2\quad$` where `$\quad u_i = y_i - \hat{y}_i$`

]

- Why .hi-orange[squaring] these residuals?

- Bigger errors, bigger .hi[penalties].

$$ 
`\begin{align}
\min_{\hat{\beta}_0,\, \hat{\beta}_1} \text{SSR} \\
\min_{\hat{\beta}_0,\, \hat{\beta}_1} (y_i - \hat{y}_i)^2 \\
\min_{\hat{\beta}_0,\, \hat{\beta}_1} \left( y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i \right)^2
\end{align}`
$$
---

# Ordinary Least Squares (OLS)

The .hi[Ordinary Least Squares (OLS) Estimator]:

- .hi[Slope coefficient]:

$$ \hat{\beta}_1 = \dfrac{\sum_i (x_i - \overline{x})(y_i - \overline{y})}{\sum_i (x_i - \overline{x})^2} = \dfrac{Cov(x,y)}{Var(x)} $$

- .hi-orange[Intercept coefficient]:

$$ \hat{\beta}_0 = \overline{y} - \hat{\beta}_1 \overline{x} $$

---

# "Best" regression lines

---

# "Best" regression lines

For any line &mdash; `$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$`

---

# "Best" regression lines

For any line &mdash; `$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$` &mdash;, we can calculate residuals: `$u_i = y_i - \hat{y}_i$`

---

# "Best" regression lines

For any line &mdash; `$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$` &mdash;, we can calculate residuals: `$u_i = y_i - \hat{y}_i$`

---

# "Best" regression lines

For any line &mdash; `$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$` &mdash;, we can calculate residuals: `$u_i = y_i - \hat{y}_i$`

---

# "Best" regression lines

SSR squares the errors `$\left(\sum u_i^2\right)$`: bigger errors get bigger penalties.

---

# "Best" regression lines

The OLS estimate is the combination of `$\hat{\beta}_0$` and `$\hat{\beta}_1$` that minimize SSR.

---

# Interpretation

---

# Interpreting OLS coefficients

- .hi-orange[Slope] coefficient: the change (increase/decrease) in the dependent variable `$(y)$` generated by a 1-unit increase in the independent variable `$(x)$`.

- .hi[Intercept] term: the value of the dependent variable `$(y)$` when `$x=0$`.

- Interpret the following estimated regression models:

$$
`\begin{align}
\widehat{wage_i} = 10 + 2.65 \ educ_i
\end{align}`
$$

$$
`\begin{align}
\widehat{sleep_i} = 6.5 -0.65 \ kids_i
\end{align}`
$$

---

# Next time: Simple regression in practice

---
exclude: true