class: center, inverse, middle <style type="text/css"> .pull-left { float: left; width: 44%; } .pull-right { float: right; width: 44%; } .pull-right ~ p { clear: both; } .pull-left-wide { float: left; width: 66%; } .pull-right-wide { float: right; width: 66%; } .pull-right-wide ~ p { clear: both; } .pull-left-narrow { float: left; width: 30%; } .pull-right-narrow { float: right; width: 30%; } .tiny123 { font-size: 0.40em; } .small123 { font-size: 0.80em; } .large123 { font-size: 2em; } .red { color: red } .orange { color: orange } .green { color: green } </style> # Statistics ## Simple linear regression ### (Chapter 17) ### Christian Vedel,<br>Department of Economics<br>University of Southern Denmark ### Email: [christian-vs@sam.sdu.dk](mailto:christian-vs@sam.sdu.dk) ### Updated 2026-04-27 --- class: middle # Today's lecture .pull-left-wide[ **Modelling the conditional mean of one variable as a linear function of another, and testing whether the relationship is real** - **Section 1:** The regression function - **Section 2:** Simple linear regression - **Section 3:** OLS estimation - **Section 4:** Hypothesis testing and confidence intervals ] .pull-right-narrow[  ] --- class: inverse, middle, center # The regression function --- # Motivation .pull-left-wide[ The relationship between two random variables `\(X\)` and `\(Y\)` is specified by their joint probability distribution `\(f(x,y)\)`. We can also study the relationship between a given value of `\(X\)` and the distribution of `\(Y\)` — given by the **conditional distribution** `\(f_{Y|X}(y \mid x)\)`. If `\(X\)` and `\(Y\)` are not independent, knowing the value of `\(X\)` gives us information about the distribution of `\(Y\)`. ] --- # Regression function .pull-left-wide[ **Regression analysis** focuses on the conditional mean `\(E(Y \mid X = x)\)`: - if `\(Y\)` is discrete: `\(\displaystyle E(Y \mid X = x) = \sum_i y_i \cdot f_{Y|X}(y_i \mid x)\)` - if `\(Y\)` is continuous: `\(\displaystyle E(Y \mid X = x) = \int_{-\infty}^\infty y \cdot f_{Y|X}(y \mid x)\,dy\)` In either case, `\(E(Y \mid X = x)\)` is a function of `\(x\)`, called the **regression function** of `\(Y\)` on `\(X\)`. ] -- .pull-left-wide[ The two variables have different names: - `\(Y\)` is the **dependent (explained)** variable - `\(X\)` is the **independent (explanatory)** variable ] --- # Error term .pull-left-wide[ The regression function gives the **expected** value of `\(Y\)` for a given `\(X\)`. The actual value rarely equals this expectation. The **error term** (residual) is the difference: `$$U = Y - E(Y \mid X = x)$$` By construction, this has zero conditional mean: `$$E(U \mid X = x) = 0$$` ] -- .pull-left-wide[ > The regression function describes a **statistical** relationship between `\(Y\)` and `\(X\)` — it does not imply causality. ] --- class: inverse, middle, center # Simple linear regression --- # Linear regression model .pull-left-wide[ In practice, we often assume a **linear** relationship between `\(X\)` and `\(Y\)`: > **Simple linear regression:** `\(E(Y \mid X = x) = \beta_0 + \beta_1 x\)` The two coefficients: - `\(\beta_0\)` is the **intercept** - `\(\beta_1\)` is the **slope coefficient** The slope `\(\beta_1\)` measures the change in `\(E(Y)\)` for a one-unit increase in `\(X\)`. ] --- # Population coefficients .pull-left-wide[ It can be shown that the two coefficients can be expressed as: `$$\begin{align*} \beta_1 & = \frac{Cov(X,Y)}{Var(X)} \\ \beta_0 & = E(Y) - \beta_1 E(X) \end{align*}$$` Implications: - the **sign** of `\(\beta_1\)` equals the sign of the correlation between `\(X\)` and `\(Y\)` - the **magnitude** of `\(\beta_1\)` gives the effect of a one-unit change in `\(X\)` on `\(E(Y)\)` ] .pull-right-narrow[ .small123[ If `\(Cov(X,Y) = 0\)`, then `\(\beta_1 = 0\)` and knowing `\(X\)` does not help predict `\(E(Y)\)`. ] ] --- class: inverse, middle, center # OLS estimation --- # Analogy principle .pull-left-wide[ Given a simple random sample `\(((X_1,Y_1),\ldots,(X_n,Y_n))\)`, replace the population quantities with sample counterparts: `$$\begin{align*} Cov(X,Y) & \;\to\; \widehat{Cov}(X,Y) = \frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y}) \\ Var(X) & \;\to\; S_X^2 = \frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2 \\ E(Y) & \;\to\; \bar{Y} = \frac{1}{n}\sum_{i=1}^n Y_i \end{align*}$$` ] --- # OLS estimators .pull-left-wide[ Substituting gives the **ordinary least squares (OLS)** estimators: `$$\begin{align*} \hat{\beta}_1 & = \frac{\widehat{Cov}(X,Y)}{S_X^2} = \frac{\sum_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})}{\sum_{i=1}^n(X_i-\bar{X})^2} \\ \hat{\beta}_0 & = \bar{Y} - \hat{\beta}_1\bar{X} \end{align*}$$` These are the solution to minimising the **sum of squared residuals**: `$$\sum_{i=1}^n\left[Y_i - (b_0 + b_1 X_i)\right]^2$$` > OLS finds the "best-fitting" straight line through the sample points. ] --- # .red[Raise your hand 1: Regression function and OLS]
−
+
00
:
20
.pull-left-wide[ **Q1.** In `\(E(Y \mid X=x) = \beta_0 + \beta_1 x\)`, the slope `\(\beta_1 = 2.5\)` means: A: When `\(X\)` increases by 1 unit, the expected value of `\(Y\)` increases by 2.5 units B: When `\(X\)` increases by 1 unit, `\(Y\)` always increases by 2.5 units C: The correlation between `\(X\)` and `\(Y\)` is 2.5 D: For every unit increase in `\(Y\)`, `\(X\)` increases by 2.5 units ] -- .pull-left-wide[ **Q2.** The OLS estimate `\(\hat{\beta}_1 = \widehat{Cov}(X,Y)/S_X^2\)`. The sign of `\(\hat{\beta}_1\)` is determined by: A: The sign of `\(\widehat{Cov}(X,Y)\)`, since `\(S_X^2 > 0\)` always B: The sign of `\(S_X^2\)` C: The sign of `\(\bar{Y}\)` D: Whether `\(n\)` is odd or even ] --- # .red[Practice 1: Computing OLS estimates] .pull-left-wide[ A sample of `\(n=5\)` observations gives: `$$\bar{X}=3, \quad \bar{Y}=7, \quad \sum_{i=1}^5(X_i-\bar{X})^2=10, \quad \sum_{i=1}^5(X_i-\bar{X})(Y_i-\bar{Y})=15$$` 1. Compute `\(\hat{\beta}_1\)` and `\(\hat{\beta}_0\)`. 2. Interpret `\(\hat{\beta}_1\)`. 3. What is the fitted value `\(\hat{Y}\)` when `\(X=5\)`? ] --- class: inverse, middle, center # Hypothesis testing and confidence intervals --- # Hypothesis test for `\(\beta_1\)` .pull-left-wide[ To test whether `\(X\)` and `\(Y\)` are linearly related, test: `$$\begin{align*} H_0 & : \beta_1 = 0 \\ H_1 & : \beta_1 \not= 0 \end{align*}$$` More generally, test `\(H_0: \beta_1 = \beta_1^0\)` for a value `\(\beta_1^0\)` suggested by theory. Since `\(\hat{\beta}_1\)` is an **unbiased estimator** of `\(\beta_1\)` (i.e. `\(E(\hat{\beta}_1) = \beta_1\)`), the test is a test of the mean of `\(\hat{\beta}_1\)`: `$$Z = \frac{\hat{\beta}_1 - \beta_1^0}{\sqrt{S_{\hat{\beta}_1}^2}} \overset{a}{\sim} \mathcal{N}(0,1)$$` Decision rule: - do not reject `\(H_0\)` if `\(z_{\alpha/2} \leq Z \leq z_{1-\alpha/2}\)` - reject `\(H_0\)` if `\(Z < z_{\alpha/2}\)` or `\(Z > z_{1-\alpha/2}\)` ] .pull-right-narrow[ .small123[ Testing `\(\beta_0\)` uses the same approach with `\(\hat{\beta}_0\)` and its standard error. ] ] --- # Confidence interval for `\(\beta_1\)` .pull-left-wide[ A confidence interval at `\((1-\alpha)\)` level: `$$\hat{I} = \left[\hat{\beta}_1 - z_{1-\alpha/2}\sqrt{S_{\hat{\beta}_1}^2},\;\; \hat{\beta}_1 + z_{1-\alpha/2}\sqrt{S_{\hat{\beta}_1}^2}\right]$$` Interpretation: if we were to draw many samples, `\((1-\alpha)\)`\% of the intervals constructed this way would contain the true `\(\beta_1\)`. Confidence intervals for `\(\beta_0\)` are constructed analogously. ] --- # .red[Raise your hand 2: Tests for regression coefficients]
−
+
00
:
20
.pull-left-wide[ **Q1.** To test whether `\(X\)` and `\(Y\)` are **linearly related**, we test: A: `\(H_0: \beta_1=0\)` vs `\(H_1: \beta_1 \neq 0\)` B: `\(H_0: \beta_0=0\)` vs `\(H_1: \beta_0 \neq 0\)` C: `\(H_0: \mu_X=\mu_Y\)` vs `\(H_1: \mu_X \neq \mu_Y\)` D: `\(H_0: Var(X)=Var(Y)\)` vs `\(H_1: Var(X)\neq Var(Y)\)` ] -- .pull-left-wide[ **Q2.** The test statistic `\(Z = (\hat{\beta}_1 - \beta_1^0)/\sqrt{S_{\hat{\beta}_1}^2}\)` follows approximately `\(\mathcal{N}(0,1)\)` because: A: `\(\hat{\beta}_1\)` is unbiased for `\(\beta_1\)`, so `\(\hat{\beta}_1 - \beta_1^0\)` is centred; the CLT gives approximate normality B: The OLS estimator is exactly normally distributed for any sample size C: The `\(t\)`-distribution always equals `\(\mathcal{N}(0,1)\)` regardless of sample size D: Population parameters such as `\(\beta_1\)` are normally distributed by definition ] --- # .red[Practice 2: Hypothesis test and CI for `\(\beta_1\)`] .pull-left-wide[ Using the estimates from Practice 1 (`\(\hat{\beta}_1 = 1.5\)`, `\(\hat{\beta}_0 = 2.5\)`), suppose the estimated standard error is `\(\sqrt{S_{\hat{\beta}_1}^2} = 0.5\)`. 1. Test `\(H_0: \beta_1=0\)` vs `\(H_1: \beta_1 \neq 0\)` at `\(\alpha=0.05\)`. 2. Construct a 95\% confidence interval for `\(\beta_1\)`. 3. Does the CI contain 0? Is this consistent with your test decision? ] --- # Before next time .pull-left[ - Read the assigned reading - Next time: Further topics in regression (Chapter 18) ] .pull-right[  ]