class: center, middle, inverse, title-slide .title[ # ECON 4050: Introduction to Econometrics ] .subtitle[ ## Linear Regression Extensions ] .author[ ### Adam Soliman, PhD ] .date[ ### Clemson University ] --- # Today - Linear Regression Extensions Depending on the data and the relationships between the variables of interest, you may need to move away from the baseline model. -- We will focus on 3 important variations: 1. ***Non-linear relationships*** 1. ***Interactions*** between variables 1. ***Standardized*** regression -- In each case, the way we estimate these coefficients does not change (i.e OLS). -- Empirical applications: * *college tuition* and *earnings potential*, * *wage*, *education* and *gender*, * *class size* and *student performance* --- layout: false class: title-slide-section-red, middle # Non-Linear Relationships --- # Accounting for Non-Linear Relationships There are two main "methods": -- 1. ***Log*** models 1. ***Polynomial*** models (not the focus of lecture) --- # Log Models * The models we have seen so far can be called ***level-level*** specifications. Both the dependent and the independent variables have been measured in levels. -- * The *level* can be dollars, years, number of students...even a percentage. -- * Taking the *natural* logarithm of the dependent and/or the independent variable(s) leads us to define 3 other types of regressions (with a common abuse of notation where `\(\ln(x) = \log_{e}(x)=\log(x)\)`): * ***Log - level***: `\(\quad \log(y_i) = b_0 + b_1 x_{1,i} + ... + e_i\)` * ***Level - log***: `\(\quad \textrm{y}_i = b_0 + b_1 \log(x_{1,i}) + ... + e_i\)` * ***Log - log***: `\(\quad \log(y_i) = b_0 + b_1 \log(x_{1,i}) + ... + e_i\)` --- # The (natural) log Function: A Primer 😉 <img src="chapter_regext_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # The (natural) log Function: A Primer 😉 * The [natural log function](https://en.wikipedia.org/wiki/Natural_logarithm) is the inverse function of the exponential function, i.e. `\(\log(\exp(x))=x\)` * Since for all `\(x\)`, `\(\exp(x)>0 \implies\)` natural log function is only defined for ***strictly positive values***! (It is not defined in 0!) ⚠️ Note you can only log your variables if they don't take 0 or negative values! Always think about this when taking the log of your dependent or independent variable(s) --- # The (natural) log Function: A Primer 😉 If you have very ***skewed distributions*** taking the log will render it more ***normally distributed*** -- .pull-left[ <img src="chapter_regext_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="chapter_regext_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> ] --- # Log Models: Simplified Interpretations | Specification | Model | Interpretation of `\(b_1\)` | |--------------------|:---------:|:-----------------------------------:| | Level - Level | `\(y = b_0 + b_1 x + e\)` | .small[A **one unit** increase in ] `\(x\)` .small[ is associated, on average, with a ] `\(b_1\)` .small[**unit change** in y] | | Log - Level | `\(\log(y) = b_0 + b_1 x + e\)` | .small[A **one unit** increase in ] `\(x\)` .small[ is associated, on average, with a] `\(b_1 \times 100\)` .small[ **percent change** in y] | | Level - Log | `\(y = b_0 + b_1 \log(x) + e\)` | .small[A **one percent** increase in ] `\(x\)` .small[ is associated, on average, with a ] `\(b_1 / 100\)` .small[**unit change** in y] | | Log - Log | `\(\log(y) = b_0 + b_1 \log(x) + e\)` | .small[A **one percent** increase in ] `\(x\)` .small[ is associated, on average, with a] `\(b_1\)` .small[**percent change** in y] | * This may look like cooking recipes but of course it can be [derived with some relatively simple math](https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-a-regression-model-when-some-variables-are-log-transformed/). * ⚠️ these interpretations are only true for ***small*** changes in `\(x\)` and small/or `\(b_1\)`. What happens if we want to know the change in `\(y\)` for big changes in `\(x\)` or when `\(b_1\)` is large? --- name: gen_log # Log Models: General Interpretations For ***any increase in `\(x\)`, `\(\Delta x,\)` and any `\(b_1\)`*** `\((\Delta x = 5\% = 0.05 \implies 1 + \Delta x = 1.05)\)`: | Specification | Model | Interpretation of `\(b_1\)` | |--------------------|:---------:|:-----------------------------------:| | Level - Level | `\(y = b_0 + b_1 x + e\)` | .small[A **one unit** increase in ] `\(x\)` .small[ is associated, on average, with a ] `\(b_1\)` .small[**unit change** in y] | | Log - Level | `\(\log(y) = b_0 + b_1 x + e\)` | .small[A **one unit** increase in ] `\(x\)` .small[ is associated, on average, with a] `\((e^{b_1} - 1) \times 100\)` .small[ **percent change** in y] | | Level - Log | `\(y = b_0 + b_1 \log(x) + e\)` | .small[A ] ** `\(\Delta x\)`** .small[**percent** increase in ] `\(x\)` .small[ is associated, on average, with a ] `\(b_1 \times \log(1 + \Delta x)\)` .small[**unit change** in y] | | Log - Log | `\(\log(y) = b_0 + b_1 \log(x) + e\)` | .small[A ] ** `\(\Delta x\)`** .small[**percent** increase in ] `\(x\)` .small[ is associated, on average, with a] `\(((1 + \Delta x)^{b_1} - 1) \times 100\)` .small[**percent change** in y] | ([*Appendix:*](#log_approx) Why are the previously shown approximations true?) --- # When Should You Use log Models? 1. If the relationship betwen `\(x\)` and `\(y\)` looks like a log or exponential function -- .pull-left[ <img src="chapter_regext_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="chapter_regext_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> ] --- # When Should You Use log Models? 1. If the relationship betwen `\(x\)` and `\(y\)` looks like a log or exponential function .pull-left[ <img src="chapter_regext_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="chapter_regext_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> ] --- # When Should You Use log Models? 1. If the relationship betwen `\(x\)` and `\(y\)` looks like a log or exponential function. 1. To easily interpret coefficients as <a href="https://en.wikipedia.org/wiki/Elasticity_(economics)">***elasticities***</a> which play a central role in economic theory ***Elasticity of `\(y\)` with respect to `\(x\)`:*** percent change in `\(y\)` following a 1% increase in `\(x\)`. --- # An example of interpretation of a log model - We estimate the effect of fertility on GDP using the **Gapminder** dataset using $$ \log(\text{gdp}) = \beta_0 + \beta_1 \cdot \text{fertility} + \epsilon $$ -- ``` r model <- lm(log(gdp) ~ fertility, gapminder) model ``` ``` ## ## Call: ## lm(formula = log(gdp) ~ fertility, data = gapminder) ## ## Coefficients: ## (Intercept) fertility ## 25.4027 -0.5899 ``` - Approximate interpretation for slope, which works for small changes: A one-unit increase in the fertility rate is associated with an approximately 100×(−0.5899)=−58.99% decrease in GDP - Exact value for slope change: `\((e^{\beta_1}−1)×100 = (e^{−0.5899}−1)×100 = −44.53\%\)` - The intercept (25.4027) represents log(gdp) when the fertility rate is 0. This doesn't have a meaningful real-world interpretation, but you would exponentiate it to get the value, so exp(25.4027) --- # Task 1: Non-linear relationships 1. Load the data [here](https://www.dropbox.com/scl/fi/68taafng8vjpzmub6q1xj/college_tuition_income.csv?rlkey=0c59a2kj0cwmcecjbg20jrkva&dl=0) and call the object `college`. This dataset contains information about tuition and estimated incomes of graduates for universities in the US. 2. Create a scatter plot of estimated mid-career pay (`mid_career_pay`) `\((y-axis)\)` as a function of out-of-state tuition (`out_of_state_tuition`) `\((x-axis)\)`. Would you say the relationship is broadly linear or rather non-linear? 3. Filter the variable `type` to keep only the values of ("For Profit", "Private", "Public"). Call this object `college_clean`. USE THIS DATA OBJECT FROM NOW ON! 4. Regress mid-career pay on university type. Interpret each coefficient. What is the reference category? 5. Regress mid-career pay on the logarithm of out-of-state tuition. You can either create a new variable using `mutate`, or incorporate the logarithm directly into the regression. Interpret the slope coefficient. 6. Regress the logarithm of mid-career pay on the variable you created in #3 (`out_state`). Interpret the slope coefficient. 7. Regress the logarithm of mid-career pay on the logarithm of out-of-state tuition. Interpret the slope coefficient. --- # Teaser for the Next Few Topics * You may have noticed that since the beginning we always work with **samples** drawn from the overall population. * Each time, imagine we could draw another sample from population: * Would we obtain the same results? * In other words, how confident can we be that our estimates (sign, magnitude) are not just driven by randomness? * We will answer those kind of questions: * We'll present the notion of **sampling**, and * Understand what **statistical inference** is and how to do it. --- # On the way to causality ✅ How to manage data? Read it, tidy it, visualise it! ✅ **How to summarise relationships between variables?** Simple and multiple linear regression, non-linear regressions, interactions... ✅ What is causality? ❌ What if we don't observe an entire population? ❌ Are our findings just due to randomness? ❌ How to find exogeneity in practice? --- # Bonus Material ## Accounting for Other Types Non-Linear Relationships What if the relationship between `\(x\)` and `\(y\)` is not exponential/log? `\(\rightarrow\)` ***polynomial*** regressions: just take a polynomial function of the regressor! --- # Polynomial Wut? 😟 .pull-left[ <img src="chapter_regext_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> ] .pull-right[ <img src="chapter_regext_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> ] --- # Polynomial Regressions What does this mean in practice? `\(\rightarrow\)` add a higher order of the regressor to the regression, depending on the visual (or expected) relationship -- Several ways of doing this in `R`, these are just two of the equivalent ones: .pull-left[ ``` r lm(y ~ x + I(x^2) + I(x^3), data) ``` ] .pull-right[ ``` r lm(y ~ poly(x, 3, raw = TRUE), data) ``` ] --- # Polynomial Regressions .pull-left[ ***2nd order:*** <img src="chapter_regext_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> ] -- .pull-left[ ***3rd order:*** <img src="chapter_regext_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> ] --- <img src="chapter_regext_files/figure-html/curve_fitting.png" width="378px" height="600px" style="display: block; margin: auto;" /> --- layout: false class: title-slide-section-red, middle # Interaction Terms --- # Interacting Regressors * In a linear model *without* interaction terms, we assume that the effect of each predictor on the dependent variable (target) is independent of other predictors in the model. * We interact two regressors when we believe ***the effect of one depends on the value of the other***. * *Example:* The returns to education on wage vary by gender. -- * In practice, if we interact `\(x_1\)` and `\(x_2\)`, we would write our model like this : `$$y_i = b_0 + b_1 x_{1,i} + b_2 x_{2,i} + b_3x_{1,i} \times x_{2,i} + ... + e_i$$` -- * The interpretation of `\(b_1\)`, `\(b_2\)`, and `\(b_3\)` will depend on the type of `\(x_1\)` and `\(x_2\)`. * We will focus on the cases where one regressor is a ***dummy/categorical*** variable and the other is ***continuous***. * It will give you the intuition for the other cases: * Both regresors are dummies/categorical variables, * Both regresors are continuous variables. --- # Interacting Regressors Let's go back to the *STAR* experiment data. How does the effect of being in a small vs regular class vary with the experience of the teacher? Our regression model becomes: $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i \times \textrm{experience}_i + e_i$$ -- Let's say we estimate it in `R` with real data, and we get the following coefficients for each `\(b\)`: `$$\hat{score}_i = \color{#d96502}{535} + \color{#027D83}{16} \textrm{small}_i + \color{#02AB0D}{1.3} \textrm{experience}_i + \color{#d90502}{-0.3} \textrm{small}_i \times \textrm{experience}_i$$` -- Because `\(\textrm{small}_i\)` is a binary variable, we can split up the previous equation for each class type. * For those in regular classes, or when `\(small_i = 0\)`, this would look like: `$$\hat{score}_i = \color{#d96502}{535} + \color{#02AB0D}{1.3} \textrm{experience}_i,$$` * and for those in small classes, or when `\(small_i = 1\)`, we would get: `$$\hat{score}_i = \color{#d96502}{535} + \color{#027D83}{16} + (\color{#02AB0D}{1.3} - \color{#d90502}{0.3}) \textrm{experience}_i = 551 + \textrm{experience}_i$$` --- # Interacting Regressors Running the regression for the `math` score (for all grades), we obtain: ``` r lm(math ~ small+ experience + small*experience, star_df) ``` ``` ## ## Call: ## lm(formula = math ~ small + experience + small * experience, ## data = star_df) ## ## Coefficients: ## (Intercept) smallTRUE experience ## 534.1919 15.8906 1.3305 ## smallTRUE:experience ## -0.3034 ``` * The interaction term allows the impact of being in a small class to vary with teacher experience. * In particular, we still observe a ***positive impact of being in a small class*** on math score, but this ***effect is decreasing in the experience of the teacher***. --- # Interacting Regressors (more formally) Running the regression for the `math` score (for all grades), we obtain: ``` r lm(math ~ small+ experience + small*experience, star_df) ``` ``` ## ## Call: ## lm(formula = math ~ small + experience + small * experience, ## data = star_df) ## ## Coefficients: ## (Intercept) smallTRUE experience ## 534.1919 15.8906 1.3305 ## smallTRUE:experience ## -0.3034 ``` * `\(\color{#d96502}{b_0}\)` : 534.1919 is the average math score for students assigned to the *regular* classes * `\(\color{#027D83}{b_1}\)` : 15.8906 is the average increase in math scores for those in a *small* class, relative to those in the regular class * `\(\color{#02AB0D}{b_2}\)` : 1.3305 means that for those in the *regular* class, average math scores increase by 1.3305 per year of teacher experience (i.e., it is the slope of the regular class line...think when `\(small = 0\)`) * `\(\color{#02AB0D}{b_2} + \color{#d90502}{b_3}\)` : 1.3305 + -0.3034 = 1.0271 is the slope of the small class line, i.e., that for those in the *small* class, average math scores increase by 1.0271 per year of teacher experience --- # How is this different than the base case? ``` r lm(math ~ small+ experience, star_df) ``` ``` ## ## Call: ## lm(formula = math ~ small + experience, data = star_df) ## ## Coefficients: ## (Intercept) smallTRUE experience ## 535.813 12.336 1.188 ``` ``` r lm(math ~ small+ experience + small*experience, star_df) ``` ``` ## ## Call: ## lm(formula = math ~ small + experience + small * experience, ## data = star_df) ## ## Coefficients: ## (Intercept) smallTRUE experience ## 534.1919 15.8906 1.3305 ## smallTRUE:experience ## -0.3034 ``` --- # Interacting Regressors: Visually $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i * \textrm{experience}_i + e_i$$ <img src="chapter_regext_files/figure-html/graph_base.png" width="90%" style="display: block; margin: auto;" /> --- # Interacting Regressors: Visually $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i * \textrm{experience}_i + e_i$$ <img src="chapter_regext_files/figure-html/graph_reg.png" width="90%" style="display: block; margin: auto;" /> --- # Interacting Regressors: Visually $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i * \textrm{experience}_i + e_i$$ <img src="chapter_regext_files/figure-html/graph_reg_b0.png" width="90%" style="display: block; margin: auto;" /> --- # Interacting Regressors: Visually $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i * \textrm{experience}_i + e_i$$ <img src="chapter_regext_files/figure-html/graph_reg_b0_b1.png" width="90%" style="display: block; margin: auto;" /> --- # Interacting Regressors: Visually $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i * \textrm{experience}_i + e_i$$ <img src="chapter_regext_files/figure-html/graph_reg_b0_b1_b2.png" width="90%" style="display: block; margin: auto;" /> --- # Interacting Regressors: Visually $$ \textrm{score}_i = \color{#d96502}{b_0} + \color{#027D83}{b_1} \textrm{small}_i + \color{#02AB0D}{b_2} \textrm{experience}_i + \color{#d90502}{b_3} \textrm{small}_i * \textrm{experience}_i + e_i$$ <img src="chapter_regext_files/figure-html/graph_reg_b0_b1_b2_b3.png" width="90%" style="display: block; margin: auto;" /> --- ## Task 2: Interacted models 1. Using the same dataset as before, `college_cleaned`, now filter out `type` ("Private", "Public"), so that you only have private or public universities, and call the object `college_clean_really`. Also, create a variable equal to out of state tuition divided by 1000 (note this means a one unit change in this variable is now a $1000 change). 2. Run the following regression model `lm(mid_career_pay ~ out_state + type + type*out_state, data = college_clean_really).` Interpret the coefficients in the context of the model. Specifically, what does each coefficient represent, and how do the coefficients for out_state and type differ for public vs. private universities? Keep in mind what is the reference category. --- name: log_approx # Log Models: Approximations Why are the approximations shown previously true? -- ***Log-Level*** *General interpretation:* A **one unit** increase in `\(x\)` is associated, on average, with a `\((e^{b_1} - 1) \times 100\)` **percent change** in y. *Simplified interpretation:* A **one unit** increase in `\(x\)` is associated, on average, with a `\(b_1 \times 100\)` **percent change** in y. -- This is because, for small `\(b_1\)`, `\(e^{b_1} \approx 1+ b_1 \iff b_1 \approx e^{b_1} - 1\)` -- `\(\rightarrow\)` for `\(b_1 = \color{#d90502}{0.04}\)`, `\(e^{b_1} - 1 = e^{0.04} - 1 = 0.0408\)` -- `\(\rightarrow\)` for `\(b_1 = \color{#d90502}{0.5}\)`, `\(e^{b_1} - 1 = e^{0.5} - 1 = 0.6487\)` --- # Log Models: Approximations Why are the approximations shown previously true? ***Level-Log*** *General interpretation:* A ** `\(\Delta x\)`** **percent** increase in `\(x\)` is associated, on average, with a `\(b_1 \times log(1 + \Delta x)\)` **unit change** in y. *Simplified interpretation:* A **one percent** increase in `\(x\)` is associated, on average, with a `\(b_1 / 100\)` **unit change** in y. -- This is because for small `\(\Delta x\)`, `\(log(1 + \Delta x) \approx \Delta x\)` -- `\(\rightarrow\)` for `\(\Delta x = \color{#d90502}{1\%}=0.01\)`, `\(log(1+\Delta x) = log(1.01) = 0.01\)` (hence the `\(/100\)` in the simplified interpretation) -- `\(\rightarrow\)` for `\(\Delta x = \color{#d90502}{20\%}=0.20\)`, `\(log(1+\Delta x) = log(1.20) = 0.18\)` --- # Log Models: Approximations Why are the approximations shown previously true? ***Log-Log*** *General interpretation:* A ** `\(\Delta x\)`** **percent** increase in `\(x\)` is associated, on average, with a `\(((1 + \Delta x)^{b_1} - 1) \times 100\)` **percent change** in y. *Simplified interpretation:* A **one percent** increase in `\(x\)` is associated, on average, with a `\(b_1\)` **percent change** in y. -- This is because for small `\(|b_1|\times \Delta x\)`, `\((1 + \Delta x)^{b_1} \approx 1 + b_1 \times \Delta x \iff b_1 \times \Delta x \times 100 \approx ((1 + \Delta x)^{b_1} - 1) \times 100\)` -- `\(\rightarrow\)` for `\(\Delta x = \color{#d90502}{1\%}=0.01\)` and `\(b_1 = \color{#d90502}{0.5}\)`, `\(((1+\Delta x)^{b_1} - 1) \times 100 = (1.01^{0.5} - 1) \times 100 = 0.5\)` -- `\(\rightarrow\)` for `\(\Delta x = \color{#d90502}{10\%}=0.10\)` and `\(b_1 = \color{#d90502}{10}\)`, `\(((1+\Delta x)^{b_1} - 1) \times 100= (1.1^{10} - 1) \times 100 = 159.37\)` [back](#gen_log) --- # Standardized Regression Let's define what *standardizing* a variable means. > ***Standardizing*** a variable `\(z\)` means to *demean* the variable and to divide the demeaned value by its own standard deviation: $$ z_i^{stand} = \frac{z_i - \bar z}{\sigma(z)}$$ where `\(\bar z\)` is the mean of `\(z\)` and `\(\sigma(z)\)` is the standard deviation of `\(z\)`, i.e. `\(\sigma(z) = \sqrt{\textrm{Var}(z)}\)`. -- `\(z^{stand}\)` now has mean 0 and standard deviation 1, i.e. `\(\overline{z^{stand}} = 0\)` and `\(\sigma(z^{stand}) = 1\)` -- Intuitively, standardizing ***puts variables on the same scale*** so we can compare them. In our class size and student performance example, it will help to interpret: * The **magnitude** of the effects * The **relative importance of each variable** --- # Standardized Regression: Graphically .pull-left[ <img src="chapter_regext_files/figure-html/graph_before.png" width="3200" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="chapter_regext_files/figure-html/graph_after.png" width="3200" style="display: block; margin: auto;" /> ] --- # Standardized Regression: Interpretation If the ***dependent*** variable `\(y\)` is standardized, i.e. the model is `\(\color{#d90502}{y^{stand}} = b_0 + \sum_{k=1}^Kb_kx_k +e\)`: -- * By definition, `\(b_k\)` measures the predicted change in ** `\(y^{stand}\)` ** associated with a one unit increase in `\(x_k\)`. * If `\(y^{stand}\)` increases by one, it means that `\(y\)` increases by one standard deviation. So `\(b_k\)` measures the change in `\(y\)` **as a share of `\(y\)`'s standard deviation**. -- If the ***regressor*** `\(x_k\)` is standardized, i.e. the model is `\(y = b_0 + \sum_{k=1}^Kb_k\color{#d90502}{x_k^{stand}} +e\)`: -- * By definition, `\(b_k\)` measures the predicted change in `\(y\)` associated with a one unit increase in ** `\(x_k^{stand}\)` **. * If `\(x_k^{stand}\)` increases by one unit, it means that `\(x_k\)` increases by one standard deviation. So `\(b_k\)` measures the predicted change in `\(y\)` **associated with an increase in `\(x_k\)` by one standard deviation**. --- # Task 3: Standardized regression Let's go back our [grades](https://www.dropbox.com/scl/fi/77u2r8xznh63jg7aisq4f/grade5_isl.csv?rlkey=uclo7ng24so9fct41kcjm8k0l&dl=0) dataset. Please download it from the previous link and read it in using the appropriate function. Below are the estimates we got from regressing average math test scores on the full set of regressors. ``` ## (Intercept) classize disadvantaged school_enrollment ## 78.560725298 0.003320773 -0.389333008 0.000758258 ## female religious ## 0.923710499 2.876146701 ``` 1. Create a new variable `avgmath_stand` equal to the standardized math score. You can use the `scale()` function (combined with `mutate`) or do it by hand with base `R`. 1. Run the full regression using the standardized math test score as the dependent variable. Interpret the coefficients and their magnitude. 1. Create the standardized variables for each *continuous* regressor as `<regressor>_stand`. * Would it make sense to standardize the `religious` variable? 1. Regress `avgmath_stand` on the full set of standardized regressors and `religious`. Discuss the relative influence of the regressors.