.b[More on functional forms]

.title[
# .b[More on functional forms]
]
.subtitle[
## .b[.green[EC 339]]
]
.author[
### Marcio Santetti
]
.date[
### Fall 2022
]

---

# Motivation

---

# New functional forms

There is more to OLS than .hi[linear-in-variables] models or .hi-orange[log-transformed] models.

But do these models .hi[preserve] OLS _Classical Assumptions_?

- They do!
  
  - But under what conditions?
  
--

As long as the model remains .hi[linear in parameters], everything is fine.

---

# New functional forms

**1**. Regression through the __origin__

**2**. Regression with __quadratic__ terms

**3**. __Inverse__ forms

**4**. __Interaction__ terms

**5**. __Binary__ (*dummy*) variables

---

# Regression through the origin

---

# Regression through the origin

It is used whenever we need to impose the .hi[restriction] that, when `$x=0$`, the expected value of `$y$` is also zero.

It should be applied .hi[only] when theory recommends to do so.

$$
`\begin{align}
y_i = \beta_1x_{1i} + u_i
\end{align}`
$$

---

# Regression through the origin

$$
`\begin{align}
Cons_i = \beta_1Inc_i + u_i
\end{align}`
$$

---

# Using quadratic terms

---

# Using quadratic terms

Many times, the effect of a variable `$x_i$` on `$y$` also depends on the .hi[level] of that independent variable.

We can also apply quadratic terms when the effect of `$x_i$` on `$y$` .hi[changes] after a given threshold.

$$
`\begin{align}
y_i = \beta_0 + \beta_1x_{1i} + \beta_2(x_{1i})^2 + \cdot \cdot \cdot + \beta_kx_{ki} + u_i
\end{align}`
$$

---

# Using quadratic terms

---

# Using quadratic terms

$$
`\begin{align}
wage_i = \beta_0 + \beta_1exper_{i} + \beta_2exper_{i}^2 +  u_i
\end{align}`
$$

---

# Using quadratic terms

$$
`\begin{align}
wage_i = \beta_0 + \beta_1exper_{i} + \beta_2exper_{i}^2 +  u_i
\end{align}`
$$

---

# Using quadratic terms

$$
`\begin{align}
wage_i = \beta_0 + \beta_1educ_{i} + \beta_2educ_{i}^2 +  u_i
\end{align}`
$$

---

# Using quadratic terms

$$
`\begin{align}
wage_i = \beta_0 + \beta_1educ_{i} + \beta_2educ_{i}^2 +  u_i
\end{align}`
$$

---

# Using quadratic terms

$$
`\begin{align}
y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{1i}^2 +  u_i
\end{align}`
$$
--

$$
`\begin{align}
\dfrac{\partial \ y}{\partial \ x_1} = \beta_1 + 2 \ \cdot \ \beta_2 \ \cdot \  x_1
\end{align}`
$$

$$
`\begin{align}
wage_i = \beta_0 + \beta_1educ_{i} + \beta_2educ_{i}^2 +  u_i
\end{align}`
$$

$$
`\begin{align}
\dfrac{\partial \ wage}{\partial \ educ} = \beta_1 + 2 \ \cdot \ \beta_2 \ \cdot \  educ
\end{align}`
$$

---

# Inverse forms

---

# Inverse forms

Inverse forms are used whenever the effect of an independent variable on `$y_i$` is expected to approach .hi[zero]
as its value approaches .hi-orange[infinity].

As always, but especially important to this category, .hi[economic theory] should *strongly recommend* the use of such functional form.

---

# Inverse forms

$$
`\begin{align}
qchicken_i = \beta_0 + \beta_1\dfrac{1}{pchicken_{i}} +  u_i
\end{align}`
$$

---

# Inverse forms

$$
`\begin{align}
qchicken_i = \beta_0 + \beta_1\dfrac{1}{pchicken_{i}} +  u_i
\end{align}`
$$

---

# Inverse forms

$$
`\begin{align}
y_i = \beta_0 + \beta_1\dfrac{1}{x_{1i}} +  u_i
\end{align}`
$$

$$
`\begin{align}
\dfrac{\partial \ y}{\partial \ x_1} = \dfrac{-\beta_1}{x_1^2}
\end{align}`
$$

$$
`\begin{align}
qchicken_i = \beta_0 + \beta_1\dfrac{1}{pchicken_{i}} +  u_i
\end{align}`
$$

$$
`\begin{align}
\dfrac{\partial \ qchicken}{\partial \ pchicken} = \frac{-\beta_1}{pchicken^2}
\end{align}`
$$

---

# Interaction terms

---

# Interaction terms

Whenever the effect of one variable on `$y$` depends on the .hi[level of another variable], the best .hi-orange[modeling strategy] is to use _interaction terms_.

For example, do we believe that an individual's .hi[wage] depends on their .hi-orange[education]?

- If so, is this effect the .hi[same] or .hi-orange[different] for two individuals with, e.g., a _college_ degree, but with different years of experience on the job market?
  
--

- Then, we represent a model by

$$
`\begin{align}
wage_i = \beta_0 + \beta_1educ_{i} + \beta_2exper_{i} + \beta_3 educ_i \cdot exper_i +  u_i
\end{align}`
$$
---

# Interaction terms

In more general terms, regression estimates `$(\hat{\beta}_i)$` describe .hi[average effects].

Some of these average effects may "hide" .hi[heterogeneous effects] that differ by .hi-orange[group] or by the .hi[level of another variable].

Interaction terms help us in modeling such .hi-orange[heterogeneous] effects.

- For instance, it is plausible to consider that returns on education will differ by .it[gender], .it[race], .it[region], etc.

---

# Interaction terms

$$
`\begin{align}
y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i} + \beta_3x_{1i}x_{2i} +  u_i
\end{align}`
$$
--

$$
`\begin{align}
\dfrac{\partial \ y}{\partial \ x_1} = \beta_1 + \beta_3 \ \cdot \ x_2
\end{align}`
$$
--

$$
`\begin{align}
wage_i = \beta_0 + \beta_1educ_{i} + \beta_2exper_{i} + \beta_3 educ_i \cdot exper_i +  u_i
\end{align}`
$$
--
$$
`\begin{align}
\dfrac{\partial \ wage}{\partial \ exper} = \beta_2 + \beta_3 \ \cdot \ educ
\end{align}`
$$

---

# Binary variables

---

# Binary variables

Categorical variables are used to translate .hi[qualitative information] into .hi-orange[numbers].

- For instance, .it[race], .it[gender], .it[being employed or not], .it[enrolled in EC 339 or not], etc.
  
--

The .hi[easiest] way to work with qualitative information is by using .hi-orange[binary (*dummy*)] variables.

For example,

$$
`\begin{align}
y_i = \beta_0 + \beta_1D_{i} +  u_i
\end{align}`
$$

where `$D_i=1$` if the criterion is fulfilled, and `$D_i=0$` otherwise.

---

# Binary variables

When .hi[interpreting] regression coefficients associated with *dummy* variables, the .it[intercept]'s interpretation changes slightly.

Moreover, the .hi-orange[slope] coefficient on `$D_i$` is not interpreted in the same way we are used to.

Consider:

$$
`\begin{align}
interviews_i = \beta_0 + \beta_1graduate_{i} +  u_i
\end{align}`
$$

where 
  
  - `$interviews_i$` is the number of interviews a candidate is called for in a given period;
  - `$graduate_i$` equals 1 if she has graduated from college, and 0 otherwise.
  
---

# Binary variables

$$
`\begin{align}
interviews_i = \beta_0 + \beta_1graduate_{i} +  u_i
\end{align}`
$$

For this model,

- `$\beta_0$` is the expected number of interviews  when `$graduate_i=0$` (non-graduates);
  - `$\beta_1$` is the expected .hi[difference] in interview calls between graduates and non-graduates;
  - And `$\beta_0 + \beta_1$` is the expected number of interviews for graduates (when `$graduate_i=1$`).
  
--

- In this case, .it[non-graduates] are the .hi[reference group].

---

# Binary variables

$$
`\begin{align}
interviews_i = \beta_0 + \beta_1graduate_{i} +  u_i
\end{align}`
$$

The model above is an example of an .hi-orange[intercept] *dummy* variable.

--
  
  - We only have different .hi[intercepts] when comparing two groups, but .hi-orange[slopes] are the same.
  
--

In order to allow for different .hi[slopes], we appeal to interaction terms involving categorical variables

- i.e., .hi-orange[slope] *dummy* variables.

---

# Log-Level Model

.note[Important!] If you have a .hi[log-linear] model with a .it[binary] variable, the interpretation of the coefficient on that variable .hi-orange[changes].

Example:

$$ \log(y_i) = \beta_0 + \beta_1 D_i + u_i $$

with `$D$` being a *dummy* variable.

Interpretation of `$\beta_1$`:

- When `$D=1$`, `$y$` will increase by `$100 \times \left( e^{\beta_1}-1 \right)$` percent.
- When `$D=0$`, `$y$` will decrease by `$100 \times \left( e^{-\beta_1}-1 \right)$` percent.

---
# Log-Level Example

Binary explanatory variable: `inlf`

- `inlf == 1` if the `$i^{th}$` individual is in the labor force.
- `inlf == 0` if the `$i^{th}$` individual is not in the labor force.

$$
`\begin{aligned}
\widehat{log(sleep_i)} = 8.08 - 0.00365 \ inlf_i
\end{aligned}`
$$

- How do we interpret the coefficient on `inlf`?

- Labor force participants sleep `36.65%` less than non-participants.

- Individuals that are not in the labor force sleep `36.92%`% more than participants.

---

# Slope *dummy* variables

---

# Slope *dummy* variables

---

# Slope *dummy* variables

$$
`\begin{align}
y_i = \beta_0 + \beta_1x_{1i} + \beta_2D_{i} + \beta_3D_ix_{1i} +  u_i
\end{align}`
$$
--

$$
`\begin{align}
\dfrac{\partial \ y}{\partial \ x_1} = \beta_1 + \beta_3 \ \cdot \ D
\end{align}`
$$
--

$$
`\begin{align}
\dfrac{\partial \ y}{\partial \ D} = \beta_2 + \beta_3 \ \cdot \ x_1
\end{align}`
$$

$$
`\begin{align}
wage_i = \beta_0 + \beta_1educ_{i} + \beta_2female_{i} + \beta_3 educ_i \cdot female_i +  u_i
\end{align}`
$$

$$
`\begin{align}
\dfrac{\partial \ wage}{\partial \ educ} = \beta_1 + \beta_3 \ \cdot \ female
\end{align}`
$$
--

$$
`\begin{align}
\dfrac{\partial \ wage}{\partial \ female} = \beta_2 + \beta_3 \ \cdot \ educ
\end{align}`
$$

---

# Next time: Functional forms in practice

---
exclude: true