Take-home rubric/key, EC421

Q01

Load whichever packages you think you’ll need and then load the dataset (data-final.csv).

Answer: (5 points)

Q02

Is this dataset an example of “cross-sectional data” or “time-series data”? Explain your answer.

Answer: (5 points) Right or wrong.

Q03

Create histograms for the variables pop_total (total population), pct_nonwhite (the percent of the population that does not identify as ‘white’), and pm_24hr (measured PM pollution).

Make sure the histograms are well labeled.

Answer: (15 points) 5 points per histogram. -5 for no labels.

Q04

Create a scatterplot where the percent of the population who does not identify as white is on the x axis (pct_nonwhite) and the log of PM pollution is on the y axis (log(pm_24hr)).

Answer: (5 points)

Q05

Regress the log of PM pollution (the log of pm_24hr) on the non-white percent of the population (pct_nonwhite). Report your results (no interpretation needed).

Answer: (10 points) lm(log(pm_24hr) ~ pct_nonwhite, data)

  • -5 for wrong numbers but right code.
  • -7.5 for flipping log-linear.
## OLS estimation, Dep. Var.: log(pm_24hr)
## Observations: 543 
## Standard-errors: IID 
##              Estimate Std. Error  t value   Pr(>|t|)    
## (Intercept)  2.962885   0.038832 76.29991  < 2.2e-16 ***
## pct_nonwhite 0.555403   0.144404  3.84619 0.00013426 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.538776   Adj. R2: 0.024817

Q06

Interpret the coefficient on pct_nonwhite.

Answer: (10 points) 100-percentage-point increase in pct_nonwhite increases pollution by 55.5 percent (holding everything else constant).

  • -3 for “one unit change in pct_nonwhite.
  • -3 for “change” without “increase” or “decrease”.
  • -5 for “1 percent” instead of “100-percentage point”
  • -5 for ignoring log-linear percent interpretation.

Q07

How would measurement error in pct_nonwhite affect the estimated coefficient on pct_nonwhite?

Answer: (5 points)

Q08

Suppose urban areas have (1) more pollution and (2) larger non-white population percentages. Will omitting the variable “urban” from our regression cause us to over-estimate, under-estimate, or have no effect on the coefficient on pct_nonwhite? Explain your answer.

Answer: (10 points) Bias upward.

  • -7.5 Wrong type of bias.

Q09

Regress the log of PM pollution (the log of pm_24hr) on the non-white percent of the population (pct_nonwhite) and the log of total population (pop_total). Report your results (no interpretation needed).

Answer: (10 points) feols(log(pm_24hr) ~ pct_nonwhite + log(pop_total), data)

## OLS estimation, Dep. Var.: log(pm_24hr)
## Observations: 543 
## Standard-errors: IID 
##                Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept)    2.532558   0.204246 12.39953 < 2.2e-16 ***
## pct_nonwhite   0.400855   0.160941  2.49070  0.013049 *  
## log(pop_total) 0.038961   0.018157  2.14578  0.032336 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.536493   Adj. R2: 0.031271

Q10

Interpret the coefficient on log(pop_total).

  • -3 for “change” without “increase” or “decrease”.
  • -5 for wrong numbers but right code.

Answer: (10 points) One-percent increase in population leads to 0.038-percent increase in pollution (holding all else constant).

Q11

Based on the change in the coefficient on pct_nonwhite between the regressions in Q05 and Q10, do you think omitting population was causing bias? Explain your answer.

Answer: (10 points) Reasonableness.

Q12

Should we be concerned about autocorrelation in this setting? Briefly explain your answer.

Answer: (10 points) No. (Right or wrong)

Q13

Should we be concerned heteroskedasticity and/or correlated disturbances? Briefly explain your answer.

Answer: (10 points) Yes.