Load whichever packages you think you’ll need and then load the dataset (data-final.csv
).
Answer: (5 points)
Is this dataset an example of “cross-sectional data” or “time-series data”? Explain your answer.
Answer: (5 points) Right or wrong.
Create histograms for the variables pop_total
(total population), pct_nonwhite
(the percent of the population that does not identify as ‘white’), and pm_24hr
(measured PM pollution).
Make sure the histograms are well labeled.
Answer: (15 points) 5 points per histogram. -5 for no labels.
Create a scatterplot where the percent of the population who does not identify as white is on the x axis (pct_nonwhite
) and the log of PM pollution is on the y axis (log(pm_24hr)
).
Answer: (5 points)
Regress the log of PM pollution (the log of pm_24hr
) on the non-white percent of the population (pct_nonwhite
). Report your results (no interpretation needed).
Answer: (10 points) lm(log(pm_24hr) ~ pct_nonwhite, data)
## OLS estimation, Dep. Var.: log(pm_24hr)
## Observations: 543
## Standard-errors: IID
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.962885 0.038832 76.29991 < 2.2e-16 ***
## pct_nonwhite 0.555403 0.144404 3.84619 0.00013426 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.538776 Adj. R2: 0.024817
Interpret the coefficient on pct_nonwhite
.
Answer: (10 points) 100-percentage-point increase in pct_nonwhite
increases pollution by 55.5 percent (holding everything else constant).
pct_nonwhite
.How would measurement error in pct_nonwhite
affect the estimated coefficient on pct_nonwhite
?
Answer: (5 points)
Suppose urban areas have (1) more pollution and (2) larger non-white population percentages. Will omitting the variable “urban” from our regression cause us to over-estimate, under-estimate, or have no effect on the coefficient on pct_nonwhite
? Explain your answer.
Answer: (10 points) Bias upward.
Regress the log of PM pollution (the log of pm_24hr
) on the non-white percent of the population (pct_nonwhite
) and the log of total population (pop_total
). Report your results (no interpretation needed).
Answer: (10 points) feols(log(pm_24hr) ~ pct_nonwhite + log(pop_total), data)
## OLS estimation, Dep. Var.: log(pm_24hr)
## Observations: 543
## Standard-errors: IID
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.532558 0.204246 12.39953 < 2.2e-16 ***
## pct_nonwhite 0.400855 0.160941 2.49070 0.013049 *
## log(pop_total) 0.038961 0.018157 2.14578 0.032336 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.536493 Adj. R2: 0.031271
Interpret the coefficient on log(pop_total)
.
Answer: (10 points) One-percent increase in population leads to 0.038-percent increase in pollution (holding all else constant).
Based on the change in the coefficient on pct_nonwhite
between the regressions in Q05 and Q10, do you think omitting population was causing bias? Explain your answer.
Answer: (10 points) Reasonableness.
Should we be concerned about autocorrelation in this setting? Briefly explain your answer.
Answer: (10 points) No. (Right or wrong)
Should we be concerned heteroskedasticity and/or correlated disturbances? Briefly explain your answer.
Answer: (10 points) Yes.