Take-home final
EC 421: Introduction to Econometrics
1 Academic honesty
You are not allowed to work with anyone else. Working with anyone else will be considered cheating. You will receive a zero for both parts of the exam and may fail the class.
You can use online materials, books, notes, solutions, etc. However, you still must put all of your answers in your own words. Copying other people’s words will be considered cheating.
Kyu and Ed will not help you debug your code. Do not ask.
2 Instructions
Due Upload your answers to Canvas before 11:59 pm (Pacific) on Thursday, 15 June 2023.
Important You must submit your answers as an HTML or PDF file, built from an RMarkdown (.RMD
) or Quarto (.qmd
) file. Do not submit the .RMD
or .qmd
file. You will not receive credit for it.
If we ask you to create a figure or run a regression, then the figure or the regression results should be in the document that you submit (not just the code—we want the actual figure or regression output with coefficients, standard errors, etc.).
3 Setup and data
Source Data come from 2021 American Community Survey (ACS) public-use microdata downloaded from the US Census (with codebooks).
We are using a subset of the sample that focuses on approximately 19 thousand individuals between the ages of 25 and 65 (in 2021) who rent their housing in California.
The table below describes each variable in the data.
Variable name | Variable description |
---|---|
id |
Household ID |
year_built |
The year the housing was built |
cost_rent |
The cost of rent (2021 USD) |
cost_electricity |
Amount paid for electricity (USD, in a year) |
cost_gas |
Amont paid for natural gas (USD, in a year) |
cost_water |
Amount paid for water (USD, in a year) |
i_heat_electric |
Indicator for whether the household heats with electricity |
i_heat_gas |
Indicator for whether the household heats with natural gas |
i_heat_wood |
Indicator for whether the household heats with wood |
income_100k |
The individual’s income (unit is hundreds of thousands of USD) |
i_health_ins |
Indicator for whether the individual has health insurance |
age |
The individual’s age |
i_female |
Indicator for whether the individual identified as female |
i_born_usa |
Indicator for whether the individual was born in the US |
i_employed |
Indicator for whether the individual is employed |
i_amin |
Indicator for whether the individual identified as American Indian |
i_asian |
Indicator for whether the individual identified as Asian |
i_black |
Indicator for whether the individual identified as Black |
i_white |
Indicator for whether the individual identified as White |
i_hispanic |
Indicator for whether the individual identified as Hispanic |
yrs_education |
The individual’s (approximate) years of education |
4 General instructions
Data You will need the data contained in exam-data.csv
.
Points There are 60 points available on this portion of the final. The in-class portion of the final is worth 140 points. Your total final-exam grade will be the sum of the points you earned on the two parts divided by 200 (= 140 + 60).
5 Prompts
[01] (15 points) Some cities and states are looking for ways to transition away from fossil-fuel-based heating. One concern with this policy is that it could disproportionately affect poorer or historically marginalized communities.
Let’s use our data on renters in California to explore this concern.
Regress the indicator for whether the individual has natural-gas heating (i_heat_gas
) on the individual’s income (income_100k
). Interpret the intercept and the coefficient. Then discuss whether the results provide evidence for or against (or neither) the concern that natural-gas bans will disproportionately affect poorer individuals.
[02] (5 points) Does omitted variable bias matter for the regression in [01]? Explain your answer.
[03] (5 points) When I asked ChatGPT about omitted-variable bias, it was concerned about the age of the house. Specifically, it said
Older houses are more likely to use natural gas heating because they were built when natural gas was the most common heating source. If older houses are also more likely to be rented by lower-income individuals (perhaps because they’re cheaper), then housing age is correlated with both income and natural gas heating. If we don’t include housing age in our model, then we may falsely conclude that lower-income individuals are more likely to have natural gas heating, when in reality, it’s because they’re more likely to live in older homes.
Run two regressions and report the results (you don’t need to interpret the estimates):
- Regress the indicator for gas heating on the year the building was built (
year_built
). - Regress the individual’s income on the year the building was built (
year_built
).
[04] (5 points) Based upon the estimates for the regressions in [03], how would omitting building age bias the coefficient estimates in [1]? Explain your answer.
[05] (15 points) Add the indicator for whether the individual is Asian (i_asian
) and its interaction with income to the regression from [1]. Interpret the intercept and all coefficients.
[06] (10 points) Add a new variable to the dataset that equals the sum of the costs of electricity (cost_electricity
) and natural gas (cost_gas
). Call this new variable cost_utilities
. Now regress this new variable (cost_utilities
) on the indicator for whether the individual has natural-gas heating (i_heat_gas
) and income. Interpret the coefficients and discuss whether renters with natural gas heating pay less/more/neither in total utility costs.
[07] (5 points) Based upon your analyses in [01–06], what are your conclusions about the equity concerns about shifting from natural gas to electricity? Are there additional analyses you think should be done before policymakers make big decisions? Explain your answers.