Exam 1

Question

The daily expenses of summer tourists in Turin are analyzed. A survey with $84$ tourists is conducted. This shows that the tourists spend on average $116.7$ EUR. The sample variance $s^2_{n-1}$ is equal to $164.3$ .

You are asked to determine a $95\%$ confidence interval for the average daily expenses (in EUR) of a tourist. Using your computations, tell me which of the following statements is correct:
1. The lower bound of the confidence interval is 113.959
2. The upper bound of the confidence interval is 119.008
3. None of the statements are correct.
Solution

The $95\%$ confidence interval for the average expenses $\mu$ is given by: $\begin{align} & & \left[\bar{y} \, - \, 1.96\sqrt{\frac{s_{n-1}^2}{n}}, \; \bar{y} \, + \, 1.96\sqrt{\frac{s_{n-1}^2}{n}}\right] \\ & = & \left[ 116.7 \, - \, 1.96\sqrt{\frac{164.3}{84}}, \; 116.7 \, + \, 1.96\sqrt{\frac{164.3}{84}}\right] \\ & = & \left[113.959, \, 119.441\right]. \end{align}$
1. True
2. False
3. False
Question

Which of the following statements about the regression model $y = b_0 + b_1 x + e$ and it’s associated predicted values $\hat{y}_i$ are correct?
1. On average, if $\hat{y}_i$ is greater than $\bar{y}$ , then the corresponding $e_i$ is smaller than it’s mean
2. On average, if $\hat{y}_i$ is smaller than $\bar{y}$ , then the corresponding $e_i$ is also smaller than it’s mean
3. On average, if $\hat{y}_i$ is smaller than $\bar{y}$ , then the corresponding $e_i$ is greater than it’s mean
4. On average, if $\hat{y}_i$ is greater than $\bar{y}$ , then the corresponding $e_i$ is also greater than it’s mean
5. None of the provided answers is correct.
Solution

We talked about this solution in class with a picture and some algebra. You can consult the solution here. You can see it under Task 4 here.
1. False. We have that seen that $\hat{y}$ and $e$ are orthogonal, i.e. $Cov(\hat{y},e) = 0$ .
2. False. We have that seen that $\hat{y}$ and $e$ are orthogonal, i.e. $Cov(\hat{y},e) = 0$ .
3. False. We have that seen that $\hat{y}$ and $e$ are orthogonal, i.e. $Cov(\hat{y},e) = 0$ .
4. False. We have that seen that $\hat{y}$ and $e$ are orthogonal, i.e. $Cov(\hat{y},e) = 0$ .
5. True. We have that seen that $\hat{y}$ and $e$ are orthogonal, i.e. $Cov(\hat{y},e) = 0$ .
Question

Which of the following statements about the regression model $y = b_0 + b_1 x + e$ and it’s associated residuals values $e_i$ are correct?
1. The mean of the residuals from a linear regression is zero only if we include a slope.
2. None of the provided answers is correct.
3. The mean of the residuals from a linear regression is always zero, regardless of whether there is an intercept or not.
4. The mean of the residuals from a linear regression is zero only if we include an intercept.
Solution

The solution to this question is in slide 32/40 of lecture 5
1. False.
2. False.
3. False.
4. True.
Question

Consider the following dataset and the fitted line:

plot of chunk nonlinearplot
1. the equation for the OLS fitted line looks like $y = b_0 + b_1 x + e$
2. OLS cannot represent nonlinear data. As the name states, its ordinary linear squares.
3. None of the provided answers is correct.
4. the equation for the OLS fitted line looks like $y = b_0 + b_1 x + b_2 x^2 + e$
5. the equation for the OLS fitted line looks like $y = b_0 + b_1 x + b_2 x + e$
Solution
1. False.
2. False.
3. False.
4. True.
5. False.

Question

You are analysing the determinants of pay, using 420 observations. Each of the columns displays the OLS regression associated to one of the following models, where male is a binary indicator equal to 1 if an individual is male. All models were run on the same input dataset. Which of the following statements is correct?

plot of chunk interaction-plot

	(1)	(2)
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	34.012***	22.040***
	(0.854)	(1.205)
education	-1.134***	0.373+
	(0.173)	(0.191)
maleTRUE		10.656***
		(0.854)
Num.Obs.	420	420
R2	0.094	0.340

The plot corresponds to model (2)
The estimate for (Intercept) in model (2) is statistically significant at least at the 5% level
None of the statements is correct.

Solution

TRUE
TRUE
FALSE

Question

For 64 firms the number of employees $X$ and the amount of expenses for continuing education $Y$ (in EUR) were recorded. The statistical summary of the data set is given by:

Variable $X$ Variable $Y$

Mean 53.38 235.8

Variance 109.86 2826.75

The covariance between $X$ and $Y$ is equal to 481.65.

Estimate the expected amount of money spent for continuing education by a firm with 47 employees using least squares regression. Your solution should be rounded to 2 digits. Which of the following statements are correct?
1. The expected amount of money spent is 209.81
2. The intercept of your regression is 1.77
3. None of the above statements is correct.
Solution

First, the regression line $y_i = \beta_0 + \beta_1 x_i + \varepsilon_i$ is determined. The regression coefficients are given by: $\begin{eqnarray*} && \hat \beta_1 = \frac{cov(x,y)}{var(x)} = \frac{481.65}{109.86} = 4.38422, \\ && \hat \beta_0 = \bar y - \hat \beta_1 \cdot \bar x = 235.8 - 4.38422 \cdot 53.38 = 1.77054. \end{eqnarray*}$

The estimated amount of money spent by a firm with 47 employees is then given by: $\begin{eqnarray*} \hat y = 1.77054 + 4.38422 \cdot 47 = 207.83. \end{eqnarray*}$
1. False
2. True
3. False
Question

The following figure shows a scatterplot. Notice that you can visually estimate the standard deviation of a normally distributed random variable by dividing it’s range by 6. Both $y$ and $x$ are normally distributed in this example. Which of the following statements are correct?

plot of chunk scatterplot
1. The standard deviation of $Y$ is at least $6$ .
2. For $X = 28$ , $Y$ can be expected to be about $36.1$ .
3. The mean of $Y$ is at least $30$ .
4. The absolute value of the correlation coefficient is at most $0.8$ .
5. The scatterplot is standardized.
Solution
1. True. The standard deviation of $Y$ is about equal to $20$ and is therefore larger than $6$ .
2. True. The regression line at $X=28$ implies a value of about $Y = 36.1$ .
3. True. The mean of $Y$ is about equal to $40$ and hence is larger than $30$ .
4. False. A strong association between the variables is given in the scatterplot. Hence the absolute value of the correlation coefficient is close to $1$ and therefore larger than $0.8$ .
5. False. The scatterplot is not standardized, because $X$ and $Y$ do not both have mean $0$ and variance $1$ .
Question

Below we show the summary of running 2 models (A and B) for 300 times. Each time we generate a dataset containing $Y,X_1,X_2,e$ where $e \sim N(0,1)$ and we run the regression

$Y = b_0 + b_1 X_1 + b_2 X_2 + e$

Both models A and B differ in how strongly $X_1$ and $X_2$ are correlated with each other. The true population parameters are $\beta_1 = 4$ and $\beta_2 = 1$ .

Model A Model B

Mean $b_1$ 4 3.99

Mean $b_2$ 1 1.01

Mean SE of $b_1$ 0.04 0.29

Mean SE of $b_2$ 0.04 0.29

Which of the following statements are true? When I say below that 2 numbers differ significantly I mean that they should differ by several multiples, i.e. $a$ and $b$ would differ if $b = n \times a$ for $n \geq 2$ .
1. The average point estimates from both models for both slope coefficients do not differ significantly.
2. The averages of the standard error of estimates from both models for both slope coefficients do not differ significantly.
3. Given this evidence, Model B can be described as a situation of multicollinearity.
4. Given this evidence, under multcollinearity, OLS is biased.
Solution
1. True
2. False
3. True
4. False
Question

The university introduces mandatory Moodle quizzes in 2024 for one course (treated), while another similar course (control) continues without them. The table below reports the average exam scores recorded in each of the cases:

$\begin{array}{lcc} \textbf{Group} & \textbf{2023 (Before)} & \textbf{2024 (After)} \\ \hline \text{Treated (with Moodle)} & 23.5 & 26.9 \\ \text{Control (no Moodle)} & 21.7 & 21.9 \\ \hline \end{array}$

Compute the Difference-in-Differences (DiD) estimate of the effect of introducing Moodle quizzes. Which of the following answers is correct?
1. The DiD estimate is equal to: -3.6
2. The DiD estimate is equal to: 3.2
3. The DiD estimate is equal to: 3.6
4. None of the above statements is correct.
Solution
1. False
2. True
3. False
4. False
Question

Which of the following statements about the abbreviation of BLUE in the context of the OLS estimator is correct? BLUE means…
1. Best Linear Unbiased Exception.
2. Biased Linear Unconditional Estimator.
3. Best Linear Unbiased Estimator.
4. Best Linear Unconditional Estimator.
5. Binary Linear Unbiased Estimator.
Solution
1. False.
2. False.
3. True.
4. False.
5. False.

	Model A	Model B
Mean $b_1$	4	3.99
Mean $b_2$	1	1.01
Mean SE of $b_1$	0.04	0.29
Mean SE of $b_2$	0.04	0.29

	Variable $X$	Variable $Y$
Mean	53.38	235.8
Variance	109.86	2826.75