Exam 1

  1. Question

    The daily expenses of summer tourists in Turin are analyzed. A survey with 8484 tourists is conducted. This shows that the tourists spend on average 116.7116.7 EUR. The sample variance sn12s^2_{n-1} is equal to 164.3164.3.

    You are asked to determine a 95%95\% confidence interval for the average daily expenses (in EUR) of a tourist. Using your computations, tell me which of the following statements is correct:


    1. The lower bound of the confidence interval is 113.959
    2. The upper bound of the confidence interval is 119.008
    3. None of the statements are correct.

    Solution

    The 95%95\% confidence interval for the average expenses μ\mu is given by: [y1.96sn12n,y+1.96sn12n]=[116.71.96164.384,116.7+1.96164.384]=[113.959,119.441].\begin{align} & & \left[\bar{y} \, - \, 1.96\sqrt{\frac{s_{n-1}^2}{n}}, \; \bar{y} \, + \, 1.96\sqrt{\frac{s_{n-1}^2}{n}}\right] \\ & = & \left[ 116.7 \, - \, 1.96\sqrt{\frac{164.3}{84}}, \; 116.7 \, + \, 1.96\sqrt{\frac{164.3}{84}}\right] \\ & = & \left[113.959, \, 119.441\right]. \end{align}


    1. True
    2. False
    3. False

  2. Question

    Which of the following statements about the regression model y=b0+b1x+ey = b_0 + b_1 x + e and it’s associated predicted values ŷi\hat{y}_i are correct?


    1. On average, if ŷi\hat{y}_i is greater than y\bar{y}, then the corresponding eie_i is smaller than it’s mean
    2. On average, if ŷi\hat{y}_i is smaller than y\bar{y}, then the corresponding eie_i is also smaller than it’s mean
    3. On average, if ŷi\hat{y}_i is smaller than y\bar{y}, then the corresponding eie_i is greater than it’s mean
    4. On average, if ŷi\hat{y}_i is greater than y\bar{y}, then the corresponding eie_i is also greater than it’s mean
    5. None of the provided answers is correct.

    Solution

    We talked about this solution in class with a picture and some algebra. You can consult the solution here. You can see it under Task 4 here.


    1. False. We have that seen that ŷ\hat{y} and ee are orthogonal, i.e. Cov(ŷ,e)=0Cov(\hat{y},e) = 0.
    2. False. We have that seen that ŷ\hat{y} and ee are orthogonal, i.e. Cov(ŷ,e)=0Cov(\hat{y},e) = 0.
    3. False. We have that seen that ŷ\hat{y} and ee are orthogonal, i.e. Cov(ŷ,e)=0Cov(\hat{y},e) = 0.
    4. False. We have that seen that ŷ\hat{y} and ee are orthogonal, i.e. Cov(ŷ,e)=0Cov(\hat{y},e) = 0.
    5. True. We have that seen that ŷ\hat{y} and ee are orthogonal, i.e. Cov(ŷ,e)=0Cov(\hat{y},e) = 0.

  3. Question

    Which of the following statements about the regression model y=b0+b1x+ey = b_0 + b_1 x + e and it’s associated residuals values eie_i are correct?


    1. The mean of the residuals from a linear regression is zero only if we include a slope.
    2. None of the provided answers is correct.
    3. The mean of the residuals from a linear regression is always zero, regardless of whether there is an intercept or not.
    4. The mean of the residuals from a linear regression is zero only if we include an intercept.

    Solution

    The solution to this question is in slide 32/40 of lecture 5


    1. False.
    2. False.
    3. False.
    4. True.

  4. Question

    Consider the following dataset and the fitted line:

    plot of chunk nonlinearplot
    plot of chunk nonlinearplot

    1. the equation for the OLS fitted line looks like y=b0+b1x+ey = b_0 + b_1 x + e
    2. OLS cannot represent nonlinear data. As the name states, its ordinary linear squares.
    3. None of the provided answers is correct.
    4. the equation for the OLS fitted line looks like y=b0+b1x+b2x2+ey = b_0 + b_1 x + b_2 x^2 + e
    5. the equation for the OLS fitted line looks like y=b0+b1x+b2x+ey = b_0 + b_1 x + b_2 x + e

    Solution

    1. False.
    2. False.
    3. False.
    4. True.
    5. False.

  5. Question

    You are analysing the determinants of pay, using 420 observations. Each of the columns displays the OLS regression associated to one of the following models, where male is a binary indicator equal to 1 if an individual is male. All models were run on the same input dataset. Which of the following statements is correct?

    plot of chunk interaction-plot
    plot of chunk interaction-plot
    (1) (2)
    + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
    (Intercept) 34.012*** 22.040***
    (0.854) (1.205)
    education -1.134*** 0.373+
    (0.173) (0.191)
    maleTRUE 10.656***
    (0.854)
    Num.Obs. 420 420
    R2 0.094 0.340

    1. The plot corresponds to model (2)
    2. The estimate for (Intercept) in model (2) is statistically significant at least at the 5% level
    3. None of the statements is correct.

    Solution

    1. TRUE
    2. TRUE
    3. FALSE

  6. Question

    For 64 firms the number of employees XX and the amount of expenses for continuing education YY (in EUR) were recorded. The statistical summary of the data set is given by:

    Variable XX Variable YY
    Mean 53.38 235.8
    Variance 109.86 2826.75

    The covariance between XX and YY is equal to 481.65.

    Estimate the expected amount of money spent for continuing education by a firm with 47 employees using least squares regression. Your solution should be rounded to 2 digits. Which of the following statements are correct?


    1. The expected amount of money spent is 209.81
    2. The intercept of your regression is 1.77
    3. None of the above statements is correct.

    Solution

    First, the regression line yi=β0+β1xi+εiy_i = \beta_0 + \beta_1 x_i + \varepsilon_i is determined. The regression coefficients are given by: β̂1=cov(x,y)var(x)=481.65109.86=4.38422,β̂0=yβ̂1x=235.84.3842253.38=1.77054.\begin{eqnarray*} && \hat \beta_1 = \frac{cov(x,y)}{var(x)} = \frac{481.65}{109.86} = 4.38422, \\ && \hat \beta_0 = \bar y - \hat \beta_1 \cdot \bar x = 235.8 - 4.38422 \cdot 53.38 = 1.77054. \end{eqnarray*}

    The estimated amount of money spent by a firm with 47 employees is then given by: ŷ=1.77054+4.3842247=207.83.\begin{eqnarray*} \hat y = 1.77054 + 4.38422 \cdot 47 = 207.83. \end{eqnarray*}


    1. False
    2. True
    3. False

  7. Question

    The following figure shows a scatterplot. Notice that you can visually estimate the standard deviation of a normally distributed random variable by dividing it’s range by 6. Both yy and xx are normally distributed in this example. Which of the following statements are correct?

    plot of chunk scatterplot
    plot of chunk scatterplot

    1. The standard deviation of YY is at least 66.
    2. For X=28X = 28, YY can be expected to be about 36.136.1.
    3. The mean of YY is at least 3030.
    4. The absolute value of the correlation coefficient is at most 0.80.8.
    5. The scatterplot is standardized.

    Solution

    1. True. The standard deviation of YY is about equal to 2020 and is therefore larger than 66.
    2. True. The regression line at X=28X=28 implies a value of about Y=36.1Y = 36.1.
    3. True. The mean of YY is about equal to 4040 and hence is larger than 3030.
    4. False. A strong association between the variables is given in the scatterplot. Hence the absolute value of the correlation coefficient is close to 11 and therefore larger than 0.80.8.
    5. False. The scatterplot is not standardized, because XX and YY do not both have mean 00 and variance 11.

  8. Question

    Below we show the summary of running 2 models (A and B) for 300 times. Each time we generate a dataset containing Y,X1,X2,eY,X_1,X_2,e where eN(0,1)e \sim N(0,1) and we run the regression

    Y=b0+b1X1+b2X2+eY = b_0 + b_1 X_1 + b_2 X_2 + e

    Both models A and B differ in how strongly X1X_1 and X2X_2 are correlated with each other. The true population parameters are β1=4\beta_1 = 4 and β2=1\beta_2 = 1.

    Model A Model B
    Mean b1b_1 4 3.99
    Mean b2b_2 1 1.01
    Mean SE of b1b_1 0.04 0.29
    Mean SE of b2b_2 0.04 0.29

    Which of the following statements are true? When I say below that 2 numbers differ significantly I mean that they should differ by several multiples, i.e. aa and bb would differ if b=n×ab = n \times a for n2n \geq 2.


    1. The average point estimates from both models for both slope coefficients do not differ significantly.
    2. The averages of the standard error of estimates from both models for both slope coefficients do not differ significantly.
    3. Given this evidence, Model B can be described as a situation of multicollinearity.
    4. Given this evidence, under multcollinearity, OLS is biased.

    Solution

    1. True
    2. False
    3. True
    4. False

  9. Question

    The university introduces mandatory Moodle quizzes in 2024 for one course (treated), while another similar course (control) continues without them. The table below reports the average exam scores recorded in each of the cases:

    𝐆𝐫𝐨𝐮𝐩𝟐𝟎𝟐𝟑 (𝐁𝐞𝐟𝐨𝐫𝐞)𝟐𝟎𝟐𝟒 (𝐀𝐟𝐭𝐞𝐫)Treated (with Moodle)23.526.9Control (no Moodle)21.721.9\begin{array}{lcc} \textbf{Group} & \textbf{2023 (Before)} & \textbf{2024 (After)} \\ \hline \text{Treated (with Moodle)} & 23.5 & 26.9 \\ \text{Control (no Moodle)} & 21.7 & 21.9 \\ \hline \end{array}

    Compute the Difference-in-Differences (DiD) estimate of the effect of introducing Moodle quizzes. Which of the following answers is correct?


    1. The DiD estimate is equal to: -3.6
    2. The DiD estimate is equal to: 3.2
    3. The DiD estimate is equal to: 3.6
    4. None of the above statements is correct.

    Solution

    1. False
    2. True
    3. False
    4. False

  10. Question

    Which of the following statements about the abbreviation of BLUE in the context of the OLS estimator is correct? BLUE means…


    1. Best Linear Unbiased Exception.
    2. Biased Linear Unconditional Estimator.
    3. Best Linear Unbiased Estimator.
    4. Best Linear Unconditional Estimator.
    5. Binary Linear Unbiased Estimator.

    Solution

    1. False.
    2. False.
    3. True.
    4. False.
    5. False.