Statistics Review I

class: center, middle, inverse, title-slide

# Statistics Review I
## EC 320: Introduction to Econometrics
### Winter 2022

---

class: inverse, middle

# Prologue

---
# Housekeeping

.small[
.hi-pink[Office hours]
  - Kyu's: T 1500-1600 & R 1400-1500
  - Mine : MW 1400-1500

.hi-pink[Lab today] at 4 p.m. MCK 442

.hi-pink[Exercise 1] this Friday by 11:59 p.m. 
  - Just this once. You'll need to submit other exercises normally by Wednesday and not Friday
  - No need to worry. The assigned exercises shouldn't take longer than the lab time. If you attend the lab, you'll be able to complete the exercise on the spot.
  - **[SUBMIT FORMAT]** Please have your work knitted in **html** format.

.hi-pink[Problem Set 1] will be posted by the end of this week, which will be due next Friday 11:59 p.m.
 
.hi-pink[Issues with .mono[R]?]
- After class
- I have office hours today after class (14:00-15:00).
]

---
# Motivation

The focus of our course is __regression analysis__, a useful toolkit for learning from data.

To understand regression, its mechanics, and its pitfalls, __we need to understand the underlying statistical theory.__

- Insights from theory can help us become better practitioners and savvier consumers of science.

Today, we will review important concepts you learned in Math 243.

- Maybe some you missed, too.

---
class: inverse, middle

# A Brief Math Review

---
# Notation

__Data__ on a variable `$X$` __are__<sup>*</sup> a sequence of `$n$` observations, indexed by `$i$`: `$$\{x_i: 1, \dots, n \}.$$`

.pull-left[

<div id="htmlwidget-8513e4377bf99cb83a41" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-8513e4377bf99cb83a41">{"x":{"filter":"none","vertical":false,"caption":"<caption>Example: \$n = 5\$<\/caption>","data":[[1,2,3,4,5],[8,9,4,7,2]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th>\$i\$<\/th>\n      <th>\$x_i\$<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"dom":"t","ordering":false,"columnDefs":[{"className":"dt-center","targets":"_all"}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

.pull-right[

- `$i$` indicates the row number.

- `$n$` is the number of rows.

- `$x_i$` is the value of `$X$` for row `$i$`.

]

.footnote[
<sup>*</sup> _Data_ .mono[=] __plural__ of _datum_.
]

---
# Summation

The __summation operator__ adds a sequence of numbers over an index:

`$$\sum_{i=1}^{n} x_i \equiv x_1 + x_2 + \dots + x_n.$$`

- "The sum of `$x_i$` from 1 to `$n$`."

.pull-left[

<div id="htmlwidget-60a130a718c5cb49aa43" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-60a130a718c5cb49aa43">{"x":{"filter":"none","vertical":false,"caption":"<caption>Example: \$n = 4\$<\/caption>","data":[[1,2,3,4],[7,4,10,2]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th>\$i\$<\/th>\n      <th>\$x_i\$<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"dom":"t","ordering":false,"columnDefs":[{"className":"dt-center","targets":"_all"}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

.pull-right[

$$
`\begin{aligned}
 \sum_{i=1}^{4} x_i &= 7 + 4 + 10 + 2 \\
               &= 23
\end{aligned}`
$$

]

---
# Summation

## Rule 1

For any constant `$c$`, `$$\sum_{i=1}^{n} c = nc.$$`

.pull-left[

<div id="htmlwidget-1ca579f9e3e6722f381e" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-1ca579f9e3e6722f381e">{"x":{"filter":"none","vertical":false,"caption":"<caption>Example: \$n = 4\$<\/caption>","data":[[1,2,3,4],[2,2,2,2]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th>\$i\$<\/th>\n      <th>\$c\$<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"dom":"t","ordering":false,"columnDefs":[{"className":"dt-center","targets":"_all"}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

.pull-right[

$$
`\begin{aligned}
 \sum_{i=1}^{4} 2 &= 4 \times 2 \\
                  &= 8
\end{aligned}`
$$

]

---
# Summation

## Rule 2

For any constant `$c$`, `$$\sum_{i=1}^{n} cx_i = c \sum_{i=1}^{n} x_i.$$`

.pull-left[

<div id="htmlwidget-2a3ad0e91ba064ce3643" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-2a3ad0e91ba064ce3643">{"x":{"filter":"none","vertical":false,"caption":"<caption>Example: \$n = 3\$<\/caption>","data":[[1,2,3],[2,2,2],[7,4,10]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th>\$i\$<\/th>\n      <th>\$c\$<\/th>\n      <th>\$x_i\$<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"dom":"t","ordering":false,"columnDefs":[{"className":"dt-center","targets":"_all"}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

.pull-right[

$$
`\begin{aligned}
 \sum_{i=1}^{3} 2x_i &= 2\times7 + 2\times4 + 2 \times10\\
               &= 14 + 8 + 20\\
               &= 42
\end{aligned}`
$$

$$
`\begin{aligned}
 2 \sum_{i=1}^{3} x_i &= 2(7 + 4 + 10) \\
               &= 42
\end{aligned}`
$$

]

---
# Summation

## Rule 3

If `$\{(x_i, y_i): 1, \dots, n \}$` is a set of `$n$` pairs, and `$a$` and `$b$` are constants, then `$$\sum_{i=1}^{n} (ax_i + by_i) = a \sum_{i=1}^{n} x_i + b \sum_{i=1}^{n} y_i.$$`

.pull-left[

<div id="htmlwidget-9bf4504ac15a8f7ccd71" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-9bf4504ac15a8f7ccd71">{"x":{"filter":"none","vertical":false,"caption":"<caption>Example: \$n = 2\$<\/caption>","data":[[1,2],[2,2],[7,4],[1,1],[4,2]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th>\$i\$<\/th>\n      <th>\$a\$<\/th>\n      <th>\$x_i\$<\/th>\n      <th>\$b\$<\/th>\n      <th>\$y_i\$<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"dom":"t","ordering":false,"columnDefs":[{"className":"dt-center","targets":"_all"}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

.pull-right[

$$
`\begin{aligned}
 \sum_{i=1}^{2} (2x_i + y_i) &= 18 + 10 \\
               &= 28
\end{aligned}`
$$

$$
`\begin{aligned}
 2 \sum_{i=1}^{2} x_i + \sum_{i=1}^{2} y_i &= 2 \times 11 + 6 \\
               &= 28
\end{aligned}`
$$

]

---
# Summation

## __Caution__

The .hi-purple[sum of the ratios] __is not__ the .hi-green[ratio of the sums]: `$$\color{#9370DB}{\sum_{i=1}^{n} x_i / y_i} \neq \color{#007935}{\left(\sum_{i=1}^{n} x_i \right) \Bigg/ \left(\sum_{i=1}^{n} y_i \right)}.$$`

- If `$n = 2$`, then `$\frac{x_1}{y_1} + \frac{x_2}{y_2} \neq \frac{x_1 + x_2}{y_1 + y_2}$`.

The .hi-purple[sum of squares] __is not__ the .hi-green[square of the sums]: `$$\color{#9370DB}{\sum_{i=1}^{n} x_i^2} \neq \color{#007935}{\left(\sum_{i=1}^{n} x_i \right)^2}.$$`

- If `$n = 2$`, then `$x_1^2 + x_2^2 \neq (x_1 + x_2)^2 = x_1^2 + 2x_1x_2 + x_2^2$`.

---
class: inverse, middle

# Probability Review

---
# Random Variables

__Experiment:__ Any procedure that is _infinitely repeatable_ and has a _well-defined set of outcomes_.

__Sample space:__ The set of all possible outcomes an experiment could generate

__Event:__ A subset of the .purple[sample space] or a combination of outcomes

__Random Variable:__ A variable with _numerical values determined by an experiment or a random phenomenon_. This could also be considered as a function that maps elements of the sample space to another set (often set of the real numbers).

---

# Random Variables

.pull-left[
Example 1:

__Experiment:__ Roll two six-sided dice

__Sample space:__ {11, 12,  ... ,  21, 22, ... , 66}

__Event:__ The sum of two top number is equal to 10, this could be combination of outcomes {46, 55, 64}

__Random variable:__ The sum of two top number {2, 3, ..., 12}
]

.pull-right[
Example 2:

__Experiment:__ Tossing a coin two times

__Sample space:__ {HH, HT, TH, TT}

__Event:__ The fist toss results in a Heads, {HH, HT}

__Random variable:__ The number of heads, {0, 1, 2}
]

---
# Random Variables

__Notation:__ capital letters for random variables (_e.g._, `$X$`, `$Y$`, or `$Z$`) and lowercase letters for particular outcomes (_e.g._, `$x$`, `$y$`, or `$z$`).

__Example 1:__ Flipping a coin.

- Two outcomes: heads or tails.
- Quantify the outcomes: Define a random variable `$\text{Heads}$` such that `$\text{Heads}=1$` if heads and `$\text{Heads}=0$` if tails.

__Example 2:__ Flipping a coin 10 times.

- Several outcomes: 10 heads and 0 tails, 9 heads and 1 tails, 8 heads and 2 tails, _etc_.
- The number of heads is a random variable: `$\{\text{Heads}: 0,1,2,3,4,5,6,7,8,9,10\}.$`

---
# Discrete Random Variables

__Discrete Random Variable:__ A random variable that takes a countable set of values.

A __Bernoulli__ (or binary) random variable takes values of either 1 or 0.

- Characterized by `$\mathop{\mathbb{P}}(X=1)$`, "the probability of success."

- Probabilities sum to 1: `$\mathop{\mathbb{P}}(X=1) + \mathop{\mathbb{P}}(X=0) = 1$`.

- For a "fair" coin, `$\mathop{\mathbb{P}}(\text{Heads}=1)=\frac{1}{2} \implies \mathop{\mathbb{P}}(\text{Heads}=0)=\frac{1}{2}$`.

- More generally, if `$\mathop{\mathbb{P}}(X=1) = \theta$` for some `$\theta \in [0,1]$`, then `$\mathop{\mathbb{P}}(X=0) = 1 - \theta$`.

- If the probability of passing this class is 75%, then the probability of not passing is 25%.

---
# Discrete Random Variables

## Probabilities

We describe a discrete random variable by listing its possible values with associated probabilities.

If `$X$` takes on `$k$` possible values `$\{x_1, \dots, x_k\}$`, then the probabilities `$p_1, p_2, \dots, p_k$` are defined by `$$p_j = \mathop{\mathbb{P}}(X=x_j), \quad j = 1,2, \dots, k,$$` where `$$p_j \in [0,1]$$` and `$$p_1 + p_2 + \dots + p_k = 1.$$`

---
# Discrete Random Variables

## Probability density function

The __probability density function__ (__pdf__) of `$X$` summarizes possible outcomes and associated probabilities: `$$f(x_j)=p_j, \quad j=1,2,\dots,k.$$`

## Example

2020 Presidential election: 538 electoral votes at stake.

- `$\{X:0,1, \dots, 538\}$` is the number of electoral votes won by the Democratic candidate. 
- Extremely unlikely that she will win 0 votes or all 538 votes: `$f(0) \approx 0$` and `$f(538) \approx 0$`.
- Nonzero probability of winning an exact majority: `$f(270) > 0$`.

---
# Discrete Random Variables

## Example

Basketball player goes to the foul line to shoot two free throws.

- `$X$` is the number of shots made (either 0, 1, or 2).

- Suppose the pdf of `$X$` is `$f(0)= 0.3$`, `$f(1) = 0.4$`, `$f(2) = 0.3$`.

- __Note:__ the probabilities sum to 1.

Use the pdf to calculate the probability of the .hi-green[event] that the player makes _at least one shot_, _i.e._, `$\mathop{\mathbb{P}}(X \geq 1)$`.

- `$\mathop{\mathbb{P}}(X \geq 1) = \mathop{\mathbb{P}}(X=1) + \mathop{\mathbb{P}}(X=2)= 0.4 + 0.3 = 0.7$`.