Slides.knit

class: center, inverse, middle

.pull-left-wide {
  float: left;
  width: 66%;
}
.pull-right-wide {
  float: right;
  width: 66%;
}
.pull-right-wide ~ p {
  clear: both;
}

.pull-left-narrow {
  float: left;
  width: 30%;
}
.pull-right-narrow {
  float: right;
  width: 30%;
}

.tiny123 {
  font-size: 0.40em;
}

.small123 {
  font-size: 0.80em;
}

.large123 {
  font-size: 2em;
}

.red {
  color: red
}

.orange {
  color: orange
}

.green {
  color: green
}
</style>

# Statistics
## Random variables
### (Chapter 4)

### Christian Vedel,<br>Department of Economics<br>University of Southern Denmark

### Email: [christian-vs@sam.sdu.dk](mailto:christian-vs@sam.sdu.dk)

### Updated 2026-02-23

.footnote[
.left[
.small123[
*Please beware. I work on these slides until the last minute before the lecture and push most changes along the way. Until the actual lecture, this is just a draft*
]
]
]

---
# This lecture

.pull-left-wide[

- Random variables: definition and types
- Discrete random variables:
  - probability and cumulative distribution functions
  - joint, marginal, and conditional probabilities
  - Bayes' theorem and independence
- Continuous random variables:
  - cumulative distribution and density functions
  - relationships between two continuous random variables
- Practice tasks throughout

]

---
# From events to random variables

.pull-left-wide[
- Last time: outcomes `$\omega \in \Omega$`, events `$A \subseteq \Omega$`, and probabilities `$P(A)$`.

- In applications we usually care about a *numerical* quantity (wage, price, number of children, etc.).

- A **random variable** assigns a number to each outcome: `$X:\Omega \to \mathbb{R}$`.

- Then we ask questions like:
  - discrete: `$P(X = x)$`
  - continuous: `$P(a < X \le b)$`
]

---
class: inverse, middle, center
# Definition of a random variable

---
# Random variable

.pull-left-wide[
- Start with a probability model: outcomes `$\omega \in \Omega$` and probabilities `$P(\cdot)$`.

- A random variable simply *labels each outcome with a number*.

> A **random variable** is a function `$X:\Omega\to\mathbb{R}$`.

- Examples:
  - coin toss (`$\Omega=\{H,T\}$`): `$$X(H)=0,\quad X(T)=1$$`
  - dice roll (`$\Omega=\{1,2,3,4,5,6\}$`): `$$X(k)=k \text{ for } k=1,\dots,6$$`
]

---
# Random variable

.pull-left-wide[
- As a general rule, we use letters at the end of the alphabet:
  - uppercase to denote random variables (e.g., `$X$`, `$Y$`)
  - lowercase for the specific values a random variable can take (e.g., `$x$`, `$y$`)
]

.pull-left-wide[
- Note that we can choose the values assigned to the various outcomes, but sometimes certain values may be more suited to the problem at hand
]

.pull-left-wide[
- For example, suppose that the experiment is "choose a person at random, ask her age"
  - outcomes: values of "age of the person" (say, 43 years)
  - random variable: assign a number to the outcome "age of person = 43 years"
  - "natural" value: 43
]

---
# Types of random variables

.pull-left-wide[
- There are two types of random variables based on the number of values they can take:
]

.pull-left-wide[
  - **discrete random variables** can take a *countable* (finite or infinite) number of values
    - examples: toss of a coin, roll of a dice, pick an integer
    - if an experiment has a finite number of outcomes, then the corresponding random variable is *usually* discrete
]

.pull-left-wide[
  - **continuous random variables** take an *uncountable* number of values
    - examples: exact quantity of rainfall over a year, exact profit corresponding to a share (profit divided by number of shares)
    - note that the same experiment can produce a discrete or a continuous random variable, depending on how "accurately" the outcome is measured
]

---
# .red[Practice: Random variables and types]

.pull-left-wide[

1. Define a random variable `$X$` for the experiment "draw one card from a standard deck".
2. Is your `$X$` discrete or continuous? Explain in one sentence.
3. Give one alternative coding of the same experiment and explain why it is still a valid random variable.

]

---
class: middle
# Why this can feel confusing

.pull-left-wide[
- We introduce *two* functions that both output numbers — so it’s easy to mix them up:

- **Probability measure:** `$P:\mathcal{F}\to[0,1]$`  
    input = an **event** `$A\subseteq\Omega$` (a set of outcomes)  
    output = a **probability**

- **Random variable:** `$X:\Omega\to\mathbb{R}$`  
    input = an **outcome** `$\omega\in\Omega$`  
    output = a **numerical value** (wage, indicator, count, etc.)

> *Probability measures* are the functions that assign probabilities to *events*, while *random variables* is just a way of describing those events. **Which we will then assign probabilities to using the probability measure.**

]

---
# Overview of concepts thusfar

### Concept hierarchy (Lectures 02–03)

.pull-left[
**World / research question** `$\rightarrow$` **uncertainty**

**Population / super-population**
  - we observe a **sample** via a **selection mechanism**
  - (population + selection + sample) = **experiment**

**Probability model** `$(\Omega,\mathcal{F},P)$`
  - outcomes `$\omega \in \Omega$`
  - events `$A \subseteq \Omega$`, with `$A \in \mathcal{F}$`
  - probabilities `$P(A)$`
]

.pull-right[
**Derived tools**
  - conditional probability `$P(A\mid B)$`
  - independence `$A \perp B$`
  - conditional independence `$A \perp B \mid C$`

.red[
**Random variables**: turn outcomes into numbers
  - `$X:\Omega\to\mathbb{R}$`
  - `$P(X\le x)=P(\{\omega: X(\omega)\le x\})$`
  - discrete: `$P(X=x)$`, continuous: `$P(a<X\le b)$`
]

]

---
class: inverse, middle, center
# Discrete random variables

---
# Probability function

.pull-left-wide[
- One advantage of using random variables is that now we work with numbers instead of abstract events
]

.pull-left-wide[
- So, we can use mathematical concepts built for numbers, such as functions
]

.pull-left-wide[
> The **probability function** `$f(\cdot)$` of a discrete random variable `$X$` is defined as:
> `$$f(x) = P(X = x)$$`
]

.pull-left-wide[
- When working with several random variables, say `$X$` and `$Y$`, it is sometimes useful to distinguish their probability functions by indexing them: `$f_X(x)$` and `$f_Y(y)$`
]

---
# Probability function

.pull-left-wide[
- Examples:
  - in the experiment of tossing a coin, we would have:
`$$f(0) = P(X = 0) = P(\text{"heads"}) = \frac{1}{2}$$`
`$$f(1) = P(X = 1) = P(\text{"tails"}) = \frac{1}{2}$$`
]

.pull-left-wide[
> **Properties of the probability function:**
>
> (i) `$0 \leq f(x) \leq 1$` for all `$x$`
>
> (ii) `$\displaystyle\sum_{i=1}^N f(x_i) = f(x_1) + f(x_2) + \ldots + f(x_N) = 1$`, where `$x_1, x_2, \ldots, x_N$` are the possible values of `$X$`
]

---
# Cumulative distribution function

.pull-left-wide[
> The **cumulative distribution function** `$F(\cdot)$` of a discrete random variable `$X$` is defined as:
> `$$F(x) = P(X \leq x) = \sum_{x_i \leq x} f(x_i)$$`
]

.pull-left-wide[
- Note that, by construction, the cumulative distribution is an *increasing* function
]

.pull-left-wide[
- Example:
  - in the experiment of rolling a dice, we would have:
`$$f(4) = P(X = 4) = P(\text{"dice shows 4"}) = \frac{1}{6}$$`
`$$F(4) = P(X \leq 4) = P(\text{"dice shows at most 4"})$$`
`$$= P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = \frac{4}{6}$$`
]

---
# Cumulative distribution function

.center[
![Discrete CDF](Figures/4. Discrete CDF.png)
]

---
# Cumulative distribution function

.pull-left-wide[
- The cumulative distribution function also allows us to calculate the probability that `$X$` will take on a value in a given range:
]

.pull-left-wide[
`$$P(a < X \leq b) = F(b) - F(a)$$`
]

.pull-left-wide[
and:

`$$P(a \leq X \leq b) = F(b) - F(a) + f(a)$$`
]

---
# .red[Practice: Discrete probability and CDF]

.pull-left-wide[

Suppose `$X$` is the outcome of a fair six-sided die.

1. Compute `$f(3)$`.
2. Compute `$F(4)$`.
3. Compute `$P(2 < X \leq 5)$` using the CDF.
4. Compute `$P(2 \leq X \leq 5)$` and explain the difference from question 3.

]

---
# Probability functions and relative frequencies

.pull-left-wide[
- Suppose the frequency of a particular outcome `$z$` in the population is given by the function `$g(z)$` (e.g., in an urn including 3 red balls and 5 blue balls, red balls have a relative frequency of "3 in 8" and blue balls of "5 in 8")
]

.pull-left-wide[
- Let `$Z$` be a random variable whose values indicate the outcome in this population (i.e., the color of the ball)
]

.pull-left-wide[
- If all elements in the population have an equal chance of being selected, then the probability function of `$Z$` is:
`$$f(z) = g(z)$$`
]

.pull-left-wide[
- In our example, if `$Z = 1$` indicates "red ball" and `$Z = 2$` indicates "blue ball," then:
`$$f(1) = \frac{3}{8} \qquad f(2) = \frac{5}{8}$$`
]

---
class: inverse, middle, center
# Relationships between discrete random variables

---
# Relationships between random variables

.pull-left-wide[
- Sometimes, we are interested in studying the relationship between two types of events
]

.pull-left-wide[
- For example, suppose that a bank wants to assess the risk of bankruptcy of a company asking for a loan
]

.pull-left-wide[
- The bank knows that this will depend on the state of the economy: booming or in recession
]

.pull-left-wide[
- Hence, the bank would like to know how risky the loan is as a function of:
  - how likely the company is to go bankrupt
  - how likely the economy is to be in recession
]

---
# Joint probability

.pull-left-wide[
> The **joint probability function** `$f(\cdot, \cdot)$` for two discrete random variables `$X$` and `$Y$` is defined as:
> `$$f(x, y) = P(X = x \text{ and } Y = y)$$`
]

.pull-left-wide[
- Example:

| | `$X = 0$` (bankrupt) | `$X = 1$` (not bankrupt) |
|---|:---:|:---:|
| `$Y = 0$` (recession) | 0.2 | 0.2 |
| `$Y = 1$` (boom) | 0.1 | 0.5 |
]

---
# Joint probability

.pull-left-wide[
> **Properties of the joint probability function:**
>
> 1. `$0 \leq f(x, y) \leq 1$` for all `$x, y$`
>
> 2. `$\displaystyle \sum_{i=1}^{N_x} \sum_{j=1}^{N_y} f(x_i, y_j) = f(x_1, y_1) + f(x_1, y_2) + \ldots + f(x_{N_x}, y_{N_y}) = 1$`
>
> where `$x_1, x_2, \ldots, x_{N_x}$` are all the possible values of `$X$`, and `$y_1, y_2, \ldots, y_{N_y}$` are all the possible values of `$Y$`
]

.pull-left-wide[
- In our example, what we need (for the second property) is:
`$$0.2 + 0.2 + 0.1 + 0.5 = 1$$`
]

---
# Marginal probability

.pull-left-wide[
> The **marginal probability function** `$f_X(\cdot)$` of a discrete random variable `$X$` is defined as:
> `$$f_X(x) = \sum_{j=1}^{N_y} f(x, y_j) = f(x, y_1) + f(x, y_2) + \ldots + f(x, y_{N_y})$$`
]

---
# Marginal probability

.pull-left-wide[
- In other words, the marginal probability function of `$X$` is the column sum of probabilities (and the marginal probability function of `$Y$` is the row sum):

| | `$X = 0$` (bankrupt) | `$X = 1$` (not bankrupt) | `$f_Y(\cdot)$` |
|---|:---:|:---:|:---:|
| `$Y = 0$` (recession) | 0.2 | 0.2 | 0.4 |
| `$Y = 1$` (boom) | 0.1 | 0.5 | 0.6 |
| `$f_X(\cdot)$` | 0.3 | 0.7 | |
]

---
# Conditional probability

.pull-left-wide[
- Recall the definition of the conditional probability:
`$$P(A | B) = \frac{P(A \cap B)}{P(B)}$$`
]

.pull-left-wide[
- Now suppose that the event `$A$` is represented by `$X = x$` and the event `$B$` by `$Y = y$`
]

.pull-left-wide[
- We can then write the conditional probability as:
`$$P(X = x | Y = y) = \frac{P(X = x \text{ and } Y = y)}{P(Y = y)}$$`
]

.pull-left-wide[
- But note that, by definition:
  - the numerator is the joint probability function `$f(x, y)$`
  - the denominator is the marginal probability function `$f_Y(y)$`
]

---
# Conditional probability

.pull-left-wide[
> The **conditional probability function** `$f_{X|Y}(\cdot, \cdot)$` of a discrete random variable `$X$` given that `$Y = y$` is defined as:
> `$$f_{X|Y}(x | y) = \frac{f(x, y)}{f_Y(y)}$$`
> if `$f_Y(y) > 0$`
]

.pull-left-wide[
- Example:
  - what is the probability of bankruptcy ( `$X = 0$` ) given that we know we are in a boom ( `$Y = 1$` )?
`$$f_{X|Y}(0 | 1) = \frac{f(0, 1)}{f_Y(1)} = \frac{0.1}{0.6} = 0.167 = 16.7\%$$`
]

---
# Bayes' theorem

.pull-left-wide[
- In practice, sometimes we know the conditional probability of `$X$` given `$Y$`, but we are interested in the conditional probability of `$Y$` given `$X$`
]

.pull-left-wide[
- For example, you may know from a car dealer friend of yours what is the probability of a "lemon" (bad car) having a low price, but you would want to know what is the probability that a cheap car is a lemon
]

.pull-left-wide[
- We can use the definition of the conditional probability to write:
`$$f_{Y|X}(y | x) = \frac{f(x, y)}{f_X(x)}$$`
]

.pull-left-wide[
- From here it is easy to prove the following theorem
]

---
# Bayes' theorem

.pull-left-wide[
> **Bayes' theorem:**
>
> 1. `$f_{X|Y}(x | y) = f_{Y|X}(y | x) \cdot \displaystyle\frac{f_X(x)}{f_Y(y)}$`
>
> 2. `$f_{X|Y}(x | y) = f_{Y|X}(y | x) \cdot \displaystyle\frac{f_X(x)}{\displaystyle\sum_{i=1}^{N_x} \left\{ f_{Y|X}(y | x_i) \cdot f_X(x_i)\right\}}$`
]

---
# Bayes' theorem: Example

.pull-left-wide[
- Let `$X = 1$` if the car is a lemon ( `$X = 0$` otherwise) and `$Y = 1$` if the price is low ( `$Y = 0$` otherwise)
]

.pull-left-wide[
- A car dealer friend tells you: 75% chance of a low price if the car is a lemon, 20% if not:
`$$f_{Y|X}(1 | 1) = 0.75, \quad f_{Y|X}(0 | 1) = 0.25$$`
`$$f_{Y|X}(1 | 0) = 0.20, \quad f_{Y|X}(0 | 0) = 0.80$$`
]

.pull-left-wide[
- Technical reports say 25% of cars on the market are lemons:
`$$f_X(1) = 0.25 \quad f_X(0) = 0.75$$`
]

---
# Bayes' theorem: Example

.pull-left-wide[
- From here you only need to apply Bayes' theorem to find the conditional probability of a lemon given that the price is low:

`$$f_{X|Y}(1 | 1) = f_{Y|X}(1 | 1) \cdot \frac{f_X(1)}{\displaystyle\sum_{i=1}^{N_x} \left\{ f_{Y|X}(1 | x_i) \cdot f_X(x_i)\right\}}$$`
]

.pull-left-wide[
`$$= f_{Y|X}(1 | 1) \cdot \frac{f_X(1)}{f_{Y|X}(1 | 0) \cdot f_X(0) + f_{Y|X}(1 | 1) \cdot f_X(1)}$$`
]

.pull-left-wide[
`$$= 0.75 \cdot \frac{0.25}{0.20 \cdot 0.75 + 0.75 \cdot 0.25} = 0.556$$`
]

.pull-left-wide[
- Therefore, there is a 55.6% chance that a car is a lemon if it has a low price
]

---
# Bayes' theorem: Name classification

.pull-left-wide[
- A database records how often each name appears per gender (frequency per 10,000):

| Name | Male | Female |
|------|:----:|:------:|
| Carl | 1,023 | 3 |
| Carla | 5 | 2,148 |
| Chris | 850 | 820 |
]

.pull-left-wide[
- These frequencies tell us `$P(\text{Name} \mid \text{Gender})$` — how common a name is *within* a gender
]

.pull-left-wide[
- But what if we only observe the name and want to infer the gender? Can we turn this around to get `$P(\text{Gender} \mid \text{Name})$`?
]

---
# .red[Practice: Bayes' theorem — Name classification]

.pull-left-wide[

A historical census lists a person named "Carla" but no gender. Assume `$P(\text{Female}) = 0.5$`.

1. Extract `$P(\text{"Carla"} \mid \text{Female})$` and `$P(\text{"Carla"} \mid \text{Male})$` from the frequency table.
2. Compute `$P(\text{"Carla"})$` using the law of total probability.
3. Apply Bayes' theorem to find `$P(\text{Female} \mid \text{"Carla"})$`.
4. Would the result be as clear-cut for "Chris"? Why?

.small123[*This is the idea behind the **Naive Bayes classifier** — a simple but powerful method also used for spam filtering, sentiment analysis, and medical diagnosis.*]

]

---
# Independence

.pull-left-wide[
- Recall the definition of independence between two events `$A$` and `$B$`:
`$$P(A \cap B) = P(A) \cdot P(B)$$`
]

.pull-left-wide[
- Now suppose that the event `$A$` is represented by `$X = x$` and the event `$B$` by `$Y = y$`
]

.pull-left-wide[
- We can then write the independence condition as:
`$$P(X = x \text{ and } Y = y) = P(X = x) \cdot P(Y = y)$$`
]

.pull-left-wide[
- But note that, by definition:
  - the left hand side is the joint probability function `$f(x, y)$`
  - the right hand side is the product of marginal probability functions `$f_X(x)$` and `$f_Y(y)$`
]

---
# Independence

.pull-left-wide[
> Two discrete random variables `$X$` and `$Y$` are **independent** if and only if:
> `$$f(x, y) = f_X(x) \cdot f_Y(y)$$`
> for all `$x$` and `$y$`. This implies that:
>
> 1. `$f_{X|Y}(x | y) = f_X(x)$` for all values `$y$` such that `$f_Y(y) > 0$`
>
> 2. `$f_{Y|X}(y | x) = f_Y(y)$` for all values `$x$` such that `$f_X(x) > 0$`
]

.pull-left-wide[
- Examples: tossing a coin twice, rolling a dice twice
]

---
# .red[Practice: Joint, marginal, conditional]

.pull-left-wide[

Using the table below,

| | `$X = 0$` | `$X = 1$` |
|---|:---:|:---:|
| `$Y = 0$` | 0.25 | 0.15 |
| `$Y = 1$` | 0.35 | 0.25 |

1. Compute `$f_X(1)$` and `$f_Y(0)$`.
2. Compute `$f_{X|Y}(1|0)$`.
3. Check whether `$X$` and `$Y$` are independent.

]

---
class: inverse, middle, center
# Continuous random variables

---
# Continuous random variables

.pull-left-wide[
- Imagine that your company wants to predict its output next year
]

.pull-left-wide[
- It knows that it will produce between 10 and 20 tons of concrete, but it can be *any* number between 10 and 20 (with equal probability)
]

.pull-left-wide[
- What is the probability that it will produce *exactly* 15 tons? Basically zero
  - if it produces 14.999999 or 15.000001, this is not exactly 15
  - it would be extremely hard to get to exactly 15 tons
]

.pull-left-wide[
- Using the same argument, the probability of producing *exactly* any particular quantity is zero `$\Rightarrow$` the concept of probability function does not make sense
]

.pull-left-wide[
- Since continuous random variables have an uncountable number of values, we cannot use the exact same concepts as in the case of discrete random variables
]

---
# Cumulative distribution function

.pull-left-wide[
> The **cumulative distribution function** `$F(\cdot)$` of a continuous random variable `$X$` is defined as:
> `$$F(x) = P(X \leq x)$$`
]

.pull-left-wide[
- This is the same definition as in the case of a discrete random variable
]

.pull-left-wide[
- However, note that in this case it does not make a difference if the inequality is strict or not:
`$$P(X \leq x) = P(X < x \text{ or } X = x) = P(X < x) + P(X = x) = P(X < x)$$`
because `$P(X = x) = 0$`
]

---
# Cumulative distribution function

.center[
![Continuous CDF](Figures/4. Continuous CDF.png)
]

---
# Probability density function

.pull-left-wide[
> The **probability density function** `$f(\cdot)$` of a continuous random variable `$X$` is defined as:
> `$$f(x) = \frac{dF(x)}{dx}$$`
]

.pull-left-wide[
  - the area under the probability density function is always equal to one:
`$$\int_{-\infty}^\infty f(x) \, dx = 1$$`
]

.pull-left-wide[
  - the cumulative distribution function is the integral of the probability density function:
`$$F(x) = \int_{-\infty}^x f(z) \, dz$$`
]

---
# Note: Summation vs. Integration:

.pull-left[
- In the case of discrete random variables, we had:
`$$P(X \leq x) = \sum_{x_i \leq x} f(x_i)$$`
- Here we have:
`$$P(X \leq x) = \int_{-\infty}^x f(z) \, dz$$`
*Essentially the same. As you know from calculus, integration is essentially continous summation. This is one of the reasons you needed to learn that.*
]

---
# Example

.pull-left-wide[
- In our example, the likelihood of any value between 10 and 20 is the same
]

.pull-left-wide[
- In other words, the value of the density function is the same for all values of `$X$`:
`$$f(x) = c$$`
]

.pull-left-wide[
- We can then use the properties of the probability density function to calculate the exact value of `$c$`:
`$$\int_{-\infty}^\infty f(x) \, dx = \int_{10}^{20} c \, dx = (20 - 10) c = 1 \; \Rightarrow \; c = 0.1$$`
]

.pull-left-wide[
- We can now calculate the cumulative distribution function:
`$$F(x) = \int_{-\infty}^x f(z) \, dz = \int_{10}^x 0.1 \, dz = 0.1 (x - 10)$$`
for all `$z$` such that `$10 \leq z \leq 20$`
]

---
# Example

.pull-left-wide[
- Now we can write explicitly the probability density function:
`$$f(x) = \begin{cases} 0, & \text{if } x < 10 \\ 0.1, & \text{if } 10 \leq x \leq 20 \\ 0, & \text{if } x > 20 \end{cases}$$`
]

.red[Note: *The output is not is a density, not a probability. The probability of any particular value is zero.*]

.pull-left-wide[
- The cumulative distribution function is:
`$$F(x) = \begin{cases} 0, & \text{if } x < 10 \\ 0.1(x - 10), & \text{if } 10 \leq x \leq 20 \\ 1, & \text{if } x > 20 \end{cases}$$`
]

---
# Probability density function

.center[
![Continuous PDF](Figures/4. Continuous PDF.png)
]

---
# Probability density and cumulative distribution functions

.center[
![Continuous PDF total area](Figures/4. Continuous PDF total area.png)
]

---
# Probability density and cumulative distribution functions

.center[
![Continuous PDF partial area](Figures/4. Continuous PDF partial area.png)
]

---
# .red[Practice: Continuous random variables]

.pull-left-wide[

Assume `$X \sim U[10,20]$`.

1. Write the density `$f(x)$`.
2. Compute `$P(12 \leq X \leq 16)$`.
3. Compute `$F(18)$`.
4. Explain why `$P(X=15)=0$` even though 15 is in the support.

]

---
# .red[Raise your hand: The density function]

.pull-left-wide[
**Q1.** `$X \sim U[0, 0.5]$`. What is the value of `$f(x)$` for `$x=0.05$`?

- **A)** 0.5 — the upper bound of the support
- **B)** 1 — `$f$` is probability-related, so it can't exceed 1
- **C)** 2 — the PDF must integrate to 1 over an interval of length 0.5

]

.pull-left-wide[

**Q2.** `$X$` is a continuous random variable. What is `$P(X \leq 5) - P(X < 5)$`?

- **A)** `$f(5)$` — read off the density at 5
- **B)** `$F(5)$` — the cumulative probability up to 5
- **C)** 0 — single-point probability is always zero for continuous RVs

]

---
class: inverse, middle, center
# Relationships between continuous random variables

---
# Joint probability density

.pull-left-wide[
- We can use the same notions defined in the case of discrete random variables, but replacing the probability function with the probability density function and summations with integrals
]

.pull-left-wide[
> The **joint probability density function** `$f(\cdot, \cdot)$` for two continuous random variables `$X$` and `$Y$` is written as `$f(x, y)$`.
]

.pull-left-wide[
- Since this is a probability density function, the area under it must "sum up" to one:
`$$\int_{-\infty}^\infty \int_{-\infty}^\infty f(x, y) \, dx \, dy = 1$$`
]

---
# Marginal probability density

.pull-left-wide[
> The **marginal probability density function** `$f_X(\cdot)$` of a continuous random variable `$X$` is defined as:
> `$$f_X(x) = \int_{-\infty}^\infty f(x, y) \, dy$$`
]

.pull-left-wide[
- Since this is a probability density function, the area under it must "sum up" to one:
`$$\int_{-\infty}^\infty f_X(x) \, dx = 1$$`
]

---
# Conditional probability density

.pull-left-wide[
> The **conditional probability density function** `$f_{X|Y}(x|y)$` of a continuous random variable `$X$` given that `$Y = y$` is defined as:
> `$$f_{X|Y}(x | y) = \frac{f(x, y)}{f_Y(y)} \text{ if } f_Y(y) > 0.$$`
]

.pull-left-wide[
- Again, the area under this function must "sum up" to one because it is a probability density function:
`$$\int_{-\infty}^\infty f_{X|Y}(x | y) \, dx = 1$$`
for all `$y$` such that `$f_Y(y) > 0$`
]

---
# Bayes' theorem

.pull-left-wide[
- Bayes' theorem applies to the continuous case in a similar way to the discrete case
]

.pull-left-wide[
> **Bayes' theorem:**
>
> 1. `$f_{X|Y}(x | y) = f_{Y|X}(y | x) \cdot \displaystyle\frac{f_X(x)}{f_Y(y)}$`
>
> 2. `$f_{X|Y}(x | y) = f_{Y|X}(y | x) \cdot \displaystyle\frac{f_X(x)}{\displaystyle\int_{-\infty}^{\infty} f_{Y|X}(y | z) f_X(z) \, dz}$`
]

---
# Independence

.pull-left-wide[
- Finally, the concept of independence is similar in the continuous case to the discrete case
]

.pull-left-wide[
> Two continuous random variables `$X$` and `$Y$` are **independent** if and only if:
> `$$f(x, y) = f_X(x) \cdot f_Y(y)$$`
> for all `$x$` and `$y$`. This implies that:
>
> 1. `$f_{X|Y}(x | y) = f_X(x)$` for all values `$y$` such that `$f_Y(y) > 0$`
>
> 2. `$f_{Y|X}(y | x) = f_Y(y)$` for all values `$x$` such that `$f_X(x) > 0$`
]

---
# .red[Practice: Continuous relationships]

.pull-left-wide[

Let
`$$f(x,y)=2, \quad 0<y<x<1,$$`
and `$f(x,y)=0$` otherwise.

1. Find `$f_X(x)$`.
2. Find `$f_{Y|X}(y|x)$` for `$0<y<x<1$`.
3. Are `$X$` and `$Y$` independent? Justify.

]

---
# .red[Raise your hand: Joint and marginal densities]

.pull-left-wide[
**Q1.** You have the joint PDF `$f(x, y)$`. How do you find the marginal PDF `$f_X(x)$`?

- **A)** Set `$y = x$`: `$\quad f_X(x) = f(x,\, x)$`
- **B)** Integrate out `$y$`: `$\quad f_X(x) = \displaystyle\int_{-\infty}^{\infty} f(x, y)\, dy$`
- **C)** Differentiate: `$\quad f_X(x) = \partial f(x, y) / \partial y$`
]

.pull-left-wide[

**Q2.** You verify that `$f(x,y) = f_X(x) \cdot f_Y(y)$` for all `$x$` and `$y$`. What follows?

- **A)** `$X$` and `$Y$` have the same marginal distribution
- **B)** `$X$` and `$Y$` are independent
- **C)** The conditional density `$f_{X|Y}(x|y)$` does not exist

]