News and Market Sentiment Analytics

.title[
# News and Market Sentiment Analytics
]
.subtitle[
## Lecture 5: Causality
]
.author[
### Christian Vedel,<br> Department of Economics<br><br> Email: <a href="mailto:christian-vs@sam.sdu.dk" class="email">christian-vs@sam.sdu.dk</a>
]
.date[
### Updated 2025-12-05
]

---

.pull-left-wide {
  float: left;
  width: 66%;
}
.pull-right-wide {
  float: right;
  width: 66%;
}
.pull-right-wide ~ p {
  clear: both;
}

.pull-left-narrow {
  float: left;
  width: 30%;
}
.pull-right-narrow {
  float: right;
  width: 30%;
}

.small123 {
  font-size: 0.80em;
}

.large123 {
  font-size: 2em;
}

.red {
  color: red
}
</style>

# Last time

.pull-left[
- Tokenization and embedding spaces  
- *Coding challenge:* zero-shot classifier with embeddings
]

![Trees](Figures/Trees.jpg)

]

---

# Today's lecture

.pull-left[
- Why **causality** in news & markets?  
- Directed Acyclic Graphs (DAGs)  
- **d-separation** and conditional independence  
- Brief(est ever) into to time series Econometrics:
  + AR and MA
  + *Granger causality*
- Tools to reaon about data
]

???

- You can say: “Today we’ll mostly stay on intuition + a few key equations.”
- Next: three concrete news shocks to motivate why causality matters.

---

.pull-left[
1. Trump’s **“Liberation Day”** tariff announcement  
2. The **DeepSeek** AI “Sputnik moment”  
3. Sam Altman’s **AI bubble** comments

Each looks like *news → market reaction*.

Question we’ll keep asking:

> What is *actually* causing what in these pictures?

]

- [`Code/liberation_day.py`](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/Code/liberation_day.py)  
- [`Code/deepseek.py`](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/Code/deepseek.py)
- [`Code/technews_and_AI_stocks.py`](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/Code/technews_and_AI_stocks.py)
]
]

???

- Briefly say when each happened and what the headlines were.
- Tell them: “All three plots you’re about to see are generated by those Python scripts in the repo.”

---

- Code: [`Code/liberation_day.py`](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/Code/liberation_day.py)

]

???

- Ask: "If I only show you this picture, what story do you tell?"
- Likely answer: tariffs → future growth fears → indices drift down.
- Flag that we don’t yet know if it’s *tariffs directly* or global conditions, Fed expectations, etc.

---

- Code: [`Code/deepseek.py`](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/Code/deepseek.py)
]

???

- Point to NVDA (thick line) vs others.
- Ask: “Does this plot prove DeepSeek *caused* the drop? What else might be going on?”
- Mention we’ll later think about common shocks, expectations, and feedback.

---

.pull-left[
- Sam Altman publicly worries about an **AI bubble**  
- Magnificent 7 (AI-heavy tech) around that date

- Code: [`Code/technews_and_AI_stocks.py`](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/Code/technews_and_AI_stocks.py)
]

???

- Highlight that here prices don’t obviously crash on the quote.
- Useful contrast: strong narrative, weaker *visual* reaction.
- Sets up the idea that news and prices can move together, but causality isn’t obvious.

---

> "News moved the market."

But many rival stories fit the same plots:

- News `$\to$` prices  
- Prices `$\to$` news  
- Macro / liquidity `$\to$` both

Today we’ll:

- Use **DAGs** and **d-separation** to formalise these stories  
- Use **AR/MA** models and **Granger tests** to ask:

`$$\text{Do past headlines help predict returns?}$$`
]

???

- Tie back explicitly to the three examples:
  - “Liberation Day tariffs”, “DeepSeek shock”, “Altman bubble quote”.
- Say you’ll revisit them when talking about DAGs and Granger causality.

---
<br>
<br>
# Toy example: Ice cream and drowning

- `$X_t$`: ice cream sales  
  - `$Y_t$`: drowning accidents

- In the toy data:
  - As `$X_t$` goes up, `$Y_t$` tends to go up

- Question:

> Does buying ice cream **cause** drowning?
  
  
- Give it a try: [Data available here](https://raw.githubusercontent.com/christianvedels/News_and_Market_Sentiment_Analytics/refs/heads/main/Lecture%205%20-%20Causality/Code/icecream_kills.csv)

]

]

---
# Simple regression results

``` r
library(fixest)
data = read.csv("Code/Icecream_kills.csv")
mod1 = feols(drownings ~ icecream, data=data)
mod2 = feols(drownings ~ icecream | month, data=data)
etable(mod1, mod2)
```

```
##                               mod1            mod2
## Dependent Var.:          drownings       drownings
##                                                   
## Constant        0.9622*** (0.0340)                
## icecream        0.0847*** (0.0040) 0.0015 (0.0043)
## Fixed-Effects:  ------------------ ---------------
## month                           No             Yes
## _______________ __________________ _______________
## S.E. type                      IID       by: month
## Observations                 1,000           1,000
## R2                         0.31504         0.66796
## Within R2                       --         0.00015
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
class: middle, inverse

# Causality: Why care?

- If you get good enough you can do **Causal Inference**
- With the introduction: You can do a sniff test on stories:
  + "Does it make sense that X causes Y?"
  + "Could there be something else going on?"
  
---
class: middle
# Causal hierarchy (Pearl’s ladder) 
### *Importantly: You cannot move up the ladder without assumptions*

2. **Intervention**  
   - `$P(Y \mid \text{do}(X))$`  
   - "What happens if we **force** `$X$`?"

3. **Counterfactuals**  
   - `$P(Y_x \mid X', Y')$`  
   - "What **would have** happened if ...?"
]

]

`$$P(Y \mid X)$$`

But we *want*:

`$$P(Y \mid \text{do}(X))$$`

In words:

> If we forced ice cream sales up,  
  > would drownings go up?
  
]

???
- Stress: statistics alone gives you level 1 by default.
- Causal inference is about moving from `$P(Y\mid X)$` to `$P(Y\mid \text{do}(X))$` using assumptions and structure.
  
---
class: middle
# Enter DAGs

- Nodes = variables  
  - Arrows = causal influence

- For ice cream:

- `$M \rightarrow X$`  
  - `$M \rightarrow Y$`  
  - No arrow `$X \rightarrow Y$`
  
  
- *Lacking arrows encode our assumptions*
]

???
- Say explicitly: “This is our *story* about the world, not something the data automatically know.”
- Highlight that **missing** arrows are strong assumptions too.
- Point out that `$M$` is a confounder: common cause of both `$X$` and `$Y$`.

---
class: middle
# Canonical structures

`$$X \rightarrow Z \rightarrow Y$$`

2. **Fork** (common cause)

`$$X \leftarrow Z \rightarrow Y$$`

3. **Collider** (common effect)

`$$X \rightarrow Z \leftarrow Y$$`
]

`$$X \leftarrow M \rightarrow Y$$`

- This is a **fork**:

- `$M$` causes both `$X$` and `$Y$`  
  - Confounding if `$M$` is ignored
]

???
- Quickly sketch the three patterns on the board.
- Ask: "Which one fits ice cream & drowning?" --> the fork.

---
class: middle
# Central trick

> *Under some conditions we can estimate `$P(Y \mid \text{do}(X))$` from observational quantities.*

- Idea: **block the backdoor paths** from `$X$` to `$Y$`

- Find a set of variables `$Z$` such that:
  - `$Z$` d-separates `$X$` and `$Y$` when we ignore arrows  
    **out of** `$X$` (no open backdoor paths)

- Then we can identify:

`$$P(Y \mid \text{do}(X)) = \sum_z P(Y \mid X, Z=z)\,P(Z=z)$$`

- Ice cream example:
  - `$Z = \{M\}$` (month)
  - Controlling for `$M$` removes the spurious  
    `$X \leftarrow M \rightarrow Y$` path
]

???

- Call this (informally) the **backdoor criterion**.
- Link back to the fixed-effects regression:
  - With month FE, the ice-cream coefficient moves toward the *causal* effect (≈0 here).
- Flag that in the news & markets setting, our main challenge is choosing a sensible `$Z$`.

---
class: middle
# d-separation (graphical independence)

We say `$X$` and `$Y$` are d-separated by `$Z$`
if **every path** between any node in `$X$` and any node in `$Y$` is **blocked** when we condition on `$Z$`.

A path is **blocked** if it contains at least one node that blocks it, using these rules:

.small123[
- **Chain / fork**  
  `$$X \rightarrow M \rightarrow Y, \quad X \leftarrow M \rightarrow Y$$`  
  Path is blocked if we **condition on `$M$`**.

- **Collider**

`$$X \rightarrow M \leftarrow Y$$`

Path is blocked if we do **not** condition on `$M$`  
  (or any descendant of `$M$`). Conditioning on a collider **opens** the path.
]

]

> d-sep'ed nodes are conditionally independent:

`$$X \perp Y \mid Z$$`
  
> *Guides us in choosing adjustment sets for valid inference*

]

???

- Emphasise: d-separation is a purely **graphical** criterion.
- If the DAG is correct, d-separation corresponds to (conditional) **independence** in the data.
- This is what lets us use the graph to choose adjustment sets for causal effects.

---
class: middle
# Takeaways (so far)

- Every story about you data implies a DAG
   - Data can support or contradict it indirectly,  
     but you can’t “prove” the arrows from correlations alone  
   - (Speculation: some causal structure might **emerge** in very large ML systems, but we’re not there yet in this course)

2. **DAGs are thinking tools**

- Force you to say what you believe causes what  
   - Let you spot **confounders**, **colliders**, and bad controls  
   - With d-separation, you can find sensible adjustment sets to turn some `$P(Y \mid X, Z)$` into `$P(Y \mid do(X))$`
]

- Judea Pearl (2009): *Causality* (technical)
   - Judea Pearl, Madelyn Glymour, Nicholas P. Jewell (2016): *Causal Inference in Statistics: A Primer* (intro)
   - Judea Pearl (2018): *The Book of Why* (popular science)
   - Scott Cunningham (2021): *The Mixtape* (applied econ focus)

.small123[
Same core ideas you’ve seen today:  
DAGs, d-separation, and the gap between `$P(Y\mid X)$` and `$P(Y\mid do(X))$`.
]
]

???

- Stress point 1: the graph is never “estimated” by OLS here; it’s a structured way to write down assumptions.
- Point 2: connect back to ice cream + drownings and the three market news stories.
- Point 3: say which book is closest to this course (probably Mixtape + What If for their style).

---
class: middle
# Exercise 1: Confounder (fork)

- `$X$`: Ice cream sales  
- `$Y$`: Drowning accidents  
- `$M$`: Month

`$M$` affects both `$X$` and `$Y$`.

**Tasks**

1. Draw a DAG for `$(X, Y, M)$`  
2. Find a d-sep of *X* and *Y*
]

???
- Expected DAG: `$X \leftarrow M \rightarrow Y$` (fork).
- Answer:  
  - (a) Not d-separated → associated.  
  - (b) d-separated by `$M$` → `$X \perp Y \mid M$`.
  
  
---
class: middle
# Exercise 2: Mediator (chain)

- `$X$`: New surprisingly good numbers on inflation
- `$M$`: Investor optimism  
- `$Y$`: Stock index return

You believe:

- News changes optimism
- Optimism moves the index

**Tasks**

1. Draw a DAG for `$(X, M, Y)$`  
2. Decide if `$X$` and `$Y$` are d-separated:

- (a) Given **nothing**  
   - (b) Given `$M$`
]

- Do `$X$` and `$Y$` remain associated  
  when we **adjust for** `$M$`?
- Would we get the effect of `$X$` on `$Y$`?
]

???
- Expected DAG: `$X \rightarrow M \rightarrow Y$` (chain).
- Answer:  
  - (a) Not d-separated → `$X$` and `$Y$` associated.  
  - (b) d-separated by `$M$` → `$X \perp Y \mid M$`.
- Good moment to mention that conditioning on a mediator “blocks” total effect and leaves direct effect (if any).

---
class: middle
# Exercise 3: Collider

- `$X$`: true product quality  
- `$Y$`: marketing spend  
- `$Z$`: "featured on TechNews front page"

Editors feature firms that are:

- very high quality **or**  
- very heavy on marketing

]

- Does more marketing spending *cause* better/worse products?
]

???

So both `$X$` and `$Y$` increase `$P(Z=1)$`:

`$$X \rightarrow Z \leftarrow Y$$`

**What happens if we condition on `$Z=1$`?**

- In the full population:  
  `$X$` and `$Y$` are (roughly) independent

- Among featured firms (`$Z=1$`):

- High `$X$` firms don't need big `$Y$`  
  - High `$Y$` firms can get by with low `$X$`

- We see a **negative correlation** and might conclude:

> "Among featured firms, more marketing means worse products."

This is a spurious conclusion from  
**conditioning on a collider**.

- On the board:  
  1. Draw a round cloud for `$(X,Y)$` (no correlation).  
  2. Draw a diagonal threshold, keep only points above it (`$Z=1$`).  
  3. Show that the remaining points line up with a negative slope.
- Emphasise: this is about **selection**; many empirical datasets are implicitly "Z=1 only".

---
class: middle
# From DAGs to time series

.pull-left[
- So far:
  - Static DAGs  
  - Ice cream vs drownings  
  - News vs returns as **cross-sectional snapshots**

- Next step:
  - Markets and news are **time series**  
  - Yesterday’s values matter for today
]

> Add **time** to our causal thinking and see how to test "X helps predict Y"

- Yet again, only an introduction

]

???

- Transition line: "Now let’s add the `$t$`-index back in and think about sequences, not just one-day plots."

---
class: middle
# AR and MA models: what they do

- `$Y_t$` depends on its **own past**

`$$Y_t = \phi_1 Y_{t-1} + \dots + \phi_p Y_{t-p} + \varepsilon_t$$`

- Captures **momentum / mean reversion**

**Moving average (MA)**

- `$Y_t$` depends on **past shocks**

`$$Y_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots$$`

- Captures **short-lived shock effects**
]

- Show a simple AR(1) and MA(1) in code  
- Fit them to a returns series  
- **Please** use this as only a starting point. There is **much** more to this. (Including reinventing LSTM and Transformers) 
]

???

- Emphasise: "These are just glorified linear regressions on lags."
- Mention that you’ll keep it at AR(1)/MA(1) for intuition, not full ARIMA theory.

---
class: middle
# Granger causality: idea

> Does series `$X_t$` help predict `$Y_t$`  
> **beyond** what past `$Y$` already tells us?

Definition (informal):

- `$X$` **Granger-causes** `$Y$` if  
  including past `$X$` improves forecasts of `$Y$`.

We’ll:

- Compare models with and without lagged `$X$`  
- Use `statsmodels`’ Granger test
]

- DAG story: “news `$\to$` returns” vs “returns `$\to$` news”
- Granger story:
  - Does past news predict returns?  
  - Does past returns predict news?

- Granger `$\neq$` full causal proof,  
  but it’s a **useful diagnostic** in time series.
  
- The idea extends to fancier prediction models
]

???

- Flag explicitly: "This is a *forecast-based* notion of causality."
- Good time to remind them: "Cause must come before effect in time."

---
class: inverse, middle
# Coding challenge: 
## [News and Market Sentiment Analysis](https://github.com/christianvedels/News_and_Market_Sentiment_Analytics/blob/main/Lecture%205%20-%20Causality/coding_challenge_5.md)