RBERT: Cutting Edge NLP in R

## Cutting-Edge NLP in R

.Large[Jon Harmon | DCR | 9 November 2019]

---

# RBERT Lead Author

---

# Outline

.Large[
**Follow along/like/reply/retweet at [bit.ly/jonvsemily](http://bit.ly/jonvsemily) and help me win headphones!**

* Transfer Learning
* BERT
* RBERT & RBERTviz
* Attention
* Layer Outputs
]

---

# Transfer Learning

---

# Transfer Learning

<div width="50%">
<p  style="padding-left:10">Credit: <a href="http://yosinski.com/deepvis">deepvis</a> by Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson</p>
<div width="50%" style="float:left">
<img src="img/deep_viz/deep_viz_toolbox-dark_to_light.png" width="130px"\>
<img src="img/deep_viz/deep_viz_toolbox-light_to_dark.png" width="130px"\>
</div>
<div width="50%" style="float:right">
</div>
</div>

---

# Transfer Learning

.pull-left[
.Large[
* Task: Classify images (ImageNet)
* Early layers: simple features.
* Later layers: complex features.
* Features can transfer to many tasks.
]
]

---

# Transfer Learning: NLP

.Large[
* Word embeddings
  * word2vec
  * GloVe
* king − man + woman ≅ queen
]

---

# Transfer Learning: NLP

.Large[
* Word embeddings
  * word2vec
  * GloVe
* king − man + woman ≅ queen
* Problem: Each word has *one* embedding vector.
  * "I saw the *branch* on the *bank*" vs 
  * "I saw the *branch* of the *bank*"
]

---

# BERT

.Large[
* **B**idirectional **E**ncoder **R**epresentations from **T**ransformers
* Initially released October 11, 2018
  * Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova from Google AI Language
* Trained with a very large corpus
* Transferable!
]

---

# RBERT

.pull-left[
.Large[
* `install_github("jonathanbratt/RBERT")`
* Implementation of BERT in R
* Use for:
  * Feature extraction (text to high-dimensional vectors)
  * Soon: Fine-tuning
]
]

---

# RBERTviz

.pull-left[
.Large[
* `install_github("jonathanbratt/RBERTviz")`
* Helper package
* Visualize how BERT "thinks"
  * `visualize_attention`
  * `display_pca`
]
]

---

# Attention

```r
RBERT::download_BERT_checkpoint(
  "bert_base_uncased"
)
RBERT::extract_features(
  "I love tacos.",
  model = "bert_base_uncased",
  layer_indexes = 1:12,
  features = "attention"
)$attention %>%
  RBERTviz::visualize_attention()
```

Based on Jesse Vig's [bertviz](https://github.com/jessevig/bertviz) tool.
]
.pull-right[
<img src="img/attention/tacos.png", width="500px"/>
]

[Live demo](tacos_viz.html)

---

# Attention

.Large[
Sentences:

* The chicken didn't cross the road because it was too tired.
* The chicken didn't cross the road because it was too wide.
* The dog fetched the ball. It was excited.
* The dog fetched the ball. It was blue.
]

---

# Attention

.Large[
Sentences:

* The **chicken** didn't cross the road because **it** was too **tired.**
* The chicken didn't cross the **road** because **it** was too **wide.**
* The **dog** fetched the ball. **It** was **excited.**
* The dog fetched the **ball.** **It** was **blue.**
]

---

# Attention

.Large[Early: Position (≈ edge detector)]

---

# Attention

.Large[Later: Pronoun Resolution (≈ face detector)]

---

# Attention

.Large[Later: Pronoun Resolution (≈ face detector)]

---

# Layer Outputs

* "A single sentence about learning, WITH the word 'train' (or 'trained', 'training', etc, meaning 'teach')."
* "A single sentence about travel, WITH the word 'train' (as in the vehicle)."
]]

```r
trains_data <- readRDS("trains_data.rds") %>%
  dplyr::mutate(
    sequence_index = dplyr::row_number()
  )

trains_output <- RBERT::extract_features(
  trains_data$sentence,
  model = "bert_base_uncased",
  layer_indexes = 0:12,
  features = "output"
)$output

trains_output_labeled <- trains_output %>%
  dplyr::left_join(
    dplyr::select(
      trains_data, 
      sequence_index, label
    ),
    by = "sequence_index"
  )
```
]

---

# Layer Outputs

```r
trains_output_labeled %>% 
  RBERTviz::display_pca(
    token_filter = "^train",
    layer_index = 0,
    # Just show one example of each unique word
    distinct_tokens = TRUE
  )
```
]

---

# Layer Outputs

Layer 0 (initial vectors)

<img src="dcrbert_files/figure-html/layer-0-labeled-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 1

<img src="dcrbert_files/figure-html/layer-1-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 2

<img src="dcrbert_files/figure-html/layer-2-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 3

<img src="dcrbert_files/figure-html/layer-3-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 4

<img src="dcrbert_files/figure-html/layer-4-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 5

<img src="dcrbert_files/figure-html/layer-5-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 6

<img src="dcrbert_files/figure-html/layer-6-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 7

<img src="dcrbert_files/figure-html/layer-7-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 8

<img src="dcrbert_files/figure-html/layer-8-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 9

<img src="dcrbert_files/figure-html/layer-9-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 10

<img src="dcrbert_files/figure-html/layer-10-1.png" width="1080" />
---
count: false

# Layer Outputs

Layer 11

<img src="dcrbert_files/figure-html/layer-11-1.png" width="1080" />
---
count: false

# Layer Outputs

wait a minute...

<img src="dcrbert_files/figure-html/highlight-bad-1.png" width="1080" />
---
count: false

# Layer Outputs

wait a minute...

<img src="dcrbert_files/figure-html/highlight-bad2-1.png" width="1080" />
---

# To Do

.Large[
* RBERT is usable *now*...
]

---
count: false

# To Do

.Large[
* RBERT is usable *now*...
* ...but it can be *better!*
]

---
count: false

# To Do

.Large[
* RBERT is usable *now*...
* ...but it can be *better!*
* Goal: CRAN by end of 2019
]

---
count: false

# To Do

.Large[
* RBERT is usable *now*...
* ...but it can be *better!*
* Goal: CRAN by end of 2019
  * tensorflow 2.0 (in testing)
  * More Rtful, less pythonic
  * Recipe: `step_bert_features()`
]

---
count: false

# To Do

.Large[
* RBERT is usable *now*...
* ...but it can be *better!*
* Goal: CRAN by end of 2019
  * tensorflow 2.0 (in testing)
  * More Rtful, less pythonic
  * Recipe: `step_bert_features()`
* `rstudio::conf(2020L)` e-poster
* Poster ideas? [bit.ly/rbertposter](http://bit.ly/rbertposter)
]

---

# Contact

.Large[
* Twitter: [@jonthegeek](https://twitter.com/JonTheGeek) (like/retweet/reply to #rstatsDC tweets!)
* [github.com/jonathanbratt/RBERT](github.com/jonathanbratt/RBERT)
* [github.com/jonathanbratt/RBERTviz](github.com/jonathanbratt/RBERTviz)
* [github.com/jonthegeek](github.com/jonthegeek)
* R4DS Online Learning Community: [r4ds.online](r4ds.online)
* TidyTuesday Podcast: [tidytuesday.com](tidytuesday.com)

]