name: title class: inverse, center, middle <img src="img/rbert_hex.png" width = "300px"/> ## Cutting-Edge NLP in R .Large[Jon Harmon | DCR | 9 November 2019] --- name: acknowledgement # RBERT Lead Author .pull-left[ ![Jonathan Bratt](img/bratt-headshot.png) ] .pull-right[ ## Jonathan Bratt ###Macmillan Learning ###github.com/jonathanbratt ] --- name: outline # Outline .Large[ **Follow along/like/reply/retweet at [bit.ly/jonvsemily](http://bit.ly/jonvsemily) and help me win headphones!** * Transfer Learning * BERT * RBERT & RBERTviz * Attention * Layer Outputs ] --- # Transfer Learning .pull-left[ .Large[ * Task: Classify images (ImageNet) ] ] --- count: false # Transfer Learning .pull-left[ .Large[ * Task: Classify images (ImageNet) * Early layers: simple features. ]] <div width="50%"> <p style="padding-left:10">Credit: <a href="http://yosinski.com/deepvis">deepvis</a> by Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson</p> <div width="50%" style="float:left"> <img src="img/deep_viz/deep_viz_toolbox-dark_to_light.png" width="130px"\> <img src="img/deep_viz/deep_viz_toolbox-light_to_dark.png" width="130px"\> </div> <div width="50%" style="float:right"> </div> </div> --- count: false # Transfer Learning .pull-left[ .Large[ * Task: Classify images (ImageNet) * Early layers: simple features. * Later layers: complex features. * Features can transfer to many tasks. ] ] <div width="50%"> <p style="padding-left:10">Credit: <a href="http://yosinski.com/deepvis">deepvis</a> by Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson</p> <div width="50%" style="float:left"> <img src="img/deep_viz/deep_viz_toolbox-dark_to_light.png" width="130px"\> <img src="img/deep_viz/deep_viz_toolbox-light_to_dark.png" width="130px"\> </div> <div width="50%" style="float:right"> <img src="img/deep_viz/deep_viz_toolbox-faces.png" width="130px"\> <img src="img/deep_viz/deep_viz_toolbox-text.png" width="130px"\> </div> </div> --- # Transfer Learning: NLP .Large[ * Word embeddings * word2vec * GloVe * king − man + woman ≅ queen ] --- count: false # Transfer Learning: NLP .Large[ * Word embeddings * word2vec * GloVe * king − man + woman ≅ queen * Problem: Each word has *one* embedding vector. * "I saw the *branch* on the *bank*" vs * "I saw the *branch* of the *bank*" ] --- # BERT .Large[ * **B**idirectional **E**ncoder **R**epresentations from **T**ransformers * Initially released October 11, 2018 * Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova from Google AI Language * Trained with a very large corpus * Transferable! ] --- # RBERT .pull-left[ .Large[ * `install_github("jonathanbratt/RBERT")` * Implementation of BERT in R * Use for: * Feature extraction (text to high-dimensional vectors) * Soon: Fine-tuning ] ] .pull-right[ <img src="img/rbert_hex.png" width="400px"> ] --- # RBERTviz .pull-left[ .Large[ * `install_github("jonathanbratt/RBERTviz")` * Helper package * Visualize how BERT "thinks" * `visualize_attention` * `display_pca` ] ] .pull-right[ <img src="img/RBERTviz.png" width="400px"> ] --- # Attention .pull-left[ ```r RBERT::download_BERT_checkpoint( "bert_base_uncased" ) RBERT::extract_features( "I love tacos.", model = "bert_base_uncased", layer_indexes = 1:12, features = "attention" )$attention %>% RBERTviz::visualize_attention() ``` Based on Jesse Vig's [bertviz](https://github.com/jessevig/bertviz) tool. ] .pull-right[ <img src="img/attention/tacos.png", width="500px"/> ] [Live demo](tacos_viz.html) --- # Attention .Large[ Sentences: * The chicken didn't cross the road because it was too tired. * The chicken didn't cross the road because it was too wide. * The dog fetched the ball. It was excited. * The dog fetched the ball. It was blue. ] --- count: false # Attention .Large[ Sentences: * The **chicken** didn't cross the road because **it** was too **tired.** * The chicken didn't cross the **road** because **it** was too **wide.** * The **dog** fetched the ball. **It** was **excited.** * The dog fetched the **ball.** **It** was **blue.** ] --- # Attention <img src="img/attention/3_1-chicken_tired.png" width="250px"> <img src="img/attention/3_1-chicken_wide.png" width="250px"> <img src="img/attention/3_1-dog_excited.png" width="250px"> <img src="img/attention/3_1-dog_blue.png" width="250px"> .Large[Early: Position (≈ edge detector)] --- # Attention <img src="img/attention/9_5-chicken_tired.png"> .Large[Later: Pronoun Resolution (≈ face detector)] --- count: false # Attention <img src="img/attention/9_5-chicken_tired.png" width="250px"> <img src="img/attention/9_5-chicken_wide.png" width="250px"> <img src="img/attention/9_5-dog_excited.png" width="250px"> <img src="img/attention/9_5-dog_blue.png" width="250px"> .Large[Later: Pronoun Resolution (≈ face detector)] --- # Layer Outputs .pull-left[ .Large[ Online survey: * "A single sentence about learning, WITH the word 'train' (or 'trained', 'training', etc, meaning 'teach')." * "A single sentence about travel, WITH the word 'train' (as in the vehicle)." ]] .pull-right[ ```r trains_data <- readRDS("trains_data.rds") %>% dplyr::mutate( sequence_index = dplyr::row_number() ) trains_output <- RBERT::extract_features( trains_data$sentence, model = "bert_base_uncased", layer_indexes = 0:12, features = "output" )$output trains_output_labeled <- trains_output %>% dplyr::left_join( dplyr::select( trains_data, sequence_index, label ), by = "sequence_index" ) ``` ] --- # Layer Outputs .pull-left[ ```r trains_output_labeled %>% RBERTviz::display_pca( token_filter = "^train", layer_index = 0, # Just show one example of each unique word distinct_tokens = TRUE ) ``` ] .pull-right[ <img src="dcrbert_files/figure-html/layer-0-plot-1.png" width="504" /> ] --- # Layer Outputs Layer 0 (initial vectors) <img src="dcrbert_files/figure-html/layer-0-labeled-1.png" width="1080" /> --- count: false # Layer Outputs Layer 1 <img src="dcrbert_files/figure-html/layer-1-1.png" width="1080" /> --- count: false # Layer Outputs Layer 2 <img src="dcrbert_files/figure-html/layer-2-1.png" width="1080" /> --- count: false # Layer Outputs Layer 3 <img src="dcrbert_files/figure-html/layer-3-1.png" width="1080" /> --- count: false # Layer Outputs Layer 4 <img src="dcrbert_files/figure-html/layer-4-1.png" width="1080" /> --- count: false # Layer Outputs Layer 5 <img src="dcrbert_files/figure-html/layer-5-1.png" width="1080" /> --- count: false # Layer Outputs Layer 6 <img src="dcrbert_files/figure-html/layer-6-1.png" width="1080" /> --- count: false # Layer Outputs Layer 7 <img src="dcrbert_files/figure-html/layer-7-1.png" width="1080" /> --- count: false # Layer Outputs Layer 8 <img src="dcrbert_files/figure-html/layer-8-1.png" width="1080" /> --- count: false # Layer Outputs Layer 9 <img src="dcrbert_files/figure-html/layer-9-1.png" width="1080" /> --- count: false # Layer Outputs Layer 10 <img src="dcrbert_files/figure-html/layer-10-1.png" width="1080" /> --- count: false # Layer Outputs Layer 11 <img src="dcrbert_files/figure-html/layer-11-1.png" width="1080" /> --- count: false # Layer Outputs wait a minute... <img src="dcrbert_files/figure-html/highlight-bad-1.png" width="1080" /> --- count: false # Layer Outputs wait a minute... <img src="dcrbert_files/figure-html/highlight-bad2-1.png" width="1080" /> --- # To Do .Large[ * RBERT is usable *now*... ] --- count: false # To Do .Large[ * RBERT is usable *now*... * ...but it can be *better!* ] --- count: false # To Do .Large[ * RBERT is usable *now*... * ...but it can be *better!* * Goal: CRAN by end of 2019 ] --- count: false # To Do .Large[ * RBERT is usable *now*... * ...but it can be *better!* * Goal: CRAN by end of 2019 * tensorflow 2.0 (in testing) * More Rtful, less pythonic * Recipe: `step_bert_features()` ] --- count: false # To Do .Large[ * RBERT is usable *now*... * ...but it can be *better!* * Goal: CRAN by end of 2019 * tensorflow 2.0 (in testing) * More Rtful, less pythonic * Recipe: `step_bert_features()` * `rstudio::conf(2020L)` e-poster * Poster ideas? [bit.ly/rbertposter](http://bit.ly/rbertposter) ] --- # Contact .Large[ * Twitter: [@jonthegeek](https://twitter.com/JonTheGeek) (like/retweet/reply to #rstatsDC tweets!) * [github.com/jonathanbratt/RBERT](github.com/jonathanbratt/RBERT) * [github.com/jonathanbratt/RBERTviz](github.com/jonathanbratt/RBERTviz) * [github.com/jonthegeek](github.com/jonthegeek) * R4DS Online Learning Community: [r4ds.online](r4ds.online) * TidyTuesday Podcast: [tidytuesday.com](tidytuesday.com) ]