Lecture Lab 3

Leon Eyrich Jessen

Recap of Lab 2

What is wrong with this visualisation?

What is wrong with this visualisation?

The Cancer Data

# A tibble: 168 × 2,909
   pt_id   age age_group event_label    g2E09    g7F07    g1A01   g3C09    g3H08
   <dbl> <dbl> <chr>     <chr>          <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
 1     1  34.2 (30,40]   good        -0.00144 -0.00144 -0.0831  -0.0475  1.58e-2
 2     2  47   (40,50]   good        -0.0604   0.0129  -0.00144  0.0104  3.16e-2
 3     3  60.3 (60,70]   good         0.0398   0.0524  -0.0786   0.0635 -3.95e-2
 4     4  57.8 (50,60]   good         0.0101   0.0314  -0.0218   0.0215  8.68e-2
 5     5  54.9 (50,60]   good         0.0496   0.0201   0.0370   0.0311  2.07e-2
 6     6  58.8 (50,60]   good        -0.0664   0.0468   0.00720 -0.370   2.88e-3
 7     7  52.9 (50,60]   good        -0.00289 -0.0816  -0.0291  -0.0249 -1.74e-2
 8     8  74.5 (70,80]   good        -0.198   -0.0499  -0.0634  -0.0298  3.00e-2
 9     9  47.6 (40,50]   good         0.00288  0.0201   0.0272   0.0174 -7.89e-5
10    10  55.8 (50,60]   good        -0.0574  -0.0574  -0.0831  -0.0897 -1.01e-1
# ℹ 158 more rows
# ℹ 2,900 more variables: g1A08 <dbl>, g1B01 <dbl>, g1int1 <dbl>, g1E11 <dbl>,
#   g8G02 <dbl>, g1H04 <dbl>, g1C01 <dbl>, g1F11 <dbl>, g3F05 <dbl>,
#   g3B09 <dbl>, g1int2 <dbl>, g2C01 <dbl>, g1A05 <dbl>, g1E01 <dbl>,
#   g1B05 <dbl>, g3C05 <dbl>, g3A07 <dbl>, g1F01 <dbl>, g2D01 <dbl>,
#   g1int3 <dbl>, g1int4 <dbl>, g1D05 <dbl>, g1E05 <dbl>, g1G05 <dbl>,
#   g1C05 <dbl>, g1G11 <dbl>, g2D08 <dbl>, g2E06 <dbl>, g3H09 <dbl>, …

Dimensions of data

dim(cancer_data)
[1]  168 2909
nrow(cancer_data)
[1] 168
ncol(cancer_data)
[1] 2909

ggplot - The Very Basics

ggplot(data = my_data,
       mapping = aes(x = v1,
                     y = v2)) +
  geom_point()

Basic Examples

Scatter-plot: geom_point()

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = g2E09,
                                y = g7F07)) +
  geom_point()

Basic Examples

Line-plot: geom_line()

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = g2E09,
                                y = g7F07)) +
  geom_line()

Basic Examples

Box-plot: geom_boxplot()

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = g2E09)) +
  geom_boxplot()

Basic Examples

Histogram-plot: geom_histogram()

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = g2E09)) +
  geom_histogram(binwidth = 0.1)

Note! Here, you’ll get a warning if you do not state your choice of bin

Basic Examples

Densitogram-plot: geom_density()

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = g2E09)) +
  geom_density()

Extended Examples

boxplot of expression levels stratified on the variable age_group

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = event_label,
                                y = g2E09)) +
  geom_boxplot()

Extended Examples

GROUP ASSIGNMENT

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = g2E09,
                                fill = event_label)) +
  geom_density()

Extended Examples

GROUP ASSIGNMENT - Customisation

Same data in these plots! If you can think it - You can build it in ggplot

Extended Examples

Plot Recreation

my_plot <- ggplot(data = cancer_data,
                  aes(x = event_label,
                      y = g1CNS507,
                      fill = event_label)) +
  geom_boxplot(alpha = 0.5,
               show.legend = FALSE) +
  coord_flip() +
  labs(x = "Event After Diagnosis",
       y = str_c("Expression level of g1CNS507 ",
                 "(log2 transformed)"),
       title = str_c("A prognostic DNA signature ",
                     "for T1T2 node-negative ",
                     "breast cancer patients"),
       subtitle = str_c("Labelling: good = no ",
                        " event, poor = early ",
                        "metastasis"),
       caption = "Data from Gravier et al. (2010)")

A Few Tips and Tricks

  • Use a pen and paper to doodle your thoughts on a good visualisation

  • When googling write e.g. “Custom legend labels” and then do an image search to find a plot that look somewhat like what you are looking for and then go to the page, most often, there will be some code

  • Stick to the following syntaxical best-practises:

ggplot(data = my_data,
       mapping = aes(x = v1,
                     y = v2,
                     fill = v3)) +
  geom_something() +
  labs(x = "My x",
       y = "My y")
  • NOTE! You are expected to adhere to a specific code style moving forward in this course
  • Because… Yes, syntax matters!

Data Visualisation II

First: Make sure you’re on track!

Lab II Learning Objectives

  • Explains the basic theory of data visualisation

  • Decipher the components of a ggplot

  • Use ggplot to do basic data visualisation

Making sure you are in tune with the LOs as we progress is essential for tracking your learning!

Customisation

  • If you can think it, ggplot can plot

  • The trick is to build experience, practice makes perfect!

  • Let us go over a few examples

Customisation Examples

Adding text to data points

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age,
                                y = g7E05,
                                label = pt_id)) +
  geom_text()

Customisation Examples

Adding labels to data points

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age,
                                y = g7E05,
                                label = pt_id)) +
  geom_label()

Customisation Examples

age_group stratification

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05)) +
  geom_boxplot()

Customisation Examples

event_label stratification

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot()

Customisation Examples

Adding a theme (A collection of settings)

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot() +
  theme_classic()

Customisation Examples

Changing the font size

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot() +
  theme_classic(base_size = 18)

Customisation Examples

Changing the font

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot() +
  theme_classic(base_size = 18,
                base_family = "Avenir")

Customisation Examples

Adding some labels

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot() +
  labs(x = "Patient Age Group",
       y = "log2 expression of g7E05") +
  theme_classic(base_size = 18,
                base_family = "Avenir")

Customisation Examples

Relocating the legend

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot() +
  labs(x = "Patient Age Group",
       y = "log2 expression of g7E05") +
  theme_classic(base_size = 18,
                base_family = "Avenir") +
  theme(legend.position = "bottom")

Customisation Examples

Customising the legend

  • Note the colour mapping green is good, red is bad
my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot() +
  scale_fill_manual(labels = c("No", "Yes"),
                    values = c("green", "red")) +
  labs(x = "Patient Age Group",
       y = "log2 expression of g7E05",
       fill = "Early Metastasis") +
  theme_classic(base_size = 18,
                base_family = "Avenir") +
  theme(legend.position = "bottom")

Customisation Examples

Adding transparency

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_boxplot(alpha = 0.5) +
  scale_fill_manual(labels = c("No", "Yes"),
                    values = c("green", "red")) +
  labs(x = "Patient Age Group",
       y = "log2 expression of g7E05",
       fill = "Early Metastasis") +
  theme_classic(base_size = 18,
                base_family = "Avenir") +
  theme(legend.position = "bottom")

Customisation Examples

Marking a threshold

my_plot <- ggplot(data = cancer_data,
                  mapping = aes(x = age_group,
                                y = g7E05,
                                fill = event_label)) +
  geom_hline(yintercept = 0,
             linetype = "dashed") +
  geom_boxplot(alpha = 0.5) +
  scale_fill_manual(labels = c("No", "Yes"),
                    values = c("green", "red")) +
  labs(x = "Patient Age Group",
       y = "log2 expression of g7E05",
       fill = "Early Metastasis") +
  theme_classic(base_size = 18,
                base_family = "Avenir") +
  theme(legend.position = "bottom")

Note! Layer order matters!

Plotting multiple plots

pl1 <- ggplot(data = cancer_data,
              mapping = aes(x = age_group,
                            y = g7E05,
                            fill = event_label)) +
  geom_hline(yintercept = 0,
             linetype = "dashed") +
  geom_boxplot(alpha = 0.5) +
  scale_fill_manual(labels = c("No", "Yes"),
                    values = c("green", "red")) +
  labs(x = "Patient Age Group",
       y = "log2 expression of g7E05",
       fill = "Early Metastasis") +
  theme_classic(base_size = 12,
                base_family = "Avenir") +
  theme(legend.position = "bottom")
pl2 <- ggplot(data = cancer_data,
              mapping = aes(x = age_group,
                            y = g8C05,
                            fill = event_label)) +
  geom_hline(yintercept = 0,
             linetype = "dashed") +
  geom_boxplot(alpha = 0.5) +
  scale_fill_manual(labels = c("No", "Yes"),
                    values = c("green", "red")) +
  labs(x = "Patient Age Group",
       y = "log2 expression of g8C05",
       fill = "Early Metastasis") +
  theme_classic(base_size = 12,
                base_family = "Avenir") +
  theme(legend.position = "bottom")

If you had not noticed before, everything is an object, even a plot!

Plotting multiple plots

library("patchwork")
pl1 + pl2

Plotting multiple plots

pl1 / pl2

Plotting multiple plots

pl1 / (pl2 + pl1)

Summary

  • ggplot is very flexible

  • Practise makes perfect

  • Compare it with learning a langauge, initially there is a vocabulary boundary

  • Return to these slides, but also remember a meta learning objective is to seek out the solutions online

  • A VERY good place to go to get input is the RStudio community pages: https://community.rstudio.com

  • Also, have fun!

Break, then exercises!