# A tibble: 168 × 2,909
pt_id age age_group event_label g2E09 g7F07 g1A01 g3C09 g3H08
<dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34.2 (30,40] good -0.00144 -0.00144 -0.0831 -0.0475 1.58e-2
2 2 47 (40,50] good -0.0604 0.0129 -0.00144 0.0104 3.16e-2
3 3 60.3 (60,70] good 0.0398 0.0524 -0.0786 0.0635 -3.95e-2
4 4 57.8 (50,60] good 0.0101 0.0314 -0.0218 0.0215 8.68e-2
5 5 54.9 (50,60] good 0.0496 0.0201 0.0370 0.0311 2.07e-2
6 6 58.8 (50,60] good -0.0664 0.0468 0.00720 -0.370 2.88e-3
7 7 52.9 (50,60] good -0.00289 -0.0816 -0.0291 -0.0249 -1.74e-2
8 8 74.5 (70,80] good -0.198 -0.0499 -0.0634 -0.0298 3.00e-2
9 9 47.6 (40,50] good 0.00288 0.0201 0.0272 0.0174 -7.89e-5
10 10 55.8 (50,60] good -0.0574 -0.0574 -0.0831 -0.0897 -1.01e-1
# ℹ 158 more rows
# ℹ 2,900 more variables: g1A08 <dbl>, g1B01 <dbl>, g1int1 <dbl>, g1E11 <dbl>,
# g8G02 <dbl>, g1H04 <dbl>, g1C01 <dbl>, g1F11 <dbl>, g3F05 <dbl>,
# g3B09 <dbl>, g1int2 <dbl>, g2C01 <dbl>, g1A05 <dbl>, g1E01 <dbl>,
# g1B05 <dbl>, g3C05 <dbl>, g3A07 <dbl>, g1F01 <dbl>, g2D01 <dbl>,
# g1int3 <dbl>, g1int4 <dbl>, g1D05 <dbl>, g1E05 <dbl>, g1G05 <dbl>,
# g1C05 <dbl>, g1G11 <dbl>, g2D08 <dbl>, g2E06 <dbl>, g3H09 <dbl>, …
geom_point()
geom_line()
geom_boxplot()
geom_histogram()
Note! Here, you’ll get a warning if you do not state your choice of bin
geom_density()
age_group
Same data in these plots! If you can think it - You can build it in ggplot
my_plot <- ggplot(data = cancer_data,
aes(x = event_label,
y = g1CNS507,
fill = event_label)) +
geom_boxplot(alpha = 0.5,
show.legend = FALSE) +
coord_flip() +
labs(x = "Event After Diagnosis",
y = str_c("Expression level of g1CNS507 ",
"(log2 transformed)"),
title = str_c("A prognostic DNA signature ",
"for T1T2 node-negative ",
"breast cancer patients"),
subtitle = str_c("Labelling: good = no ",
" event, poor = early ",
"metastasis"),
caption = "Data from Gravier et al. (2010)")
Use a pen and paper to doodle your thoughts on a good visualisation
When googling write e.g. “Custom legend labels” and then do an image search to find a plot that look somewhat like what you are looking for and then go to the page, most often, there will be some code
Stick to the following syntaxical best-practises:
Explains the basic theory of data visualisation
Decipher the components of a ggplot
Use ggplot to do basic data visualisation
Making sure you are in tune with the LOs as we progress is essential for tracking your learning!
If you can think it, ggplot can plot
The trick is to build experience, practice makes perfect!
Let us go over a few examples
age_group
stratificationevent_label
stratificationmy_plot <- ggplot(data = cancer_data,
mapping = aes(x = age_group,
y = g7E05,
fill = event_label)) +
geom_boxplot() +
scale_fill_manual(labels = c("No", "Yes"),
values = c("green", "red")) +
labs(x = "Patient Age Group",
y = "log2 expression of g7E05",
fill = "Early Metastasis") +
theme_classic(base_size = 18,
base_family = "Avenir") +
theme(legend.position = "bottom")
my_plot <- ggplot(data = cancer_data,
mapping = aes(x = age_group,
y = g7E05,
fill = event_label)) +
geom_boxplot(alpha = 0.5) +
scale_fill_manual(labels = c("No", "Yes"),
values = c("green", "red")) +
labs(x = "Patient Age Group",
y = "log2 expression of g7E05",
fill = "Early Metastasis") +
theme_classic(base_size = 18,
base_family = "Avenir") +
theme(legend.position = "bottom")
my_plot <- ggplot(data = cancer_data,
mapping = aes(x = age_group,
y = g7E05,
fill = event_label)) +
geom_hline(yintercept = 0,
linetype = "dashed") +
geom_boxplot() +
scale_fill_manual(labels = c("No", "Yes"),
values = c("green", "red")) +
labs(x = "Patient Age Group",
y = "log2 expression of g7E05",
fill = "Early Metastasis") +
theme_classic(base_size = 18,
base_family = "Avenir") +
theme(legend.position = "bottom")
Note! Layer order matters!
pl1 <- ggplot(data = cancer_data,
mapping = aes(x = age_group,
y = g7E05,
fill = event_label)) +
geom_hline(yintercept = 0,
linetype = "dashed") +
geom_boxplot(alpha = 0.5) +
scale_fill_manual(labels = c("No", "Yes"),
values = c("green", "red")) +
labs(x = "Patient Age Group",
y = "log2 expression of g7E05",
fill = "Early Metastasis") +
theme_classic(base_size = 12,
base_family = "Avenir") +
theme(legend.position = "bottom")
pl2 <- ggplot(data = cancer_data,
mapping = aes(x = age_group,
y = g8C05,
fill = event_label)) +
geom_hline(yintercept = 0,
linetype = "dashed") +
geom_boxplot(alpha = 0.5) +
scale_fill_manual(labels = c("No", "Yes"),
values = c("green", "red")) +
labs(x = "Patient Age Group",
y = "log2 expression of g8C05",
fill = "Early Metastasis") +
theme_classic(base_size = 12,
base_family = "Avenir") +
theme(legend.position = "bottom")
If you had not noticed before, everything is an object, even a plot!
ggplot is very flexible
Practise makes perfect
Compare it with learning a langauge, initially there is a vocabulary boundary
Return to these slides, but also remember a meta learning objective is to seek out the solutions online
A VERY good place to go to get input is the RStudio community pages: https://community.rstudio.com
Also, have fun!
R for Bio Data Science