Data visualization with ggplot2

Nyamisi Peter & Masumbuko Semba

2024-04-09

require(tidyverse)
require(patchwork)
require(magrittr)

knitr::knit_hooks$set(crop = knitr::hook_pdfcrop)

Learning Agenda

  1. Get familiar with R and Rstudio
  2. Data structure and data types
  3. Reading and writing data in Rstudio
  4. Tidying with tidyverse
  5. Plotting and Visualization
  6. Data manipulation with tidyverse
  7. Descriptive Statistics
  8. Inferential Statistics
  9. Modelling and simulation
  10. Spatial Handling and Analysis

GGPLOT

  • The first function in building graph is the ggplot

  • It specify the data frame to be used and aesthetics

  • The mapping are placed with aes function

  • load package

require(tidyverse)
  • Import data with
chinook = read_csv("chinook_lw.csv")
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )

Exercise 1 Why is the graphy empty?

Solution. The problem is that we have not specified the geom

GGPLOT……….

  • geom are the geometric objects (points, lines, polygons etc)
  • they are added using geom_ function
  • In ggplot2 framework, layers are added using the + sign.
  • We add points using the geom_point function to create a plot
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point()

Exercise 2 What type of plot is this?

Solution. A scatterplot, shows a relationship of two continuous variables

Polish and Decorate Plots

  • Several parameters can be specified in geom
  • Options for color, size, shape, alpha, etc of a geom
  • color control color, size control size, shape control shape and alpha control transparancy
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point(
    color = "cornflowerblue",
    alpha = .7,
    size = 2
)

Add a Best-fit line

  • Since is the scatterplot
  • We can add best-fit line
  • Tha is done by adding geom_smooth layer
  • function to control type of line (linear, quadratic, non-parametric)
  • options for thickness of line, color and presence os standard error
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point(
    color = "cornflowerblue",
    alpha = .7,
    size = 2
) +
  geom_smooth(
    method = "lm", 
    se = TRUE, 
    color = "cornflowerblue", 
    fill = "cornflowerblue"
    )

Distinguish plot by variable

  • You can also map in aes with a third variable (x,y,z)
  • The third variable can be used to differentiate color, size, shape etc of the geometry
  • This allows groups of observations to be superimposed in single graph
  • Lets add loc, which are sampling sites of the dataset
ggplot(
  data = chinook, 
  aes(
    x = tl, 
    y = w, 
    color = loc, 
    fill = loc)
  )+
  geom_point(
    alpha = .7,
    size = 2
) +
  geom_smooth(
    method = "lm", 
    se = TRUE
    )

Transform axis with scale_

  • scales control how variable are mapped
  • They help in visual presentation
  • They all start with scale_ function
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point(
    alpha = .7,
    size = 3)+
  geom_smooth(
    method = "lm", 
    se = TRUE, 
    )+
  scale_x_continuous(breaks = c(25,75,105), name = "Total length (mm)")+
  scale_y_continuous(breaks = c(5,15), name = "Weight (gm)")

Facetting plots

  • Facets reproduce a graph for each level of a given variable
  • Facets are created using facet_ function
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point(
    alpha = .7,
    size = 3)+
  geom_smooth(
    method = "lm", 
    se = TRUE, 
    )+
  scale_x_continuous(breaks = c(25,75,105), name = "Total length (mm)")+
  scale_y_continuous(breaks = c(5,15), name = "Weight (gm)")+
  facet_wrap(~loc, nrow = 2)

Label Titles…….

  • Graphs should be well labelled and titled
  • Labels assist the reader to easily interpret
  • The labs function provide customized labels for the axes
  • labs also provide additional function to customize title, subtitles and caption
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point(
    alpha = .7,
    size = 3)+
  geom_smooth(
    method = "lm", 
    se = TRUE, 
    )+
  scale_x_continuous(breaks = c(25,75,105), name = "Total length (mm)")+
  scale_y_continuous(breaks = c(5,15), name = "Weight (gm)")+
  labs(
    title = "Relationship between length and weight of Chinook",
    subtitle = "Sampled in three sites of Argentina",
    caption = "https:semba-blog.netlify.app/"
  )

Change the appearance

  • Appearance of graph can be customized with theme_
  • These functions control background colors, font. grid-lines, legend placement etc
ggplot(
  data = chinook, 
  aes(x = tl, y = w)
  )+
  geom_point(
    alpha = .7,
    size = 3)+
  geom_smooth(
    method = "lm", 
    se = TRUE, 
    )+
  scale_x_continuous(breaks = c(25,75,105), name = "Total length (mm)")+
  scale_y_continuous(breaks = c(5,15), name = "Weight (gm)")+
  labs(
    title = "Relationship between length and weight of Chinook",
    subtitle = "Sampled in three sites of Argentina",
    caption = "https:semba-blog.netlify.app/"
  )+
  theme_minimal(base_size = 40)

Bonus Video…………

Figure 1: The video “CERN: The Journey of Discovery”

In Figure 1

Thank You for Attending

Acknowledgments

I am grateful for the insightful comments offered by the anonymous peer reviewers at Books & Texts. The generosity and expertise of one and all have improved this study in innumerable ways and saved me from many errors; those that inevitably remain are entirely my own responsibility.