Examples

In the following section, you can found examples on how to use each functions with an brief explanation of the function, and the requirements for the inputs.

gather

gather: restructure the data gather(data, key, value, column_names)

Example 1: stack Qrt1, Qrt2,Qrt3,Qrt4 into a new column Quarter.

df_gathered <- df %>% 
  gather(Quarter, Revenue, Qtr1:Qtr4)

head(df_gathered)

Example 2: stack X, Y, Z into a new column stock

stock_gathered <- stocks %>% 
  gather(stock, price, X:Z)
head(stock_gathered)

spread

spread: undo gather() spread(column_to_split_on, value)

Example 1: convert stock_gathered back to original format

stock_gathered %>% spread(stock,price)# split different values within column

#           stock into different columns, fill in corresponding price values.

Example 2: spread the data set out base on time

stock_gathered %>% spread(time, price)# split different values within column

      #time into different columns, fill in corresponding price values.

separate

separate: turn a single character column into multiple columns separate(data = data, col = target_column_name, into = c(‘new_col_name_1’,‘new_col_name2’)) Note: by default, it is separated by non alphanumeric values, like ‘.’ and ‘-’

Create a toy data set with only one column x

df1 <- data.frame(x = c(NA, "a.x", "b.y", "c.z"))
df1

Example 1: separate the values in column x into two columns ABC and XYZ. when argument sep is not specified, it is separated by non alphanumeric values, like ‘.’ and ‘-’. See example 2.

df1 %>% separate(col = x, into = c("ABC", "XYZ"))

Example 2: separate by ‘+’

#create another toy data set
df2 <- data.frame(x= c('1+0', "1+0.5", "1+1", "1+2"))
df2

df2_se<- separate(data = df2, col=x, into = c("abc", "xyz"), sep = "([+])")# separate on +
df2_se

unite

unite: undo separate; combine values in multiple columns into one unite(data = data, col_names_to_be_joined_separated_by_commas, sep = separator to use between joined values.)

Example 1: convert df2_se back to df2

df2_se %>% 
  unite(new_joined, abc,xyz, sep = '+' )

Join tables

inner_join(data1, data2) keeps the common rows and drops other rows
left_join(data1, data2) keeps all the rows in data1 add the extra columns from data2 if they are in common, put NA when no data available. Note: right_join() is similar to left. Same result returns when switch data1 and data 2.
full_join(data1, data2) keeps all rows from both data set, use NA when no data is available.
anti_join: shows the rows in the left table that don’t have a match in the right one. Note: Most commonly you’d use it to check that rows weren’t dropped from the left table.For instance, you could check that the number of rows in an anti join is zero.

# create two tables with a common column name x1

df1 <- tibble(x1 = c('A','B','C','E','A'),
              x2 = c(1,2,3,4,5))
df2 <- tibble(x1 = c('A','B','D'),
              x3 = c(T,F,T))
df1

df2

inner_join(df1,df2)

## Joining, by = "x1"

left_join(df1,df2)

## Joining, by = "x1"

left_join(df2, df1)

## Joining, by = "x1"

full_join(df1, df2)

## Joining, by = "x1"

anti_join(df1,df2)#shows the rows that have been dropped by inner_join()

## Joining, by = "x1"

nrow(anti_join(df1,df2))==0

## Joining, by = "x1"

## [1] FALSE

To be continued…

Examples on using functions from tidyr package

Xiaona Zhou

4/19/2020

Overview