Data structures: matrices, data.frames, and lists


Ex. 3.1

Initialize a matrix with 2 rows and 3 columns that has all 1’s on the first row and all 2’s on the second row.

matrix <- matrix( c(1,1,1,2,2,2), 
                  nrow = 2, 
                  ncol = 3, 
                  byrow = TRUE)

Give it column and row names as you wish.

rownames(matrix) <- paste("Gene", seq(1,nrow(matrix)), sep="_")
colnames(matrix) <- paste("Sample", seq(1,ncol(matrix)), sep="_")
matrix
##        Sample_1 Sample_2 Sample_3
## Gene_1        1        1        1
## Gene_2        2        2        2

Check its dimensions.

dim(matrix)
## [1] 2 3
nrow(matrix)
## [1] 2
ncol(matrix)
## [1] 3


Ex. 3.2

Initialize a numeric vector to store the year in which you got your driving license and first car, respectively (NA values are possible).

driver <- c(2019, NA)

Set appropriate names.

names(driver) <- c("license", "car")

Initialize a character vector to store the names of your favorite cities in Europe (as many as you like).

favcities <- c("Venice", "Innsbruck", "Rome")

Save both vectors in a list using meaningful names.

myInfo <- list(car = driver, cities = favcities)
myInfo
## $car
## license     car 
##    2019      NA 
## 
## $cities
## [1] "Venice"    "Innsbruck" "Rome"

Extract the information about the your favorite cities using a numeric index and the list names.

myInfo[[2]]
## [1] "Venice"    "Innsbruck" "Rome"
myInfo[["cities"]]
## [1] "Venice"    "Innsbruck" "Rome"


Ex. 3.3

Imagine you have an experiment with 4 lung cancer cell lines: 2 from adenocarcinomas (LUAD) and 2 from squamous cell carcinomas (LUSC).

One cell line from each cancer subtype is treated with a drug, the other one is untreated, as shown in the figure below.

Save in a data.frame the info about the experiment: cell line identifier, lung-cancer sub-type, and treatment.

experiment <- data.frame(id = paste("CL", seq(1,4), sep=""),
                         cancer = rep(c("LUAD", "LUSC"), each=2),
                         treatment = rep(c("drug", "control"), times=2))
experiment
##    id cancer treatment
## 1 CL1   LUAD      drug
## 2 CL2   LUAD   control
## 3 CL3   LUSC      drug
## 4 CL4   LUSC   control

Check how many rows and columns the data.frame has.

dim(experiment)
## [1] 4 3
nrow(experiment)
## [1] 4
ncol(experiment)
## [1] 3


Ex. 3.4

Initialize then following matrix in R:

M <- matrix(1:6, nrow=3, byrow=FALSE) 
colnames(M) <- c("Sample.A", "Sample.B")
rownames(M) <- c("Gene.1", "Gene.2", "Gene.3")

Save the second row of the M matrix into:

  • A vector called v
  • A 1x2 matrix called N

Try to use both:

  • A numeric index
v <- M[2,]
v
## Sample.A Sample.B 
##        2        5
N <- M[2,,drop=FALSE]
N
##        Sample.A Sample.B
## Gene.2        2        5
  • The matrix row names
v <- M["Gene.2",]
v
## Sample.A Sample.B 
##        2        5
N <- M["Gene.2",,drop=FALSE]
N
##        Sample.A Sample.B
## Gene.2        2        5

Assess the length of v and the number of columns of N.

length(v)
## [1] 2
ncol(N)
## [1] 2


Ex. 3.5

Initialize a 10x5 matrix named M1 composed of all 1’s.

M1 <- matrix(1, nrow=10, ncol=5)

Initialize a 5x6 matrix named M2 composed of all 2’s.

M2 <- matrix(2, nrow=5, ncol=6)

Transpose M2 using the t function.

M2 <- t(M2)

Create a matrix named M3 by concatenating M1 and M2 rows.

M3 <- rbind(M1,M2)

Use the head function to extract the first 10 rows of M3 corresponding to M1.

head(M3, nrow(M1))
##       [,1] [,2] [,3] [,4] [,5]
##  [1,]    1    1    1    1    1
##  [2,]    1    1    1    1    1
##  [3,]    1    1    1    1    1
##  [4,]    1    1    1    1    1
##  [5,]    1    1    1    1    1
##  [6,]    1    1    1    1    1
##  [7,]    1    1    1    1    1
##  [8,]    1    1    1    1    1
##  [9,]    1    1    1    1    1
## [10,]    1    1    1    1    1

Similarly, use the tail function to extract the rows of M3 corresponding to M2.

tail(M3, nrow(M2))
##       [,1] [,2] [,3] [,4] [,5]
## [11,]    2    2    2    2    2
## [12,]    2    2    2    2    2
## [13,]    2    2    2    2    2
## [14,]    2    2    2    2    2
## [15,]    2    2    2    2    2
## [16,]    2    2    2    2    2

Count the occurrences of 1’s and 2’s in M3.

table(M3)
## M3
##  1  2 
## 50 30


Ex. 3.6

Create a vector a containing the first 100 positive integers.

a <- seq(1,100)

Create a vector b containing the first 200 even numbers.

Tip: to consider only even numbers, type help(seq) to learn how to use its by and length.out parameters.

b <-  seq(2, by=2, length.out=200)

Verify that all elements in b are multiple of 2 and that its length is 200.

Tip: the all function can be used to check that all the elements of a vector satisfy a condition.

all((b %% 2) == 0 )
## [1] TRUE
length(b)
## [1] 200

Initialize a vector c with the square root of b.

c <- sqrt(b)

Save a, b, and c in a list and compute their mean and standard deviations using the lapply function.

abc <- list(a,b,c)
lapply(abc, mean)
## [[1]]
## [1] 50.5
## 
## [[2]]
## [1] 201
## 
## [[3]]
## [1] 13.38188
lapply(abc, sd)
## [[1]]
## [1] 29.01149
## 
## [[2]]
## [1] 115.7584
## 
## [[3]]
## [1] 4.694183


Functions


Ex. 3.7

Write a function that takes as argument a numeric vector, computes its mean, variance, minimum, and maximum, and prints to screen the results together with some explanatory messages (e.g. “The mean of the vector is:…\n”) using the cat function.

Tip: the special character “\n” can be used at the end of the messages to go to a new line.

vectorstats <- function (myvec) {
  
  v.mean <- mean(myvec)
  v.var <- var(myvec)
  v.min <- min(myvec)
  v.max <- max(myvec)
  
  cat("The mean of the vector is:", v.mean, "\n")
  cat("The variance of the vector is:", v.var, "\n")
  cat("The minimum of the vector is:", v.min, "\n")
  cat("The maximum of the vector is:", v.max, "\n")
  
}

Apply your function to the a vector of Ex. 3.6.

vectorstats(a)
## The mean of the vector is: 50.5 
## The variance of the vector is: 841.6667 
## The minimum of the vector is: 1 
## The maximum of the vector is: 100


Ex. 3.8

Write a function that takes as arguments a numeric matrix and a parameter called dim that can have a value of either 1 or 2.

The function should compute the median of of the input matrix by rows, when dim equals 1, or by columns, when dim equals 2, using the apply function.

It then rounds the median to two decimal digits and return the results in a form of a numeric vector.

getmedian <- function (mymatrix, dim) {
  
  res <- apply(mymatrix, dim, median)
  res.round <- round(res, digits = 2)
  return(res.round)
  
}

Apply this function to the M3 matrix of Ex. 3.5 specifying once dim = 1 and once dim = 2.

getmedian(M3, dim=1)
##  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
getmedian(M3, dim=2)
## [1] 1 1 1 1 1