Data Structures in R

Mastering Vectors and Data Frames for Data Manipulation

Masumbuko Semba

2024-02-01

Introduction

Introduction

something

something

Introduction

  • R is a powerful language for data analysis and statistics.
  • Vectors and data frames are fundamental data structures in R.
  • Understanding them is essential for any R user.

What is a Vector?

What is a Vector?

  • A vector is a basic data structure in R.
  • A vector is a one-dimensional collection of data elements.
  • All elements in a vector must be of the same data type (numeric, character, logical).
sst_vector <- c(18.5, 19.2, 20.1, 18.8, 19.5)

Creating Vectors

  • Vectors are commonly created using the c() function.
  • Other functions can also be used, such as seq() and rep().

Types of Vectors

  • Integer vector
id = c(101, 102, 103, 104, 105)
  • Numeric Vector
sst_vector <- c(18.5, 19.2, 20.1, 18.8, 19.5)
  • Character vector
coastal.cities = c( "Mombasa", "Tanga", "Dar es Salaam", "Pwani", "Mtwara" )
  • Logical vector
cities =  c(TRUE, TRUE, TRUE, FALSE, FALSE)
  • Date and time vector
became = c("1975-01-25", "2009-07-01", "1963-12-09", NA, NA)

What is a Data Frame?

What is a Data Frame?

  • A data frame is a two-dimensional table of data.
  • It has rows and columns, similar to a spreadsheet.
  • Each column represents a variable, and each row + represents a data point (record).
  • Each column can be of a different data type.

Creating Data Frames

  • Data frames can be created from vectors using the data.frame() function.
  • Alternatively, we can assign data directly to columns.
# Define the data
countries <- c("Comoros", "Kenya", "Madagascar", "Mauritius", "Mozambique",
              "Réunion", "Seychelles", "Somalia", "South Africa", "Tanzania")
sst <- c(27.5, 28.2, 27.8, 26.9, 27.3, 26.7, 28.1, 28.4, 26.2, 27.9)
chl_a <- c(0.12, 0.15, 0.18, 0.20, 0.17, 0.19, 0.14, 0.11, 0.13, 0.16)
salinity <- c(35.2, 35.4, 35.3, 35.1, 35.2, 35.0, 35.3, 35.5, 35.1, 35.2)
surface_current <- c(0.5, 0.7, 0.6, 0.8, 0.7, 0.9, 0.6, 0.5, 0.8, 0.7)
wind_speed <- c(5.2, 5.8, 5.5, 6.0, 5.7, 6.2, 5.4, 5.1, 5.6, 5.3)



# Create the data frame
wio_data <- data.frame(
  Country = countries,
  SST = sst,
  Chl_a = chl_a,
  Salinity = salinity,
  Surface_Current = surface_current,
  Wind_Speed = wind_speed
)

# Print the data frame
print(wio_data)
        Country  SST Chl_a Salinity Surface_Current Wind_Speed
1       Comoros 27.5  0.12     35.2             0.5        5.2
2         Kenya 28.2  0.15     35.4             0.7        5.8
3    Madagascar 27.8  0.18     35.3             0.6        5.5
4     Mauritius 26.9  0.20     35.1             0.8        6.0
5    Mozambique 27.3  0.17     35.2             0.7        5.7
6       Réunion 26.7  0.19     35.0             0.9        6.2
7    Seychelles 28.1  0.14     35.3             0.6        5.4
8       Somalia 28.4  0.11     35.5             0.5        5.1
9  South Africa 26.2  0.13     35.1             0.8        5.6
10     Tanzania 27.9  0.16     35.2             0.7        5.3

Summary

  • Vectors and data frames are essential building blocks for data analysis in R.
  • Understanding their features and operations is crucial for efficient data manipulation.