class: center, middle, inverse, title-slide # .hi-grey[原住民族資料分析線上訓練工作坊:R的基礎與應用] ## .smallest.hi-slate[第二週] ### .hi-slate[Kacing 廖彥傑] ### .smallest[英國艾賽克斯大學博士候選人] --- exclude: true --- layout: true # 結構 (Data Structure) --- name: <img src="./images/image01.png" width="60%" style="display: block; margin: auto;" /> ##### 參考書目:R Cookbook Proven Recipes for Data Analysis, Statistics, and Graphics by J. D. Long, Paul Teetor --- layout: true # 向量 (Vectors) --- name: 建立一個向量,命名(`names()`) 它。 ```r v <- c(20, 20, NA) names(v) <- c("Kacing", "Labak", "Truku") print(v) ``` ``` #> Kacing Labak Truku #> 20 20 NA ``` -- 提取名稱為 ```r v[["Kacing"]] ``` ``` #> [1] 20 ``` -- ```r v["Kacing"] ``` ``` #> Kacing #> 20 ``` --- layout: true # 列表 (Lists) --- name: 用`list()`建立list 物件 ```r a.list <- list(x = 1:6, y = "a", z = c(TRUE, FALSE)) ``` -- 提取它 ```r a.list$x ``` ``` #> [1] 1 2 3 4 5 6 ``` -- 另個方式提取它 ```r a.list[["x"]] ``` ``` #> [1] 1 2 3 4 5 6 ``` --- 比較一下`a.list[["x"]]`與`a.list["x"]` ```r a.list[["x"]] ``` ``` #> [1] 1 2 3 4 5 6 ``` ```r a.list["x"] ``` ``` #> $x #> [1] 1 2 3 4 5 6 ``` -- 用`c()`提取,挑取特定的**vector** ```r a.list[c(1,3)] ``` ``` #> $x #> [1] 1 2 3 4 5 6 #> #> $z #> [1] TRUE FALSE ``` --- layout: true # 矩陣 (matrix) --- name: `ncol = 1`的matrix ```r one.col.matrix <- matrix(1:6, ncol = 1) ``` -- `ncol = 2`的matrix ```r two.col.matrix <- matrix(1:6, ncol = 2) ``` -- 構面 ```r dim(two.col.matrix) ``` ``` #> [1] 3 2 ``` --- 建立4個col的matrix ```r A <- matrix(1:20, ncol = 4) ``` -- matrix的遞遺性 ```r A + 1 ``` ``` #> [,1] [,2] [,3] [,4] #> [1,] 2 7 12 17 #> [2,] 3 8 13 18 #> [3,] 4 9 14 19 #> [4,] 5 10 15 20 #> [5,] 6 11 16 21 ``` --- layout: true # factor (類化) --- name: ```r my.vector <- c(1, 1, 0, 0, 0, 1) my.factor <- factor(x = my.vector, levels = c(1, 0), labels = c("treated", "control")) ``` 用`levels()`看㔌幾種類別 ```r levels(my.factor) ``` ``` #> [1] "treated" "control" ``` --- layout: true # 內建函數 (build-in function) --- name: 用`str()`查看**a.list**內容 ```r str(a.list) ``` ``` #> List of 3 #> $ x: int [1:6] 1 2 3 4 5 6 #> $ y: chr "a" #> $ z: logi [1:2] TRUE FALSE ``` -- 用`append()`增加更多list 於之前的list ```r another.list <- append(a.list, list(yy = 1:10, zz = letters[5:1])) ``` -- 用`str()`查看 ```r str(another.list) ``` ``` #> List of 5 #> $ x : int [1:6] 1 2 3 4 5 6 #> $ y : chr "a" #> $ z : logi [1:2] TRUE FALSE #> $ yy: int [1:10] 1 2 3 4 5 6 7 8 9 10 #> $ zz: chr [1:5] "e" "d" "c" "b" ... ``` --- 刪除某個list 中的vector ```r a.list$y <- NULL ``` -- 鳥巢list (nested list) ```r nested.list <- list(A = list("a", "aa", "aaa"), B = list("b", "bb")) ``` -- 用`str()`查看 ```r str(nested.list, max.level = 1) ``` ``` #> List of 2 #> $ A:List of 3 #> $ B:List of 2 ``` --- 用`is.list()`,判斷**nested.list**是否屬於list 性質 ```r is.list(nested.list) ``` ``` #> [1] TRUE ``` -- 用`unlist()`去**list**特質 ```r c.vec <- unlist(nested.list) ``` -- 用`mode()`,判斷**nested.list**是否屬於list 性質 ```r mode(nested.list) ``` ``` #> [1] "list" ``` -- ```r names(nested.list) ``` ``` #> [1] "A" "B" ``` -- ```r mode(c.vec) ``` ``` #> [1] "character" ``` --- layout: true # 練習:提取鳥巢序列(nested list) --- name: 首先,建立一個鳥巢list,嘗試提換別的數值。 ```r nested.list <- list(A = list("a", "aa", "aaa"), B = list("b", "bb")) ``` -- ```r str(nested.list) ``` ``` #> List of 2 #> $ A:List of 3 #> ..$ : chr "a" #> ..$ : chr "aa" #> ..$ : chr "aaa" #> $ B:List of 2 #> ..$ : chr "b" #> ..$ : chr "bb" ``` -- 用以下範例自己跑看看 ```r nested.list[1] nested.list[[1]][2] nested.list[[1]][[2]] nested.list[2] nested.list[2][[1]] ``` --- layout: true # 資料架構 (Dataframe) --- name: 創建一個資料框架 ```r a.df <- data.frame(x = 1:6, y = "a", z = c(TRUE, FALSE)) ``` -- 用`is.data.frame()`判斷**a.df**是否屬於**資料框架** ```r is.data.frame(a.df) ``` ``` #> [1] TRUE ``` -- 用`is.list()`判斷**a.df**是否屬於**list序列** ```r is.list(a.df) ``` ``` #> [1] TRUE ``` --- `length()`看有多少變數 ```r length(a.df) ``` ``` #> [1] 3 ``` -- `colnames()`看變數名稱 ```r colnames(a.df) ``` ``` #> [1] "x" "y" "z" ``` -- 提取變數裡的資料 ```r a.df$x ``` ``` #> [1] 1 2 3 4 5 6 ``` -- 另一個方式,提取變數裡的資料 ```r a.df[["x"]] ``` ``` #> [1] 1 2 3 4 5 6 ``` -- **a.df["x"]** 有何不一樣,自己跑看看。 ```r a.df["x"] ``` --- 選取第一個column ```r a.df[ , 1] ``` ``` #> [1] 1 2 3 4 5 6 ``` ```r a.df[[1]] ``` ``` #> [1] 1 2 3 4 5 6 ``` 選取第一個row ```r a.df[1, ] ```
x
y
z
1
a
TRUE
--- 選取1至2 row ```r a.df[1:2, ] ```
x
y
z
1
a
TRUE
2
a
FALSE
-- 取帶a.df[1, 1]特為99 ```r a.df[1, 1] <- 99 ``` -- 把a.df[ , 1] 所有值變成-99 ```r a.df[ , 1] <- -99 ``` --- 用`subset()` 取中x大於3 ```r subset(a.df, x> 3) ```
x
y
z
-- ```r subset(a.df, x > 5)[ , -3] ```
x
y
-- ```r subset(a.df, x > 3)$x ``` ``` #> numeric(0) ``` --- 用`which()`來找出特定值 ```r which(colnames(a.df) == "y") ``` ``` #> [1] 2 ``` -- 不要選取用`-`,選區用`+` ```r a.df[ , -which(colnames(a.df) == "y")] ```
x
z
-99
TRUE
-99
FALSE
-99
TRUE
-99
FALSE
-99
TRUE
-99
FALSE