class: center, middle, inverse, title-slide # MULTIVARIJATNE STATISTIČKE METODE ## Predavanje 4: Analiza glavnih komponenti ### Luka Šikić, PhD ### Fakultet hrvatskih studija |
Github MV
--- <style type="text/css"> @media print { .has-continuation { display: block !important; } } remark-slide-content { font-size: 22px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 16px; } .mid. remark-code { /*Change made here*/ font-size: 60% !important; } .tiny .remark-code { /*Change made here*/ font-size: 40% !important; } </style> # Pregled predavanja <br> <br> <br> 1. [Opis analize glavnih komponenti (PCA)](#opis) 2. [Primjeri](#primjer) 3. [Praktična primjena](#primjena) --- class: inverse, center, middle name: opis # ANALIZA GLAVNIH KOMPONENTI <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Definicija) --- # Osnovne karakteristike PCA <br> <br> - PCA uvodi K. Pearson (1901), a metode izračuna u praksi Hotelling (1933) <br> - Jedna od jednostavnijih metoda multivarijatne analize <br> - Polazište za PCA je skup (visoko) koreliranih varijabli <br> - Cilj PCA je redukcija mnoštva *koreliranih* varijabli na mali broj (2-3) međusobno nekoreliranih varijabli, odnosno smanjenje dimenzionalnosti podataka <br> - Redukcija se provodi linearnom projekcijom početnog skupa varijabli na nove, sintetičke varijable (glavne komponente:GK) koje su poredane prema veličini varijance <br> - Nove varijable (GK) od interesa su one sa najvećom varijancom <br> - U praktičnom smislu je najčešće riječ o dvije ili tri komponente <br> - Nove varijable (GK) se najčešće koriste za dvodimenzionalnu vizualizaciju, odnosno pojednostavljenje velikog skupa podataka <br> - Nove varijable (GK) se također mogu koristiti kao imputi za daljnju analizu (primjerice regresijski model) --- # Proceura provedbe PCA <br> <br> 1. Provedi skaliranje originalnih varijabli tako da imaju prosjek 0 i standardnu devijaciju 1. <br> 2. Izračunaj kovarijančnu matricu. U slučaju da je prvi korak proveden, dovoljno je izračunati korelacijsku matricu! <br> 3. Izračunaj svojstvene vrijednosti i pripadajuće svojstvene vektore matrice iz prethodnog koraka. Koeficijenti glavnih komponenti su svojstveni vektori, a svojstvene vrijednosti su varijanca glavnih kmponenti. <br> 4. Zadrži onaj dio glavnih komponenti koji sadržava najveći stupanj varijabilnosti (2-3 komponente). --- class: inverse, center, middle name: primjer # PRIMJERI PCA <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Gdje se koristi PCA!?) --- # Ptice i vremenska nepogoda ##### Pregled podataka <img src="../Foto/sparrow_dta_0.png" width="300px" style="display: block; margin: auto;" /> --- # Ptice i vremenska nepogoda ##### ...nastavak <img src="../Foto/sparrow_dta.png" width="400px" style="display: block; margin: auto;" /> --- # Ptice i vremenska nepogoda ##### Korelacija između pet mjera veličine ptica <br> <br> <br> <img src="../Foto/sparrow_cor.png" width="500px" style="display: block; margin: auto;" /> --- # Ptice i vremenska nepogoda ##### Svojstvene vrijednosti i vektori <br> <br> <br> <img src="../Foto/sparrow_eigen.png" width="500px" style="display: block; margin: auto;" /> --- # Ptice i vremenska nepogoda ##### Provedena PCA <br> <br> <br> <img src="../Foto/sparrow_compare.png" width="500px" style="display: block; margin: auto;" /> --- # Ptice i vremenska nepogoda ##### PCA vizualizacija (Crne točkice označavaju umrle ptice) <img src="../Foto/sparrow_PCA.png" width="500px" style="display: block; margin: auto;" /> --- # Struktura zaposlenosti u europskim zemljama ##### Pregled podataka <img src="../Foto/european_dta.png" width="300px" style="display: block; margin: auto;" /> --- # Struktura zaposlenosti u europskim zemljama ##### Korelacijska matrica <br> <br> <img src="../Foto/european_cor.png" width="500px" style="display: block; margin: auto;" /> --- # Struktura zaposlenosti u europskim zemljama ##### PCA vizualizacija <img src="../Foto/european_PCA.png" width="500px" style="display: block; margin: auto;" /> --- # Elementi razvoja osobnosti kod djece ##### PCA opterećenja <br> <br> <img src="../Foto/djeca_PCA.png" width="450px" style="display: block; margin: auto;" /> --- # Uspjeh u srednjoj školi ##### Korelacija između postignuća na različitim predmetima <br> <br> <img src="../Foto/grades_cor.png" width="500px" style="display: block; margin: auto;" /> --- # Uspjeh u srednjoj školi ##### Provedena PCA <br> <br> <img src="../Foto/grades_PCA.png" width="500px" style="display: block; margin: auto;" /> --- # Uspjeh u srednjoj školi ##### Vizualizacija dvije dominantne PCA komponente <img src="../Foto/grades_PCA_gg.png" width="500px" style="display: block; margin: auto;" /> --- # Uspjeh u srednjoj školi ##### Opterećenja za prve dvije PCA komponente <img src="../Foto/grades_loadings.png" width="400px" style="display: block; margin: auto;" /> --- # Zaposlenost u EU ##### Opis varijabli <br> <br> <img src="../Foto/zaposlenost_sifranik.png" width="500px" style="display: block; margin: auto;" /> --- # Zaposlenost u EU ##### Korelacijska matrica <img src="../Foto/zaposlenost_cor.png" width="450px" style="display: block; margin: auto;" /> --- # Zaposlenost u EU ##### Varijanca pripadajućih PCA komponenti <img src="../Foto/zaposlenost_PCA.png" width="500px" style="display: block; margin: auto;" /> --- # Zaposlenost u EU ##### PCA opterećenja komponenti <img src="../Foto/zaposlenost_loadings.png" width="500px" style="display: block; margin: auto;" /> --- #Zaposlenost u EU ##### Scree grafikon za PCA analizu <img src="../Foto/zaposlenost_scree.png" width="450px" style="display: block; margin: auto;" /> --- # Zaposlenost u EU ##### Vizualizacija dvije glavne PCA komponente po sektorima <img src="../Foto/zaposlenost_PCA_gg.png" width="450px" style="display: block; margin: auto;" /> --- # Zaposlenost u EU ##### Vizualizacija dvije glavne PCA komponente po zemljama <img src="../Foto/zaposlenost_zemlje.png" width="450px" style="display: block; margin: auto;" /> --- # Socijalna mobilnost u UK ##### Pregled varijabli <img src="../Foto/socmob_sifranik.png" width="500px" style="display: block; margin: auto;" /> --- # Socijalna mobilnost u UK ##### Korelacijska matrica <img src="../Foto/socmob_cor.png" width="500px" style="display: block; margin: auto;" /> --- # Socijalna mobilnost u UK ##### PCA opterećenja <img src="../Foto/socmob_loadings.png" width="500px" style="display: block; margin: auto;" /> --- # Gledanje televizije u UK ##### Korelacijska matrica <img src="../Foto/tv_cor.png" width="500px" style="display: block; margin: auto;" /> --- # Gledanje televizije u UK ##### Scree grafikon <img src="../Foto/tv_scree.png" width="500px" style="display: block; margin: auto;" /> --- # Gledanje televizije u UK ##### PCA vizualizacija <img src="../Foto/tv_pca_gg.png" width="500px" style="display: block; margin: auto;" /> --- class: inverse, center, middle name: primjena # PRAKTIČNA PRIMJENA <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (PCA u praksi) --- # Primjer 1 (Kriminal u SAD) ##### Podatci ```r # vidi https://uc-r.github.io/pca data("USArrests") # Učitaj podatke head(USArrests,10) # Pregledaj podatke ``` ``` ## Murder Assault UrbanPop Rape ## Alabama 13.2 236 58 21.2 ## Alaska 10.0 263 48 44.5 ## Arizona 8.1 294 80 31.0 ## Arkansas 8.8 190 50 19.5 ## California 9.0 276 91 40.6 ## Colorado 7.9 204 78 38.7 ## Connecticut 3.3 110 77 11.1 ## Delaware 5.9 238 72 15.8 ## Florida 15.4 335 80 31.9 ## Georgia 17.4 211 60 25.8 ``` --- # Primjer 1 (Kriminal u SAD) ##### Skaliranje varijabli ```r apply(USArrests, 2, var) # Pogledaj varijance verijabli ``` ``` ## Murder Assault UrbanPop Rape ## 18.97047 6945.16571 209.51878 87.72916 ``` ```r skalirano_dta <- apply(USArrests,2, scale) # Standardiziraj varijable str(skalirano_dta) # Pogledaj novi objekt ``` ``` ## num [1:50, 1:4] 1.2426 0.5079 0.0716 0.2323 0.2783 ... ## - attr(*, "dimnames")=List of 2 ## ..$ : NULL ## ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape" ``` ```r head(skalirano_dta,4) # pogledaj podatke ``` ``` ## Murder Assault UrbanPop Rape ## [1,] 1.24256408 0.7828393 -0.5209066 -0.003416473 ## [2,] 0.50786248 1.1068225 -1.2117642 2.484202941 ## [3,] 0.07163341 1.4788032 0.9989801 1.042878388 ## [4,] 0.23234938 0.2308680 -1.0735927 -0.184916602 ``` --- # Primjer 1 (Kriminal u SAD) ##### Prilagodba podataka ```r # Za izračun glavnih komponenti: ## 1. izračunaj kovarijančnu matricu kov_dta <- cov(skalirano_dta) ## 2. Izračunaj svojstvene vrijednosti km eig_kov_dta <- eigen(kov_dta) str(eig_kov_dta) # Pogledaj objekt ``` ``` ## List of 2 ## $ values : num [1:4] 2.48 0.99 0.357 0.173 ## $ vectors: num [1:4, 1:4] -0.536 -0.583 -0.278 -0.543 0.418 ... ## - attr(*, "class")= chr "eigen" ``` ```r ## 3. Spremi opterećenja u novi objekt opt <- eig_kov_dta$vectors[,1:2] ## 4. Okreni smjer svojstvenih vektora opt <- -opt ## 5. Pripiši nazive row.names(opt) <- c("Murder", "Assault", "UrbanPop", "Rape") colnames(opt) <- c("PC1", "PC2") ``` --- # Primjer 1 (Kriminal u SAD) ##### Prilagodba podataka ```r head(opt) # Pogledaj objekt ``` ``` ## PC1 PC2 ## Murder 0.5358995 -0.4181809 ## Assault 0.5831836 -0.1879856 ## UrbanPop 0.2781909 0.8728062 ## Rape 0.5434321 0.1673186 ``` --- # Primjer 1 (Kriminal u SAD) ##### Prilagodba podataka (nastavak) ```r ## 6. Izračunaj koeficijente glavnih komponenti PC1 <- as.matrix(skalirano_dta) %*% opt[,1] PC2 <- as.matrix(skalirano_dta) %*% opt[,2] ## 7. Poveži u podatkovni okvir PC <- data.frame(GEO = row.names(USArrests), PC1, PC2) head(PC) ``` ``` ## GEO PC1 PC2 ## 1 Alabama 0.9756604 -1.1220012 ## 2 Alaska 1.9305379 -1.0624269 ## 3 Arizona 1.7454429 0.7384595 ## 4 Arkansas -0.1399989 -1.1085423 ## 5 California 2.4986128 1.5274267 ## 6 Colorado 1.4993407 0.9776297 ``` --- # Primjer 1 (Kriminal u SAD) ##### Prikaži grafički ```r # Prikaži prve dvije PC komponente grafički ggplot2:: ggplot(PC, aes(PC1,PC2)) + modelr:: geom_ref_line(h=0) + modelr:: geom_ref_line(v= 0) + geom_text(aes(label = GEO), size = 1.5) + xlab("PC1") + ylab("PC2") + ggtitle("Prve dvije glavne komponente za USArrests podatkovni okvir") + theme_grey(base_size = 10) -> USArViz ``` --- # Primjer 1 (Kriminal u SAD) ##### Prikaži grafički ```r USArViz ``` <img src="04_PCA_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> --- # Primjer 1 (Kriminal u SAD) ##### Scree plot ```r # Izračunaj varijabilnost vezanu uz glavne komponente PCvar <- eig_kov_dta$values / sum(eig_kov_dta$values) print(round(PCvar,2)) # Prikaži podatke ``` ``` ## [1] 0.62 0.25 0.09 0.04 ``` ```r # Prikaži grafički ## Scree PVEplot <- qplot(c(1:4), PCvar) + geom_line() + xlab("PC") + ylab("PVE") + ggtitle("Scree") + ylim(0, 1) ``` --- # Primjer 1 (Kriminal u SAD) ##### Scree plot ```r PVEplot ``` <img src="04_PCA_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> --- # Primjer 1 (Kriminal u SAD) ##### Kumulativni zbroj ```r ## CumSum scree cumPVE <- qplot(c(1:4), cumsum(PCvar)) + geom_line() + xlab("PC") + ylab(NULL) + ggtitle("Cumulative Sum Scree") + ylim(0,1) ``` --- # Primjer 1 (Kriminal u SAD) ##### Kumulativni zbroj ```r cumPVE ``` <img src="04_PCA_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> --- # Primjer 1 (Kriminal u SAD) ##### PCA pomoću funkcija .tiny[ ```r ## PROVEDI PROCEDURU PUTEM FORMULA ## PCA_fun <- stats::prcomp(USArrests, scale = T) names(PCA_fun) # Pregledaj objekt ``` ``` ## [1] "sdev" "rotation" "center" "scale" "x" ``` ```r # Pogledaj prosjek komponenti po varijablama PCA_fun$center ``` ``` ## Murder Assault UrbanPop Rape ## 7.788 170.760 65.540 21.232 ``` ```r # Pogledaj stdev komponenti po varijablama PCA_fun$scale ``` ``` ## Murder Assault UrbanPop Rape ## 4.355510 83.337661 14.474763 9.366385 ``` ```r # Pogledaj opterećenja PCA_fun$rotation ``` ``` ## PC1 PC2 PC3 PC4 ## Murder -0.5358995 0.4181809 -0.3412327 0.64922780 ## Assault -0.5831836 0.1879856 -0.2681484 -0.74340748 ## UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773 ## Rape -0.5434321 -0.1673186 0.8177779 0.08902432 ``` ] --- # Primjer 1 (Kriminal u SAD) ##### PCA pomoću funkcija ```r # Promjeni predznak PCA_fun$rotation <- -PCA_fun$rotation PCA_fun$rotation ``` ``` ## PC1 PC2 PC3 PC4 ## Murder 0.5358995 -0.4181809 0.3412327 -0.64922780 ## Assault 0.5831836 -0.1879856 0.2681484 0.74340748 ## UrbanPop 0.2781909 0.8728062 0.3780158 -0.13387773 ## Rape 0.5434321 0.1673186 -0.8177779 -0.08902432 ``` ```r # Izračunaj koeficijente PCA_fun$x <- -PCA_fun$x head(PCA_fun$x) ``` ``` ## PC1 PC2 PC3 PC4 ## Alabama 0.9756604 -1.1220012 0.43980366 -0.154696581 ## Alaska 1.9305379 -1.0624269 -2.01950027 0.434175454 ## Arizona 1.7454429 0.7384595 -0.05423025 0.826264240 ## Arkansas -0.1399989 -1.1085423 -0.11342217 0.180973554 ## California 2.4986128 1.5274267 -0.59254100 0.338559240 ## Colorado 1.4993407 0.9776297 -1.08400162 -0.001450164 ``` --- # Primjer 1 (Kriminal u SAD) ##### PCA pomoću funkcija ```r # Vizualiziraj # biplot(PCA_fun, scale = 0) # standardna funkcija PCbiplot(PCA_fun) # custom made funkcija; vidi .Rmd ``` <img src="04_PCA_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" /> --- # Primjer 1 (Kriminal u SAD) ##### PCA pomoću funkcija ```r VE <- PCA_fun$sdev^2 PCv <- VE / sum(VE) print(round(PCv, 2)) ``` ``` ## [1] 0.62 0.25 0.09 0.04 ``` --- # Primjer 2 (Sportski rezultati) ##### Pregled podataka ```r # Vidi niže za detaljnjiji opis # Podatci data(decathlon2) head(decathlon2, 5) ``` ``` ## X100m Long.jump Shot.put High.jump X400m X110m.hurdle Discus ## SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69 43.75 ## CLAY 10.76 7.40 14.26 1.86 49.37 14.05 50.72 ## BERNARD 11.02 7.23 14.25 1.92 48.93 14.99 40.87 ## YURKOV 11.34 7.09 15.19 2.10 50.42 15.31 46.26 ## ZSIVOCZKY 11.13 7.30 13.48 2.01 48.62 14.17 45.67 ## Pole.vault Javeline X1500m Rank Points Competition ## SEBRLE 5.02 63.19 291.7 1 8217 Decastar ## CLAY 4.92 60.15 301.5 2 8122 Decastar ## BERNARD 5.32 62.77 280.1 4 8067 Decastar ## YURKOV 4.72 63.44 276.4 5 8036 Decastar ## ZSIVOCZKY 4.42 55.37 268.0 7 8004 Decastar ``` .footnote[ [*]Za detalje pogledaj: [analiza 1](https://raw.githack.com/clemonster/decathlon-pca/master/decathlon_full_analysis.html) i [analiza 2](https://husson.github.io/MOOC_GB/RMarkdown_PCA_Decathlon.pdf).] --- # Primjer 2 (Sportski rezultati) ##### Pregled podataka (nastavak) ```r str(decathlon2) ``` ``` ## 'data.frame': 27 obs. of 13 variables: ## $ X100m : num 11 10.8 11 11.3 11.1 ... ## $ Long.jump : num 7.58 7.4 7.23 7.09 7.3 7.31 6.81 7.56 6.97 7.27 ... ## $ Shot.put : num 14.8 14.3 14.2 15.2 13.5 ... ## $ High.jump : num 2.07 1.86 1.92 2.1 2.01 2.13 1.95 1.86 1.95 1.98 ... ## $ X400m : num 49.8 49.4 48.9 50.4 48.6 ... ## $ X110m.hurdle: num 14.7 14.1 15 15.3 14.2 ... ## $ Discus : num 43.8 50.7 40.9 46.3 45.7 ... ## $ Pole.vault : num 5.02 4.92 5.32 4.72 4.42 4.42 4.92 4.82 4.72 4.62 ... ## $ Javeline : num 63.2 60.1 62.8 63.4 55.4 ... ## $ X1500m : num 292 302 280 276 268 ... ## $ Rank : int 1 2 4 5 7 8 9 10 11 12 ... ## $ Points : int 8217 8122 8067 8036 8004 7995 7802 7733 7708 7651 ... ## $ Competition : Factor w/ 2 levels "Decastar","OlympicG": 1 1 1 1 1 1 1 1 1 1 ... ``` --- # Primjer 2 (Sportski rezultati) ##### Prilagodba podataka ```r # Definiraj podatke za analizu decathlon2.active <- decathlon2[1:23, 1:10] head(decathlon2.active[, 1:6], 10) ``` ``` ## X100m Long.jump Shot.put High.jump X400m X110m.hurdle ## SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69 ## CLAY 10.76 7.40 14.26 1.86 49.37 14.05 ## BERNARD 11.02 7.23 14.25 1.92 48.93 14.99 ## YURKOV 11.34 7.09 15.19 2.10 50.42 15.31 ## ZSIVOCZKY 11.13 7.30 13.48 2.01 48.62 14.17 ## McMULLEN 10.83 7.31 13.76 2.13 49.91 14.38 ## MARTINEAU 11.64 6.81 14.57 1.95 50.14 14.93 ## HERNU 11.37 7.56 14.41 1.86 51.10 15.06 ## BARRAS 11.33 6.97 14.09 1.95 49.48 14.48 ## NOOL 11.33 7.27 12.68 1.98 49.20 15.29 ``` --- # Primjer 2 (Sportski rezultati) ##### Provedi PCA .tiny[ ```r # Provedi PCA procjena_PCA <- FactoMineR::PCA(decathlon2.active, graph = F) print(procjena_PCA) ``` ``` ## **Results for the Principal Component Analysis (PCA)** ## The analysis was performed on 23 individuals, described by 10 variables ## *The results are available in the following objects: ## ## name description ## 1 "$eig" "eigenvalues" ## 2 "$var" "results for the variables" ## 3 "$var$coord" "coord. for the variables" ## 4 "$var$cor" "correlations variables - dimensions" ## 5 "$var$cos2" "cos2 for the variables" ## 6 "$var$contrib" "contributions of the variables" ## 7 "$ind" "results for the individuals" ## 8 "$ind$coord" "coord. for the individuals" ## 9 "$ind$cos2" "cos2 for the individuals" ## 10 "$ind$contrib" "contributions of the individuals" ## 11 "$call" "summary statistics" ## 12 "$call$centre" "mean of the variables" ## 13 "$call$ecart.type" "standard error of the variables" ## 14 "$call$row.w" "weights for the individuals" ## 15 "$call$col.w" "weights for the variables" ``` ] --- # Primjer 2 (Sportski rezultati) ##### Svojstvene vrijednosti ```r # Izvuci svojstvene vrijednosti svojstvene_vrijednosti <- factoextra::get_eigenvalue(procjena_PCA) print(svojstvene_vrijednosti) ``` ``` ## eigenvalue variance.percent cumulative.variance.percent ## Dim.1 4.1242133 41.242133 41.24213 ## Dim.2 1.8385309 18.385309 59.62744 ## Dim.3 1.2391403 12.391403 72.01885 ## Dim.4 0.8194402 8.194402 80.21325 ## Dim.5 0.7015528 7.015528 87.22878 ## Dim.6 0.4228828 4.228828 91.45760 ## Dim.7 0.3025817 3.025817 94.48342 ## Dim.8 0.2744700 2.744700 97.22812 ## Dim.9 0.1552169 1.552169 98.78029 ## Dim.10 0.1219710 1.219710 100.00000 ``` --- # Primjer 2 (Sportski rezultati) ##### Svojstvene vrijednosti ```r # Prikaži grafički factoextra::fviz_eig(procjena_PCA, addlabels = T, ylim = c(0,45)) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r ## VARIJABLE ## vars <- factoextra::get_pca_var(procjena_PCA) print(vars) ``` ``` ## Principal Component Analysis Results for variables ## =================================================== ## Name Description ## 1 "$coord" "Coordinates for the variables" ## 2 "$cor" "Correlations between variables and dimensions" ## 3 "$cos2" "Cos2 for the variables" ## 4 "$contrib" "contributions of the variables" ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r head(vars$coord) # Pogledaj koordinate za varijable ``` ``` ## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 ## X100m -0.8506257 -0.17939806 0.3015564 0.03357320 -0.1944440 ## Long.jump 0.7941806 0.28085695 -0.1905465 -0.11538956 0.2331567 ## Shot.put 0.7339127 0.08540412 0.5175978 0.12846837 -0.2488129 ## High.jump 0.6100840 -0.46521415 0.3300852 0.14455012 0.4027002 ## X400m -0.7016034 0.29017826 0.2835329 0.43082552 0.1039085 ## X110m.hurdle -0.7641252 -0.02474081 0.4488873 -0.01689589 0.2242200 ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r head(vars$cos2) # Reprezentativnost varijabli ``` ``` ## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 ## X100m 0.7235641 0.0321836641 0.09093628 0.0011271597 0.03780845 ## Long.jump 0.6307229 0.0788806285 0.03630798 0.0133147506 0.05436203 ## Shot.put 0.5386279 0.0072938636 0.26790749 0.0165041211 0.06190783 ## High.jump 0.3722025 0.2164242070 0.10895622 0.0208947375 0.16216747 ## X400m 0.4922473 0.0842034209 0.08039091 0.1856106269 0.01079698 ## X110m.hurdle 0.5838873 0.0006121077 0.20149984 0.0002854712 0.05027463 ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r head(vars$contrib) # Doprinos varijabli komponentama ``` ``` ## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 ## X100m 17.544293 1.7505098 7.338659 0.13755240 5.389252 ## Long.jump 15.293168 4.2904162 2.930094 1.62485936 7.748815 ## Shot.put 13.060137 0.3967224 21.620432 2.01407269 8.824401 ## High.jump 9.024811 11.7715838 8.792888 2.54987951 23.115504 ## X400m 11.935544 4.5799296 6.487636 22.65090599 1.539012 ## X110m.hurdle 14.157544 0.0332933 16.261261 0.03483735 7.166193 ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r # Prikaži varijable u prostoru glavnih komponenti factoextra::fviz_pca_var(procjena_PCA, col.var = "contrib", labelsize = 2) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-55-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli .tiny[ ```r # Prikaži kvalitetu reprezentacije varijabli u faktorskom prostoru corrplot::corrplot(vars$cos2, is.corr = F) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-56-1.png" style="display: block; margin: auto;" /> ] --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r # Prikaži doprinos na stupčastom grafikonu factoextra::fviz_cos2(procjena_PCA, choice = "var", axes = 1:2) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-57-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r # Prikaži kvalitetu reprezentacije varijabli u PC prostoru corrplot::corrplot(vars$contrib, is.corr = F) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-58-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza varijabli ```r # Prikaži na stupčastom grafikonu factoextra::fviz_contrib(procjena_PCA, choice = "var", axses = 1, top = 10) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-59-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Proširena analiza ```r set.seed(123) # Grupiranje na osnovi Kmeans algoritma # group_km <- stats::kmeans(vars$coord, centers = 3, nstart = 25) group <- as.factor(group_km$cluster) factoextra::fviz_pca_var(procjena_PCA, col.var = group, labelsize = 2, palette = c("#0073C2FF", "#EFC000FF", "#868686FF"), legend.title = "cluster") ``` <img src="04_PCA_files/figure-html/unnamed-chunk-60-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija .tiny[ ```r opis_PCA <- FactoMineR::dimdesc(procjena_PCA, axes = c(1,2), proba = 0.05) head(opis_PCA,10) ``` ``` ## $Dim.1 ## $quanti ## correlation p.value ## Long.jump 0.7941806 6.059893e-06 ## Discus 0.7432090 4.842563e-05 ## Shot.put 0.7339127 6.723102e-05 ## High.jump 0.6100840 1.993677e-03 ## Javeline 0.4282266 4.149192e-02 ## X400m -0.7016034 1.910387e-04 ## X110m.hurdle -0.7641252 2.195812e-05 ## X100m -0.8506257 2.727129e-07 ## ## attr(,"class") ## [1] "condes" "list" ## ## $Dim.2 ## $quanti ## correlation p.value ## Pole.vault 0.8074511 3.205016e-06 ## X1500m 0.7844802 9.384747e-06 ## High.jump -0.4652142 2.529390e-02 ## ## attr(,"class") ## [1] "condes" "list" ## ## $call ## $call$num.var ## [1] 1 ## ## $call$proba ## [1] 0.05 ## ## $call$weights ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## ## $call$X ## Dim.1 X100m Long.jump Shot.put High.jump X400m X110m.hurdle ## SEBRLE 0.1955047 11.04 7.58 14.83 2.07 49.81 14.69 ## CLAY 0.8078795 10.76 7.40 14.26 1.86 49.37 14.05 ## BERNARD -1.3591340 11.02 7.23 14.25 1.92 48.93 14.99 ## YURKOV -0.8889532 11.34 7.09 15.19 2.10 50.42 15.31 ## ZSIVOCZKY -0.1081216 11.13 7.30 13.48 2.01 48.62 14.17 ## McMULLEN 0.1212195 10.83 7.31 13.76 2.13 49.91 14.38 ## MARTINEAU -2.4461206 11.64 6.81 14.57 1.95 50.14 14.93 ## HERNU -1.9335505 11.37 7.56 14.41 1.86 51.10 15.06 ## BARRAS -1.8143379 11.33 6.97 14.09 1.95 49.48 14.48 ## NOOL -2.8394182 11.33 7.27 12.68 1.98 49.20 15.29 ## BOURGUIGNON -4.5129309 11.36 6.80 13.46 1.86 51.16 15.67 ## Sebrle 3.5290188 10.85 7.84 16.36 2.12 48.36 14.05 ## Clay 3.3907555 10.44 7.96 15.23 2.06 49.19 14.13 ## Karpov 4.1618361 10.50 7.81 15.93 2.09 46.81 13.97 ## Macey 1.8900060 10.89 7.47 15.73 2.15 48.97 14.56 ## Warners 1.4185318 10.62 7.74 14.48 1.97 47.97 14.01 ## Zsivoczky 0.4821513 10.91 7.14 15.31 2.12 49.40 14.95 ## Hernu 0.2825218 10.97 7.19 14.65 2.03 48.73 14.25 ## Bernard 1.3979877 10.69 7.48 14.80 2.12 49.13 14.17 ## Schwarzl -0.7262410 10.98 7.49 14.01 1.94 49.76 14.25 ## Pogorelov -0.2191699 10.95 7.31 15.10 2.06 50.79 14.21 ## Schoenbeck -0.5064487 10.90 7.30 14.77 1.88 50.30 14.34 ## Barras -0.3229862 11.14 6.99 14.91 1.94 49.41 14.37 ## Discus Pole.vault Javeline X1500m ## SEBRLE 43.75 5.02 63.19 291.70 ## CLAY 50.72 4.92 60.15 301.50 ## BERNARD 40.87 5.32 62.77 280.10 ## YURKOV 46.26 4.72 63.44 276.40 ## ZSIVOCZKY 45.67 4.42 55.37 268.00 ## McMULLEN 44.41 4.42 56.37 285.10 ## MARTINEAU 47.60 4.92 52.33 262.10 ## HERNU 44.99 4.82 57.19 285.10 ## BARRAS 42.10 4.72 55.40 282.00 ## NOOL 37.92 4.62 57.44 266.60 ## BOURGUIGNON 40.49 5.02 54.68 291.70 ## Sebrle 48.72 5.00 70.52 280.01 ## Clay 50.11 4.90 69.71 282.00 ## Karpov 51.65 4.60 55.54 278.11 ## Macey 48.34 4.40 58.46 265.42 ## Warners 43.73 4.90 55.39 278.05 ## Zsivoczky 45.62 4.70 63.45 269.54 ## Hernu 44.72 4.80 57.76 264.35 ## Bernard 44.75 4.40 55.27 276.31 ## Schwarzl 42.43 5.10 56.32 273.56 ## Pogorelov 44.60 5.00 53.45 287.63 ## Schoenbeck 44.41 5.00 60.89 278.82 ## Barras 44.83 4.60 64.55 267.09 ``` ] --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r # Prva dimenzija opis_PCA$Dim.1 ``` ``` ## $quanti ## correlation p.value ## Long.jump 0.7941806 6.059893e-06 ## Discus 0.7432090 4.842563e-05 ## Shot.put 0.7339127 6.723102e-05 ## High.jump 0.6100840 1.993677e-03 ## Javeline 0.4282266 4.149192e-02 ## X400m -0.7016034 1.910387e-04 ## X110m.hurdle -0.7641252 2.195812e-05 ## X100m -0.8506257 2.727129e-07 ## ## attr(,"class") ## [1] "condes" "list" ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r # Druga dimenzija opis_PCA$Dim.2 ``` ``` ## $quanti ## correlation p.value ## Pole.vault 0.8074511 3.205016e-06 ## X1500m 0.7844802 9.384747e-06 ## High.jump -0.4652142 2.529390e-02 ## ## attr(,"class") ## [1] "condes" "list" ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r ## INDIVIDUALNI ELEMENTI ## inds <- factoextra::get_pca_ind(procjena_PCA) # Stvori IE objekt print(inds) # Pregledaj ``` ``` ## Principal Component Analysis Results for individuals ## =================================================== ## Name Description ## 1 "$coord" "Coordinates for the individuals" ## 2 "$cos2" "Cos2 for the individuals" ## 3 "$contrib" "contributions of the individuals" ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r head(inds$coord) # Kordinate za IE ``` ``` ## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 ## SEBRLE 0.1955047 1.5890567 0.6424912 0.08389652 1.16829387 ## CLAY 0.8078795 2.4748137 -1.3873827 1.29838232 -0.82498206 ## BERNARD -1.3591340 1.6480950 0.2005584 -1.96409420 0.08419345 ## YURKOV -0.8889532 -0.4426067 2.5295843 0.71290837 0.40782264 ## ZSIVOCZKY -0.1081216 -2.0688377 -1.3342591 -0.10152796 -0.20145217 ## McMULLEN 0.1212195 -1.0139102 -0.8625170 1.34164291 1.62151286 ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r head(inds$cos2) # Reprezentativnost IE ``` ``` ## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 ## SEBRLE 0.007530179 0.49747323 0.081325232 0.001386688 0.2689026575 ## CLAY 0.048701249 0.45701660 0.143628117 0.125791741 0.0507850580 ## BERNARD 0.197199804 0.28996555 0.004294015 0.411819183 0.0007567259 ## YURKOV 0.096109800 0.02382571 0.778230322 0.061812637 0.0202279796 ## ZSIVOCZKY 0.001574385 0.57641944 0.239754152 0.001388216 0.0054654972 ## McMULLEN 0.002175437 0.15219499 0.110137872 0.266486530 0.3892621478 ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r head(inds$contrib) # Doprinos IE ``` ``` ## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 ## SEBRLE 0.04029447 5.9714533 1.4483919 0.03734589 8.45894063 ## CLAY 0.68805664 14.4839248 6.7537381 8.94458283 4.21794385 ## BERNARD 1.94740183 6.4234107 0.1411345 20.46819433 0.04393073 ## YURKOV 0.83308415 0.4632733 22.4517396 2.69663605 1.03075263 ## ZSIVOCZKY 0.01232413 10.1217143 6.2464325 0.05469230 0.25151025 ## McMULLEN 0.01549089 2.4310854 2.6102794 9.55055888 16.29493304 ``` --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r # Prikaži doprinos IE factoextra::fviz_pca_ind(procjena_PCA, col.ind = "cos2",gradient.cols = c("#00AFBB", "#E7B800","#FC4E07"),repel = TRUE, labelsize = 3 ) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r # Prikaži doprinos IE na stupčastom grafikonu factoextra::fviz_cos2(procjena_PCA, choice = "ind") ``` <img src="04_PCA_files/figure-html/unnamed-chunk-69-1.png" style="display: block; margin: auto;" /> --- # Primjer 2 (Sportski rezultati) ##### Analiza PCA dimenzija ```r factoextra::fviz_cos2(procjena_PCA, choice = "ind", axses = 1:2) # Prve dvije dimenzije ``` <img src="04_PCA_files/figure-html/unnamed-chunk-70-1.png" style="display: block; margin: auto;" /> --- # Primjer 3 (Sportski rezultati) ##### Analiza PCA dimenzija ```r factoextra::fviz_cos2(procjena_PCA, choice = "ind", axses = 1:2) # Prve dvije dimenzije ``` <img src="04_PCA_files/figure-html/unnamed-chunk-71-1.png" style="display: block; margin: auto;" /> --- # Primjer 3 (Cvijeće) ##### Podatci ```r data(iris) head(iris,10) # Pogledaj podatke ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ``` --- # Primjer 3 (Cvijeće) ##### Provedi PCA .tiny[ ```r # Provedi PCA iris_PCA <- FactoMineR::PCA(iris[,-5], graph = F) iris_PCA ``` ``` ## **Results for the Principal Component Analysis (PCA)** ## The analysis was performed on 150 individuals, described by 4 variables ## *The results are available in the following objects: ## ## name description ## 1 "$eig" "eigenvalues" ## 2 "$var" "results for the variables" ## 3 "$var$coord" "coord. for the variables" ## 4 "$var$cor" "correlations variables - dimensions" ## 5 "$var$cos2" "cos2 for the variables" ## 6 "$var$contrib" "contributions of the variables" ## 7 "$ind" "results for the individuals" ## 8 "$ind$coord" "coord. for the individuals" ## 9 "$ind$cos2" "cos2 for the individuals" ## 10 "$ind$contrib" "contributions of the individuals" ## 11 "$call" "summary statistics" ## 12 "$call$centre" "mean of the variables" ## 13 "$call$ecart.type" "standard error of the variables" ## 14 "$call$row.w" "weights for the individuals" ## 15 "$call$col.w" "weights for the variables" ``` ] --- # Primjer 3 (Cvijeće) ##### Grafički prikaz ```r # Vizualizacija 1 factoextra::fviz_pca_ind(iris_PCA, geom.ind = "point", col.ind = iris$Species, addEllipses = T, legend.title = "Grupa", palette = c("#00AFBB", "#E7B800", "#FC4E07")) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-74-1.png" style="display: block; margin: auto;" /> --- # Primjer 3 (Cvijeće) ##### Grafički prikaz ```r # Vizualizacija 2 factoextra::fviz_pca_biplot(iris_PCA, geom.ind = "point", col.ind = iris$Species, addEllipses = T, legend.title = "Grupa", palette = "jco", col.var = "black", label = "var", repel = T, labelsize = 3) ``` <img src="04_PCA_files/figure-html/unnamed-chunk-75-1.png" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Hvala na pažnji <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Nastavak: Faktorska analiza)