class: center, middle, inverse, title-slide # PRIMJENJENA STATISTIKA ## Predavanje 3: Deskriptivna statistika ### Luka Sikic, PhD ### Fakultet hrvatskih studija |
Github PS
--- class: inverse, middle # PREGLED PREDAVANJA --- layout: true # PREGLED PREDAVANJA --- <br> <br> <br> - Upoznavanje sa podatcima <br> - Mjere centralne tendencije <br> - Mjere varijabilnosti <br> - Mjere asimetrije i zaobljenosti <br> - Pregled varijabli i podatkovnih okvira <br> - Standardizirane vrijednosti <br> - Korelacija --- layout: true # PODATCI --- .hi[Učitaj podatke] ```r # Učitaj paket library(lsr) # Definiraj put do podataka setwd() # Provjera getwd() load("../Podatci/aflsmall.Rdata") # Učitaj podatke u radni prostor #who() # Pregledaj učitane podatke str(afl.finalists) # Struktura podataka ``` ``` #> Factor w/ 17 levels "Adelaide","Brisbane",..: 9 10 3 10 9 3 10 3 9 10 ... ``` ```r str(afl.margins) # Struktura podataka ``` ``` #> num [1:176] 56 31 56 8 32 14 36 56 19 1 ... ``` --- layout: true # PODATCI --- .hi[Pregledaj podatke] <br> <br> <br> ```r # Pregledaj podatke print(afl.margins[1:11]) ``` ``` #> [1] 56 31 56 8 32 14 36 56 19 1 3 ``` -- ```r # Pregledaj podatke print(afl.finalists[1:5]) ``` ``` #> [1] Hawthorn Melbourne Carlton Melbourne Hawthorn #> 17 Levels: Adelaide Brisbane Carlton Collingwood Essendon Fitzroy ... Western Bulldogs ``` --- .hi[Vizualizacija] <br> <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/histogram1-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Histogram pobjedničkih bodova(`afl.margins`) iz AFL 2010 lige američkog nogometa. Grafikon prikazuje da se broj pobjeda uz veću razliku rijeđe pojavljuje.] --- layout: false class: middle, inverse # MJERE CENTRALNE TENDENCIJE --- layout: true # MJERE CENTRALNE TENDENCIJE --- <br> <br> <br> <br> - Aritmetička sredina <br> - Medijan <br> - Mod --- layout: true #ARITMETIČKA SREDINA --- .hi[Definicija] `$$\bar{X} = \frac {X_{1}+X_{2}+\cdots +X_{N}}{N}$$` -- .hi[Sumiranje] $$ \sum_{i=1}^5 X_i $$ -- .hi[Skraćeni zapis] $$ \bar{X} = \frac{1}{N} \sum_{i=1}^N X_i $$ -- .hi[Izračun rukom] $$ \frac{56 + 31 + 56 + 8 + 32}{5} = \frac{183}{5} = 36.60 $$ --- <br> <br> .hi[Kalkulator] ```r (56 + 31 + 56 + 8 + 32) / 5 ``` ``` #> [1] 36.6 ``` -- <br> <br> .hi[Funkcija] ```r sum( afl.margins[1:5]) / 5 ``` ``` #> [1] 36.6 ``` --- layout: true # MEDIJAN --- <br> <br> .hi[Za neparni niz] $$ 8, 31, \mathbf{32}, 56, 56 $$ -- .hi[Za parni niz] $$ 8, 14, \mathbf{31}, \mathbf{32}, 56, 56 $$ -- .hi[Funkcija] ```r # Izračunaj median putem funkcije median( x = afl.margins ) # Cijeli podatkovni skup ``` ``` #> [1] 30.5 ``` --- layout: true # EKSTREMNE VRIJEDNOSTI --- ```r # Definiraj vektor od 10 brojeva vektor_10 <- c( -15,2,3,4,5,6,7,8,9,12 ) ``` -- ```r mean( x = vektor_10 ) # Izračunaj AS ``` ``` #> [1] 4.1 ``` -- ```r median( x = vektor_10 ) # Izračunaj medijan ``` ``` #> [1] 5.5 ``` -- .hi[Korekcija] ```r # Ukloni 10% ekstremnih vrijednosti mean( x = vektor_10, trim = .1) ``` ``` #> [1] 5.5 ``` -- ```r # Ukloni 5% ekstremnih vrijednosti mean( x = afl.margins, trim = .05) ``` ``` #> [1] 33.75 ``` --- layout: true # MOD --- <br> <br> ```r # Pogledaj frekvenciju podataka table(afl.finalists) ``` ``` #> afl.finalists #> Adelaide Brisbane Carlton Collingwood #> 26 25 26 28 #> Essendon Fitzroy Fremantle Geelong #> 32 0 6 39 #> Hawthorn Melbourne North Melbourne Port Adelaide #> 27 28 28 17 #> Richmond St Kilda Sydney West Coast #> 6 24 26 38 #> Western Bulldogs #> 24 ``` --- ```r # Izračunaj modalnu vrijednost modeOf( x = afl.finalists ) ``` ``` #> [1] "Geelong" ``` -- ```r # Izračunaj modalnu frekvenciju maxFreq(x = afl.finalists) ``` ``` #> [1] 39 ``` -- ```r # Izaračun za afl.margins podatke modeOf(afl.margins) # Mod ``` ``` #> [1] 3 ``` -- ```r maxFreq(afl.margins) # Modalna frekvencija ``` ``` #> [1] 8 ``` --- layout: false class: middle, inverse # MJERE VARIJABILNOSTI --- layout: true # MJERE VARIJABILNOSTI --- <br> <br> <br> <br> - Raspon/Min-Max <br> - Kvartili <br> - Prosječno apsolutno odstupanje <br> - Varijanca <br> - Standardna devijacija <br> - Srednje apsolutno odstupanje --- layout: true # RASPON/MIN-MAX --- ```r # Maksimalna vrijednost max(afl.margins) ``` ``` #> [1] 116 ``` -- ```r # Minimalna vrijednost min(afl.margins) ``` ``` #> [1] 0 ``` -- ```r # Raspon podataka range(afl.margins) ``` ``` #> [1] 0 116 ``` --- layout: true # KVARTILI --- ```r # Izračunaj pedeseti (50i) kvartil/percentil quantile(x = afl.margins, probs = .5) ``` ``` #> 50% #> 30.5 ``` -- ```r # Izračunaj 25i i 75i kvartil/percentil quantile(afl.margins, probs = c(.25,.75)) ``` ``` #> 25% 75% #> 12.75 50.50 ``` -- ```r # Izračunaj interkvartilni raspon IQR(x = afl.margins) ``` ``` #> [1] 37.75 ``` --- layout:true # PROSJEČNO APSOLUTNO ODSTUPANJE --- .hi[Formula] $$ \mbox{}(X) = \frac{1}{N} \sum_{i = 1}^N |X_i - \bar{X}| $$ -- .hi[Tablica za ručni izračun prosječnog apsolutnog odstupanja] <table> <caption></caption> <thead> <tr> <th style="text-align:right;"> `\(i\)` </th> <th style="text-align:right;"> `\(X_i\)` </th> <th style="text-align:right;"> `\(X_i - \bar{X}\)` </th> <th style="text-align:right;"> `\((X_i - \bar{X})\)` </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 19.4 </td> <td style="text-align:right;"> 19.4 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> -5.6 </td> <td style="text-align:right;"> 5.6 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 19.4 </td> <td style="text-align:right;"> 19.4 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> -28.6 </td> <td style="text-align:right;"> 28.6 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> -4.6 </td> <td style="text-align:right;"> 4.6 </td> </tr> </tbody> </table> --- .hi[Izračun rukom] $$ \frac{19.4 + 5.6 + 19.4 + 28.6 + 4.6}{5} = 15.52 $$ -- .hi[Izračun pomoću funkcija] ```r X <- c(56, 31,56,8,32) # Napravi vektor X.bar <- mean( X ) # Korak 1. Izračunaj AS AD <- abs( X - X.bar ) # Korak 2. Uzmi aps vrijednost AAD <- mean( AD ) # Korak 3. Izračunaj AS devijacija ``` -- ```r print( AAD ) # Pogledaj rezultate ``` ``` #> [1] 15.52 ``` --- layout:true # VARIJANCA --- <br> <br> .hi[Formula 1] $$ \mbox{Var}(X) = \frac{1}{N} \sum_{i=1}^N \left( X_i - \bar{X} \right)^2 $$ <br> <br> .hi[Formula 2] `$$\mbox{Var}(X) = \frac{\sum_{i=1}^N \left( X_i - \bar{X} \right)^2}{N}$$` --- .hi[Ručni izračun varijance] <table> <caption></caption> <thead> <tr> <th style="text-align:right;"> `\(i\)` </th> <th style="text-align:right;"> `\(X_i\)` </th> <th style="text-align:right;"> `\(X_i - \bar{X}\)` </th> <th style="text-align:right;"> `\((X_i - \bar{X})^2\)` </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 19.4 </td> <td style="text-align:right;"> 376.36 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 31 </td> <td style="text-align:right;"> -5.6 </td> <td style="text-align:right;"> 31.36 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 56 </td> <td style="text-align:right;"> 19.4 </td> <td style="text-align:right;"> 376.36 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> -28.6 </td> <td style="text-align:right;"> 817.96 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 32 </td> <td style="text-align:right;"> -4.6 </td> <td style="text-align:right;"> 21.16 </td> </tr> </tbody> </table> -- .hi[Kalkulatorski izračun] ```r # Kalkulator (376.36 + 31.36 + 376.36 + 817.96 + 21.16 ) / 5 ``` ``` #> [1] 324.64 ``` --- .hi[Izračun putem funkcija] ```r # Izračunaj varijancu pomoću funkcija mean( (X - mean(X) )^2) ``` ``` #> [1] 324.64 ``` -- ```r var(X) # Skrati postupak ``` ``` #> [1] 405.8 ``` -- ```r ## Isti primjer sa svim podatcima # Izračunaj varijancu pomoću funkcija mean( (afl.margins - mean(afl.margins) )^2) ``` ``` #> [1] 675.9718 ``` -- ```r var( afl.margins ) # Skrati postupak ``` ``` #> [1] 679.8345 ``` --- layout:true # STANDARDNA DEVIJACIJA --- <br> <br> .hi[Formula 1] $$ s = \sqrt{ \frac{1}{N} \sum_{i=1}^N \left( X_i - \bar{X} \right)^2 } $$ <br> <br> .hi[Formula 2] $$ \hat\sigma = \sqrt{ \frac{1}{N-1} \sum_{i=1}^N \left( X_i - \bar{X} \right)^2 } $$ -- ```r # Izračunaj pomoću funkcije sd( afl.margins ) ``` ``` #> [1] 26.07364 ``` --- layout:true # APSOLUTNO ODSTUPANJE OD MEDIJANA --- <br> <br> ```r # Prosječno apsolutno odstupanje od prosjeka mean( abs(afl.margins - mean(afl.margins)) ) ``` ``` #> [1] 21.10124 ``` -- ```r # *Medijansko* apsolutno odstupanje od *medijana*: median( abs(afl.margins - median(afl.margins)) ) ``` ``` #> [1] 19.5 ``` -- ```r # Izračun putem funkcije mad( x = afl.margins, constant = 1 ) ``` ``` #> [1] 19.5 ``` --- layout:true # KOEFICIJENT ASIMETRIJE --- <br> <br> <br> <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/skewness-1.svg" style="display: block; margin: auto;" /> --- .hi[Formula] $$ \mbox{skewness}(X) = \frac{1}{N \hat{\sigma}^3} \sum_{i=1}^N (X_i - \bar{X})^3 $$ -- .hi[Funkcijski izračun] ```r # Izračunaj na stvarnim podatcima skew( x = afl.margins ) ``` ``` #> [1] 0.7671555 ``` --- layout:true # KOEFICIJENT ZAOBLJENOSTI --- <br> <br> <br> <br> <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/kurtosis-1.svg" style="display: block; margin: auto;" /> --- <br> <br> .hi[Formula] <br> <br> $$ \mbox{kurtosis}(X) = \frac{1}{N \hat\sigma^4} \sum_{i=1}^N \left( X_i - \bar{X} \right)^4 - 3 $$ -- <br> <br> .hi[Funkcijski izračun] ```r # Izračunaj na stvarnim podatcima kurtosi( x = afl.margins ) ``` ``` #> [1] 0.02962633 ``` --- layout:true # DESKRIPTIVNA STATISTIKA NA VARIJABLI --- .hi[Numerička varijabla] ```r # Pregled numeričke varijable summary( object = afl.margins ) # Deskriptivna stat ``` ``` #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.00 12.75 30.50 35.30 50.50 116.00 ``` -- .hi[Logička varijabla] ```r # Pregled logičke varijable ekstremi <- afl.margins > 50 # Stvori log varijablu ``` -- ```r head(ekstremi,5) # Pogledaj podatke ``` ``` #> [1] TRUE FALSE TRUE FALSE FALSE ``` ```r summary(ekstremi) # Deskriptivna stat ``` ``` #> Mode FALSE TRUE #> logical 132 44 ``` --- .hi[Faktorska varijabla] ```r # Pregled faktorske varijable summary(object = afl.finalists) # Deskriptivna stat ``` ``` #> Adelaide Brisbane Carlton Collingwood #> 26 25 26 28 #> Essendon Fitzroy Fremantle Geelong #> 32 0 6 39 #> Hawthorn Melbourne North Melbourne Port Adelaide #> 27 28 28 17 #> Richmond St Kilda Sydney West Coast #> 6 24 26 38 #> Western Bulldogs #> 24 ``` -- ```r # Pregled tekstualne varijable txt <- as.character( afl.finalists ) # Stvori txt var summary( object = txt ) # Deskriptivna stat ``` ``` #> Length Class Mode #> 400 character character ``` --- layout:true # NOVI PODATKOVNI SKUP --- <br> <br> .hi[Podatci] <br> ```r rm(list = ls()) # Očisti radni prostor load("../Podatci/clinicaltrial.Rdata") # Učitaj podatke who(TRUE) # Pregled podataka ``` ``` #> -- Name -- -- Class -- -- Size -- #> clin.trial data.frame 18 x 3 #> $drug factor 18 #> $therapy factor 18 #> $mood.gain numeric 18 ``` --- layout:true # DESKRIPTIVNA STATISTIKA DF --- <br> <br> .hi[Obični pregled] <br> ```r # Deksriptivna statistika na podatkovnom okviru summary(clin.trial) # Desktiptivna stat ``` ``` #> drug therapy mood.gain #> placebo :6 no.therapy:9 Min. :0.1000 #> anxifree:6 CBT :9 1st Qu.:0.4250 #> joyzepam:6 Median :0.8500 #> Mean :0.8833 #> 3rd Qu.:1.3000 #> Max. :1.8000 ``` --- <br> <br> .hi[Alternativna funkcija] <br> ```r # Deksriptivna statistika na podatkovnom okviru describe(clin.trial) # Desktiptivna stat/ druga funkcija ``` ``` #> vars n mean sd median trimmed mad min max range skew kurtosis #> drug* 1 18 2.00 0.84 2.00 2.00 1.48 1.0 3.0 2.0 0.00 -1.66 #> therapy* 2 18 1.50 0.51 1.50 1.50 0.74 1.0 2.0 1.0 0.00 -2.11 #> mood.gain 3 18 0.88 0.53 0.85 0.88 0.67 0.1 1.8 1.7 0.13 -1.44 #> se #> drug* 0.20 #> therapy* 0.12 #> mood.gain 0.13 ``` --- .hi[Grupirani pregled] ```r # Pregledaj grupirano prema terapiji by(data = clin.trial, # Izvor podataka INDICES = clin.trial$therapy, # Odredi grupiranje FUN = summary) # Odredi funkciju ``` ``` #> clin.trial$therapy: no.therapy #> drug therapy mood.gain #> placebo :3 no.therapy:9 Min. :0.1000 #> anxifree:3 CBT :0 1st Qu.:0.3000 #> joyzepam:3 Median :0.5000 #> Mean :0.7222 #> 3rd Qu.:1.3000 #> Max. :1.7000 #> ------------------------------------------------------------ #> clin.trial$therapy: CBT #> drug therapy mood.gain #> placebo :3 no.therapy:0 Min. :0.300 #> anxifree:3 CBT :9 1st Qu.:0.800 #> joyzepam:3 Median :1.100 #> Mean :1.044 #> 3rd Qu.:1.300 #> Max. :1.800 ``` --- .hi[Grupirani pregled] <br> <br> ```r # Pregledaj grupirano prema razlici u raspoloženju aggregate(formula = mood.gain ~ drug + therapy, # Prikaz data = clin.trial, # Podatci FUN = mean) # AS ``` ``` #> drug therapy mood.gain #> 1 placebo no.therapy 0.300000 #> 2 anxifree no.therapy 0.400000 #> 3 joyzepam no.therapy 1.466667 #> 4 placebo CBT 0.600000 #> 5 anxifree CBT 1.033333 #> 6 joyzepam CBT 1.500000 ``` --- .hi[Grupirani pregled] <br> ```r # Pregledaj grupirano prema razlici u raspoloženju aggregate(mood.gain ~ drug + therapy, # Prikaz clin.trial, # Podatci sd) # Standardna devijacija ``` ``` #> drug therapy mood.gain #> 1 placebo no.therapy 0.2000000 #> 2 anxifree no.therapy 0.2000000 #> 3 joyzepam no.therapy 0.2081666 #> 4 placebo CBT 0.3000000 #> 5 anxifree CBT 0.2081666 #> 6 joyzepam CBT 0.2645751 ``` --- layout:true # STANDARDNE VRIJEDNOSTI --- .hi[Formula] $$ \mbox{standardna vrijednost} = \frac{\mbox{vrijednost opservacije} - \mbox{prosjek}}{\mbox{standardna devijacija}} $$ -- .hi[Z-score] $$ z_i = \frac{X_i - \bar{X}}{\hat\sigma} $$ -- .hi[Izračun rukom] $$ z = \frac{35 - 17}{5} = 3.6 $$ -- .hi[Distribucija] ```r # Vidi dio u distribuciji pnorm( 3.6 ) ``` ``` #> [1] 0.9998409 ``` --- layout:true # NOVI PODATKOVNI SKUP --- ```r rm(list = ls()) # Očisti radni prostor # Učitaj podatke load("../Podatci/parenthood.Rdata") who(TRUE) # Pregled podataka ``` ``` #> -- Name -- -- Class -- -- Size -- #> parenthood data.frame 100 x 4 #> $dan.sleep numeric 100 #> $baby.sleep numeric 100 #> $dan.grump numeric 100 #> $day integer 100 ``` -- ```r # Pregledaj podatke head(parenthood, 7) # Prvih 7 redova ``` ``` #> dan.sleep baby.sleep dan.grump day #> 1 7.59 10.18 56 1 #> 2 7.91 11.66 60 2 #> 3 5.14 7.92 82 3 #> 4 7.71 9.61 55 4 #> 5 6.68 9.75 67 5 #> 6 5.99 5.04 72 6 #> 7 8.19 10.45 53 7 ``` --- <br> <br> ```r # Pogledaj deskriptivnu statistiku describe(parenthood) ``` ``` #> vars n mean sd median trimmed mad min max range skew #> dan.sleep 1 100 6.97 1.02 7.03 7.00 1.09 4.84 9.00 4.16 -0.29 #> baby.sleep 2 100 8.05 2.07 7.95 8.05 2.33 3.25 12.07 8.82 -0.02 #> dan.grump 3 100 63.71 10.05 62.00 63.16 9.64 41.00 91.00 50.00 0.43 #> day 4 100 50.50 29.01 50.50 50.50 37.06 1.00 100.00 99.00 0.00 #> kurtosis se #> dan.sleep -0.72 0.10 #> baby.sleep -0.69 0.21 #> dan.grump -0.16 1.00 #> day -1.24 2.90 ``` --- .hi[Vizualizacija] <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/parenthood-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Grafički prikaz varijabli u `parenthood` podatkovnom skupu.] --- layout:true # KORELACIJA --- .hi[Prikaz podataka (*još jednom*)] ```r head(parenthood[,c("dan.sleep", "baby.sleep", "dan.grump")],10) ``` ``` #> dan.sleep baby.sleep dan.grump #> 1 7.59 10.18 56 #> 2 7.91 11.66 60 #> 3 5.14 7.92 82 #> 4 7.71 9.61 55 #> 5 6.68 9.75 67 #> 6 5.99 5.04 72 #> 7 8.19 10.45 53 #> 8 7.19 8.27 60 #> 9 7.40 6.06 60 #> 10 6.58 7.09 71 ``` .footnote[[*]Pogledaj [tutorial](http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r) o korelacijama!] --- .hi[Grafički prikaz korelacije] <br> <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/scatterparent1a-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Dijagram rasipanja za varijable `Sati spavanja/roditelj` i `Raspoloženje`.] --- .hi[Grafički prikaz korelacije] <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/scatterparent2-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Dijagram rasipanja za varijable `Sati spavanja/dijete` i `Sati spavanja/roditelj`.] --- <br> <br> <br> <br> .hi[Kovarijanca] $$ \mbox{Cov}(X,Y) = \frac{1}{N-1} \sum_{i=1}^N \left( X_i - \bar{X} \right) \left( Y_i - \bar{Y} \right) $$ -- <br> <br> <br> .hi[Personov korelacijski koeficijent; standardizacija kovarijance] $$ r_{XY} = \frac{\mbox{Cov}(X,Y)}{ \hat{\sigma}_X \ \hat{\sigma}_Y} $$ .footnote[[*] [Pogledaj](https://datascience.stackexchange.com/questions/64260/pearson-vs-spearman-vs-kendall/64261) za raspravu: Pearson vs Spearman vs Kendall.] --- layout:true # SMJER I INTENZITET KORELACIJE --- <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/corr-1.svg" style="display: block; margin: auto;" /> --- layout:true # IZRAČUN KORELACIJE U R --- .hi[Funkcijski izračun; pojedinačno] ```r # Izračunaj korelaciju između spavanja i raspoloženja cor(x = parenthood$dan.sleep, y = parenthood$dan.grump) ``` ``` #> [1] -0.903384 ``` -- .hi[Funkcijski izračun; cijeli df] ```r # Izračunaj korelacijsku tablicu cor(x = parenthood) ``` ``` #> dan.sleep baby.sleep dan.grump day #> dan.sleep 1.00000000 0.62794934 -0.90338404 -0.09840768 #> baby.sleep 0.62794934 1.00000000 -0.56596373 -0.01043394 #> dan.grump -0.90338404 -0.56596373 1.00000000 0.07647926 #> day -0.09840768 -0.01043394 0.07647926 1.00000000 ``` --- layout:true # INTERPRETACIJA KORELACIJE --- .hi[Okvirne smjernice za interpretaciju korelacije] <table> <caption></caption> <thead> <tr> <th style="text-align:left;"> Korelacija </th> <th style="text-align:left;"> Snaga </th> <th style="text-align:left;"> Smjer </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> -1.0 to -0.9 </td> <td style="text-align:left;"> Izrazito jaka </td> <td style="text-align:left;"> Negativna </td> </tr> <tr> <td style="text-align:left;"> -0.9 to -0.7 </td> <td style="text-align:left;"> Jaka </td> <td style="text-align:left;"> Negativna </td> </tr> <tr> <td style="text-align:left;"> -0.7 to -0.4 </td> <td style="text-align:left;"> Umjerena </td> <td style="text-align:left;"> Negativna </td> </tr> <tr> <td style="text-align:left;"> -0.4 to -0.2 </td> <td style="text-align:left;"> Slaba </td> <td style="text-align:left;"> Negativna </td> </tr> <tr> <td style="text-align:left;"> -0.2 to 0 </td> <td style="text-align:left;"> Zanemariva </td> <td style="text-align:left;"> Negativna </td> </tr> <tr> <td style="text-align:left;"> 0 to 0.2 </td> <td style="text-align:left;"> Zanemariva </td> <td style="text-align:left;"> Pozitivna </td> </tr> <tr> <td style="text-align:left;"> 0.2 to 0.4 </td> <td style="text-align:left;"> Slaba </td> <td style="text-align:left;"> Pozitivna </td> </tr> <tr> <td style="text-align:left;"> 0.4 to 0.7 </td> <td style="text-align:left;"> Umjerena </td> <td style="text-align:left;"> Pozitivna </td> </tr> <tr> <td style="text-align:left;"> 0.7 to 0.9 </td> <td style="text-align:left;"> Jaka </td> <td style="text-align:left;"> Pozitivna </td> </tr> <tr> <td style="text-align:left;"> 0.9 to 1.0 </td> <td style="text-align:left;"> Izrazito jaka </td> <td style="text-align:left;"> Pozitivna </td> </tr> </tbody> </table> --- layout:true # NOVI PODATKOVNI SKUP --- <br> <br> <br> ```r rm(list=ls()) # Očisti radni prostor load("../Podatci/effort.Rdata") # Učitaj podatke who(TRUE) # Pregledaj podatke ``` ``` #> -- Name -- -- Class -- -- Size -- #> effort data.frame 10 x 2 #> $hours numeric 10 #> $grade numeric 10 ``` --- .hi[Pregled podataka] ```r head(effort, 10) #Pregledaj podatke ``` ``` #> hours grade #> 1 2 13 #> 2 76 91 #> 3 40 79 #> 4 6 14 #> 5 16 21 #> 6 28 74 #> 7 27 47 #> 8 59 85 #> 9 46 84 #> 10 68 88 ``` -- ```r cor(effort$hours, effort$grade) # Izračunaj korelaciju ``` ``` #> [1] 0.909402 ``` --- .hi[Vizualizacija] <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/rankcorrpic-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Odnos između sati studiranja i ocjene (svaka točka predstavlja jednog studenta). Isprekidana linija prikazuje linearni odnos. Korelacija između ove dvije varijable je visoka `\(r = .91\)`. Valja primjetiti da više sati učenja uvijek dodnosi veću ocjenu što se odražava u visokom Spearman koeficijentu korelacije of `\(rho = 1\)`.] --- layout:true # SPEARMANOVA KORELACIJA --- ```r korelacije <- data.frame( sati = effort$hours, ocjena = effort$grade, satiRang = rank(effort$hours), # Rang sati ocjenaRang = rank(effort$grade)) # Rang ocjena korelacije # Pregledaj podatke (+ rang varijable) ``` ``` #> sati ocjena satiRang ocjenaRang #> 1 2 13 1 1 #> 2 6 14 2 2 #> 3 16 21 3 3 #> 4 27 47 4 4 #> 5 28 74 5 5 #> 6 40 79 6 6 #> 7 46 84 7 7 #> 8 59 85 8 8 #> 9 68 88 9 9 #> 10 76 91 10 10 ``` .footnote[[*]Spearmanova korelacijska metoda računa korelaciju između ranga dvije varijable.] --- <br> <br> .hi[Funkcijski izračun] ```r cor(korelacije$sati,korelacije$ocjena, method = "pearson") # Izračunaj korelaciju ``` ``` #> [1] 0.909402 ``` -- ```r cor(korelacije$satiRang,korelacije$ocjenaRang, method = "pearson") # Izračunaj korelaciju ``` ``` #> [1] 1 ``` -- ```r # Dodaj argument "spearman" cor(korelacije$satiRang,korelacije$ocjenaRang, method = "spearman") ``` ``` #> [1] 1 ``` --- layout:true # KENDALLOVA KORELACIJA --- ```r # Dodaj argument "spearman" cor(korelacije$satiRang,korelacije$ocjenaRang, method = "kendall") ``` ``` #> [1] 1 ``` .footnote[[*]Kendallova korelacijska metoda računa korespondenciju između rangiranja dvije varijable.] --- layout:true # KORELACIJA (još jedan primjer) --- .hi[Podatci] ``` #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ``` --- .hi[Vizualizacija] <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/unnamed-chunk-66-1.svg" style="display: block; margin: auto;" /> --- .hi[Testovi distribucije] ```r # Shapiro-Wilk test normalnosti za mpg shapiro.test(my_data$mpg) # => p = 0.1229 ``` ``` #> #> Shapiro-Wilk normality test #> #> data: my_data$mpg #> W = 0.94756, p-value = 0.1229 ``` ```r # Shapiro-Wilk test normalnosti za wt shapiro.test(my_data$wt) # => p = 0.09 ``` ``` #> #> Shapiro-Wilk normality test #> #> data: my_data$wt #> W = 0.94326, p-value = 0.09265 ``` --- .hi[Testovi distribucije] <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/unnamed-chunk-68-1.svg" style="display: block; margin: auto;" /> --- .hi[Testovi distribucije] <img src="03_DESKRIPTIVNA_STATISTIKA_xar_files/figure-html/unnamed-chunk-69-1.svg" style="display: block; margin: auto;" /> --- .hi[Korelacijski test] ```r pers <- cor.test(my_data$wt, my_data$mpg, method = "pearson") pers ``` ``` #> #> Pearson's product-moment correlation #> #> data: my_data$wt and my_data$mpg #> t = -9.559, df = 30, p-value = 1.294e-10 #> alternative hypothesis: true correlation is not equal to 0 #> 95 percent confidence interval: #> -0.9338264 -0.7440872 #> sample estimates: #> cor #> -0.8676594 ``` --- .hi[Korelacijski test] ```r sper <- cor.test(my_data$wt, my_data$mpg, method = "spearman") sper ``` ``` #> #> Spearman's rank correlation rho #> #> data: my_data$wt and my_data$mpg #> S = 10292, p-value = 1.488e-11 #> alternative hypothesis: true rho is not equal to 0 #> sample estimates: #> rho #> -0.886422 ``` --- .hi[Korelacijski test] ```r kend <- cor.test(my_data$wt, my_data$mpg, method = "kendall") kend ``` ``` #> #> Kendall's rank correlation tau #> #> data: my_data$wt and my_data$mpg #> z = -5.7981, p-value = 6.706e-09 #> alternative hypothesis: true tau is not equal to 0 #> sample estimates: #> tau #> -0.7278321 ``` --- layout:false class: middle, inverse # HVALA NA PAŽNJI! <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Sljedeće predavanje: Grafikoni i vizualizacije)