class: center, middle, inverse, title-slide # PRIMJENJENA STATISTIKA ## Predavanje 9: Usporedba prosjeka ### Luka Sikic, PhD ### Fakultet hrvatskih studija |
Github PS
--- class: inverse, middle # PREGLED PREDAVANJA --- layout: true # PREGLED PREDAVANJA --- <br> ## CILJEVI - z-test - t-test - t-test u nezavisnim uzorcima - t-test u zavisnim uzorcima - Jednostrani testovi - Izvođenje t-testova u R - Efekt veličine - Provjera normalnosti distribucije --- layout:false class: middle, inverse # z test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Zagrijavanje za t test!) --- layout:true # z test --- <style type="text/css"> @media print { .has-continuation { display: block !important; } } remark-slide-content { font-size: 22px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 16px; } .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .mid .remark-code { /*Change made here*/ font-size: 70% !important; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> ```r # Učitaj podatke load( file.path( "../Podatci/zeppo.Rdata" )) print( grades ) # Pregledaj podatke ``` ``` #> [1] 50 60 60 64 66 66 67 69 70 74 76 76 77 79 79 79 81 82 82 89 ``` ```r # Izračunaj prosjek mean(grades) ``` ``` #> [1] 72.3 ``` .hi[**Postavi hipoteze**] - Da li su ocjene studenata sociologije više od prosjeka na fakultetu? `$$\begin{array}{ll} H_0: & \mu = 67.5 \\ H_1: & \mu \neq 67.5 \end{array}$$` --- .hi[Prikaži hipoteze grafički] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/ztesthyp-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Grafički prikaz nulte i alternativne hipoteze pod pretpostvkom jednostranog z- testa. Nulta i alternativna hipoteza pretpostavljaju da populacija (podatci) prati standarnu distribuciju i da je standardna devijacija poznata (sigma_0). Pod nultom hipotezom je prosjek populacije mu jednak apriori definiranoj vrijednosti mu_0. Pod alternativnom hipotezom prosjek populacije nije jednak tako definiranoj vrijednosti, mu neq mu_0.] --- .hi[Prikaži podatke grafički] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/zeppo-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Puna linija predstavlja teoretsku distribuciju pod nultom hipotezom iz koje su "generirane" ocjene studenata sociologije.] --- .hi[Konstruiraj testnu statistiku] - `$$\bar{X} - \mu_0$$` - `$$X \sim \mbox{Normal}(\mu_0,\sigma^2)$$` - `$$\mbox{SE}({\bar{X}}) = \frac{\sigma}{\sqrt{N}}$$` - `$$\bar{X} \sim \mbox{Normal}(\mu_0,\mbox{SE}({\bar{X}}))$$` --- - `$$z_{\bar{X}} = \frac{\bar{X} - \mu_0}{\mbox{SE}({\bar{X}})}$$` - `$$z_{\bar{X}} = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{N}}$$` - `$$z_{\bar{X}} \sim \mbox{Normal}(0,1)$$` --- .hi[**Prikaz kritičkih garnica za dvostrani test**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/ztest2-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Kritičke regije za dvostrani z-test.] --- .hi[**Prikaz kritičkih garnica za jednostrani test**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/ztest1-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Kritičke regije za jednostrani z-test] --- .hi[**Izvedi test u R (korak po korak)**] ```r # Definiraj varijablu sa prosjekom ocjena u uzorku sample.mean <- mean( grades ) print( sample.mean ) # Pogledaj podatke ``` ``` #> [1] 72.3 ``` ```r # Definiraj pretpostavljeni prosjek populacije mu.null <- 67.5 # Definiraj pretpostavljenu standardnu devijaciju populacije sd.true <- 9.5 ``` ```r # Definiraj veličinu uzorka N <- length( grades ) print( N ) # Pogledaj podatke ``` ``` #> [1] 20 ``` --- .hi[**Izvedi test u R (korak po korak)**] ```r # Definiraj standardnu pogrešku sampling distribucije prosjeka (uzorka) sem.true <- sd.true / sqrt(N) print(sem.true) # Pogledaj podatke ``` ``` #> [1] 2.124265 ``` ```r # Spremi testnu statistiku u varijablu z.score <- (sample.mean - mu.null) / sem.true print( z.score ) # Pogledaj podatke ``` ``` #> [1] 2.259606 ``` --- .hi[**Izračun pripadajuće p-vrijednosti testa**] ```r # Vjerojatnost u gornjem dijelu distribucije upper.area <- pnorm( q = z.score, lower.tail = FALSE ) print( upper.area ) # Pogledaj podatke ``` ``` #> [1] 0.01192287 ``` ```r # Vjerojatnost u donjem dijelu distribucije lower.area <- pnorm( q = -z.score, lower.tail = TRUE ) print( lower.area ) # Pogledaj podatke ``` ``` #> [1] 0.01192287 ``` ```r # Izračunaj p-vrijednost p.value <- lower.area + upper.area print( p.value ) # Pogledaj podatke ``` ``` #> [1] 0.02384574 ``` --- <br> <br> .hi[**Pretpostavke testa**] <br> <br> 1. Normalnost distribucije <br> 2. Nezavisnost podataka u uzorku <br> 3. Poznata standardna devijacija --- layout:false class: middle, inverse # t test <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Realnija varijanta...) --- layout:true # t test --- .hi[**Grafički prikaži nultu i alternativnu hipotezu kod t testa**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/ttesthyp_onesample-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Grafički prikaz nulte i alternativne hipoteze kod t testa. Primijeti sličnosti u usporedbi sa z testom. Pod nultom hipotezom je prosjek populacije mu jednak nekoj apriori specificiranoj vrijednosti mu_0, a pod alternativnom nije tako. Kao kod z testa prtpostavljamo standardnu distribuciju; razlika se odnosi na to da kod t-testa(distribucije) ne pretpostavljamo da je standardna devijacija sigma unaprijed poznata.] --- .hi[Što ako ne znamo standardnu devijaciju populacije?] ```r # Provjeri standardnu devijaciju uzorka na našem primjeru sd( grades ) ``` ``` #> [1] 9.520615 ``` .hi[**Distribucija t statistike**] `$$t = \frac{\bar{X} - \mu}{\hat{\sigma}/\sqrt{N} }$$` --- .hi[**Prikaži distribuciju grafički**] <img src="../Foto/t_dist.png" width="2665" style="display: block; margin: auto;" /> <br> <br> <br> .footnote[[*]t distribucija sa 2 stupnja slobode(l) i 10 stupnjeva slobode(d) i standardna distribucija(prosjek 0, i st_dev 1) prikazana isprekidanom linijom. t distribucija ima deblje repove(viša asimetričnost) od standardne distribucije. Ova razlika je izražena kod malog broja stupnjeva slobode ali zanemariva za više vrijednosti stupnjeva slobode. Za veći broj stupnjeva slobode je t distribucija skoro identična normalnoj distribuciji.] --- .hi[**Provedi test u R**] .tiny[ ```r library(lsr) # Učitaj paket # Provedi test lsr::oneSampleTTest( x=grades, mu=67.5 ) ``` ``` #> #> One sample t-test #> #> Data variable: grades #> #> Descriptive statistics: #> grades #> mean 72.300 #> std dev. 9.521 #> #> Hypotheses: #> null: population mean equals 67.5 #> alternative: population mean not equal to 67.5 #> #> Test results: #> t-statistic: 2.255 #> degrees of freedom: 19 #> p-value: 0.036 #> #> Other information: #> two-sided 95% confidence interval: [67.844, 76.756] #> estimated effect size (Cohen's d): 0.504 ``` ] --- <br> <br> .hi[**Zapis rezultata testa**] > t(19) = 2.25, p < .05, CI_{95} = [67.8, 76.8] <br> <br> .hi[**Pretpostavke t testa**] <br> <br> 1. Normalnost distribucije <br> 2. Nezavisnost --- layout:false class: middle, inverse # t test ZA NEZAVISNE UZORKE (Student) <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Studentovo rješenje...) --- layout:true # t test ZA NEZAVISNE UZORKE (Student) --- ```r load (file.path("../Podatci/harpo.Rdata" )) # Učitaj podatke str(harpo) # Pregledaj podatke ``` ``` #> 'data.frame': 33 obs. of 2 variables: #> $ grade: num 65 72 66 74 73 71 66 76 69 79 ... #> $ tutor: Factor w/ 2 levels "Anastasia","Bernadette": 1 2 2 1 1 2 2 2 2 2 ... ``` ```r head( harpo,4 ) # Pregledaj podatke ``` ``` #> grade tutor #> 1 65 Anastasia #> 2 72 Bernadette #> 3 66 Bernadette #> 4 74 Anastasia ``` --- <br> <br> ```r # Deskriptivna statistika library(tidyverse) harpo %>% dplyr::group_by(tutor) %>% dplyr::summarise(prosjek = mean(grade), stDev = sd(grade)) ``` ``` #> # A tibble: 2 x 3 #> tutor prosjek stDev #> <fct> <dbl> <dbl> #> 1 Anastasia 74.5 9.00 #> 2 Bernadette 69.1 5.77 ``` --- .hi[**Histogram ocjena**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/harpohistanastasia-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Histogram prikazuje distribuciju ocjena u Anastasijnom razredu.] --- .hi[**Histogram ocjena**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/harpohistbernadette-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Histogram prikazuje distribuciju ocjena u Bernadettinom razredu.] --- .hi[**Postavi nultu i alternativnu hipotezu**] `$$\begin{array}{ll} H_0: & \mu_1 = \mu_2 \\ H_1: & \mu_1 \neq \mu_2 \end{array}$$` .hi[**Prikaži test grafički**] .tiny[ <img src="../Foto/t_promjena.png" width="2667" style="display: block; margin: auto;" /> ] --- .hi[**Testna statistika**] `$$t = \frac{\bar{X}_1 - \bar{X}_2}{\mbox{SE}}$$` .hi[**Grafički prikaz t distribucije**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/ttesthyp-1.svg" style="display: block; margin: auto;" /> .footnote[[*] Grafički prikaz nulte i alternativne hipoteze kod Studentovog t testa. Nulta hipoteza pretpostavlja da obje grupe imaju jednak prosjek mu_1 i mu_2 dok su ti prosjeci pod alternativnom hipotezom različiti. Primijeti pretpostavku da su populacijske distribucije normalne i da imaju jednaku standardnu devijaciju.] --- .hi[**Procjena udružene standardne devijacije**] - Definiraj pondere `$$\begin{array}{rcl} w_1 &=& N_1 - 1\\ w_2 &=& N_2 - 1 \end{array}$$` - Udružena procjena varijance `$$\hat\sigma^2_p = \frac{w_1 {\hat\sigma_1}^2 + w_2 {\hat\sigma_2}^2}{w_1 + w_2}$$` - Udružena procjena standardne devijacije `$$\hat\sigma_p = \sqrt{\frac{w_1 {\hat\sigma_1}^2 + w_2 {\hat\sigma_2}^2}{w_1 + w_2}}$$` --- .hi[**Drugačiji način**] - Izračunaj devijaciju od grupnog prosjeka `$$X_{ik} - \bar{X}_k$$` - Zbroji za sve opservacije `$$\frac{\sum_{ik} \left( X_{ik} - \bar{X}_k \right)^2}{N}$$` - Izvrši korekciju (nazivnik) `$$\hat\sigma^2_p = \frac{\sum_{ik} \left( X_{ik} - \bar{X}_k \right)^2}{N -2}$$` --- .hi[**Drugačiji način**] <br> - Izračunaj standardnu pogrešku razlike prosjeka `$$\mbox{SE}({\bar{X}_1 - \bar{X}_2}) = \hat\sigma \sqrt{\frac{1}{N_1} + \frac{1}{N_2}}$$` - t statistika u našem testu `$$t = \frac{\bar{X}_1 - \bar{X}_2}{\mbox{SE}({\bar{X}_1 - \bar{X}_2})}$$` --- .hi[**Izvođenje testa u R**] <br> <br> ```r # Pregledaj podatke head( harpo ) ``` ``` #> grade tutor #> 1 65 Anastasia #> 2 72 Bernadette #> 3 66 Bernadette #> 4 74 Anastasia #> 5 73 Anastasia #> 6 71 Bernadette ``` --- .hi[**Izvođenje testa u R**] .tiny[ ```r # Izvedi test independentSamplesTTest( formula = grade ~ tutor, # Formula za zavisnu i nezavisnu varijablu data = harpo, # Podatci var.equal = TRUE # Pretpostavka jednakih varijanci ) ``` ``` #> #> Student's independent samples t-test #> #> Outcome variable: grade #> Grouping variable: tutor #> #> Descriptive statistics: #> Anastasia Bernadette #> mean 74.533 69.056 #> std dev. 8.999 5.775 #> #> Hypotheses: #> null: population means equal for both groups #> alternative: different population means in each group #> #> Test results: #> t-statistic: 2.115 #> degrees of freedom: 31 #> p-value: 0.043 #> #> Other information: #> two-sided 95% confidence interval: [0.197, 10.759] #> estimated effect size (Cohen's d): 0.74 ``` ] --- .hi[**Sažeti statistički zapis rezultata testa**] <br> <br> > t(31) = 2.1, p<.05, CI_{95} = [0.2, 10.8], d = .74 <br> <br> .hi[**Pretpostavke testa**] <br> <br> 1. Normalnost distribucije <br> 2. Nezavisnost <br> 3. Homogenost varijance --- layout:false class: middle, inverse # t test ZA NEZAVISNE UZORKE (Welch) <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Welch-eva varijanta...) --- layout:true # t test ZA NEZAVISNE UZORKE (Student) --- .hi[Testna statistika] .hi[Standardna greška statistike] .hi[Izračun stupnjeva slobode] --- .hi[**Prikaži nultu i alternativnu hipotezu grafički**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/ttesthyp2-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Grafički prikaz nulte i alternativne hipoteze za Welch t test. Kao kod studentovog t testa pretpostavljamo normalnu distribuciju ali valja primijetiti da pod alternativnom hipotezom više ne zahtijevamo da oba uzorka imaju jednake varijance.] --- .hi[**Izvedi test u R**] .tiny[ ```r independentSamplesTTest( formula = grade ~ tutor, # Formula za zavisnu i nezavisnu varijablu data = harpo) # Podatci ``` ``` #> #> Welch's independent samples t-test #> #> Outcome variable: grade #> Grouping variable: tutor #> #> Descriptive statistics: #> Anastasia Bernadette #> mean 74.533 69.056 #> std dev. 8.999 5.775 #> #> Hypotheses: #> null: population means equal for both groups #> alternative: different population means in each group #> #> Test results: #> t-statistic: 2.034 #> degrees of freedom: 23.025 #> p-value: 0.054 #> #> Other information: #> two-sided 95% confidence interval: [-0.092, 11.048] #> estimated effect size (Cohen's d): 0.724 ``` ] --- <br> .hi[**Pretpostavke testa**] <br> <br> 1. Normalnost distribucije <br> 2. Nezavisnost --- layout:false class: middle, inverse # t test ZA ZAVISNE UZORKE <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (U zavisnim uzorcima...) --- layout:true # t test ZA ZAVISNE UZORKE --- ```r # Učitaj podatke load( file.path("../Podatci/chico.Rdata" )) str(chico) # Pogledaj podatke ``` ``` #> 'data.frame': 20 obs. of 3 variables: #> $ id : Factor w/ 20 levels "student1","student10",..: 1 12 14 15 16 17 18 19 20 2 ... #> $ grade_test1: num 42.9 51.8 71.7 51.6 63.5 58 59.8 50.8 62.5 61.9 ... #> $ grade_test2: num 44.6 54 72.3 53.4 63.8 59.3 60.8 51.6 64.3 63.2 ... ``` ```r # Pregledaj podatke head( chico,4 ) ``` ``` #> id grade_test1 grade_test2 #> 1 student1 42.9 44.6 #> 2 student2 51.8 54.0 #> 3 student3 71.7 72.3 #> 4 student4 51.6 53.4 ``` --- ```r library( psych ) # Učitaj paket psych::describe( chico ) # Pregledaj podatke ``` ``` #> vars n mean sd median trimmed mad min max range skew #> id* 1 20 10.50 5.92 10.5 10.50 7.41 1.0 20.0 19.0 0.00 #> grade_test1 2 20 56.98 6.62 57.7 56.92 7.71 42.9 71.7 28.8 0.05 #> grade_test2 3 20 58.38 6.41 59.7 58.35 6.45 44.6 72.3 27.7 -0.05 #> kurtosis se #> id* -1.38 1.32 #> grade_test1 -0.35 1.48 #> grade_test2 -0.39 1.43 ``` --- .hi[**Prikaži podatke grafički**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/pairedta-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Prosječna ocjena za test 1 i 2, uz prateće 95% intervale pouzdanosti.] --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/pairedtb-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Dijagram rasipanja za odnos ocjena na prvom i drugom testu.] --- .hi[Izračunaj razliku u ocjeni] ```r # Stvori vektor razlika u ocjenama između prvog i drugog testa chico$improvement <- chico$grade_test2 - chico$grade_test1 ``` ```r # Pregledaj podatke head( chico,5 ) ``` ``` #> id grade_test1 grade_test2 improvement #> 1 student1 42.9 44.6 1.7 #> 2 student2 51.8 54.0 2.2 #> 3 student3 71.7 72.3 0.6 #> 4 student4 51.6 53.4 1.8 #> 5 student5 63.5 63.8 0.3 ``` --- .hi[**Prikaži razlike grafički**] <img src="09_USPOREDBA_PROSJEKA_files/figure-html/pairedtc-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Histogram prikazuje individualna poboljšanja ocjene između prvog i drugog testa. Valja primijetiti da je gotovo cjelokupna distribucija iznad 0: najveći broj studenata je poboljšao rezultat na drugom testu.] --- ```r # Pogledaj granice pouzdanosti za varijablu poboljšanja ocjena ciMean( x = chico$improvement ) ``` ``` #> 2.5% 97.5% #> [1,] 0.9508686 1.859131 ``` --- .hi[**Tehnički aspekti testa**] - Definiraj razliku `$$D_{i} = X_{i1} - X_{i2}$$` - Formuliraj hipoteze `$$\begin{array}{ll} H_0: & \mu_D = 0 \\ H_1: & \mu_D \neq 0 \end{array}$$` - Testna statistika `$$t = \frac{\bar{D}}{\mbox{SE}({\bar{D}})}$$` --- .hi[**Izvođenje testa u R**] I. način .tiny[ ```r lsr::oneSampleTTest( chico$improvement, mu=0 ) ``` ``` #> #> One sample t-test #> #> Data variable: chico$improvement #> #> Descriptive statistics: #> improvement #> mean 1.405 #> std dev. 0.970 #> #> Hypotheses: #> null: population mean equals 0 #> alternative: population mean not equal to 0 #> #> Test results: #> t-statistic: 6.475 #> degrees of freedom: 19 #> p-value: <.001 #> #> Other information: #> two-sided 95% confidence interval: [0.951, 1.859] #> estimated effect size (Cohen's d): 1.448 ``` ] --- .hi[**Izvođenje testa u R**] II. način .tiny[ ```r lsr::pairedSamplesTTest( formula = ~ grade_test2 + grade_test1, # Formula za definiranje zavisnih i nezavisnih varijabli data = chico # Podatci ) ``` ``` #> #> Paired samples t-test #> #> Variables: grade_test2 , grade_test1 #> #> Descriptive statistics: #> grade_test2 grade_test1 difference #> mean 58.385 56.980 1.405 #> std dev. 6.406 6.616 0.970 #> #> Hypotheses: #> null: population means equal for both measurements #> alternative: different population means for each measurement #> #> Test results: #> t-statistic: 6.475 #> degrees of freedom: 19 #> p-value: <.001 #> #> Other information: #> two-sided 95% confidence interval: [0.951, 1.859] #> estimated effect size (Cohen's d): 1.448 ``` ] --- .hi[**Izvođenje testa u R**] II. način ```r # Prestrukturiraj podatke chico2 <- wideToLong( chico, within="time" ) head( chico2 ) # Pregledaj podatke ``` ``` #> id improvement time grade #> 1 student1 1.7 test1 42.9 #> 2 student2 2.2 test1 51.8 #> 3 student3 0.6 test1 71.7 #> 4 student4 1.8 test1 51.6 #> 5 student5 0.3 test1 63.5 #> 6 student6 1.3 test1 58.0 ``` --- II. način ```r # Sortiraj podatke chico2 <- sortFrame( chico2, id ) head( chico2 ) # Pregledaj podatke ``` ``` #> id improvement time grade #> 1 student1 1.7 test1 42.9 #> 21 student1 1.7 test2 44.6 #> 10 student10 1.3 test1 61.9 #> 30 student10 1.3 test2 63.2 #> 11 student11 1.4 test1 50.4 #> 31 student11 1.4 test2 51.8 ``` --- II. način .tiny[ ```r # Provedi test lsr::pairedSamplesTTest( formula = grade ~ time, # Definiraj formulu data = chico2, # Podatci id = "id" # Naziv id ) ``` ``` #> #> Paired samples t-test #> #> Outcome variable: grade #> Grouping variable: time #> ID variable: id #> #> Descriptive statistics: #> test1 test2 difference #> mean 56.980 58.385 -1.405 #> std dev. 6.616 6.406 0.970 #> #> Hypotheses: #> null: population means equal for both measurements #> alternative: different population means for each measurement #> #> Test results: #> t-statistic: -6.475 #> degrees of freedom: 19 #> p-value: <.001 #> #> Other information: #> two-sided 95% confidence interval: [-1.859, -0.951] #> estimated effect size (Cohen's d): 1.448 ``` ] --- .hi[**Alternativna specifikacija testa**] <br> <br> ``` pairedSamplesTTest( formula = grade ~ time + (id), data = chico2 ) ili ``` ``` pairedSamplesTTest( grade ~ time + (id), chico2 ) ``` --- layout:false class: middle, inverse # JEDNOSTRANI TESTOVI <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (U slučaju kada znamo smjer odnosa...) --- layout:true # JEDNOSTRANI TESTOVI --- .hi[**Provedi test sa pretpostavkom da je prosjek uzorka veći od pretpostavljenog prosjeka populacije**] .tiny[ ```r library(psych) library(lsr) # Provedi test lsr::oneSampleTTest( x = grades, mu = 67.5, one.sided = "greater" # Gornja granica ) ``` ``` #> #> One sample t-test #> #> Data variable: grades #> #> Descriptive statistics: #> grades #> mean 72.300 #> std dev. 9.521 #> #> Hypotheses: #> null: population mean less than or equal to 67.5 #> alternative: population mean greater than 67.5 #> #> Test results: #> t-statistic: 2.255 #> degrees of freedom: 19 #> p-value: 0.018 #> #> Other information: #> one-sided 95% confidence interval: [68.619, Inf] #> estimated effect size (Cohen's d): 0.504 ``` ] --- .tiny[ ```r # Provedi test na drugi način lsr::independentSamplesTTest( formula = grade ~ tutor, data = harpo, one.sided = "Anastasia" ) ``` ``` #> #> Welch's independent samples t-test #> #> Outcome variable: grade #> Grouping variable: tutor #> #> Descriptive statistics: #> Anastasia Bernadette #> mean 74.533 69.056 #> std dev. 8.999 5.775 #> #> Hypotheses: #> null: population means are equal, or smaller for group 'Anastasia' #> alternative: population mean is larger for group 'Anastasia' #> #> Test results: #> t-statistic: 2.034 #> degrees of freedom: 23.025 #> p-value: 0.027 #> #> Other information: #> one-sided 95% confidence interval: [0.863, Inf] #> estimated effect size (Cohen's d): 0.724 ``` ] --- .tiny[ ```r # Provedi test na treći način pairedSamplesTTest( formula = ~ grade_test2 + grade_test1, data = chico, one.sided = "grade_test2" ) ``` ``` #> #> Paired samples t-test #> #> Variables: grade_test2 , grade_test1 #> #> Descriptive statistics: #> grade_test2 grade_test1 difference #> mean 58.385 56.980 1.405 #> std dev. 6.406 6.616 0.970 #> #> Hypotheses: #> null: population means are equal, or smaller for measurement 'grade_test2' #> alternative: population mean is larger for measurement 'grade_test2' #> #> Test results: #> t-statistic: 6.475 #> degrees of freedom: 19 #> p-value: <.001 #> #> Other information: #> one-sided 95% confidence interval: [1.03, Inf] #> estimated effect size (Cohen's d): 1.448 ``` ] --- <br> ```r # Alternativne specifikacije testa > pairedSamplesTTest( formula = grade ~ time, data = chico2, id = "id", one.sided = "test2" ) > pairedSamplesTTest( formula = grade ~ time + (id), data = chico2, one.sided = "test2" ) ``` --- layout:false class: middle, inverse # STANDARDNI NAČIN PROVOĐENJA t-testa <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Test u praksi...) --- layout:true # STANDARDNI NAČIN PROVOĐENJA t-testa U R --- ```r # Provedi standardni test usporedbe prosjeka t.test( x = grades, # Definiraj podatke mu = 67.5 # Definiraj prosjek ) ``` ``` #> #> One Sample t-test #> #> data: grades #> t = 2.2547, df = 19, p-value = 0.03615 #> alternative hypothesis: true mean is not equal to 67.5 #> 95 percent confidence interval: #> 67.84422 76.75578 #> sample estimates: #> mean of x #> 72.3 ``` --- ```r # Provedi test za nezavisne uzorke t.test( formula = grade ~ tutor, # Definiraj formulu data = harpo ) # Definiraj podatke ``` ``` #> #> Welch Two Sample t-test #> #> data: grade by tutor #> t = 2.0342, df = 23.025, p-value = 0.05361 #> alternative hypothesis: true difference in means is not equal to 0 #> 95 percent confidence interval: #> -0.09249349 11.04804904 #> sample estimates: #> mean in group Anastasia mean in group Bernadette #> 74.53333 69.05556 ``` --- ```r # Provedi test za zavisne uzorke t.test( x = chico$grade_test2, # Definiraj varijablu y = chico$grade_test1, # Definiraj varijablu paired = TRUE # Zavisni uzorci ) ``` ``` #> #> Paired t-test #> #> data: chico$grade_test2 and chico$grade_test1 #> t = 6.4754, df = 19, p-value = 3.321e-06 #> alternative hypothesis: true difference in means is not equal to 0 #> 95 percent confidence interval: #> 0.9508686 1.8591314 #> sample estimates: #> mean of the differences #> 1.405 ``` --- layout:false class: middle, inverse # EFEKT VELIČINE <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Snaga procijenjenog odnosa...) --- layout:true # EFEKT VELIČINE --- .hi[**Osnovna ideja**] `$$d = \frac{\mbox{(prosjek 1)} - \mbox{(prosjek 2)}}{\mbox{std dev}}$$` .hi[**Interpretacija**] <table> <thead> <tr> <th style="text-align:left;"> `\(d\)`-vrijednost </th> <th style="text-align:left;"> okvirna interpretacija </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> otprilike 0.2 </td> <td style="text-align:left;"> mali efekt </td> </tr> <tr> <td style="text-align:left;"> otprilike 0.5 </td> <td style="text-align:left;"> umjereni efekt </td> </tr> <tr> <td style="text-align:left;"> otprilike 0.8 </td> <td style="text-align:left;"> veliki efekt </td> </tr> </tbody> </table> .footnote[[*] Okvirni prikaz za interpretaciju Choenove d statistike. d statistika pokazuje kolika je razlika između dva prosjeka izraženo u standardnim devijacijama.] --- .hi[**Jedan uzorak**] `$$d = \frac{\bar{X} - \mu_0}{\hat{\sigma}}$$` ```r # Provedi Choen test za jedan uzorak lsr::cohensD( x = grades, # Podatci mu = 67.5 # cUsporedi sa prosjekom od 67.5 ) ``` ``` #> [1] 0.5041691 ``` ```r # Provedi test "ručno" (bez funkcije) (mean(grades) - 67.5 ) / sd(grades) ``` ``` #> [1] 0.5041691 ``` --- .hi[**Efekt veličine Studentovog t testa**] `$$\delta = \frac{\mu_1 - \mu_2}{\sigma}$$` `$$d = \frac{\bar{X}_1 - \bar{X}_2}{\hat{\sigma}_p}$$` ```r # Provedi test u R lsr::cohensD( formula = grade ~ tutor, # Definiraj formulu data = harpo, # Podatci method = "pooled" # Tip testa ) ``` ``` #> [1] 0.7395614 ``` --- .hi[**Efekt veličine Welchovog t testa**] `$$\delta^\prime = \frac{\mu_1 - \mu_2}{\sigma^\prime}$$` `$$\sigma^\prime = \sqrt{\displaystyle{\frac{ {\sigma_1}^2 + {\sigma_2}^2}{2}}}$$` `$$d = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\displaystyle{\frac{ {\hat\sigma_1}^2 + {\hat\sigma_2}^2}{2}}}}$$` ```r # Provedi test u R lsr::cohensD( formula = grade ~ tutor, # Definiraj formulu data = harpo, # Podatci method = "unequal" # Tip testa ) ``` ``` #> [1] 0.7244995 ``` --- .hi[**Efekt veličine kod zavisnih uzoraka**] `$$d = \frac{\bar{D}}{\hat{\sigma}_D}$$` ```r # Provedi test u R lsr::cohensD( x = chico$grade_test2, # Definiraj prvu varijablu y = chico$grade_test1, # Definiraj drugu varijablu method = "paired" # Izaberi metodu ) ``` ``` #> [1] 1.447952 ``` --- layout:false class: middle, inverse # PROVJERA NORMALNOSTI DISTRIBUCIJE <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Jesu li zadovoljene pretpostavke?) --- layout:true # PROVJERA NORMALNOSTI DISTRIBUCIJE --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/qq1a-1.svg" style="display: block; margin: auto;" /> ``` #> Normalno distribuirani podatci #> Asimetrija= -0.02936155 #> Zakrivljenost= -0.06035938 #> #> Shapiro-Wilk normality test #> #> data: data #> W = 0.99108, p-value = 0.7515 ``` <br> .footnote[[*]Histogram normalno distribuiranih podataka; prikaz se odnosi na simulaciju 100 opservacija.] --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/qq1b-1.svg" style="display: block; margin: auto;" /> .footnote[[*]QQ plot normalno distribuiranih podataka, prikaz se odnosi na simulaciju 100 opservacija.] --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/qq2a-1.svg" style="display: block; margin: auto;" /> ``` #> Podatci sa gamma distribucijom. #> Asimetrija= 1.889475 #> Zakrivljenost= 4.4396 #> #> Shapiro-Wilk normality test #> #> data: data #> W = 0.81758, p-value = 8.908e-10 ``` .footnote[[*]Histogram 100 opservacija "zakrivljeno" distribuiranih podataka.] --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/qq2b-1.svg" style="display: block; margin: auto;" /> .footnote[[*]QQ plot normalno distribuiranih, "zakrivljenih" podataka, prikaz se odnosi na simulaciju 100 opservacija.] --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/qq2c-1.svg" style="display: block; margin: auto;" /> ``` #> Heavy-Tailed Data #> Asimetrija= -0.05308273 #> Zakrivljenost= 7.508765 #> #> Shapiro-Wilk normality test #> #> data: data #> W = 0.83892, p-value = 4.718e-09 ``` .footnote[[*]Histogram 100 opservacija u distribuciji koja ima puno mase u repovima.] --- <img src="09_USPOREDBA_PROSJEKA_files/figure-html/qq2d-1.svg" style="display: block; margin: auto;" /> .footnote[[*]Histogram 100 opservacija u distribuciji koja ima puno mase u repovima.] --- ```r normal.data <- rnorm( n = 100 ) # Stvori 100 normalno distribuiranih brojeva par(mfrow=c(1,2)) # Postavi grid za prikaz grafikona hist( x = normal.data ) # Napravi histogram qqnorm( y = normal.data ) # Napravi QQ grafikon ``` <img src="09_USPOREDBA_PROSJEKA_files/figure-html/unnamed-chunk-46-1.svg" style="display: block; margin: auto;" /> ```r par(mfrow=c(1,1)) # Resetiraj grid za prikaz grafikona ``` --- .hi[**Shapiro-Wilk test**] `$$W = \frac{ \left( \sum_{i = 1}^N a_i X_i \right)^2 }{ \sum_{i = 1}^N (X_i - \bar{X})^2}$$` ```r # Provedi test na nizu normalno distribuiranih podataka shapiro.test( x = normal.data ) ``` ``` #> #> Shapiro-Wilk normality test #> #> data: normal.data #> W = 0.98654, p-value = 0.4076 ``` --- layout:false class: middle, inverse # NE-STANDARDNA DISTRIBUCIJA <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Kada nisu zadovoljene pretpostavke...) --- layout:true # NE-STANDARDNA DISTRIBUCIJA --- .hi[**Wilcoxson test za dva uzorka**] ```r load(file.path("../Podatci/awesome.Rdata")) # Uvezi podatke print( awesome ) # Prikaži podatke ``` ``` #> scores group #> 1 6.4 A #> 2 10.7 A #> 3 11.9 A #> 4 7.3 A #> 5 10.0 A #> 6 14.5 B #> 7 10.4 B #> 8 12.9 B #> 9 11.7 B #> 10 13.0 B ``` --- ```r # Provedi Wicoxonov test wilcox.test( formula = scores ~ group, data = awesome) ``` ``` #> #> Wilcoxon rank sum exact test #> #> data: scores by group #> W = 3, p-value = 0.05556 #> alternative hypothesis: true location shift is not equal to 0 ``` --- ```r # Uvezi drugačije organizirane podatke load( file.path("../Podatci/awesome2.Rdata" )) score.A ``` ``` #> [1] 6.4 10.7 11.9 7.3 10.0 ``` ```r score.B ``` ``` #> [1] 14.5 10.4 12.9 11.7 13.0 ``` ```r # Provedi Wicoxonov test wilcox.test( x = score.A, y = score.B ) ``` ``` #> #> Wilcoxon rank sum exact test #> #> data: score.A and score.B #> W = 3, p-value = 0.05556 #> alternative hypothesis: true location shift is not equal to 0 ``` --- .hi[**Wilcoxson test za jedan uzorak**] ```r # Učitaj podatke load( file.path("../podatci/happy.Rdata" )) print( happiness ) # Prikaži podatke ``` ``` #> before after change #> 1 30 6 -24 #> 2 43 29 -14 #> 3 21 11 -10 #> 4 24 31 7 #> 5 23 17 -6 #> 6 40 2 -38 #> 7 29 31 2 #> 8 56 21 -35 #> 9 38 8 -30 #> 10 16 21 5 ``` --- ```r # Wilcoxconov test za jedan uzorak wilcox.test( x = happiness$change, mu = 0 ) ``` ``` #> #> Wilcoxon signed rank exact test #> #> data: happiness$change #> V = 7, p-value = 0.03711 #> alternative hypothesis: true location is not equal to 0 ``` --- ```r # Wilcoxonov test za zavisne uzorke wilcox.test( x = happiness$after, y = happiness$before, paired = TRUE ) ``` ``` #> #> Wilcoxon signed rank exact test #> #> data: happiness$after and happiness$before #> V = 7, p-value = 0.03711 #> alternative hypothesis: true location shift is not equal to 0 ``` --- layout:false class: middle, inverse # Hvala na pažnji! <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Sljedeće predavanje: Usporedba više od dva prosjeka.)