class: center, middle, inverse, title-slide # MULTIVARIJATNE STATISTIČKE METODE ## Predavanje 7: Multivarijatna regresijska analiza ### dr.sc. Luka Šikić ### Fakultet hrvatskih studija |
Github MV
--- <style type="text/css"> @media print { .has-continuation { display: block !important; } } remark-slide-content { font-size: 22px; padding: 20px 80px 20px 80px; } .remark-code, .remark-inline-code { background: #f0f0f0; } .remark-code { font-size: 16px; } .mid. remark-code { /*Change made here*/ font-size: 60% !important; } .tiny .remark-code { /*Change made here*/ font-size: 40% !important; } /* custom.css */ .left-code { color: #777; width: 38%; height: 92%; float: left; } .right-plot { width: 60%; float: right; padding-left: 1%; } .plot-callout { height: 225px; width: 450px; bottom: 5%; right: 5%; position: absolute; padding: 0px; z-index: 100; } .plot-callout img { width: 100%; border: 4px solid #23373B; } </style> # Pregled predavanja <br> <br> <br> 1. [Praktični primjer](#exemplar) 2. [Univarijatna regresijska analiza](#uni) 3. [Predikcija i elementi modela](#pred) 4. [Kvaliteta procjene](#qual) 5. [Višestruka regresija](#multi) --- class: inverse, center, middle name: exemplar # PRAKTIČNI PRIMJER <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Motivacija) --- # Podatci ```r # Koristimo "marketing" podatkovni skup iz "datarium" paketa library(datarium) # učitaj paket data("marketing") # učitaj podatke # Podatci se odnose na prodajni efekt ulaganja u promidžbu na youtube, facebook i novinske oglase head(marketing,10) # pogledaj podatke ``` ``` ## youtube facebook newspaper sales ## 1 276.12 45.36 83.04 26.52 ## 2 53.40 47.16 54.12 12.48 ## 3 20.64 55.08 83.16 11.16 ## 4 181.80 49.56 70.20 22.20 ## 5 216.96 12.96 70.08 15.48 ## 6 10.44 58.68 90.00 8.64 ## 7 69.00 39.36 28.20 14.16 ## 8 144.24 23.52 13.92 15.84 ## 9 10.32 2.52 1.20 5.76 ## 10 239.76 3.12 25.44 12.72 ``` --- # Model <br> <br> - cilj je procijeniti utjecaj oglašavanja na youtube, fb i novinskih oglasa na prodaju <br> - procjenjujemo model: `sales = b0 + b1*youtube + b2*facebook + b3*newspaper` <br> <br> ```r # Procijeni model u R model <- lm(sales ~ youtube + facebook + newspaper, data = marketing) ``` --- # Interpretacija ```r summary(model)$coefficients ``` ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.526667243 0.374289884 9.4222884 1.267295e-17 ## youtube 0.045764645 0.001394897 32.8086244 1.509960e-81 ## facebook 0.188530017 0.008611234 21.8934961 1.505339e-54 ## newspaper -0.001037493 0.005871010 -0.1767146 8.599151e-01 ``` --- # Model ```r # Rezultati modela summary(model) ``` ``` ## ## Call: ## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.5932 -1.0690 0.2902 1.4272 3.3951 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.526667 0.374290 9.422 <2e-16 *** ## youtube 0.045765 0.001395 32.809 <2e-16 *** ## facebook 0.188530 0.008611 21.893 <2e-16 *** ## newspaper -0.001037 0.005871 -0.177 0.86 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.023 on 196 degrees of freedom ## Multiple R-squared: 0.8972, Adjusted R-squared: 0.8956 ## F-statistic: 570.3 on 3 and 196 DF, p-value: < 2.2e-16 ``` --- # Poboljšanje modela ```r # izbaci nesignifikantnu varijablu model <- lm(sales ~ youtube + facebook, data = marketing) summary(model) ``` ``` ## ## Call: ## lm(formula = sales ~ youtube + facebook, data = marketing) ## ## Residuals: ## Min 1Q Median 3Q Max ## -10.5572 -1.0502 0.2906 1.4049 3.3994 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.50532 0.35339 9.919 <2e-16 *** ## youtube 0.04575 0.00139 32.909 <2e-16 *** ## facebook 0.18799 0.00804 23.382 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.018 on 197 degrees of freedom ## Multiple R-squared: 0.8972, Adjusted R-squared: 0.8962 ## F-statistic: 859.6 on 2 and 197 DF, p-value: < 2.2e-16 ``` --- # Kvaliteta modela ##### Intervali pouzdanosti ```r confint(model) ``` ``` ## 2.5 % 97.5 % ## (Intercept) 2.80841159 4.20222820 ## youtube 0.04301292 0.04849671 ## facebook 0.17213877 0.20384969 ``` ##### R-sq ```r summary(model)$r.squared ``` ``` ## [1] 0.8971943 ``` ##### RSE ```r sigma(model)/mean(marketing$sales) ``` ``` ## [1] 0.1199045 ``` --- # Alternativna specifikacija <br> <br> ```r # Svi prediktori u jednom funkcijskom pozivu model <- lm(sales ~., data = marketing) ``` <br> ```r # Selekcija prediktora; bez varijable newspaper model <- lm(sales ~. -newspaper, data = marketing) ``` <br> ```r # Također moguće i korištenje sintakse model1 <- update(model, ~. -newspaper) ``` --- class: inverse, center, middle name: uni # UVIVARIJATNA REGRESIJA <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Osnovni model) --- # Podatci ```r library(dplyr) # Učitaj podatke SMID <- read.csv2("../Podatci/SLR.csv", header = TRUE) %>% # podatci na GHubu rename(n_claims = X, total_payment_sek = Y) # preimenuj varijable str(SMID) # struktura ``` ``` ## 'data.frame': 63 obs. of 2 variables: ## $ n_claims : int 108 19 13 124 40 57 23 14 45 10 ... ## $ total_payment_sek: num 392.5 46.2 15.7 422.2 119.4 ... ``` ```r head(SMID, 5) # pogledaj podatke ``` ``` ## n_claims total_payment_sek ## 1 108 392.5 ## 2 19 46.2 ## 3 13 15.7 ## 4 124 422.2 ## 5 40 119.4 ``` --- # Deskriptivna statistika <br> <br> .pull-left[ ```r library(dplyr) SMID %>% summarize_all(mean) ``` ``` ## n_claims total_payment_sek ## 1 22.90476 98.1873 ``` ] .pull-right[ ```r SMID %>% summarize(correlation = cor(n_claims, total_payment_sek)) ``` ``` ## correlation ## 1 0.9128782 ``` ] --- # Što je regresija? <br> <br> > Statistički model koji objašnjava odnos između zavisne i nezavisne varijable. > Uz danu vrijednost nezavisne varijable, kolika je vrijednost zavisne varijable? <br> <br> ``` ## # A tibble: 6 x 2 ## n_claims total_payment_sel ## <chr> <chr> ## 1 108 392.5 ## 2 19 46.2 ## 3 13 15.7 ## 4 124 422.2 ## 5 40 119.4 ## 6 200 ??? ``` --- # Osnovna terminologija ###### ŽARGON - **response** varijabla ili zavisna varijable je ona koju želimo predvidjeti - **eksplanatorna** varijabla ili nazavisna je ona koja objašnjava promjenu u zavisnoj <br> <br> ###### DEFINICIJE - Linearna regresija: zavisna varijabla je numerička - Logistička regresija: zavisna varijabla je logička (0,1) - Univarijatna regresija: samo jedna nezavisna varijabla - MUltivarijatna regresija: više zavisnih varijabli <br> <br> --- # Vizualizacija odnosa <br> <br> .left-code[ ```r library(ggplot2) ggplot(SMID, aes(n_claims, total_payment_sek)) + geom_point() + ggtitle("Odnos isplata i zahtijeva") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label-out-1.png" style="display: block; margin: auto;" /> ] --- # Dodaj regresijski pravac <br> <br> .left-code[ ```r library(ggplot2) ggplot(SMID, aes(n_claims, total_payment_sek)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos isplata i zahtijeva") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label2-out-1.png" style="display: block; margin: auto;" /> ] --- # Regresijski pravac <br> <br> <br> - odječak na osi y (*intercept*) - nagib pravca (*slope*) je iznos za koji se poveća y ako se x poveća za jednu jedinicu - jednadžba pravca: `y = odsječak + nagib*x` --- # Procjena nagiba <br> <br> <img src="../Foto/reg1.png" width="500px" style="display: block; margin: auto;" /> <br> --- # Procjena nagiba <br> <br> <img src="../Foto/reg2.png" width="500px" style="display: block; margin: auto;" /> <br> --- # Procjena nagiba <br> <br> <img src="../Foto/reg3.png" width="500px" style="display: block; margin: auto;" /> <br> --- # Procjena nagiba <br> <br> <img src="../Foto/reg4.png" width="500px" style="display: block; margin: auto;" /> <br> --- # Procjena nagiba <br> <br> <img src="../Foto/reg5.png" width="500px" style="display: block; margin: auto;" /> <br> --- # Procjena nagiba <br> <br> <img src="../Foto/reg6.png" width="500px" style="display: block; margin: auto;" /> <br> --- # Procijeni model <br> <br> ```r lm(total_payment_sek ~ n_claims, data = SMID) ``` ``` ## ## Call: ## lm(formula = total_payment_sek ~ n_claims, data = SMID) ## ## Coefficients: ## (Intercept) n_claims ## 19.994 3.414 ``` --- # Interpretacija koeficijenata <br> <br> ##### Model ```r lm(total_payment_sek ~ n_claims, data = SMID) ``` ``` ## ## Call: ## lm(formula = total_payment_sek ~ n_claims, data = SMID) ## ## Coefficients: ## (Intercept) n_claims ## 19.994 3.414 ``` ##### Jednadžba `total_payment_sek = 19.994 + 3.414*n_claims` --- class: inverse, center, middle # Kategorijska zavisna varijabla <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Predviđanje kategorija) --- # Podatci ```r head(sample_n(fish,10),10) # pogledaj podatke (detaljan opis podataka u GH repo-u) ``` ``` ## species mass_g length_cm ## 1 Perch 1000.0 39.8 ## 2 Common Bream 1000.0 33.5 ## 3 Smelt 19.7 13.2 ## 4 Smelt 10.0 11.3 ## 5 Common Bream 975.0 37.4 ## 6 Pike 567.0 43.2 ## 7 Smelt 6.7 9.3 ## 8 Silver Bream 300.0 24.0 ## 9 Roach 150.0 20.4 ## 10 Silver Bream 60.0 14.3 ``` <br> <br> - svaki red se odnosi na jednu ribu - uzorak se sastoji od 71 ribe - u uzorak ulazi devet vrsta riba --- # Vizualiziraj podatke .left-code[ ```r ggplot(fish, aes(mass_g)) + geom_histogram(bins = 9) + facet_wrap(vars(species)) + ggtitle("Histogram prema vrstama riba") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label3-out-1.png" style="display: block; margin: auto;" /> ] --- # Deskriptivna statistika <br> <br> ```r fish %>% group_by(species) %>% summarize(mean_mass_g = round(mean(mass_g),2)) ``` ``` ## # A tibble: 7 x 2 ## species mean_mass_g ## * <fct> <dbl> ## 1 Common Bream 802. ## 2 Whitefish 800 ## 3 Roach 165. ## 4 Silver Bream 165. ## 5 Smelt 11.2 ## 6 Pike 864. ## 7 Perch 581. ``` --- # Regresijski model <br> ```r lm(mass_g ~ species, data = fish) ``` ``` ## ## Call: ## lm(formula = mass_g ~ species, data = fish) ## ## Coefficients: ## (Intercept) speciesWhitefish speciesRoach ## 802.50 -2.50 -637.88 ## speciesSilver Bream speciesSmelt speciesPike ## -637.07 -791.32 61.17 ## speciesPerch ## -221.94 ``` --- # Regresijski model bez odsječka <br> ```r lm(mass_g ~ species + 0, data = fish) ``` ``` ## ## Call: ## lm(formula = mass_g ~ species + 0, data = fish) ## ## Coefficients: ## speciesCommon Bream speciesWhitefish speciesRoach ## 802.50 800.00 164.62 ## speciesSilver Bream speciesSmelt speciesPike ## 165.43 11.18 863.67 ## speciesPerch ## 580.56 ``` --- class: inverse, center, middle name: pred # PREDIKCIJA I ELEMENTI MODELA <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Nakon procjene...) --- # Podatkovni skup o ribama <br> ```r # Izaberi samo "Bream" vrstu bream <- fish %>% filter(str_detect(species,"Bream")) head(bream,12) ``` ``` ## species mass_g length_cm ## 1 Common Bream 600 29.4 ## 2 Common Bream 700 30.4 ## 3 Common Bream 575 31.3 ## 4 Common Bream 725 31.8 ## 5 Common Bream 1000 33.5 ## 6 Common Bream 920 35.0 ## 7 Common Bream 925 36.2 ## 8 Common Bream 975 37.4 ## 9 Silver Bream 60 14.3 ## 10 Silver Bream 90 16.3 ## 11 Silver Bream 120 17.5 ## 12 Silver Bream 170 19.0 ``` --- # Vizualiziraj težinu vs. duljinu <br> <br> .left-code[ ```r ggplot(bream, aes(length_cm, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label4-out-1.png" style="display: block; margin: auto;" /> ] --- # Provedi regresijski model <br> <br> ```r tezina_vs_visina <- lm(mass_g ~ length_cm, data = bream) tezina_vs_visina ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm, data = bream) ## ## Coefficients: ## (Intercept) length_cm ## -669.92 44.19 ``` --- # Predviđanje <br> > Ako postavimo nezavisne varijable na vrijednost xy, koja će biti vrijednost zavisne varijable? ```r nezavisne_vars <- tibble(length_cm = 20:40) ``` ```r predict(tezina_vs_visina, nezavisne_vars) ``` ``` ## 1 2 3 4 5 6 7 8 ## 213.8509 258.0393 302.2278 346.4162 390.6046 434.7931 478.9815 523.1700 ## 9 10 11 12 13 14 15 16 ## 567.3584 611.5468 655.7353 699.9237 744.1122 788.3006 832.4890 876.6775 ## 17 18 19 20 21 ## 920.8659 965.0544 1009.2428 1053.4312 1097.6197 ``` --- # Predviđanje u `data.frame` objektu <br> ```r prediction_data <- nezavisne_vars %>% mutate(mass_g = predict(tezina_vs_visina, nezavisne_vars)) prediction_data ``` ``` ## # A tibble: 21 x 2 ## length_cm mass_g ## <int> <dbl> ## 1 20 214. ## 2 21 258. ## 3 22 302. ## 4 23 346. ## 5 24 391. ## 6 25 435. ## 7 26 479. ## 8 27 523. ## 9 28 567. ## 10 29 612. ## # ... with 11 more rows ``` --- # Vizualiziraj predikcije <br> <br> .left-code[ ```r ggplot(bream, aes(length_cm, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + geom_point( data = prediction_data, color = "blue") + ggtitle(" Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label5-out-1.png" style="display: block; margin: auto;" /> ] --- # Ekstrapolacija <br> - *Ekstrapolacija* označava predikciju izvan postojećih podataka <br> ```r mali_bream <- tibble(length_cm = 10) mali_bream %>% mutate(mass_g = predict(tezina_vs_visina, mali_bream)) ``` ``` ## # A tibble: 1 x 2 ## length_cm mass_g ## <dbl> <dbl> ## 1 10 -228. ``` --- class: inverse, center, middle # Elementi modela <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Što se sve nalazi u regresijskom objektu?) --- # Regresijski koeficijenti <br> <br> ```r tezina_vs_visina <- lm(mass_g ~ length_cm, data = bream) tezina_vs_visina ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm, data = bream) ## ## Coefficients: ## (Intercept) length_cm ## -669.92 44.19 ``` ```r coefficients(tezina_vs_visina) ``` ``` ## (Intercept) length_cm ## -669.91790 44.18844 ``` --- # *fitted* vrijednosti - predstavljaju predviđanja na originalnim podatcima ```r fitted(tezina_vs_visina) ``` ``` ## 1 2 3 4 5 6 7 8 ## 629.22222 673.41066 713.18026 735.27447 810.39482 876.67748 929.70361 982.72974 ## 9 10 11 12 13 14 15 ## -38.02322 50.35366 103.37979 169.66245 205.01320 346.41621 390.60465 ``` ```r # jednako kao: explanatory_data <- bream %>% select(length_cm) predict(tezina_vs_visina, explanatory_data) ``` ``` ## 1 2 3 4 5 6 7 8 ## 629.22222 673.41066 713.18026 735.27447 810.39482 876.67748 929.70361 982.72974 ## 9 10 11 12 13 14 15 ## -38.02322 50.35366 103.37979 169.66245 205.01320 346.41621 390.60465 ``` --- # Rrezidualne vrijednosti - predstavljaju razliku između nezavisne varijable i predviđenih vrijednosti ```r residuals(tezina_vs_visina) ``` ``` ## 1 2 3 4 5 6 ## -29.2222201 26.5893405 -138.1802550 -10.2744748 189.6051782 43.3225190 ## 7 8 9 10 11 12 ## -4.7036084 -7.7297357 98.0232157 39.6463368 16.6202095 0.3375503 ## 13 14 15 ## -60.0132013 -73.4162076 -90.6046470 ``` ```r # ekvivalentno: bream$mass_g - fitted(tezina_vs_visina) ``` ``` ## 1 2 3 4 5 6 ## -29.2222201 26.5893405 -138.1802550 -10.2744748 189.6051782 43.3225190 ## 7 8 9 10 11 12 ## -4.7036084 -7.7297357 98.0232157 39.6463368 16.6202095 0.3375503 ## 13 14 15 ## -60.0132013 -73.4162076 -90.6046470 ``` --- # Puni pregled modela ```r summary(tezina_vs_visina) ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm, data = bream) ## ## Residuals: ## Min 1Q Median 3Q Max ## -138.180 -44.618 -4.704 33.118 189.605 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -669.918 77.182 -8.68 9.06e-07 *** ## length_cm 44.188 2.791 15.83 7.08e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 81.96 on 13 degrees of freedom ## Multiple R-squared: 0.9507, Adjusted R-squared: 0.9469 ## F-statistic: 250.6 on 1 and 13 DF, p-value: 7.081e-10 ``` --- # Dodatne vještine u R <br> <br> ```r library(broom) # podgledaj dokumentaciju na https://opr.princeton.edu/workshops/Downloads/2016Jan_BroomRobinson.pdf tidy(tezina_vs_visina) # napravi tibble (dframe) procjene ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -670. 77.2 -8.68 9.06e- 7 ## 2 length_cm 44.2 2.79 15.8 7.08e-10 ``` --- # Dodatne vještine u R ```r augment(tezina_vs_visina) ``` ``` ## # A tibble: 15 x 8 ## mass_g length_cm .fitted .resid .hat .sigma .cooksd .std.resid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 600 29.4 629. -29.2 0.0758 84.9 0.00564 -0.371 ## 2 700 30.4 673. 26.6 0.0835 84.9 0.00523 0.339 ## 3 575 31.3 713. -138. 0.0924 74.3 0.159 -1.77 ## 4 725 31.8 735. -10.3 0.0981 85.2 0.000948 -0.132 ## 5 1000 33.5 810. 190. 0.122 62.2 0.423 2.47 ## 6 920 35 877. 43.3 0.149 84.2 0.0286 0.573 ## 7 925 36.2 930. -4.70 0.174 85.3 0.000419 -0.0631 ## 8 975 37.4 983. -7.73 0.202 85.3 0.00141 -0.106 ## 9 60 14.3 -38.0 98.0 0.242 78.9 0.301 1.37 ## 10 90 16.3 50.4 39.6 0.190 84.4 0.0338 0.537 ## 11 120 17.5 103. 16.6 0.163 85.1 0.00477 0.222 ## 12 170 19 170. 0.338 0.134 85.3 0.00000151 0.00442 ## 13 145 19.8 205. -60.0 0.120 83.3 0.0416 -0.781 ## 14 273 23 346. -73.4 0.0816 82.4 0.0388 -0.935 ## 15 300 24 391. -90.6 0.0745 80.9 0.0531 -1.15 ``` --- # Dodatne vještine u R <br> <br> ```r glance(tezina_vs_visina) ``` ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.951 0.947 82.0 251. 7.08e-10 1 -86.3 179. 181. ## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` --- class: inverse, center, middle # Nelinearnosti <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Svijet nije linearan!) --- # *Perch* vrsta ribe ```r perch <- fish %>% filter(species == "Perch") head(perch,12) ``` ``` ## species mass_g length_cm ## 1 Perch 125 19.0 ## 2 Perch 130 19.3 ## 3 Perch 120 20.0 ## 4 Perch 110 20.0 ## 5 Perch 130 20.5 ## 6 Perch 150 20.5 ## 7 Perch 180 23.0 ## 8 Perch 300 25.2 ## 9 Perch 260 25.4 ## 10 Perch 250 25.4 ## 11 Perch 300 26.9 ## 12 Perch 320 27.8 ``` --- # Nelinearnost ```r ggplot(perch, aes(length_cm, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos duljine i težine") ``` <img src="07_REG_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> --- # Bream vs. perch vrsta <br> <br> <img src="../Foto/reg8.png" width="600px" style="display: block; margin: auto;" /> --- # Težina vs. duljina **na treću(^3)** <br> <br> .left-code[ ```r ggplot(perch, aes(length_cm ^ 3, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label6-out-1.png" style="display: block; margin: auto;" /> ] --- # Regresijski model <br> <br> ```r mdl_perch <- lm(mass_g ~ I(length_cm ^ 3), data = perch) mdl_perch ``` ``` ## ## Call: ## lm(formula = mass_g ~ I(length_cm^3), data = perch) ## ## Coefficients: ## (Intercept) I(length_cm^3) ## 10.80230 0.01677 ``` --- # Predikcija <br> <br> ```r explanatory_data <- tibble(length_cm = seq(10, 40, 5)) prediction_data <- explanatory_data %>% mutate(mass_g = predict(mdl_perch, explanatory_data)) head(prediction_data) ``` ``` ## # A tibble: 6 x 2 ## length_cm mass_g ## <dbl> <dbl> ## 1 10 27.6 ## 2 15 67.4 ## 3 20 145. ## 4 25 273. ## 5 30 464. ## 6 35 730. ``` --- # Vizualiziraj (^3) .left-code[ ```r ggplot(perch, aes(length_cm ^ 3, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + geom_point(data = prediction_data, color = "blue") + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label7-out-1.png" style="display: block; margin: auto;" /> ] --- # Vizualiziraj .left-code[ ```r ggplot(perch, aes(length_cm, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + geom_point(data = prediction_data, color = "blue") + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label8-out-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle name: exemplar # KVALITETA PROCJENE <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Koliko je dobar model?) --- # Bream vs perch model <img src="07_REG_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" /> --- # Koeficijent determinacije <br> > Označava proporciju varijance u zavisnoj varijabli koja se može objasniti sa nezavisnom varijablom. - `1` označava savršeni fit - `0` označava najgori mogući fit .tiny[ ```r mdl_bream <- lm(mass_g ~ length_cm, data = bream) summary(mdl_bream) # Vidi R-squared ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm, data = bream) ## ## Residuals: ## Min 1Q Median 3Q Max ## -138.180 -44.618 -4.704 33.118 189.605 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -669.918 77.182 -8.68 9.06e-07 *** ## length_cm 44.188 2.791 15.83 7.08e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 81.96 on 13 degrees of freedom ## Multiple R-squared: 0.9507, Adjusted R-squared: 0.9469 ## F-statistic: 250.6 on 1 and 13 DF, p-value: 7.081e-10 ``` ] --- # Alternativni način ```r # Pregledaj rezultate modela mdl_bream %>% glance() ``` ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.951 0.947 82.0 251. 7.08e-10 1 -86.3 179. 181. ## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` ```r # Izvuci koeficijent determinacije mdl_bream %>% glance() %>% pull(r.squared) ``` ``` ## [1] 0.9506922 ``` ```r # Zapravo se radi o običnom kvadratu korelacijskog koeficijenta :-) bream %>% summarize(coeff_determination = cor(length_cm, mass_g) ^ 2) ``` ``` ## coeff_determination ## 1 0.9506922 ``` --- # Rezidualna standardna pogreška (RSE) > Tipična razlika između predikcije i zavisne varijable. ```r # Vidi residual standard error summary(mdl_bream) ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm, data = bream) ## ## Residuals: ## Min 1Q Median 3Q Max ## -138.180 -44.618 -4.704 33.118 189.605 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -669.918 77.182 -8.68 9.06e-07 *** ## length_cm 44.188 2.791 15.83 7.08e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 81.96 on 13 degrees of freedom ## Multiple R-squared: 0.9507, Adjusted R-squared: 0.9469 ## F-statistic: 250.6 on 1 and 13 DF, p-value: 7.081e-10 ``` --- # Rezidualna standardna pogreška (RSE) <br> <br> ```r mdl_bream %>% glance() ``` ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.951 0.947 82.0 251. 7.08e-10 1 -86.3 179. 181. ## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` ```r # Izvuci RSE mdl_bream %>% glance() %>% pull(sigma) ``` ``` ## [1] 81.95992 ``` --- # Izračunaj RSE: reziduali ^2 ```r bream %>% mutate(residuals_sq = residuals(mdl_bream) ^ 2) ``` ``` ## species mass_g length_cm residuals_sq ## 1 Common Bream 600 29.4 8.539381e+02 ## 2 Common Bream 700 30.4 7.069930e+02 ## 3 Common Bream 575 31.3 1.909378e+04 ## 4 Common Bream 725 31.8 1.055648e+02 ## 5 Common Bream 1000 33.5 3.595012e+04 ## 6 Common Bream 920 35.0 1.876841e+03 ## 7 Common Bream 925 36.2 2.212393e+01 ## 8 Common Bream 975 37.4 5.974881e+01 ## 9 Silver Bream 60 14.3 9.608551e+03 ## 10 Silver Bream 90 16.3 1.571832e+03 ## 11 Silver Bream 120 17.5 2.762314e+02 ## 12 Silver Bream 170 19.0 1.139402e-01 ## 13 Silver Bream 145 19.8 3.601584e+03 ## 14 Silver Bream 273 23.0 5.389940e+03 ## 15 Silver Bream 300 24.0 8.209202e+03 ``` --- # Izračunaj RSE: zbroji <br> <br> ```r bream %>% mutate(residuals_sq = residuals(mdl_bream) ^ 2) %>% summarize(resid_sum_of_sq = sum(residuals_sq)) ``` ``` ## resid_sum_of_sq ## 1 87326.57 ``` --- # Izračunaj RSE: stupnjevi slobode <br> <br> > Stupnjevi slobode su broj opservacija minus broj modelskih koeficijenata. ```r bream %>% mutate(residuals_sq = residuals(mdl_bream) ^ 2) %>% summarize( resid_sum_of_sq = sum(residuals_sq), deg_freedom = n() - 2) ``` ``` ## resid_sum_of_sq deg_freedom ## 1 87326.57 13 ``` --- # Izračunaj RSE: korijen omjera ```r bream %>% mutate(residuals_sq = residuals(mdl_bream) ^ 2) %>% summarize(resid_sum_of_sq = sum(residuals_sq), deg_freedom = n() - 2, rse = sqrt(resid_sum_of_sq / deg_freedom)) ``` ``` ## resid_sum_of_sq deg_freedom rse ## 1 87326.57 13 81.95992 ``` --- # Interpretacija RSE <br> <br> <br> - `mdl_bream` ima RSE `74` <br> > Taj pokazatelj se može interpretirati na način da je razlika predviđene mase bream vrste i prave (podatci) mase te vrste otprilike 74g. --- # RMSE <br> ###### Rezidualna stde .pull-left[ ```r bream %>% mutate(residuals_sq = residuals(mdl_bream) ^ 2) %>% summarize(resid_sum_of_sq = sum(residuals_sq), deg_freedom = n() - 2, rse = sqrt(resid_sum_of_sq / deg_freedom)) ``` ``` ## resid_sum_of_sq deg_freedom rse ## 1 87326.57 13 81.95992 ``` ] ###### Korijen prosjecne stde (RMSE) .pull-right[ ```r bream %>% mutate(residuals_sq = residuals(mdl_bream) ^ 2) %>% summarize( resid_sum_of_sq = sum(residuals_sq), n_obs = n(), rmse = sqrt(resid_sum_of_sq / n_obs)) ``` ``` ## resid_sum_of_sq n_obs rmse ## 1 87326.57 15 76.30053 ``` ] --- # Poželjna stvojstva reziduala <br> <br> - Normalna disribucija <br> <br> - Prosjek reziduala je 0 --- # Dobar model: Bream ```r mdl_bream <- lm(mass_g ~ length_cm, data = bream) ``` <img src="07_REG_files/figure-html/unnamed-chunk-65-1.png" style="display: block; margin: auto;" /> --- # Loš model: Perch ```r mdl_perch <- lm(mass_g ~ length_cm, data = perch) ``` <img src="07_REG_files/figure-html/unnamed-chunk-67-1.png" style="display: block; margin: auto;" /> --- # Reziduali vs. fitted vrijednosti ###### Bream <br> .pull-left[ <img src="07_REG_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" /> ] ###### Perch .pull-right[ <img src="07_REG_files/figure-html/unnamed-chunk-69-1.png" style="display: block; margin: auto;" /> ] --- # QQ ###### Bream .pull-left[ <img src="07_REG_files/figure-html/unnamed-chunk-70-1.png" style="display: block; margin: auto;" /> ] ###### Perch .pull-right[ <img src="07_REG_files/figure-html/unnamed-chunk-71-1.png" style="display: block; margin: auto;" /> ] --- # Scale-location ###### Bream .pull-left[ <img src="07_REG_files/figure-html/unnamed-chunk-72-1.png" style="display: block; margin: auto;" /> ] ###### Perch .pull-right[ <img src="07_REG_files/figure-html/unnamed-chunk-73-1.png" style="display: block; margin: auto;" /> ] --- # autoplot() funkcija .left-code[ ```r autoplot( mdl_perch, which = 1:3, nrow = 3, ncol = 1 ) ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label9-out-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle # Ekstremne vrijednosti <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (*Outlieri*) --- # Roach vrsta <br> <br> ```r roach <- fish %>% filter(species == "Roach") roach %>% head(8) ``` ``` ## species mass_g length_cm ## 1 Roach 110 19.1 ## 2 Roach 120 19.4 ## 3 Roach 150 20.4 ## 4 Roach 145 20.5 ## 5 Roach 160 20.5 ## 6 Roach 160 21.1 ## 7 Roach 200 22.1 ## 8 Roach 272 25.0 ``` --- # Ekstremne vrijednosti <br> <br> .left-code[ ```r ggplot(roach, aes(length_cm, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label10-out-1.png" style="display: block; margin: auto;" /> ] --- # Ekstremne vrijednosti <br> <br> .left-code[ ```r roach %>% mutate(has_extreme_length = length_cm < 15 | length_cm > 26) %>% ggplot(aes(length_cm, mass_g)) + geom_point(aes(color = has_extreme_length)) + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label11-out-1.png" style="display: block; margin: auto;" /> ] --- # Ekstremne vrijednosti <br> <br> .left-code[ ```r roach %>% mutate(has_extreme_length = length_cm < 15 | length_cm > 26,has_extreme_mass = mass_g < 1) %>% ggplot(aes(length_cm, mass_g)) + geom_point(aes(color = has_extreme_length,shape = has_extreme_mass)) + geom_smooth(method = "lm", se = FALSE) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label12-out-1.png" style="display: block; margin: auto;" /> ] --- # Poluga (*leverge*) <br> <br> > "Poluga" pokazuje koliko su ekstremne nezavisne varijable. ```r mdl_roach <- lm(mass_g ~ length_cm, data = roach) hatvalues(mdl_roach) ``` ``` ## 1 2 3 4 5 6 7 8 ## 0.2758390 0.2322285 0.1404712 0.1358317 0.1358317 0.1253157 0.1737718 0.7807103 ``` --- # .hat kolona <br> ```r augment(mdl_roach) ``` ``` ## # A tibble: 8 x 8 ## mass_g length_cm .fitted .resid .hat .sigma .cooksd .std.resid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 110 19.1 112. -2.25 0.276 6.43 0.0375 -0.444 ## 2 120 19.4 120. -0.468 0.232 6.53 0.00121 -0.0894 ## 3 150 20.4 148. 2.15 0.140 6.45 0.0123 0.388 ## 4 145 20.5 151. -5.59 0.136 5.96 0.0798 -1.01 ## 5 160 20.5 151. 9.41 0.136 4.71 0.226 1.70 ## 6 160 21.1 167. -7.02 0.125 5.61 0.113 -1.26 ## 7 200 22.1 194. 5.59 0.174 5.93 0.112 1.03 ## 8 272 25 274. -1.82 0.781 6.30 0.756 -0.651 ``` --- # Visoka poluga (*Roach vrsta*) <br> ```r mdl_roach %>% augment() %>% select(mass_g, length_cm, leverage = .hat) %>% arrange(desc(leverage)) %>% head() ``` ``` ## # A tibble: 6 x 3 ## mass_g length_cm leverage ## <dbl> <dbl> <dbl> ## 1 272 25 0.781 ## 2 110 19.1 0.276 ## 3 120 19.4 0.232 ## 4 200 22.1 0.174 ## 5 150 20.4 0.140 ## 6 145 20.5 0.136 ``` --- # Utjecaj (*Influence*) <br> <br> > "Utjecaj" pokazuje koliko će se model promijeniti ako se opservacija isključi iz uzorka. ```r cooks.distance(mdl_roach) ``` ``` ## 1 2 3 4 5 6 ## 0.037476486 0.001210061 0.012320303 0.079838727 0.226176142 0.113403425 ## 7 8 ## 0.111893500 0.755523739 ``` --- # .cooksd kolona <br> <br> ```r augment(mdl_roach) ``` ``` ## # A tibble: 8 x 8 ## mass_g length_cm .fitted .resid .hat .sigma .cooksd .std.resid ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 110 19.1 112. -2.25 0.276 6.43 0.0375 -0.444 ## 2 120 19.4 120. -0.468 0.232 6.53 0.00121 -0.0894 ## 3 150 20.4 148. 2.15 0.140 6.45 0.0123 0.388 ## 4 145 20.5 151. -5.59 0.136 5.96 0.0798 -1.01 ## 5 160 20.5 151. 9.41 0.136 4.71 0.226 1.70 ## 6 160 21.1 167. -7.02 0.125 5.61 0.113 -1.26 ## 7 200 22.1 194. 5.59 0.174 5.93 0.112 1.03 ## 8 272 25 274. -1.82 0.781 6.30 0.756 -0.651 ``` --- # Opservacije sa najvećim *utjecajem* ```r mdl_roach %>% augment() %>% select(mass_g, length_cm, cooks_dist = .cooksd) %>% arrange(desc(cooks_dist)) %>% head() ``` ``` ## # A tibble: 6 x 3 ## mass_g length_cm cooks_dist ## <dbl> <dbl> <dbl> ## 1 272 25 0.756 ## 2 160 20.5 0.226 ## 3 160 21.1 0.113 ## 4 200 22.1 0.112 ## 5 145 20.5 0.0798 ## 6 110 19.1 0.0375 ``` --- # Izbacivanje opservacija ```r # Makni opservacije sa visokom "polugom" roach_not_short <- roach %>% filter(length_cm != 12.9) ``` .left-code[ ```r ggplot(roach, aes(length_cm, mass_g)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + geom_smooth(method = "lm", se = FALSE,data = roach_not_short, color = "red") + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label13-out-1.png" style="display: block; margin: auto;" /> ] --- # Pregledaj reziduale .left-code[ ```r autoplot( mdl_roach, which = 4:6, nrow = 3, ncol = 1) ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label14-out-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle name: exemplar # VIŠESTRUKA LINEARNA REGRESIJA <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Više od jedne nezavisne ) --- # Pregledaj podatke ```r head(sample_n(fish,10),12) ``` ``` ## species mass_g length_cm height_cm ## 1 Common Bream 1000.0 33.5 44.5 ## 2 Perch 1000.0 39.8 26.4 ## 3 Perch 300.0 25.2 29.0 ## 4 Common Bream 725.0 31.8 40.0 ## 5 Smelt 6.7 9.3 16.1 ## 6 Smelt 12.2 12.1 16.5 ## 7 Roach 110.0 19.1 26.7 ## 8 Perch 260.0 25.4 24.8 ## 9 Perch 1000.0 41.1 26.8 ## 10 Perch 1015.0 37.0 29.2 ``` --- # 3D vizualizacija .left-code[ ```r library(plot3D) scatter3D(fish$length_cm, fish$height_cm, fish$mass_g) ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label15-out-1.png" style="display: block; margin: auto;" /> ] --- # 2D vizualizacija .left-code[ ```r ggplot(fish, aes(length_cm, height_cm, color = mass_g)) + geom_point() + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label16-out-1.png" style="display: block; margin: auto;" /> ] --- # 2D vizualizacija (*poboljšano*) .left-code[ ```r ggplot(fish, aes(length_cm, height_cm, color = mass_g)) + geom_point() + scale_color_viridis_c(option = "inferno") + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label17-out-1.png" style="display: block; margin: auto;" /> ] --- # Regresijski model (*2 varijable*) ```r mdl_mass_vs_both <- lm(mass_g ~ length_cm + height_cm, data = fish) mdl_mass_vs_both ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm + height_cm, data = fish) ## ## Coefficients: ## (Intercept) length_cm height_cm ## -558.59 33.64 4.18 ``` --- # Predviđanje .pull-left[ ```r explanatory_data <- expand_grid( length_cm = seq(5, 60, 5), height_cm = seq(2, 20, 2)) head(explanatory_data,4) ``` ``` ## # A tibble: 4 x 2 ## length_cm height_cm ## <dbl> <dbl> ## 1 5 2 ## 2 5 4 ## 3 5 6 ## 4 5 8 ``` ] .pull-right[ ```r prediction_data <- explanatory_data %>% mutate(mass_g = predict(mdl_mass_vs_both, explanatory_data)) head(prediction_data,4) ``` ``` ## # A tibble: 4 x 3 ## length_cm height_cm mass_g ## <dbl> <dbl> <dbl> ## 1 5 2 -382. ## 2 5 4 -374. ## 3 5 6 -365. ## 4 5 8 -357. ``` ] --- # Vizaliziraj predikciju .left-code[ ```r ggplot(fish,aes(length_cm, height_cm, color = mass_g)) + geom_point() + scale_color_viridis_c(option = "inferno") + geom_point(data = prediction_data, shape = 15, size = 3) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label18-out-1.png" style="display: block; margin: auto;" /> ] --- # Uključi interakciju varijabli ```r mdl_mass_vs_both_inter <- lm(mass_g ~ length_cm * height_cm, data = fish) mdl_mass_vs_both_inter ``` ``` ## ## Call: ## lm(formula = mass_g ~ length_cm * height_cm, data = fish) ## ## Coefficients: ## (Intercept) length_cm height_cm ## -54.0334 14.6141 -20.0871 ## length_cm:height_cm ## 0.8967 ``` --- # Predviđanje .pull-left[ ```r explanatory_data <- expand_grid( length_cm = seq(5, 60, 5), height_cm = seq(2, 20, 2)) head(explanatory_data,4) ``` ``` ## # A tibble: 4 x 2 ## length_cm height_cm ## <dbl> <dbl> ## 1 5 2 ## 2 5 4 ## 3 5 6 ## 4 5 8 ``` ] .pull-right[ ```r prediction_data <- explanatory_data %>% mutate(mass_g = predict(mdl_mass_vs_both_inter, explanatory_data)) head(prediction_data,4) ``` ``` ## # A tibble: 4 x 3 ## length_cm height_cm mass_g ## <dbl> <dbl> <dbl> ## 1 5 2 -12.2 ## 2 5 4 -43.4 ## 3 5 6 -74.6 ## 4 5 8 -106. ``` ] --- # Vizaliziraj predikciju .left-code[ ```r ggplot(fish,aes(length_cm, height_cm, color = mass_g)) + geom_point() + scale_color_viridis_c(option = "inferno") + geom_point(data = prediction_data, shape = 15, size = 3) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label19-out-1.png" style="display: block; margin: auto;" /> ] --- # Više od dvije nezavisne varijable .left-code[ ```r ggplot(fish,aes(length_cm, height_cm, color = mass_g)) + geom_point() + scale_color_viridis_c(option = "inferno") + facet_wrap(vars(species)) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label20-out-1.png" style="display: block; margin: auto;" /> ] --- # Interakcijske mogućnosti ###### Bez interakcija ```r lm(mass_g ~ length_cm + height_cm + species + 0, data = fish) ``` ###### Dvosmjerna interakcija ```r lm(mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + 0, data = fish) ``` ###### Trosmjerna interakcija ```r lm(mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0, data = fish) ``` --- # Interakcijske mogućnosti (*nastavak*) ###### Sve interakcije ```r # Jedna mogućnost lm(mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0, data = fish) ``` ```r # Druga mogućnost lm(mass_g ~ length_cm * height_cm * species + 0, data = fish) ``` --- # Interakcijske mogućnosti (*nastavak*) ###### Dvostruke interakcije ```r lm(mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + 0, data = fish) ``` ```r lm(mass_g ~ (length_cm + height_cm + species) ^ 2 + 0, data = fish) ``` ```r lm(mass_g ~ I(length_cm) ^ 2 + height_cm + species + 0, data = fish) ``` --- # Predviđanje ```r # Procijeni model mdl_mass_vs_all <- lm(mass_g ~ length_cm * height_cm * species * 0, data = fish) # Napravi podatke za predikciju explanatory_data <- expand_grid(length_cm = seq(5, 60, 6), height_cm = seq(2, 20, 2), species = unique(fish$species)) # Provedi predikciju prediction_data <- explanatory_data %>% mutate(mass_g = predict(mdl_mass_vs_all, explanatory_data)) ``` --- # Vizualizacija predikcije .left-code[ ```r ggplot(fish,aes(length_cm, height_cm, color = mass_g)) + geom_point() + scale_color_viridis_c(option = "inferno") + facet_wrap(vars(species)) + geom_point( data = prediction_data, size = 3, shape = 15) + ggtitle("Odnos duljine i težine") ``` ] .right-plot[ <img src="07_REG_files/figure-html/plot-label21-out-1.png" style="display: block; margin: auto;" /> ] --- class: inverse, center, middle # Hvala na pažnji <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> (Sljedeće predavanje: Survival analiza)