a broad class of statistical models that extend linear and generalized linear models to handle data where observations are measured within discrete groups such as field sites; years or other temporal blocks; individuals that are observed multiple times; genotypes; species; etc. They can be thought of (equivalently) as (1) accounting for the correlation among observations from the same group; (2) estimating the variability among groups, or (3) parsimoniously estimating the effects of groups. They are most useful when the experimental or observational design includes a large number of groups with varying numbers of observations per group.
g
(must be discrete!) and a varying term f
(f | g)
in most R MM packagesf
(e.g. slope) varies across groups defined by g
(e.g. species; f
= 1
→ intercept)f
for each group g
via shrinkage estimation/partial pooling (empirical Bayes, joint Bayesian prior, …)Some rules of thumb
cf. Crawley (2002), Gelman (2005)
lme4
; gamm4
to deal with spatial autocorrelationggplot2
, maybe some tidyversebroom.mixed
, DHARMa
, car
(using ::
notation as appropriate)Nested: sub-unit IDs only measured within a single larger unit. e.g.: Class1 in School1 independent of Class1 in School2: (1|School/Class)
Crossed: sub-unit IDs can be measured in multiple larger units. e.g. class = “history”, “geography”, “mathematics”: (1|School) + (1|Class)
Unique coding: removes ambiguity
Robert Long, Cross Validated
n
parameters (intercept + slope = 2)
n*(n+1)/2
parametersn
=5 → 15 for this examplen
independent effects (||
shortcut); only n
parameters
knitr::include_graphics("pix/gelman_hill_complexity.png")
Gelman and Hill (2006)
knitr::include_graphics("pix/uriarte_yackulic_complexity.png")
Uriarte and Yackulic (2009)
Load packages up front, note what they’re used for …
library(tidyverse); theme_set(theme_bw())
library(lme4)
library(gamm4)
## diagnostics
library(DHARMa)
library(car) ## influencePlot
## extraction/graphics
library(broom.mixed)
library(emmeans)
library(dotwhisker)
library(gridExtra)
## from GitHub: see utils.R
library(r2glmm) ## bbolker/r2glmm
library(remef) ## hohenstein/remef
source("utils.R")
source("gamm4_utils.R")
dd <- readRDS("data/ecoreg.rds")
Single-level model (biomes), intercept variation only. All pairwise interactions of main variables ((...)^2
), plus (log of) ecoregion area:
m1 <- lmer(mbirds_log ~ scale(log(area_km2)) + (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
(1 | biome),
data = dd)
## "Feat" = "fire eaten"
## may get
## Warning message:
## Some predictor variables are on very different scales: consider rescaling
Best practice: check diagnostics as early as possible (before summary()
, coeff plots) to reduce snooping.
plot(m1, type = c("p", "smooth"))
## heteroscedasticity
plot(m1, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth")) ## slight heteroscedasticity?
car::influencePlot(m1)
plot(DHARMa::simulateResiduals(m1)) ## we'll get back to this
## basic coefficient plot
dotwhisker::dwplot(m1, effects="fixed") + geom_vline(xintercept = 0, lty = 2)
## ordered coefficient plot (from utils.R)
dwplot_ordered(m1, effects = "fixed")
update()
is your friendm2 <- update(m1, . ~ . - (1|biome) + (1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome))
## compare plots
dwplot_ordered(list(intercept_only = m1, random_slopes = m2), effects = "fixed")
Now go to the (almost) maximal model; variation of main effects at all three geographic levels
## ~ 30 seconds
max_model <- lmer(mbirds_log ~ scale(log(area_km2)) + (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
(Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome) +
(Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | flor_realms) +
(Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome_FR),
data = dd,
## for speed/skip convergence warnings
control = lmerControl(calc.derivs = FALSE))
isSingular(max_model)
## [1] TRUE
lwr <- getME(max_model, "lower"); theta <- getME(max_model, "theta"); min(theta[lwr == 0])
## [1] 0
VarCorr(max_model)
## Groups Name Std.Dev. Corr
## biome_FR (Intercept) 0.108634
## Feat_log_sc 0.028956 -0.998
## Feat_cv_sc 0.058083 0.968 -0.975
## NPP_log_sc 0.112312 -0.011 -0.027 0.239
## NPP_cv_sc 0.036536 -0.987 0.991 -0.996 -0.152
## biome (Intercept) 0.126970
## Feat_log_sc 0.011065 -1.000
## Feat_cv_sc 0.050601 0.936 -0.936
## NPP_log_sc 0.148002 -0.468 0.468 -0.749
## NPP_cv_sc 0.091323 -0.621 0.621 -0.856 0.983
## flor_realms (Intercept) 0.229352
## Feat_log_sc 0.130501 0.561
## Feat_cv_sc 0.072461 0.396 0.440
## NPP_log_sc 0.119123 -0.358 -0.964 -0.512
## NPP_cv_sc 0.031699 0.249 -0.660 -0.093 0.791
## Residual 0.193239
allFit()
, diagnose/evaluate differences in effects of interestfor
loop over tablereformulate()
is usefulall_vars <- "1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc"
v1 <- expand.grid(c("1 | ", paste(all_vars, "|"), paste(all_vars, "||")),
c("biome", "flor_realms", "biome_FR"))
v2 <- sprintf("(%s)", apply(v1, 1, paste, collapse = " "))
## use cross_df instead of expand.grid, want chars
v3 <- cross_df(list(biome = v2[1:3], FR = v2[4:6], biome_FR = v2[7:9]))
v3[1,]
## # A tibble: 1 × 3
## biome FR biome_FR
## <chr> <chr> <chr>
## 1 (1 | biome) (1 | flor_realms) (1 | biome_FR)
model_list <- list()
p1 <- proc.time()
for (i in 1:nrow(v3)) {
cat(i, unlist(v3[i,]), "\n")
form <- reformulate(
c(sprintf("(%s)^2", all_vars), ## main effects and 2-way interax
"log(area_km2)", ## plus ecoregion area
unlist(v3[i,])), ## plus specified REs
response = "mbirds_log")
model_list[[i]] <- lmer(form, data = dd) ## fit model
}
saveRDS(model_list, file = "data/model_list.rds")
proc.time() - p1 ## about 10 minutes
model_list <- readRDS("data/model_list.rds")
aic_vec <- sapply(model_list, AIC)
is_sing <- sapply(model_list, isSingular)
conv_warn <- sapply(model_list, has_warning)
tibble(model=1:27, aic_vec, is_sing, conv_warn) %>% arrange(aic_vec)
## # A tibble: 27 × 4
## model aic_vec is_sing conv_warn
## <int> <dbl> <lgl> <lgl>
## 1 19 35.7 FALSE FALSE
## 2 16 38.6 TRUE FALSE
## 3 25 39.7 TRUE FALSE
## 4 10 40.9 TRUE FALSE
## 5 21 43.2 TRUE FALSE
## 6 27 46.2 TRUE FALSE
## 7 18 46.8 TRUE FALSE
## 8 12 48.4 TRUE FALSE
## 9 9 49.5 FALSE FALSE
## 10 13 51.2 FALSE TRUE
## # … with 17 more rows
best_index <- which(aic_vec == min(aic_vec) & !is_sing & !conv_warn)
best_model <- model_list[[best_index]]
best_model
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## mbirds_log ~ (1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
## log(area_km2) + (1 | biome) + (1 | flor_realms) + ((1 | biome_FR) +
## (0 + Feat_log_sc | biome_FR) + (0 + Feat_cv_sc | biome_FR) +
## (0 + NPP_log_sc | biome_FR) + (0 + NPP_cv_sc | biome_FR))
## Data: dd
## REML criterion at convergence: -4.3321
## Random effects:
## Groups Name Std.Dev.
## biome_FR NPP_cv_sc 0.07758
## biome_FR.1 NPP_log_sc 0.13922
## biome_FR.2 Feat_cv_sc 0.08691
## biome_FR.3 Feat_log_sc 0.05442
## biome_FR.4 (Intercept) 0.13560
## biome (Intercept) 0.10908
## flor_realms (Intercept) 0.23103
## Residual 0.19400
## Number of obs: 620, groups: biome_FR, 55; biome, 14; flor_realms, 6
## Fixed Effects:
## (Intercept) Feat_log_sc Feat_cv_sc
## 5.714624 0.091608 -0.006491
## NPP_log_sc NPP_cv_sc log(area_km2)
## 0.260378 0.066486 -0.029917
## Feat_log_sc:Feat_cv_sc Feat_log_sc:NPP_log_sc Feat_log_sc:NPP_cv_sc
## -0.011307 -0.051931 -0.047096
## Feat_cv_sc:NPP_log_sc Feat_cv_sc:NPP_cv_sc NPP_log_sc:NPP_cv_sc
## 0.013720 -0.036622 0.006426
plot(best_model, type = c("p", "smooth"))
## heteroscedasticity
plot(best_model, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth"))
car::influencePlot(best_model)
plot(sr <- simulateResiduals(best_model))
Computes residuals at population level (usually a good default but not necessarily appropriate here?)
DHARMa::plotResiduals(sr, dd$NPP_log_sc)
## https://www.stat.ubc.ca/~jenny/STAT545A/block16_colorsLattice.html
line_style <- list(plot.line = list(alpha=1, col = "red", lty = 1, lwd = 2))
pop_resids <- model.response(model.frame(best_model)) - predict(best_model, re.form = NA)
p1 <- lattice::xyplot(pop_resids ~ dd$NPP_log_sc, cex = 1, type = c("p", "smooth"),
main = "pop-level resids (à la DHARMa)",
par.settings = line_style)
p2 <- plot(best_model, resid(.) ~ NPP_log_sc, type = c("p", "smooth"),
main = "unit-level resids (à la lme4)",
par.settings = line_style)
gridExtra::grid.arrange(p1, p2, nrow = 1)
(lowest resid is Palearctic, lon=24, lat=
Easiest to explore spatial correlations graphically:
dd$res1 <- residuals(best_model)
ggplot(dd, aes(x, y, colour = res1, size = abs(res1))) +
geom_point() +
scale_colour_gradient2() +
scale_size(range=c(2,7))
gamm4
Quickest (not necessarily best) fix for spatial autocorr; bs="sos"
is a spherical smooth
gamm4_form <- update(formula(best_model), . ~ . + s(y, x, bs="sos"))
best_gamm4 <- gamm4(formula = nobars(gamm4_form),
random = as.formula(reOnly(gamm4_form)),
data = dd)
class(best_gamm4) <- c("gamm4", "list") ## so we can tidy etc. (using gamm4_utils.R)
dwplot_ordered(list(gamm4 = best_gamm4, lme4 = best_model), effects="fixed")
dotwhisker::dwplot()
as shown before, or broom.mixed::tidy()
+ ggplot
tweaking
rsqvals <- r2glmm::r2beta(best_model, method = "sgv", partial = TRUE)
## set (reverse)factor order (or forcats::fct_inorder())
rsqvals$Effect <- factor(rsqvals$Effect, levels = rev(rsqvals$Effect))
ggplot(rsqvals, aes(x=Rsq, xmin = lower.CL, xmax = upper.CL, y = Effect)) +
geom_pointrange()
emmeans
, effects
and ggeffects
package are useful for extracting predictionsre.form
)NPPvec <- with(dd, seq(min(NPP_log_sc), max(NPP_log_sc), length.out = 51))
pred <- emmeans::emmeans(best_model, ~NPP_log_sc, at = list(NPP_log_sc = NPPvec))
## NOTE: Results may be misleading due to involvement in interactions
gg0 <- ggplot(as.data.frame(pred),
aes(NPP_log_sc, emmean)) + geom_line() +
geom_ribbon(aes(ymin=lower.CL, ymax = upper.CL), alpha = 0.2, colour = NA)
gg1 <- gg0 + geom_point(data = dd, aes(y = mbirds_log, colour = biome))
print(gg1)
Can also back-transform, unscale, etc. (see e.g. Stack Overflow Q1, Stack Overflow Q2)
Compute partial residuals and display:
dd$remef <- remef::remef(best_model, fix = "NPP_log_sc")
grid.arrange(gg1 + ggtitle("full"),
gg0 + geom_point(data = dd, aes(y = remef, colour = biome)) +
scale_y_continuous(limits = range(dd$mbirds_log)) +
ggtitle("partial"),
nrow=1)
dotplot
, qqmath
methodsas.data.frame(ranef())
or broom.mixed::tidy(model, effects = "ran_vals")
re_plots <- lattice::dotplot(ranef(best_model))
print(re_plots$biome_FR)
print(grid.arrange(grobs = re_plots[2:3], nrow = 1))
glmer
glmmTMB
, brms
GLMMadaptive
: or go Bayesian)blme
, MCMCglmm
, rstanarm
, brms
…beyond intercept-only and diagonal structures
(1|g/f)
instead of (f|g)
for factor termscs()
in glmmTMB
, other possibilities in nlme::lme
, MCMCglmm
, brms
, …mgcv
: Markov random field, etc.brms
: can use all mgcv
smoothsINLA
: probably best for spatial/temporal, but a whole other worldglmmTMB
: simple space/time stuff with complex GLMMssystem.time(af <- allFit(model_list[[13]],
parallel = "multicore",
ncpus = 8))
saveRDS(af, "allfit.rds")
if (file.exists("allfit.rds")) {
af <- readRDS("allfit.rds")
summary(af)
}
Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.
Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.
Bolker, Benjamin M. 2015. “Linear and Generalized Linear Mixed Models.” In Ecological Statistics: Contemporary Theory and Application, edited by Gordon A. Fox, Simoneta Negrete-Yankelevich, and Vinicio J. Sosa. Oxford University Press.
Bolker, Benjamin M., Mollie E. Brooks, Connie J. Clark, Shane W. Geange, John R. Poulsen, M. Henry H. Stevens, and Jada-Simone S. White. 2009. “Generalized Linear Mixed Models: A Practical Guide for Ecology and Evolution.” Trends in Ecology & Evolution 24: 127–35. https://doi.org/10.1016/j.tree.2008.10.008.
Crawley, Michael J. 2002. Statistical Computing: An Introduction to Data Analysis Using S-PLUS. John Wiley & Sons.
Gelman, Andrew. 2005. “Analysis of Variance: Why It Is More Important Than Ever.” Annals of Statistics 33 (1): 1–53. https://doi.org/doi:10.1214/009053604000001048.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, England: Cambridge University Press. http://www.stat.columbia.edu/~gelman/arm/.
Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.
McElreath, Richard. 2015. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton: Chapman; Hall/CRC.
Olson, D. M., E. Dinerstein, E. D. Wikramanayake, N. D. Burgess, G. V. Powell, E. C. Underwood, J. A. D’Amico, et al. 2001. “Terrestrial Ecoregions of the World: A New Map of Life on Earth: A New Global Map of Terrestrial Ecoregions Provides an Innovative Tool for Conserving Biodiversity.” BioScience 51 (11): 933–38.
Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients.” Methods in Ecology and Evolution 1: 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.
Schielzeth, Holger, and Wolfgang Forstmeier. 2009. “Conclusions Beyond Support: Overconfident Estimates in Mixed Models.” Behavioral Ecology 20 (2): 416–20. https://doi.org/10.1093/beheco/arn145.
Uriarte, María, and Charles B. Yackulic. 2009. “Preaching to the Unconverted.” Ecological Applications 19 (3): 592–96. http://www.jstor.org/stable/27646001.
Wood, Simon. 2017. Generalized Additive Models: An Introduction with R. 2d ed. CRC Texts in Statistical Science. Chapman & Hall.
Zuur, Alain F., Elena N. Ieno, Neil J. Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer.
“relatively large units of land containing a distinct assemblage of natural communities and species, with boundaries that approximate the original extent of natural communities prior to major land-use change”↩︎