Overview (mixed models essentials and subject-matter intro)

Mixed models essentials

What are mixed models good for?

a broad class of statistical models that extend linear and generalized linear models to handle data where observations are measured within discrete groups such as field sites; years or other temporal blocks; individuals that are observed multiple times; genotypes; species; etc. They can be thought of (equivalently) as (1) accounting for the correlation among observations from the same group; (2) estimating the variability among groups, or (3) parsimoniously estimating the effects of groups. They are most useful when the experimental or observational design includes a large number of groups with varying numbers of observations per group.

What is a random effect anyway?

  • A grouping variable g (must be discrete!) and a varying term f
  • denoted as (f | g) in most R MM packages
  • "the effect of f (e.g. slope) varies across groups defined by g (e.g. species; f = 1 → intercept)
  • effects of f for each group g via shrinkage estimation/partial pooling (empirical Bayes, joint Bayesian prior, …)

When should you use a random effect?

Some rules of thumb

  • don’t want to test hypotheses about differences between groups
  • do want to quantify the variability across groups
  • do want to make predictions for unobserved groups
  • do want to combine information across groups
  • do have variation in information per group (samples, noise)
  • do have groups randomly sampled from a population
  • do have a categorical nuisance variable
  • do have exchangeable groups
  • do have “many” (> 5-6) groups; small \(n\) per group; unbalanced groups

cf. Crawley (2002), Gelman (2005)

What tools?

  • Mostly lme4; gamm4 to deal with spatial autocorrelation
  • Some ggplot2, maybe some tidyverse
  • diagnostics etc.; broom.mixed, DHARMa, car (using :: notation as appropriate)
  • Other (G)LMM-adjacent packages listed here

Science intro

What is this project?

  • Paper in progress/review with Max Moritz (UCSB) and Enric Batllori Presas (Univ Barcelona)
  • Use synthetic global-scale databases of species richness, primary productivity, wildfire to quantify relationships between (NPP, fire consumption) and richness

Pictures?

Analytical goals?

  • estimate effects of NPP (g C m\(^2\)/year), fire consumption (% of NPP), interannual CV of NPP and fire consumption, and their interactions, on species richness
  • at the global level and possibly variation across different geographic scales

need for mixed models: geographic variation

  • realms (large-scale) (e.g. “Neotropics”)
  • biomes (medium-scale, environmental) (e.g. “tropical grassland”)
  • biome × realm interaction (“tropical grasslands in the Neotropics”)
  • “ecoregion”: sampling unit (Olson et al. 2001)1

nesting and crossing of random effects

Nested: sub-unit IDs only measured within a single larger unit. e.g.: Class1 in School1 independent of Class1 in School2: (1|School/Class)

Crossed: sub-unit IDs can be measured in multiple larger units. e.g. class = “history”, “geography”, “mathematics”: (1|School) + (1|Class)

Unique coding: removes ambiguity

Robert Long, Cross Validated

random effects terms

  • “random effect of X” usually means intercept variation only, by default (but see Schielzeth and Forstmeier (2009))
    • one parameter (variance/std dev of intercept across groups)
  • RE terms with n parameters (intercept + slope = 2)
    • n*(n+1)/2 parameters
    • n=5 → 15 for this example
  • RE terms with n independent effects (|| shortcut); only n parameters
    • 5 for this example

more preliminaries

  • we’ll work with log-scaled NPP/fire, raw CVs, all centered (Schielzeth 2010)
  • main effects all evaluated at geometric mean of other variables
  • coefficients are approximately elasticities

before we start coding …

knitr::include_graphics("pix/gelman_hill_complexity.png")

Gelman and Hill (2006)


knitr::include_graphics("pix/uriarte_yackulic_complexity.png")

Uriarte and Yackulic (2009)

Coding

load packages

Load packages up front, note what they’re used for …

library(tidyverse); theme_set(theme_bw())
library(lme4)
library(gamm4)
## diagnostics
library(DHARMa)
library(car) ## influencePlot
## extraction/graphics
library(broom.mixed)
library(emmeans)
library(dotwhisker)
library(gridExtra)
## from GitHub: see utils.R
library(r2glmm) ## bbolker/r2glmm
library(remef)  ## hohenstein/remef
source("utils.R")
source("gamm4_utils.R")

load data

dd <- readRDS("data/ecoreg.rds")

simplest model

Single-level model (biomes), intercept variation only. All pairwise interactions of main variables ((...)^2), plus (log of) ecoregion area:

m1 <- lmer(mbirds_log ~ scale(log(area_km2)) + (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
             (1 | biome),
           data = dd)
## "Feat" = "fire eaten"
## may get
## Warning message:
## Some predictor variables are on very different scales: consider rescaling 

diagnostics

Best practice: check diagnostics as early as possible (before summary(), coeff plots) to reduce snooping.

plot(m1, type = c("p", "smooth"))

## heteroscedasticity
plot(m1, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth")) ## slight heteroscedasticity?

car::influencePlot(m1)

plot(DHARMa::simulateResiduals(m1))  ## we'll get back to this

coefficient plots

## basic coefficient plot
dotwhisker::dwplot(m1, effects="fixed") + geom_vline(xintercept = 0, lty = 2)

## ordered coefficient plot (from utils.R)
dwplot_ordered(m1, effects = "fixed")

add RE terms

  • allow the main effects to vary across biomes (in a correlated way)
  • update() is your friend
m2 <- update(m1, . ~ . - (1|biome) + (1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome))
## compare plots
dwplot_ordered(list(intercept_only = m1, random_slopes = m2), effects = "fixed")

three-level model

Now go to the (almost) maximal model; variation of main effects at all three geographic levels

## ~ 30 seconds
max_model <- lmer(mbirds_log ~ scale(log(area_km2)) + (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +
                    (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome) +
                    (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | flor_realms) +
                    (Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc | biome_FR),
                  data = dd,
                  ## for speed/skip convergence warnings
                  control = lmerControl(calc.derivs = FALSE))

checking the covariance parameter estimates

isSingular(max_model)
## [1] TRUE
lwr <- getME(max_model, "lower"); theta <- getME(max_model, "theta"); min(theta[lwr == 0])
## [1] 0
VarCorr(max_model)
##  Groups      Name        Std.Dev. Corr                       
##  biome_FR    (Intercept) 0.108634                            
##              Feat_log_sc 0.028956 -0.998                     
##              Feat_cv_sc  0.058083  0.968 -0.975              
##              NPP_log_sc  0.112312 -0.011 -0.027  0.239       
##              NPP_cv_sc   0.036536 -0.987  0.991 -0.996 -0.152
##  biome       (Intercept) 0.126970                            
##              Feat_log_sc 0.011065 -1.000                     
##              Feat_cv_sc  0.050601  0.936 -0.936              
##              NPP_log_sc  0.148002 -0.468  0.468 -0.749       
##              NPP_cv_sc   0.091323 -0.621  0.621 -0.856  0.983
##  flor_realms (Intercept) 0.229352                            
##              Feat_log_sc 0.130501  0.561                     
##              Feat_cv_sc  0.072461  0.396  0.440              
##              NPP_log_sc  0.119123 -0.358 -0.964 -0.512       
##              NPP_cv_sc   0.031699  0.249 -0.660 -0.093  0.791
##  Residual                0.193239

maximal model

model simplification

  • avoid singularity/non-convergence (Barr et al. 2013; Schielzeth and Forstmeier 2009)
  • data-driven (AIC, p-value) (Bates et al. 2015; Matuschek et al. 2017)

non-convergence vs singularity

  • convergence warnings: historical reasons
  • very unreliable (and slow!) for large data sets (>10,000 observations)
  • gold standard: run allFit(), diagnose/evaluate differences in effects of interest

AIC table/strategy

  • for loop over table
  • fitting many models is a code smell but sometimes hard to avoid
  • reformulate() is useful
all_vars <- "1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc"
v1 <- expand.grid(c("1 | ", paste(all_vars, "|"), paste(all_vars, "||")),
                  c("biome", "flor_realms", "biome_FR"))
v2 <- sprintf("(%s)", apply(v1, 1, paste, collapse = " "))
## use cross_df instead of expand.grid, want chars
v3 <- cross_df(list(biome = v2[1:3], FR = v2[4:6], biome_FR = v2[7:9]))
v3[1,]
## # A tibble: 1 × 3
##   biome        FR                 biome_FR       
##   <chr>        <chr>              <chr>          
## 1 (1 |  biome) (1 |  flor_realms) (1 |  biome_FR)
model_list <- list()
p1 <- proc.time()
for (i in 1:nrow(v3)) {
  cat(i, unlist(v3[i,]), "\n")
  form <- reformulate(
      c(sprintf("(%s)^2", all_vars), ## main effects and 2-way interax
        "log(area_km2)",             ## plus ecoregion area
        unlist(v3[i,])),             ## plus specified REs
      response = "mbirds_log")
  model_list[[i]] <- lmer(form, data = dd) ## fit model
}
saveRDS(model_list, file = "data/model_list.rds")
proc.time() - p1 ## about 10 minutes

extract summary info from models, take a look

model_list <- readRDS("data/model_list.rds")
aic_vec <- sapply(model_list, AIC)
is_sing <- sapply(model_list, isSingular)
conv_warn <- sapply(model_list, has_warning)
tibble(model=1:27, aic_vec, is_sing, conv_warn) %>% arrange(aic_vec)
## # A tibble: 27 × 4
##    model aic_vec is_sing conv_warn
##    <int>   <dbl> <lgl>   <lgl>    
##  1    19    35.7 FALSE   FALSE    
##  2    16    38.6 TRUE    FALSE    
##  3    25    39.7 TRUE    FALSE    
##  4    10    40.9 TRUE    FALSE    
##  5    21    43.2 TRUE    FALSE    
##  6    27    46.2 TRUE    FALSE    
##  7    18    46.8 TRUE    FALSE    
##  8    12    48.4 TRUE    FALSE    
##  9     9    49.5 FALSE   FALSE    
## 10    13    51.2 FALSE   TRUE     
## # … with 17 more rows

find best-AIC, non-singular model

best_index <- which(aic_vec == min(aic_vec) & !is_sing & !conv_warn)
best_model <- model_list[[best_index]]

check it out

best_model
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## mbirds_log ~ (1 + Feat_log_sc + Feat_cv_sc + NPP_log_sc + NPP_cv_sc)^2 +  
##     log(area_km2) + (1 | biome) + (1 | flor_realms) + ((1 | biome_FR) +  
##     (0 + Feat_log_sc | biome_FR) + (0 + Feat_cv_sc | biome_FR) +  
##     (0 + NPP_log_sc | biome_FR) + (0 + NPP_cv_sc | biome_FR))
##    Data: dd
## REML criterion at convergence: -4.3321
## Random effects:
##  Groups      Name        Std.Dev.
##  biome_FR    NPP_cv_sc   0.07758 
##  biome_FR.1  NPP_log_sc  0.13922 
##  biome_FR.2  Feat_cv_sc  0.08691 
##  biome_FR.3  Feat_log_sc 0.05442 
##  biome_FR.4  (Intercept) 0.13560 
##  biome       (Intercept) 0.10908 
##  flor_realms (Intercept) 0.23103 
##  Residual                0.19400 
## Number of obs: 620, groups:  biome_FR, 55; biome, 14; flor_realms, 6
## Fixed Effects:
##            (Intercept)             Feat_log_sc              Feat_cv_sc  
##               5.714624                0.091608               -0.006491  
##             NPP_log_sc               NPP_cv_sc           log(area_km2)  
##               0.260378                0.066486               -0.029917  
## Feat_log_sc:Feat_cv_sc  Feat_log_sc:NPP_log_sc   Feat_log_sc:NPP_cv_sc  
##              -0.011307               -0.051931               -0.047096  
##  Feat_cv_sc:NPP_log_sc    Feat_cv_sc:NPP_cv_sc    NPP_log_sc:NPP_cv_sc  
##               0.013720               -0.036622                0.006426

diagnostics

basics

plot(best_model, type = c("p", "smooth"))
## heteroscedasticity
plot(best_model, sqrt(abs(resid(.))) ~ fitted(.), type = c("p", "smooth"))
car::influencePlot(best_model)
plot(sr <- simulateResiduals(best_model))

What’s going on with DHARMa?

Computes residuals at population level (usually a good default but not necessarily appropriate here?)

DHARMa::plotResiduals(sr, dd$NPP_log_sc)

## https://www.stat.ubc.ca/~jenny/STAT545A/block16_colorsLattice.html
line_style <- list(plot.line = list(alpha=1, col = "red", lty = 1, lwd = 2))
pop_resids <- model.response(model.frame(best_model)) - predict(best_model, re.form = NA)
p1 <- lattice::xyplot(pop_resids ~ dd$NPP_log_sc, cex = 1, type = c("p", "smooth"),
                      main = "pop-level resids (à la DHARMa)",
                      par.settings = line_style)
p2 <- plot(best_model, resid(.) ~ NPP_log_sc, type = c("p", "smooth"),
           main = "unit-level resids (à la lme4)",
           par.settings = line_style)
gridExtra::grid.arrange(p1, p2, nrow = 1)

(lowest resid is Palearctic, lon=24, lat=

spatial correlation

Easiest to explore spatial correlations graphically:

dd$res1 <- residuals(best_model)
ggplot(dd, aes(x, y, colour = res1, size = abs(res1))) +
  geom_point() +
  scale_colour_gradient2() +
  scale_size(range=c(2,7))

  • Can see sub-realm geographic variation
  • Could work on spatial variograms, Moran’s I statistic, etc..

refit with gamm4

Quickest (not necessarily best) fix for spatial autocorr; bs="sos" is a spherical smooth

gamm4_form  <- update(formula(best_model), . ~ . + s(y, x, bs="sos"))
best_gamm4 <- gamm4(formula = nobars(gamm4_form),
                    random = as.formula(reOnly(gamm4_form)),
                    data = dd)
class(best_gamm4) <- c("gamm4", "list") ## so we can tidy etc. (using gamm4_utils.R)

compare results

dwplot_ordered(list(gamm4 = best_gamm4, lme4 = best_model), effects="fixed")

display/description

coefficient plots

  • with dotwhisker::dwplot() as shown before, or broom.mixed::tidy() + ggplot tweaking
  • scaled or unscaled coefficients?

\(R^2\) and partial \(R^2\) values

rsqvals <- r2glmm::r2beta(best_model, method = "sgv", partial = TRUE)
## set (reverse)factor order (or forcats::fct_inorder())
rsqvals$Effect <- factor(rsqvals$Effect, levels = rev(rsqvals$Effect))
ggplot(rsqvals, aes(x=Rsq, xmin = lower.CL, xmax = upper.CL, y = Effect)) +
  geom_pointrange()

all taxa (from the paper)

prediction plots

  • emmeans, effects and ggeffects package are useful for extracting predictions
  • need to be mindful of whether random effects components are included in the predictions (re.form)
NPPvec <- with(dd, seq(min(NPP_log_sc), max(NPP_log_sc), length.out = 51))
pred <- emmeans::emmeans(best_model, ~NPP_log_sc, at = list(NPP_log_sc = NPPvec))
## NOTE: Results may be misleading due to involvement in interactions
gg0 <- ggplot(as.data.frame(pred),
              aes(NPP_log_sc, emmean)) + geom_line() +
  geom_ribbon(aes(ymin=lower.CL, ymax = upper.CL), alpha = 0.2, colour = NA)
gg1 <- gg0 + geom_point(data = dd, aes(y = mbirds_log, colour = biome))
print(gg1)

Can also back-transform, unscale, etc. (see e.g. Stack Overflow Q1, Stack Overflow Q2)

partial residuals

Compute partial residuals and display:

dd$remef <- remef::remef(best_model, fix = "NPP_log_sc")
grid.arrange(gg1 + ggtitle("full"),
             gg0 + geom_point(data = dd, aes(y = remef, colour = biome)) +
             scale_y_continuous(limits = range(dd$mbirds_log)) +
             ggtitle("partial"),
             nrow=1)

random effects

  • Can extract, plot, etc. examine random effects
  • Built-in dotplot, qqmath methods
  • Or use as.data.frame(ranef()) or broom.mixed::tidy(model, effects = "ran_vals")
  • For this problem, REs were interesting but too noisy to draw conclusions from
re_plots <- lattice::dotplot(ranef(best_model))
print(re_plots$biome_FR)

print(grid.arrange(grobs = re_plots[2:3], nrow = 1))

extras

further reading

  • Gelman and Hill (2006); McElreath (2015); Bolker (2015) (see worked examples); Wood (2017)
  • with caveats: Zuur et al. (2009) (and others); Bolker et al. (2009)

GLMMs

  • the entire layer of GLMMs can be added
  • standard families (Poisson/Gamma/binomial etc.) via glmer
  • zero-inflation, Tweedie, neg binom … glmmTMB, brms
  • more complex machinery (Gauss-Hermite quadrature, GLMMadaptive: or go Bayesian)
  • observation-level random effects

Bayesian approaches

  • slower but you get more
  • regularization is easy
  • informative priors
  • better handling of uncertainty, especially for complex models

regularization

  • if you want to fit a very complex model but prevent singular fits etc., you need to regularize the model somehow (i.e. force it to be sensible)
  • this is easiest to do in a Bayesian or semi-Bayesian framework
  • blme, MCMCglmm, rstanarm, brms

model simplification/covariance structures

beyond intercept-only and diagonal structures

  • Barr et al. (2013) also discusses slope-only models
  • compound symmetry (all-equal correlations)
    • (1|g/f) instead of (f|g) for factor terms
    • cs() in glmmTMB, other possibilities in nlme::lme, MCMCglmm, brms, …
  • factor-analytic models (Maeve McGillycuddy, UNSW)

autocorrelation

  • mgcv: Markov random field, etc.
  • brms: can use all mgcv smooths
  • INLA: probably best for spatial/temporal, but a whole other world
  • glmmTMB: simple space/time stuff with complex GLMMs

REML vs ML

  • not as important as people think?
  • analogous to dividing by \(n-1\) instead of \(n\) when computing variance
  • REML better for RE covariance estimates
  • ML required for model comparison

convergence example

system.time(af <- allFit(model_list[[13]],
                         parallel = "multicore",
                         ncpus = 8))
saveRDS(af, "allfit.rds")
if (file.exists("allfit.rds")) {
  af <- readRDS("allfit.rds")
  summary(af)
}

mixed-model ecosystem in R

Google sheets list

references

references

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.

Bolker, Benjamin M. 2015. “Linear and Generalized Linear Mixed Models.” In Ecological Statistics: Contemporary Theory and Application, edited by Gordon A. Fox, Simoneta Negrete-Yankelevich, and Vinicio J. Sosa. Oxford University Press.

Bolker, Benjamin M., Mollie E. Brooks, Connie J. Clark, Shane W. Geange, John R. Poulsen, M. Henry H. Stevens, and Jada-Simone S. White. 2009. “Generalized Linear Mixed Models: A Practical Guide for Ecology and Evolution.” Trends in Ecology & Evolution 24: 127–35. https://doi.org/10.1016/j.tree.2008.10.008.

Crawley, Michael J. 2002. Statistical Computing: An Introduction to Data Analysis Using S-PLUS. John Wiley & Sons.

Gelman, Andrew. 2005. “Analysis of Variance: Why It Is More Important Than Ever.” Annals of Statistics 33 (1): 1–53. https://doi.org/doi:10.1214/009053604000001048.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, England: Cambridge University Press. http://www.stat.columbia.edu/~gelman/arm/.

Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.

McElreath, Richard. 2015. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boca Raton: Chapman; Hall/CRC.

Olson, D. M., E. Dinerstein, E. D. Wikramanayake, N. D. Burgess, G. V. Powell, E. C. Underwood, J. A. D’Amico, et al. 2001. “Terrestrial Ecoregions of the World: A New Map of Life on Earth: A New Global Map of Terrestrial Ecoregions Provides an Innovative Tool for Conserving Biodiversity.” BioScience 51 (11): 933–38.

Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients.” Methods in Ecology and Evolution 1: 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.

Schielzeth, Holger, and Wolfgang Forstmeier. 2009. “Conclusions Beyond Support: Overconfident Estimates in Mixed Models.” Behavioral Ecology 20 (2): 416–20. https://doi.org/10.1093/beheco/arn145.

Uriarte, María, and Charles B. Yackulic. 2009. “Preaching to the Unconverted.” Ecological Applications 19 (3): 592–96. http://www.jstor.org/stable/27646001.

Wood, Simon. 2017. Generalized Additive Models: An Introduction with R. 2d ed. CRC Texts in Statistical Science. Chapman & Hall.

Zuur, Alain F., Elena N. Ieno, Neil J. Walker, Anatoly A. Saveliev, and Graham M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer.


  1. “relatively large units of land containing a distinct assemblage of natural communities and species, with boundaries that approximate the original extent of natural communities prior to major land-use change”↩︎