dispRity workshop

This is the code that goes with the dispRity workshop. You can use it to follow along with the workshop instructions and modify the code to use your own data or adjust your own options. Most of the information about the workshop comes from the dispRity manual. You can also always find more documentation about the functions used here using the R inbuilt manual by typing ?function.name.

`R` level

In this workshop I will assume you are already familiar with basic R. The basic notions that I’ll assume you know are:

What is a package (e.g. ape or dispRity)
What is an object (e.g. this_object <- 1)
What is an object’s class (e.g. the class "matrix" or "phylo")
What is a function (e.g. the function mean(c(1,2)))
How to access function manuals (e.g. ?mean)
How to input data in R (e.g. using read.csv)

Let’s get into it. First we’ll want to download and install the package:

library(dispRity)

##        --- dispRity package ---
## This is the CRAN release version (1.5.0) of the package.
## For news, vignettes and future releases,
## visit https://github.com/TGuillerme/dispRity

Data

To follow this tutorial you can use the inbuilt dispRity:

## Loading the inbuilt data
data(BeckLee_mat50)
data(BeckLee_tree)

## The matrix
my_matrix <- BeckLee_mat50

## The tree
my_tree <- BeckLee_tree

But you can always use your own data. You’ll need:

a matrix with rows being your observations (species, sites, OTUs, etc.) and columns being your traits (e.g. PC scores)
a phylogenetic tree with the tip names matching the matrix row names

For making it easier to follow the tutorial you can name your matrix my_matrix and your tree my_tree. If you’re missing either the tree or the matrix, you can find below two function to simulate either:

## Functions to get simulate a PCO looking like matrix from a tree
i.need.a.matrix <- function(tree) {
    matrix <- space.maker(elements = Ntip(tree), dimensions = Ntip(tree), distribution = rnorm,
                          scree = rev(cumsum(rep(1/Ntip(tree), Ntip(tree)))))
    rownames(matrix) <- tree$tip.label
    return(matrix)
}

## Function to simulate a tree
i.need.a.tree <- function(matrix) {
    tree <- rtree(nrow(matrix))
    tree$root.time <- max(tree.age(tree)$age)
    tree$tip.label <- rownames(matrix)
    tree$node.label <- paste0("n", 1:(nrow(matrix)-1))
    return(tree)
}
## You can verify if both the matrix and the tree labels match using
clean.data(my_matrix, my_tree)

Disparity per groups analysis

This is the code for the simple disparity per group analysis. The hypothesis we will test will be: “Is the crown mammal’s morphospace bigger than stem mammal’s one?”

## Creating a list of groups
my_groups <- crown.stem(my_tree, inc.nodes = FALSE)

## Creating a dispRity object that contains group information
trait_space <- custom.subsets(my_matrix, group = my_groups)

## Bootstrapping
trait_space_bs <- boot.matrix(trait_space)

Which disparity metric do we need?

This is a tricky one and depends on your question, but thankfully we can ask moms to help us! Check out the documentation here.

For that we can simply upload our trait space in moms and test our different metrics to measure our different hypotheses.

## Saving our space as a matrix (with column names and row names)
write.csv(my_matrix, file = "moms_space.csv")

Here for our hypotheses, we are interested in volume or size of the trait space occupied (i.e. “Is the crown mammal’s morphospace bigger than stem mammal’s one?”) and we can see that the sum of variance metric works relatively well to capture changes in volume.

## Measuring disparity
disparity_groups <- dispRity(trait_space_bs, metric = c(sum, variances))

## Summarizing the results
summary(disparity_groups)

##   subsets  n   obs bs.median  2.5%   25%   75% 97.5%
## 1   crown 30 2.526     2.451 2.375 2.426 2.472 2.497
## 2    stem 20 2.244     2.135 1.962 2.089 2.169 2.216

## Plotting the results
plot(disparity_groups)

## Testing the difference between both groups
test.dispRity(disparity_groups, test = t.test)

## [[1]]
##              statistic: t
## crown : stem     47.32392
## 
## [[2]]
##              parameter: df
## crown : stem      159.1896
## 
## [[3]]
##                   p.value
## crown : stem 1.113884e-95
## 
## [[4]]
##                   stderr
## crown : stem 0.006782907

Bonus tip

If you’re using a good text editor to write your research (e.g. LaTeX, markdown, etc… not Word), you can use the the knitr::kable function to directly print out the summary tables in a format for your research project. For example for this markdown vignette:

knitr::kable(summary(disparity_groups))

subsets	n	obs	bs.median	2.5%	25%	75%	97.5%
crown	30	2.526	2.451	2.375	2.426	2.472	2.497
stem	20	2.244	2.135	1.962	2.089	2.169	2.216

Disparity through time analysis

Making up some node data

This section here will generate some data for your nodes in the tree. DO NOT TRY THIS AT HOME. Or, less patronisingly, we will use some within-PCA ancestral states estimations, there are some cases when this method is totally valid but this is not always the case. You can find our opinion on this paper Section 3-d. I will personally be really cautious about using this approach if you don’t know what you are doing (but you should do what you think is best for your question and data - don’t list to me).

## Wrapping function for running a quick and dirty ancestral character estimation
quick.n.dirty.ace <- function(matrix, tree) {
    apply(matrix, 2,
          function(trait, tree) return(ape::ace(trait, phy = tree)$ace),
          tree = tree)
}

## Running the ACE
node_traits <- quick.n.dirty.ace(my_matrix, my_tree)

## Combining both matrices
my_matrix_nodes <- rbind(my_matrix, node_traits)

## Adding node labels to the tree
my_tree$node.label <- rownames(node_traits)

## Adding a root time to the tree (if missing)
if(is.null(my_tree$root.time)) {
    my_tree$root.time <- max(tree.age(my_tree)$age)
}

Disparity through time analysis

## Creating a time-sliced trait space
trait_space <- chrono.subsets(my_matrix_nodes, tree = my_tree,
                              method = "continuous",
                              model  = "proximity",
                              time = 9)

## Bootstrapping the data
trait_space_bs <- boot.matrix(trait_space)

## Warning in boot.matrix(trait_space): The following subsets have less than 3 elements: 133.51.
## This might effect the bootstrap/rarefaction output.

## Measuring disparity
disparity_time <- dispRity(trait_space_bs, metric = c(sum, variances))

## Summarizing the results
summary(disparity_time)

##   subsets  n   obs bs.median  2.5%   25%   75% 97.5%
## 1  133.51  2 0.008     0.000 0.000 0.000 0.008 0.008
## 2  116.82  6 0.348     0.288 0.129 0.233 0.345 0.418
## 3  100.13 10 1.023     0.945 0.598 0.792 1.057 1.327
## 4   83.44 15 1.092     1.037 0.610 0.896 1.159 1.356
## 5   66.76 20 1.101     1.046 0.709 0.915 1.161 1.365
## 6   50.07 15 1.270     1.200 0.709 1.037 1.325 1.563
## 7   33.38 10 1.490     1.357 0.904 1.222 1.515 1.778
## 8   16.69 10 2.491     2.268 1.939 2.178 2.318 2.418
## 9       0 10 2.491     2.284 1.975 2.224 2.344 2.412

## Plotting the results
plot(disparity_time)

## Testing the difference between both groups
test.dispRity(disparity_time, test = wilcox.test,
              comparisons = "sequential",
              correction = "bonferroni")

## [[1]]
##                 statistic: W
## 133.51 : 116.82          0.0
## 116.82 : 100.13          0.0
## 100.13 : 83.44        3810.0
## 83.44 : 66.76         4799.0
## 66.76 : 50.07         3016.0
## 50.07 : 33.38         2909.0
## 33.38 : 16.69           12.0
## 16.69 : 0             4298.5
## 
## [[2]]
##                      p.value
## 133.51 : 116.82 1.795477e-34
## 116.82 : 100.13 2.046946e-33
## 100.13 : 83.44  2.924806e-02
## 83.44 : 66.76   1.000000e+00
## 66.76 : 50.07   1.005462e-05
## 50.07 : 33.38   2.606010e-06
## 33.38 : 16.69   2.938255e-33
## 16.69 : 0       6.939708e-01

Making some fancy plots

The plot.dispRity function has many, many different functions. Many of them are your normal plot options (main, etc…) so check out the manual to make some more fancy plots. Alternatively, you can just extract the disparity data and

## Some fancier plot
plot(disparity_time,
     main = "Rainbows!",
     cent.tend = sd,
     quantiles = seq(from = 1, to = 99, by = 1),
     col = c("black", rainbow(100)),
     las = 2,
     ylab = "My non descriptive disparity metric")
legend("topleft", lty = 1, col = "black",
       legend = "The standard deviation")

If you are a ggploter, you can extract the data from the dispRity object using the extract.dispRity function. If you like ggplot and the dispRity package and feel like collaborating with me, please drop me an email so that we can figure out a way to make a ggplot module to the package (and make you an author!).

Some more advanced stuff

`dtt` analysis

## Some disparity through time analysis
dispRity_dtt <- dtt.dispRity(data = my_matrix, metric = c(sum, variances),
                             tree = my_tree, nsim = 100)

## Print the data
dispRity_dtt

## Disparity-through-time test (modified from geiger:dtt)
## Call: dtt.dispRity(data = my_matrix, metric = c(sum, variances), tree = my_tree, nsim = 100, model = "BM", alternative = "two-sided") 
## 
## Observation: 0.348593205349649
## 
## Model: BM
## Based on 100 replicates
## Simulated p-value: 0.96
## Alternative hypothesis: two-sided
## 
##     Mean.dtt Mean.sim_MDI  var.sim_MDI 
## 0.8007267649 0.3470381488 0.0005418808 
## 
## Use plot.dispRity() to visualise.

## Plot the data
plot(dispRity_dtt)

More info here.

PERMANOVAs

## Is there an effect of the factor "group" in the data matrix?
adonis.dispRity(disparity_groups)

## 
## Call:
## vegan::adonis(formula = dist(matrix) ~ group, data = disparity_groups,      method = "euclidean") 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## group      1     7.601  7.6006  3.1484 0.06155  0.001 ***
## Residuals 48   115.876  2.4141         0.93845           
## Total     49   123.477                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Is there an effect of the factor "time" in the data matrix?
adonis.dispRity(disparity_time)

## 
## Call:
## vegan::adonis(formula = dist(matrix) ~ time, data = disparity_time,      method = "euclidean") 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## time       7    16.010  2.2871  1.6865 0.16215  0.001 ***
## Residuals 61    82.727  1.3562         0.83785           
## Total     68    98.737                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

More info here.

Null testing

## Is the data normally distributed in each group?
results <- null.test(disparity_groups, replicates = 100, null.distrib = rnorm)
plot(results)

## Is the data log-normally distributed through time?
results <- null.test(disparity_time, replicates = 100, null.distrib = rlnorm)
plot(results)

More info here.

Model fitting

Note that this is an unpublished method. Please contact me (guillert@tcd.ie) if you are interested in using it in your study.

## Testing the fit of different modes of disparity changes through time
model.test.wrapper(disparity_time, model = c("BM", "OU", "EB", "Trend"))

## Evidence of equal variance (Bartlett's test of equal variances p = 0).
## Variance is not pooled.
## Running BM model...Done. Log-likelihood = -2.126
## Running OU model...Done. Log-likelihood = 0.846
## Running EB model...Done. Log-likelihood = -4.044
## Running Trend model...Done. Log-likelihood = 0.658

##        aicc delta_aicc weight_aicc log.lik param ancestral state sigma squared
## Trend  9.48      0.000       0.580    0.66     3           0.008         0.005
## BM    10.25      0.767       0.395   -2.13     2           0.008         0.012
## OU    16.31      6.823       0.019    0.85     4           0.007         0.006
## EB    18.89      9.403       0.005   -4.04     3           2.488         0.011
##       alpha optima.1 eb trend median p value lower p value upper p value
## Trend    NA       NA NA 0.019    0.864135864   0.863136863    0.86513487
## BM       NA       NA NA    NA    0.129870130   0.126873127    0.13286713
## OU    0.006    4.327 NA    NA    0.871628372   0.870129870    0.87312687
## EB       NA       NA  0    NA    0.005994006   0.000999001    0.01098901

More info here.

Some useful stuff

`dispRity` utility functions

The package also provides a range of functions for modifying dispRity objects or extracting some useful information from it (e.g. on specific bootstrapped matrix, etc…). More info here.

Other utility functions

The package also provides some other functions that I find useful in disparity analysis but are not totally linked to disparity analysis. For example, a function for removing zero branch lengths from trees (remove.zero.brlen) or a function to calculate various matrix distance metrics (char.diff). More info here.

Using the `Claddis` package with `dispRity`

Some of you might be familiar with Graeme Lloyd’s rich Claddis package and might want to use that in your analysis. Great news, there is a simple function in dispRity that can convert Claddis objects into dispRity objects Claddis.ordination. The way both me and Graeme see the synergy between both packages is that you can do a lot of analyses before measuring disparity in the Claddis package (e.g. ordinating data, measuring distance matrices, looking at rates, etc…) and then export the trait space generated in Claddis in dispRity for the analyses involving disparity metrics. Of course you do you and you can mix and match your analyses and your packages the way you want.

For those that don’t use/know Claddis , here is a quick pipeline on how to analyse your data starting from a cladistic matrix:

library(Claddis) # based on v0.6.3 or higher

## Loading required package: phytools

## Loading required package: maps

## Loading required package: strap

## Loading required package: geoscale

## previous version of Claddis were quite different
## and won't work here

You can either input your own nexus matrix:

## Reading your cladistic (NEXUS) matrix
my_nexus <- read_nexus_matrix(file_name)

Or a matrix already present in R with species as rows and characters as columns:

## Create random 10-by-50 matrix containing NAs (inapplicable)
## polymorphisms (&) and missing data ("")
character_taxon_matrix <- matrix(sample(c("0", "1", "0&1", NA, ""), 
                                        500,replace = TRUE ),
                                nrow = 10, ncol = 50)
rownames(character_taxon_matrix) <- LETTERS[1:10]

## Reformat for use elsewhere in Claddis:
my_nexus <- build_cladistic_matrix(character_taxon_matrix)

You can then measure your distance matrix anyway you want (e.g. using Graeme’s MORD distance) and then ordinate the distance matrix using a PCoA (NMDS):

## Calculate morphological distances
mord_distances <- calculate_morphological_distances(my_nexus)

## Ordinating the distances
ordinated_distances <- cmdscale(mord_distances$distance_matrix)

There is much more the the distance and ordination methods and I suggest you check out directly Graeme’s paper for more info. Regardless you can then feed the ordinated matrix directly into dispRity:

## Calculating the sum of variances:
dispRity(ordinated_distances, metric = c(sum, variances))

##  ---- dispRity object ---- 
## 10 elements in one matrix with 2 dimensions.
## Disparity was calculated as: c(sum, variances).

Alternatively, you can do all the pipeline automatically using the Claddis.ordination function:

## Running the ordination automatically to create a dispRity object
dispRity(Claddis.ordination(my_nexus), metric = c(sum, variances))

##  ---- dispRity object ---- 
## 10 elements in one matrix with 9 dimensions.
## Disparity was calculated as: c(sum, variances).

What’s next?

The dispRity package is constantly updated with new functionalities, bug fixes and improved functions. You can follow the development of the package by following me on github (TGuillerme - I also develop other ecology-evolution packages) or twitter ([@TGuillerme](https://twitter.com/TGuillerme) - I also tweet about other work related stuff). You can check what will be happening in the next CRAN version of the package by checking the NEWS.md on the master branch. And of course, you are more than welcome to contact me if you have suggestions, ideas or wishes for improvements!

dispRity workshop

Thomas Guillerme

2021-03-11

`R` level

Data

Disparity per groups analysis

Which disparity metric do we need?

Bonus tip

Disparity through time analysis

Making up some node data

Disparity through time analysis

Making some fancy plots

Some more advanced stuff

`dtt` analysis

PERMANOVAs

Null testing

Model fitting

Some useful stuff

`dispRity` utility functions

Other utility functions

Using the `Claddis` package with `dispRity`

What’s next?

dispRity workshop

Thomas Guillerme

2021-03-11

R level

Data

Disparity per groups analysis

Which disparity metric do we need?

Bonus tip

Disparity through time analysis

Making up some node data

Disparity through time analysis

Making some fancy plots

Some more advanced stuff

dtt analysis

PERMANOVAs

Null testing

Model fitting

Some useful stuff

dispRity utility functions

Other utility functions

Using the Claddis package with dispRity

What’s next?

`R` level

`dtt` analysis

`dispRity` utility functions

Using the `Claddis` package with `dispRity`