This repository includes material for the Physalia workshop on Generalized linear Latent Variable Models, 10-13 June 2024. Feel free to share, alter, or re-use this material with appropriate referencing of this repository.
Workshop webpage: https://www.physalia-courses.org/courses-workshops/gllvm/
Since the 1950s, ecologists have used ordination methods for analysis of data on ecological communities. In recent years, research by (amongst others) Warton et al. 2012 has shown that classical ordination methods (PCA, PCoA, RDA, CA, CCA, NMDS etc.) which rely on distance measures have various unfavourable properties that lead to a poor representation of the composition of communities.
Hui et al. (2015) suggested to use the Generalized Linear Latent Variable Modeling (GLLVM) framework instead, and with it modernize ecological multivariate analysis. It is not quite clear to me (at present) who proposed GLLVMs as a class of models first, but Skrondal and Rabe-Hesketh (2004) and Bartholomew et al. 2011 are go-to resources. It is clear however, that the first latent variable model method to be developed was Factor analysis (Spearman, 1904), which is a GLLVM for normally distributed responses. Factor analysis is not a very popular method in community ecology, mostly because it was noted early on that its assumption of normally distributed responses does not hold for most ecological applications.
GLLVMs have many properties in common with Generalised Linear Models (GLMs, Nelder and Wedderburn 1972), Generalised Linear Mixed Models, and with other ordination methods. Estimation tends to be challenging due to the omnipresence of random effects, but there are many favorable statistical properties, and tools for inference, that are worth the hassle. This workshop teaches GLLVMs by first providing a quick recap of GLMs, GLMMs, and classical ordination methods since those methods are more familiar to most ecologists (i.e., basic statistical concepts as sampling theory and such are assumed to be somewhat familiar to participants). The material of my Physalia workshop on Generalised Linear Models can be found here. Gavin Simpsons' Physalia workshop on classical multivariate analysis (github here) can serve as an introduction to some of the material in this course.
I will assume all workshop participants to be sufficiently familiar with the R statistical programming language, so that in this course I do not recap use of R and Rstudio.
Please make sure to update your R installation prior to the workshop. Most of the code used in the workshop should function on older versions of R as well, but not all R packages used might be available or function fully.
You can find an R installation based on your operating system here
Sessions from 14:00 to 20:00 (Tuesday to Friday). Sessions will consist of a mix of lectures, in-class discussion, and practical exercises / case studies over Slack and Zoom.
- Introduction and overview
- Aspects of community data
- Multispecies Generalised Linear Models
- Multispecies Generalised Linear Mixed Models
- Other R packages for fitting GLLVMs and JSDMs
- Beyond vanilla GLLVMs: hierarchical ordination and machine learning
- Discussion and reanalysis of a paper
- Possibility for own data analysis, or addressing suggested topics by participants
| Day | Time | Subject |
|---|---|---|
| Tuesday | 14:00 - 14:45 | Introduction |
| 14:45 - 15:05 | Brainstorming: challenging properties of community data | |
| 14:05 - 15:30 | Key concepts in modeling community data | |
| 15:30 - 15:45 | Break | |
| 15:45 - 16:45 | Vector Generalised Linear Models (VGLM) | |
| 16:45 - 17:45 | Practical 1: Fitting vector GLMs | |
| 17:45 - 18:30 | Break | |
| 18:30 - 19:15 | Vector Generalised Linear Mixed Models (GLMM) | |
| 19:15 - 20:00 | Practical 2: Predicting diversity with multispecies GLMMs | |
| --------- | ------------- | ---------------------------------------------------------------- |
| Wednesday | 14:00 - 14:45 | Model validation and comparison |
| 14:45 - 15:45 | Practical 3: Validation and comparison | |
| 15:45 - 16:00 | Break | |
| 16:00 - 16:45 | Hierarchically modeling environmental responses | |
| 16:45 - 17:45 | Practical 4: traits and phylogeny | |
| 17:45 - 18:30 | Break | |
| 18:30 - 19:15 | Incorporating species' correlation | |
| 19:15 - 20:00 | Practical 5: Joint Species Distribution Models | |
| --------- | ------------- | ---------------------------------------------------------------- |
| Thursday | 14:00 - 14:45 | Model-based ordination |
| 14:45 - 15:45 | Practical 6: Comparing ordinations | |
| 15:45 - 16:00 | Break | |
| 16:00 - 16:45 | Ordination with predictors | |
| 16:45 - 17:45 | Practical 7: Random canonical coefficients | |
| 17:45 - 18:30 | Break | |
| 18:30 - 19:15 | Ordination with unimodal responses | |
| 19:15 - 20:00 | Practical 8: Quadratic GLLVM | |
| --------- | ------------- | ---------------------------------------------------------------- |
| Friday | 14:00 - 14:45 | Other R packages for fitting GLLVM and JSDMs |
| 14:45 - 15:45 | Practical 9: Fit a model with various R packages | |
| 15:45 - 16:00 | Break | |
| 16:00 - 16:45 | Beyond vanilla GLLVMs | |
| 16:45 - 17:45 | Practical 10: Article reanalysis | |
| 17:45 - 18:30 | Break | |
| 18:30 - 20:00 | Wrapping up - questions, requests, own analysis | |
| --------- | ------------- | ---------------------------------------------------------------- |
| gllvm argument | Function | Accepted structures | Data |
|---|---|---|---|
formula |
Fixed and random species-specific effects | lme4 -type formula (e.g. ~ x1 + (0+x2|1)) |
X: environmental variables |
lv.formula |
Specifies fixed or random effect in the ordination | lme4-type formula (e.g., ~x1 + x2 or ~(0+x1 + x2|1) |
X: covariates for the latent variables |
row.eff |
Includes fixed and random species-common effects | glmmTMBtype formula, alternatively "fixed" or "random" | studyDesign: any categorical or continuous covariates |
lvCor |
For group-level unconstrained ordination or to introduce correlation structure among unconstrained latent variables | lme4-type formula | studyDesign |
