non-Metric Multidimensional Scaling (nMDS)

Author

Affiliation

Published

January 1, 2021

Material required for this chapter

Type	Name	Link
Theory	Numerical Ecology in R	See pages 145-151
Slides	nMDS lecture slides	💾 `BCB743_11_nMDS.pdf`
Data	The Doubs River data	💾 `Doubs.RData`

Tasks to complete in this Chapter

Task F 1–5

Non-Metric Multidimensional Scaling (nMDS) is a rank-based indirect gradient analysis that uses a distance or dissimilarity matrix as its input (either pre-calculated using vegdist() or constructed internal to the metaMDS() function via the dist argument). Should one supply a ‘raw’ species × site table, the default dissimilarity matrix is Bray-Curtis dissimilarity, but any other dissimilarity index in vegdist() can be specified. Unlike other ordination methods such as Principal Component Analysis (PCA) and Correspondence Analysis (CA), which aim to maximise variance or correspondence between sites, nMDS focuses on representing the pairwise dissimilarities between sites in an ordination space. It does not use the raw distances or dissimilarities directly; instead, these values are replaced with their ranks, which is why the method is termed “non-metric.”

nMDS is the non-metric equivalent of Principal Coordinates Analysis (PCoA); in fact, PCoA is sometimes referred to as metric multidimensional scaling. PCoA and nMDS can both produce ordinations of objects from any distance or dissimilarity matrix. However, nMDS does not preserve the exact distances among objects in an ordination plot. Instead, it tries to represent the ordering (rank) relationships among objects as accurately as possible on a specified number of axes. This nonlinear mapping of dissimilarities onto a low-dimensional ordination space means that the Euclidean distances of points in the ordination space are rank-order similar to the original community dissimilarities.

The ordination space in nMDS is metric, but the regression used to fit the dissimilarities to this space is non-metric. This makes nMDS more robust than the eigen-value methods, especially when the data are not well-represented by a specific distribution as may sometimes be the case for ecological data. As with PCoA, it can handle quantitative, semi-quantitative, qualitative, or mixed variables, so we can flexibly apply it to many ecological problems.

A new concept, not seen in the eigen-approaches, is the idea of ‘stress.’ Stress quantifies the discrepancy between the observed dissimilarities and the distances in the ordination space. Stress is visually presented as the scatter of observed dissimilarities against the expected monotone regression. Lower stress values indicate a better fit of the data to the ordination space. Because rank orders of dissimilarities cannot be exactly preserved by rank-orders of ordination distances in low-dimensional space, some stress cannot be avoided.

nMDS does have some limitations. The rank-based approach means that information about the magnitude of differences between site pairs is lost and this can be a disadvantage when the actual distances are important for interpretation. Also, nMDS can be computationally intensive with large datasets or when trying to minimise stress through numerous iterations.

After performing nMDS, environmental interpretation can be facilitated using vegan’s envfit() and ordisurf() functions, as we have already seen in PCA, CA, and PCoA. As before, they allow for the fitting of environmental variables onto the ordination. This aids in visualising and understanding how the environmental variables influence the species ordination. Again, I must emphasis that this is not the same as doing a formal constrained ordination, which will be discussed in the next section (RDA and CCA). Additionally, if we require a statistical framework to assess the influence of categorical factors on the observed dissimilarities, we can use PERMANOVA (Permutational Multivariate Analysis of Variance) to test for differences that might be attributed to group effects. These ideas will be demonstrated below.

Set-Up the Analysis Environment

library(tidyverse)
library(vegan)
library(viridis)

# setting up a 'root' file path so I don't have to keep doing it later...
root <- "../data/"

The Doubs River Data

We continue to use the species data:

load(paste0(root, "NEwR-2ed_code_data/NEwR2-Data/Doubs.RData"))
spe <- dplyr::slice(spe, -8)

Do the nMDS

1spe_nmds <- metaMDS(spe, distance = "bray", trace = 0)
spe_nmds

1: I use trace = 0 to suppress the output of the iterations.


Call:
metaMDS(comm = spe, distance = "bray", trace = 0) 

global Multidimensional Scaling using monoMDS

Data:     spe 
Distance: bray 

Dimensions: 2 
Stress:     0.07383663 
Stress type 1, weak ties
Best solution was not repeated after 20 tries
The best solution was from try 10 (random start)
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'spe'

As always, reading the help file (accessible as ?vegan::metaMDS) is invaluable, as it is for all other ordination techniques.

There’s a summary method available, but it is not particularly useful and I don’t display the output here:

summary(spe_nmds)

Although summary(spe_nmds) does not return anything interesting, the species and site scores are nevertheless available directly through the scores() command, and they can be plotted as layer in ggplot2 if need be:

scores(spe_nmds)

$sites
         NMDS1       NMDS2
1  -1.79030700  0.81627940
2  -1.14283792 -0.16029572
3  -1.00160702 -0.14778221
4  -0.62112404 -0.08255030
5   0.07295609  0.45828960
6  -0.42762076 -0.15598943
7  -0.87429643 -0.23120352
8  -0.01506354 -0.86128619
9  -0.52497940 -0.40525880
10 -1.00839414 -0.37979671
11 -0.97805856 -0.08559725
12 -1.15922594  0.10409874
13 -0.80520832  0.12745554
14 -0.49408674  0.18096462
15 -0.18710477  0.28001633
16  0.08426766  0.12223470
17  0.29752301  0.11348497
18  0.44599916  0.14733905
19  0.77078678  0.28186685
20  0.86581233  0.37461844
21  0.95565447  0.44335648
22  0.75234556 -1.44653979
23  1.13039673 -0.63131939
24  0.85989408 -0.87168749
25  0.93317091  0.12297337
26  0.97418656  0.36676972
27  1.02422882  0.37679560
28  0.79548581  0.55876891
29  1.06720661  0.58399447

$species
          NMDS1       NMDS2
Cogo -0.9130076 -0.07653571
Satr -1.1128954 -0.22563057
Phph -0.7889489 -0.32476577
Babl -0.5342073 -0.29329558
Thth -0.9390579 -0.06898398
Teso -0.5243956  0.16916607
Chna  0.9023733  0.36442990
Pato  0.5180442  0.38552811
Lele  0.3307598  0.28576947
Sqce  0.3584800 -0.15225009
Baba  0.7042294  0.47527741
Albi  0.7284594  0.46988216
Gogo  0.6856238  0.30371673
Eslu  0.6097670  0.41713674
Pefl  0.6174938  0.50243339
Rham  0.9667597  0.58308133
Legi  0.9563545  0.51021506
Scer  0.9660788  0.55152417
Cyca  0.9585354  0.62960316
Titi  0.7245647  0.41753246
Abbr  1.0823537  0.68035414
Icme  1.1273075  0.78456940
Gyce  1.0742532  0.40811571
Ruru  0.7597479  0.15062704
Blbj  1.0951530  0.58135119
Alal  0.9938664  0.02649176
Anan  1.0089923  0.61329546

See Numerical Ecology in R (pp. 145 to 149) for information about the interpretation of a nMDSand the ordination diagrams shown below.

Ordination Diagrams

We create the ordination diagrammes as before, but new concepts introduced here are stress, Shepard plots, and goodness of fit (Figure 1). The stress indicates the scatter of observed dissimilarities against an expected monotone regression, while a Shepard diagram plots ordination distances against original dissimilarities, and adds a monotone or linear fit line to highlight this relationship. The stressplot() function also produces two fit statistics. The goodness-of-fit of the ordination is measured as the $R^{2}$ of either a linear or a non-linear regression of the nMDS distances on the original ones.

opar <- par(no.readonly = TRUE)
par(mfrow = c(2, 2))
stressplot(spe_nmds, main = "Shepard plot")
ordiplot(spe_nmds, type = "t", cex = 1.2,
         main = paste0("nMDS stress = ", round(spe_nmds$stress, 2)))
gof <- goodness(spe_nmds)
plot(spe_nmds, type = "t", cex = 1.2, main = "Goodness of fit")
points(spe_nmds, display = "sites", cex = gof * 200)
# ...bigger bubbles indicate a worse fit
par(opar)

Figure 1: nMDS ordination plots of the Doubs River species data showing a Shepard plot (top, left), the ordination diagram (top, right), and goodness of fit (bottom, right).

A good rule of thumb: stress <0.05 provides an excellent representation in reduced dimensions, <0.1 is great, <0.2 is so-so, and stress <0.3 provides a poor representation.

We can also build ordination plots from scratch to suit specific needs:

pl <- ordiplot(spe_nmds, type = "none", main = "nMDS fish abundances ")
points(pl, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80")
points(pl, "species", pch = 21, col = "turquoise", arrows = TRUE)
text(pl, "species", col = "blue4", cex = 0.9)
text(pl, "sites", col = "red4", cex = 0.9)

Figure 2: nMDS ordination plot of the Doubs River species data assembled from scratch.

Or we can fit response surfaces using ordisurf() and project environmental drivers (Figure 3):

palette(viridis(8))
opar <- par(no.readonly = TRUE)
par(mar = c(4, 4, 0.9, 0.5) + .1, mfrow = c(2, 2))

invisible(ordisurf(spe_nmds ~ Satr, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Salmo trutta fario"))
abline(h = 0, v = 0, lty = 3)

invisible(ordisurf(spe_nmds ~ Scer, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Scardinius erythrophthalmus"))
abline(h = 0, v = 0, lty = 3)

invisible(ordisurf(spe_nmds ~ Teso, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Telestes souffia"))
abline(h = 0, v = 0, lty = 3)

invisible(ordisurf(spe_nmds ~ Cogo, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Cottus gobio"))
abline(h = 0, v = 0, lty = 3)

env <- env[-8, ] # because we removed the eighth site in the spp data

# A posteriori projection of environmental variables in a CA
# The last plot produced (CA scaling 2) must be active
spe_nmds_env <- envfit(spe_nmds, env, scaling = 2) # Scaling 2 is default
plot(spe_nmds_env)

# Plot significant variables with a different colour
plot(spe_nmds_env, p.max = 0.05, col = "red")
par(opar)

Figure 3: nMDS ordination plots with species response surfaces of the Doubs River species data emphasising four species of fish: A) Satr, B) Scer, C) Teso, and D) Cogo. D) additionally has the environmental vectors projected on the plot, with the significant vectors shown in red.

References

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2021,
  author = {Smit, A. J.,},
  title = {Non-Metric {Multidimensional} {Scaling} {(nMDS)}},
  date = {2021-01-01},
  url = {http://tangledbank.netlify.app/BCB743/nMDS.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J. (2021) non-Metric Multidimensional Scaling (nMDS). http://tangledbank.netlify.app/BCB743/nMDS.html.

--- date: "2021-01-01" title: "non-Metric Multidimensional Scaling (nMDS)" ---  ::: callout-tip ## **Material required for this chapter** | Type | Name | Link | |----------------|------------------------------------------|-----------------------------------------------------------------------------------| | **Theory** | Numerical Ecology in R | See pages 145-151 | | **Slides** | nMDS lecture slides | [💾 `BCB743_11_nMDS.pdf`](../slides/BCB743_11_nMDS.pdf) | | **Data** | The Doubs River data | [💾 `Doubs.RData`](../data/NEwR-2ed_code_data/NeWR2-Data/Doubs.RData) | ::: ::: {.callout-important appearance="simple"} ## Tasks to complete in this Chapter * [Task F 1--5](assessments/Task_F.qmd) ::: Non-Metric Multidimensional Scaling (nMDS) is a rank-based indirect gradient analysis that uses a distance or dissimilarity matrix as its input (either pre-calculated using `vegdist()` or constructed internal to the `metaMDS()` function via the `dist` argument). Should one supply a 'raw' species × site table, the default dissimilarity matrix is Bray-Curtis dissimilarity, but any other dissimilarity index in `vegdist()` can be specified. Unlike other ordination methods such as Principal Component Analysis (PCA) and Correspondence Analysis (CA), which aim to maximise variance or correspondence between sites, nMDS focuses on representing the pairwise dissimilarities between sites in an ordination space. It does not use the raw distances or dissimilarities directly; instead, these values are replaced with their ranks, which is why the method is termed "non-metric." nMDS is the non-metric equivalent of Principal Coordinates Analysis (PCoA); in fact, PCoA is sometimes referred to as metric multidimensional scaling. PCoA and nMDS can both produce ordinations of objects from any distance or dissimilarity matrix. However, nMDS does not preserve the exact distances among objects in an ordination plot. Instead, it tries to represent the ordering (rank) relationships among objects as accurately as possible on a specified number of axes. This nonlinear mapping of dissimilarities onto a low-dimensional ordination space means that the Euclidean distances of points in the ordination space are rank-order similar to the original community dissimilarities. The ordination space in nMDS is metric, but the regression used to fit the dissimilarities to this space is non-metric. This makes nMDS more robust than the eigen-value methods, especially when the data are not well-represented by a specific distribution as may sometimes be the case for ecological data. As with PCoA, it can handle quantitative, semi-quantitative, qualitative, or mixed variables, so we can flexibly apply it to many ecological problems. A new concept, not seen in the eigen-approaches, is the idea of 'stress.' Stress quantifies the discrepancy between the observed dissimilarities and the distances in the ordination space. Stress is visually presented as the scatter of observed dissimilarities against the expected monotone regression. Lower stress values indicate a better fit of the data to the ordination space. Because rank orders of dissimilarities cannot be exactly preserved by rank-orders of ordination distances in low-dimensional space, *some* stress cannot be avoided. nMDS does have some limitations. The rank-based approach means that information about the magnitude of differences between site pairs is lost and this can be a disadvantage when the actual distances are important for interpretation. Also, nMDS can be computationally intensive with large datasets or when trying to minimise stress through numerous iterations. After performing nMDS, environmental interpretation can be facilitated using **vegan**'s `envfit()` and `ordisurf()` functions, as we have already seen in PCA, CA, and PCoA. As before, they allow for the fitting of environmental variables onto the ordination. This aids in visualising and understanding how the environmental variables influence the species ordination. Again, I must emphasis that this is not the same as doing a formal constrained ordination, which will be discussed in the next section (RDA and CCA). Additionally, if we require a statistical framework to assess the influence of categorical factors on the observed dissimilarities, we can use PERMANOVA (Permutational Multivariate Analysis of Variance) to test for differences that might be attributed to group effects. These ideas will be demonstrated below. ## Set-Up the Analysis Environment ```{r message=FALSE, warning=FALSE} library(tidyverse) library(vegan) library(viridis) # setting up a 'root' file path so I don't have to keep doing it later... root <- "../data/" ``` ## The Doubs River Data We continue to use the species data: ```{r} load(paste0(root, "NEwR-2ed_code_data/NEwR2-Data/Doubs.RData")) spe <- dplyr::slice(spe, -8) ``` ## Do the nMDS ```{r} spe_nmds <- metaMDS(spe, distance = "bray", trace = 0) # <1> spe_nmds ``` 1. I use `trace = 0` to suppress the output of the iterations. As always, reading the help file (accessible as `?vegan::metaMDS`) is invaluable, as it is for all other ordination techniques. There's a summary method available, but it is not particularly useful and I don't display the output here: ```{r} #| eval: false summary(spe_nmds) ``` Although `summary(spe_nmds)` does not return anything interesting, the species and site scores are nevertheless available directly through the `scores()` command, and they can be plotted as layer in **ggplot2** if need be: ```{r} scores(spe_nmds) ``` See *Numerical Ecology in R* (pp. 145 to 149) for information about the interpretation of a nMDSand the ordination diagrams shown below. ## Ordination Diagrams We create the ordination diagrammes as before, but new concepts introduced here are **stress**, **Shepard plots**, and **goodness of fit** (@fig-nmds-plots). The stress indicates the scatter of observed dissimilarities against an expected monotone regression, while a Shepard diagram plots ordination distances against original dissimilarities, and adds a monotone or linear fit line to highlight this relationship. The `stressplot()` function also produces two fit statistics. The goodness-of-fit of the ordination is measured as the $R^{2}$ of either a linear or a non-linear regression of the nMDS distances on the original ones. ```{r} #| fig-width: 10 #| fig-height: 10.75 #| fig.align: center #| fig.cap: "nMDS ordination plots of the Doubs River species data showing a Shepard plot (top, left), the ordination diagram (top, right), and goodness of fit (bottom, right)." #| label: fig-nmds-plots opar <- par(no.readonly = TRUE) par(mfrow = c(2, 2)) stressplot(spe_nmds, main = "Shepard plot") ordiplot(spe_nmds, type = "t", cex = 1.2, main = paste0("nMDS stress = ", round(spe_nmds$stress, 2))) gof <- goodness(spe_nmds) plot(spe_nmds, type = "t", cex = 1.2, main = "Goodness of fit") points(spe_nmds, display = "sites", cex = gof * 200) # ...bigger bubbles indicate a worse fit par(opar) ``` A good rule of thumb: stress <0.05 provides an excellent representation in reduced dimensions, <0.1 is great, <0.2 is so-so, and stress <0.3 provides a poor representation. We can also build ordination plots from scratch to suit specific needs: ```{r} #| fig-width: 5.625 #| fig-height: 5.625 #| fig.align: center #| fig.cap: "nMDS ordination plot of the Doubs River species data assembled from scratch." #| label: fig-pca-ordiplot pl <- ordiplot(spe_nmds, type = "none", main = "nMDS fish abundances ") points(pl, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80") points(pl, "species", pch = 21, col = "turquoise", arrows = TRUE) text(pl, "species", col = "blue4", cex = 0.9) text(pl, "sites", col = "red4", cex = 0.9) ``` Or we can fit response surfaces using `ordisurf()` and project environmental drivers (@fig-nmds-ordisurf): ```{r} #| fig-width: 10 #| fig-height: 10 #| fig.align: center #| fig.cap: "nMDS ordination plots with species response surfaces of the Doubs River species data emphasising four species of fish: A) Satr, B) Scer, C) Teso, and D) Cogo. D) additionally has the environmental vectors projected on the plot, with the significant vectors shown in red." #| label: fig-nmds-ordisurf palette(viridis(8)) opar <- par(no.readonly = TRUE) par(mar = c(4, 4, 0.9, 0.5) + .1, mfrow = c(2, 2)) invisible(ordisurf(spe_nmds ~ Satr, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Salmo trutta fario")) abline(h = 0, v = 0, lty = 3) invisible(ordisurf(spe_nmds ~ Scer, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Scardinius erythrophthalmus")) abline(h = 0, v = 0, lty = 3) invisible(ordisurf(spe_nmds ~ Teso, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Telestes souffia")) abline(h = 0, v = 0, lty = 3) invisible(ordisurf(spe_nmds ~ Cogo, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Cottus gobio")) abline(h = 0, v = 0, lty = 3) env <- env[-8, ] # because we removed the eighth site in the spp data # A posteriori projection of environmental variables in a CA # The last plot produced (CA scaling 2) must be active spe_nmds_env <- envfit(spe_nmds, env, scaling = 2) # Scaling 2 is default plot(spe_nmds_env) # Plot significant variables with a different colour plot(spe_nmds_env, p.max = 0.05, col = "red") par(opar) ``` ## References ::: {#refs} :::