non-Metric Multidimensional Scaling (nMDS)

Author

Affiliation

Published

January 1, 2021

Material required for this chapter

Type	Name	Link
Theory	Numerical Ecology in R	See pages 145-151
Slides	nMDS lecture slides	💾 `BCB743_11_nMDS.pdf`
Data	The Doubs River data	💾 `Doubs.RData`

Tasks to complete in this Chapter

Task F 1–5

Non-Metric Multidimensional Scaling (nMDS) is a rank-based indirect gradient analysis that uses a distance or dissimilarity matrix as its input (either pre-calculated using vegdist() or constructed internal to the metaMDS() function via the dist argument). Should one supply a ‘raw’ species × site table, the default dissimilarity matrix is Bray-Curtis dissimilarity, but any other dissimilarity index in vegdist() can be specified. Unlike other ordination methods such as Principal Component Analysis (PCA) and Correspondence Analysis (CA), which aim to maximise variance or correspondence between sites, nMDS focuses on representing the pairwise dissimilarities between sites in an ordination space. It does not use the raw distances or dissimilarities directly; instead, these values are replaced with their ranks, which is why the method is termed “non-metric.”

nMDS is the non-metric equivalent of Principal Coordinates Analysis (PCoA); in fact, PCoA is sometimes referred to as metric multidimensional scaling. PCoA and nMDS can both produce ordinations of objects from any distance or dissimilarity matrix. However, nMDS does not preserve the exact distances among objects in an ordination plot. Instead, it tries to represent the ordering (rank) relationships among objects as accurately as possible on a specified number of axes. This nonlinear mapping of dissimilarities onto a low-dimensional ordination space means that the Euclidean distances of points in the ordination space are rank-order similar to the original community dissimilarities.

The ordination space in nMDS is metric, but the regression used to fit the dissimilarities to this space is non-metric. This makes nMDS more robust than the eigen-value methods, especially when the data are not well-represented by a specific distribution as may sometimes be the case for ecological data. As with PCoA, it can handle quantitative, semi-quantitative, qualitative, or mixed variables, so we can flexibly apply it to many ecological problems.

A new concept, not seen in the eigen-approaches, is the idea of ‘stress.’ Stress quantifies the discrepancy between the observed dissimilarities and the distances in the ordination space. Stress is visually presented as the scatter of observed dissimilarities against the expected monotone regression. Lower stress values indicate a better fit of the data to the ordination space. Because rank orders of dissimilarities cannot be exactly preserved by rank-orders of ordination distances in low-dimensional space, some stress cannot be avoided.

nMDS does have some limitations. The rank-based approach means that information about the magnitude of differences between site pairs is lost and this can be a disadvantage when the actual distances are important for interpretation. Also, nMDS can be computationally intensive with large datasets or when trying to minimise stress through numerous iterations.

After performing nMDS, environmental interpretation can be facilitated using vegan’s envfit() and ordisurf() functions, as we have already seen in PCA, CA, and PCoA. As before, they allow for the fitting of environmental variables onto the ordination. This aids in visualising and understanding how the environmental variables influence the species ordination. Again, I must emphasis that this is not the same as doing a formal constrained ordination, which will be discussed in the next section (RDA and CCA). Additionally, if we require a statistical framework to assess the influence of categorical factors on the observed dissimilarities, we can use PERMANOVA (Permutational Multivariate Analysis of Variance) to test for differences that might be attributed to group effects. These ideas will be demonstrated below.

Set-Up the Analysis Environment

library(tidyverse)
library(vegan)
library(viridis)

# Files will be referenced using here::here() for absolute paths

The Doubs River Data

We continue to use the species data:

load(here::here("data", "NEwR-2ed_code_data", "NEwR2-Data", "Doubs.RData"))
spe <- dplyr::slice(spe, -8)

Do the nMDS

1spe_nmds <- metaMDS(spe, distance = "bray", trace = 0)
spe_nmds

1: I use trace = 0 to suppress the output of the iterations.


Call:
metaMDS(comm = spe, distance = "bray", trace = 0) 

global Multidimensional Scaling using monoMDS

Data:     spe 
Distance: bray 

Dimensions: 2 
Stress:     0.07376221 
Stress type 1, weak ties
Best solution was repeated 2 times in 20 tries
The best solution was from try 11 (random start)
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'spe'

As always, reading the help file (accessible as ?vegan::metaMDS) is invaluable, as it is for all other ordination techniques.

There’s a summary method available, but it is not particularly useful and I don’t display the output here:

summary(spe_nmds)

Although summary(spe_nmds) does not return anything interesting, the species and site scores are nevertheless available directly through the scores() command, and they can be plotted as layer in ggplot2 if need be:

scores(spe_nmds)

$sites
         NMDS1       NMDS2
1  -1.78749017  0.81742694
2  -1.14365897 -0.15623577
3  -1.00358116 -0.07477727
4  -0.62137240 -0.07857296
5   0.07185211  0.45924489
6  -0.42772369 -0.15281550
7  -0.87060554 -0.23019721
8  -0.01413906 -0.86260556
9  -0.52651717 -0.40390700
10 -1.00851298 -0.37662162
11 -0.97239466 -0.16375387
12 -1.16272023  0.09690746
13 -0.80295464  0.12358526
14 -0.49318857  0.17930066
15 -0.18906339  0.27563879
16  0.08270340  0.12241188
17  0.29909458  0.11255720
18  0.44582156  0.14953201
19  0.76715097  0.28460546
20  0.86210607  0.37774450
21  0.95504161  0.44491303
22  0.75402254 -1.44440335
23  1.13004847 -0.63118404
24  0.85949610 -0.87220639
25  0.93320742  0.12360293
26  0.97500428  0.36365957
27  1.02353178  0.37384704
28  0.79606470  0.55879950
29  1.06877701  0.58350341

$species
          NMDS1       NMDS2
Cogo -0.9122203 -0.10506160
Satr -1.1123745 -0.22563306
Phph -0.7890658 -0.32113890
Babl -0.5341407 -0.28971105
Thth -0.9380692 -0.09761992
Teso -0.5249314  0.16330198
Chna  0.9014755  0.36633730
Pato  0.5170700  0.38511415
Lele  0.3301036  0.28554031
Sqce  0.3584920 -0.15439076
Baba  0.7036266  0.47491059
Albi  0.7279500  0.47025960
Gogo  0.6850514  0.30436765
Eslu  0.6093182  0.42159315
Pefl  0.6169944  0.50353418
Rham  0.9664251  0.58288174
Legi  0.9559752  0.50938127
Scer  0.9655902  0.55203449
Cyca  0.9583386  0.62827976
Titi  0.7239586  0.41807849
Abbr  1.0821714  0.67985975
Icme  1.1278169  0.78239244
Gyce  1.0740996  0.40762430
Ruru  0.7592002  0.15178841
Blbj  1.0948356  0.58060244
Alal  0.9933672  0.02796742
Anan  1.0088438  0.61233301

See Numerical Ecology in R (pp. 145 to 149) for information about the interpretation of a nMDSand the ordination diagrams shown below.

Ordination Diagrams

We create the ordination diagrammes as before, but new concepts introduced here are stress, Shepard plots, and goodness of fit (Figure 1). The stress indicates the scatter of observed dissimilarities against an expected monotone regression, while a Shepard diagram plots ordination distances against original dissimilarities, and adds a monotone or linear fit line to highlight this relationship. The stressplot() function also produces two fit statistics. The goodness-of-fit of the ordination is measured as the $R^{2}$ of either a linear or a non-linear regression of the nMDS distances on the original ones.

opar <- par(no.readonly = TRUE)
par(mfrow = c(2, 2))
stressplot(spe_nmds, main = "Shepard plot")
ordiplot(spe_nmds, type = "t", cex = 1.2,
         main = paste0("nMDS stress = ", round(spe_nmds$stress, 2)))
gof <- goodness(spe_nmds)
plot(spe_nmds, type = "t", cex = 1.2, main = "Goodness of fit")
points(spe_nmds, display = "sites", cex = gof * 200)
# ...bigger bubbles indicate a worse fit
par(opar)

Figure 1: nMDS ordination plots of the Doubs River species data showing a Shepard plot (top, left), the ordination diagram (top, right), and goodness of fit (bottom, right).

A good rule of thumb: stress <0.05 provides an excellent representation in reduced dimensions, <0.1 is great, <0.2 is so-so, and stress <0.3 provides a poor representation.

We can also build ordination plots from scratch to suit specific needs:

pl <- ordiplot(spe_nmds, type = "none", main = "nMDS fish abundances ")
points(pl, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80")
points(pl, "species", pch = 21, col = "turquoise", arrows = TRUE)
text(pl, "species", col = "blue4", cex = 0.9)
text(pl, "sites", col = "red4", cex = 0.9)

Figure 2: nMDS ordination plot of the Doubs River species data assembled from scratch.

Or we can fit response surfaces using ordisurf() and project environmental drivers (Figure 3):

palette(viridis(8))
opar <- par(no.readonly = TRUE)
par(mar = c(4, 4, 0.9, 0.5) + .1, mfrow = c(2, 2))

invisible(ordisurf(spe_nmds ~ Satr, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Salmo trutta fario"))
abline(h = 0, v = 0, lty = 3)

invisible(ordisurf(spe_nmds ~ Scer, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Scardinius erythrophthalmus"))
abline(h = 0, v = 0, lty = 3)

invisible(ordisurf(spe_nmds ~ Teso, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Telestes souffia"))
abline(h = 0, v = 0, lty = 3)

invisible(ordisurf(spe_nmds ~ Cogo, data = spe, bubble = 3,
                   family = quasipoisson, knots = 2, col = 6,
                   display = "sites", main = "Cottus gobio"))
abline(h = 0, v = 0, lty = 3)

env <- env[-8, ] # because we removed the eighth site in the spp data

# A posteriori projection of environmental variables in a CA
# The last plot produced (CA scaling 2) must be active
spe_nmds_env <- envfit(spe_nmds, env, scaling = 2) # Scaling 2 is default
plot(spe_nmds_env)

# Plot significant variables with a different colour
plot(spe_nmds_env, p.max = 0.05, col = "red")
par(opar)

Figure 3: nMDS ordination plots with species response surfaces of the Doubs River species data emphasising four species of fish: A) Satr, B) Scer, C) Teso, and D) Cogo. D) additionally has the environmental vectors projected on the plot, with the significant vectors shown in red.

References

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2021,
  author = {Smit, A. J.,},
  title = {Non-Metric {Multidimensional} {Scaling} {(nMDS)}},
  date = {2021-01-01},
  url = {http://tangledbank.netlify.app/BCB743/nMDS.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J. (2021) non-Metric Multidimensional Scaling (nMDS). http://tangledbank.netlify.app/BCB743/nMDS.html.

--- date: "2021-01-01" title: "non-Metric Multidimensional Scaling (nMDS)" ---  ::: callout-tip ## **Material required for this chapter** | Type | Name | Link | |----------------|------------------------------------------|-----------------------------------------------------------------------------------| | **Theory** | Numerical Ecology in R | See pages 145-151 | | **Slides** | nMDS lecture slides | [💾 `BCB743_11_nMDS.pdf`](../slides/BCB743_11_nMDS.pdf) | | **Data** | The Doubs River data | [💾 `Doubs.RData`](../data/NEwR-2ed_code_data/NeWR2-Data/Doubs.RData) | ::: ::: {.callout-important appearance="simple"} ## Tasks to complete in this Chapter * [Task F 1--5](tasks/Task_F.qmd) ::: Non-Metric Multidimensional Scaling (nMDS) is a rank-based indirect gradient analysis that uses a distance or dissimilarity matrix as its input (either pre-calculated using `vegdist()` or constructed internal to the `metaMDS()` function via the `dist` argument). Should one supply a 'raw' species × site table, the default dissimilarity matrix is Bray-Curtis dissimilarity, but any other dissimilarity index in `vegdist()` can be specified. Unlike other ordination methods such as Principal Component Analysis (PCA) and Correspondence Analysis (CA), which aim to maximise variance or correspondence between sites, nMDS focuses on representing the pairwise dissimilarities between sites in an ordination space. It does not use the raw distances or dissimilarities directly; instead, these values are replaced with their ranks, which is why the method is termed "non-metric." nMDS is the non-metric equivalent of Principal Coordinates Analysis (PCoA); in fact, PCoA is sometimes referred to as metric multidimensional scaling. PCoA and nMDS can both produce ordinations of objects from any distance or dissimilarity matrix. However, nMDS does not preserve the exact distances among objects in an ordination plot. Instead, it tries to represent the ordering (rank) relationships among objects as accurately as possible on a specified number of axes. This nonlinear mapping of dissimilarities onto a low-dimensional ordination space means that the Euclidean distances of points in the ordination space are rank-order similar to the original community dissimilarities. The ordination space in nMDS is metric, but the regression used to fit the dissimilarities to this space is non-metric. This makes nMDS more robust than the eigen-value methods, especially when the data are not well-represented by a specific distribution as may sometimes be the case for ecological data. As with PCoA, it can handle quantitative, semi-quantitative, qualitative, or mixed variables, so we can flexibly apply it to many ecological problems. A new concept, not seen in the eigen-approaches, is the idea of 'stress.' Stress quantifies the discrepancy between the observed dissimilarities and the distances in the ordination space. Stress is visually presented as the scatter of observed dissimilarities against the expected monotone regression. Lower stress values indicate a better fit of the data to the ordination space. Because rank orders of dissimilarities cannot be exactly preserved by rank-orders of ordination distances in low-dimensional space, *some* stress cannot be avoided. nMDS does have some limitations. The rank-based approach means that information about the magnitude of differences between site pairs is lost and this can be a disadvantage when the actual distances are important for interpretation. Also, nMDS can be computationally intensive with large datasets or when trying to minimise stress through numerous iterations. After performing nMDS, environmental interpretation can be facilitated using **vegan**'s `envfit()` and `ordisurf()` functions, as we have already seen in PCA, CA, and PCoA. As before, they allow for the fitting of environmental variables onto the ordination. This aids in visualising and understanding how the environmental variables influence the species ordination. Again, I must emphasis that this is not the same as doing a formal constrained ordination, which will be discussed in the next section (RDA and CCA). Additionally, if we require a statistical framework to assess the influence of categorical factors on the observed dissimilarities, we can use PERMANOVA (Permutational Multivariate Analysis of Variance) to test for differences that might be attributed to group effects. These ideas will be demonstrated below. ## Set-Up the Analysis Environment ```{r message=FALSE, warning=FALSE} library(tidyverse) library(vegan) library(viridis) # Files will be referenced using here::here() for absolute paths ``` ## The Doubs River Data We continue to use the species data: ```{r} load(here::here("data", "NEwR-2ed_code_data", "NEwR2-Data", "Doubs.RData")) spe <- dplyr::slice(spe, -8) ``` ## Do the nMDS ```{r} spe_nmds <- metaMDS(spe, distance = "bray", trace = 0) # <1> spe_nmds ``` 1. I use `trace = 0` to suppress the output of the iterations. As always, reading the help file (accessible as `?vegan::metaMDS`) is invaluable, as it is for all other ordination techniques. There's a summary method available, but it is not particularly useful and I don't display the output here: ```{r} #| eval: false summary(spe_nmds) ``` Although `summary(spe_nmds)` does not return anything interesting, the species and site scores are nevertheless available directly through the `scores()` command, and they can be plotted as layer in **ggplot2** if need be: ```{r} scores(spe_nmds) ``` See *Numerical Ecology in R* (pp. 145 to 149) for information about the interpretation of a nMDSand the ordination diagrams shown below. ## Ordination Diagrams We create the ordination diagrammes as before, but new concepts introduced here are **stress**, **Shepard plots**, and **goodness of fit** (@fig-nmds-plots). The stress indicates the scatter of observed dissimilarities against an expected monotone regression, while a Shepard diagram plots ordination distances against original dissimilarities, and adds a monotone or linear fit line to highlight this relationship. The `stressplot()` function also produces two fit statistics. The goodness-of-fit of the ordination is measured as the $R^{2}$ of either a linear or a non-linear regression of the nMDS distances on the original ones. ```{r} #| fig-width: 10 #| fig-height: 10.75 #| fig.align: center #| fig.cap: "nMDS ordination plots of the Doubs River species data showing a Shepard plot (top, left), the ordination diagram (top, right), and goodness of fit (bottom, right)." #| label: fig-nmds-plots opar <- par(no.readonly = TRUE) par(mfrow = c(2, 2)) stressplot(spe_nmds, main = "Shepard plot") ordiplot(spe_nmds, type = "t", cex = 1.2, main = paste0("nMDS stress = ", round(spe_nmds$stress, 2))) gof <- goodness(spe_nmds) plot(spe_nmds, type = "t", cex = 1.2, main = "Goodness of fit") points(spe_nmds, display = "sites", cex = gof * 200) # ...bigger bubbles indicate a worse fit par(opar) ``` A good rule of thumb: stress <0.05 provides an excellent representation in reduced dimensions, <0.1 is great, <0.2 is so-so, and stress <0.3 provides a poor representation. We can also build ordination plots from scratch to suit specific needs: ```{r} #| fig-width: 5.625 #| fig-height: 5.625 #| fig.align: center #| fig.cap: "nMDS ordination plot of the Doubs River species data assembled from scratch." #| label: fig-pca-ordiplot pl <- ordiplot(spe_nmds, type = "none", main = "nMDS fish abundances ") points(pl, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80") points(pl, "species", pch = 21, col = "turquoise", arrows = TRUE) text(pl, "species", col = "blue4", cex = 0.9) text(pl, "sites", col = "red4", cex = 0.9) ``` Or we can fit response surfaces using `ordisurf()` and project environmental drivers (@fig-nmds-ordisurf): ```{r} #| fig-width: 10 #| fig-height: 10 #| fig.align: center #| fig.cap: "nMDS ordination plots with species response surfaces of the Doubs River species data emphasising four species of fish: A) Satr, B) Scer, C) Teso, and D) Cogo. D) additionally has the environmental vectors projected on the plot, with the significant vectors shown in red." #| label: fig-nmds-ordisurf palette(viridis(8)) opar <- par(no.readonly = TRUE) par(mar = c(4, 4, 0.9, 0.5) + .1, mfrow = c(2, 2)) invisible(ordisurf(spe_nmds ~ Satr, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Salmo trutta fario")) abline(h = 0, v = 0, lty = 3) invisible(ordisurf(spe_nmds ~ Scer, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Scardinius erythrophthalmus")) abline(h = 0, v = 0, lty = 3) invisible(ordisurf(spe_nmds ~ Teso, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Telestes souffia")) abline(h = 0, v = 0, lty = 3) invisible(ordisurf(spe_nmds ~ Cogo, data = spe, bubble = 3, family = quasipoisson, knots = 2, col = 6, display = "sites", main = "Cottus gobio")) abline(h = 0, v = 0, lty = 3) env <- env[-8, ] # because we removed the eighth site in the spp data # A posteriori projection of environmental variables in a CA # The last plot produced (CA scaling 2) must be active spe_nmds_env <- envfit(spe_nmds, env, scaling = 2) # Scaling 2 is default plot(spe_nmds_env) # Plot significant variables with a different colour plot(spe_nmds_env, p.max = 0.05, col = "red") par(opar) ``` ## References ::: {#refs} :::