---
date: last-modified
title: "11a: non-Metric Multidimensional Scaling (nMDS)"
format:
html:
code-link: false
---
```{r code-brewing-opts, echo=FALSE}
knitr::opts_chunk$set(
comment = "R>",
warning = FALSE,
message = FALSE,
fig.width = 4.5,
fig.height = 2.625,
out.width = "75%",
fig.asp = NULL, # control via width/height
dpi = 300
)
ggplot2::theme_set(
ggplot2::theme_minimal(base_size = 8)
)
ggplot2::theme_set(
ggplot2::theme_bw(base_size = 8)
)
```
```{r code-setup-hidden}
#| echo: false
library(tidyverse)
library(ggrepel)
```
<!-- # Topic 11: non-Metric multidimensional scaling (nMDS) -->
::: callout-tip
## **Material Required for This Chapter**
| Type | Name | Link |
| :--- | :--- | :--- |
| **Theory** | Numerical Ecology with R | See pages 145-151 |
| **Slides** | nMDS lecture slides | [💾 `BCB743_11_nMDS.pdf`](../slides/BCB743_11_nMDS.pdf) |
| **Data** | The Doubs River data | [💾 `Doubs.RData`](../data/BCB743/NEwR-2ed_code_data/NeWR2-Data/Doubs.RData) |
:::
::: {.callout-important appearance="simple"}
## Tasks to Complete in This Chapter
* [Task G 1--6](tasks/Task_G.qmd)
:::
Non-Metric Multidimensional Scaling (nMDS) is a rank-based indirect gradient analysis that uses a distance or dissimilarity matrix as its input (either pre-calculated using `vegdist()` or constructed internal to the `metaMDS()` function via the `dist` argument). Should one supply a 'raw' species × site table, the default dissimilarity matrix is Bray-Curtis dissimilarity, but any other dissimilarity index in `vegdist()` can be specified. Unlike other ordination methods such as Principal Component Analysis (PCA) and Correspondence Analysis (CA), which aim to maximise variance or correspondence between sites, nMDS focuses on representing the pairwise dissimilarities between sites in an ordination space. It does not use the raw distances or dissimilarities directly; instead, these values are replaced with their ranks, which is why the method is termed "non-metric."
nMDS is the non-metric equivalent of Principal Coordinates Analysis (PCoA); in fact, PCoA is sometimes referred to as metric multidimensional scaling. PCoA and nMDS can both produce ordinations of objects from any distance or dissimilarity matrix. However, nMDS does not preserve the exact distances among objects in an ordination plot. Instead, it tries to represent the ordering (rank) relationships among objects as accurately as possible on a specified number of axes. This non-linear mapping of dissimilarities onto a low-dimensional ordination space means that the Euclidean distances of points in the ordination space are rank-order similar to the original community dissimilarities.
The two methods sit one step apart, and the table fixes the distinction:
| PCoA | nMDS |
| :--- | :--- |
| preserves distances | preserves ranks |
| metric | non-metric |
| eigenvalue decomposition | iterative optimisation |
| keeps the distance magnitudes | keeps only the ordering |
| diagnostic: eigenvalues | diagnostic: stress |
"Preserving the ranks" is worth seeing directly. Suppose four pairs of sites have Bray-Curtis dissimilarities of 0.10, 0.30, 0.55, and 0.80. nMDS does not try to reproduce those values. It tries only to keep the pairs in the same order of similarity, so that the most similar pair stays closest together in the plot and the least similar pair stays furthest apart (@fig-nmds-rank). The spacing is free to change but the order is not.
```{r fig-nmds-rank}
#| fig-cap: "Rank preservation. Four site pairs are shown by their Bray-Curtis dissimilarity (left) and by the distance between the same pairs in the nMDS plot (right). The lines do not cross, so the order of the pairs is preserved, from most similar (A-B) to least similar (A-E). Their spacing changes, because nMDS keeps only the ordering and is free to stretch or shrink the gaps."
#| fig-width: 4.8
#| fig-height: 3.4
#| code-fold: true
rank_demo <- tibble(
pair = c("A-B", "A-C", "A-D", "A-E"),
`Bray-Curtis\ndissimilarity` = c(0.10, 0.30, 0.55, 0.80),
`distance in the\nnMDS plot` = c(0.30, 0.95, 1.25, 2.35)
) |>
pivot_longer(-pair, names_to = "stage", values_to = "val") |>
mutate(
stage = factor(
stage,
levels = c("Bray-Curtis\ndissimilarity", "distance in the\nnMDS plot")
)
)
ggplot(rank_demo, aes(stage, val, group = pair, colour = pair)) +
geom_line(linewidth = 0.7) +
geom_point(size = 2.6) +
geom_text(
data = filter(rank_demo, stage == "Bray-Curtis\ndissimilarity"),
aes(label = pair),
hjust = 1.3,
size = 3
) +
scale_x_discrete(expand = expansion(mult = c(0.3, 0.15))) +
labs(x = NULL, y = "value") +
theme(legend.position = "none")
```
The ordination space in nMDS is metric, but the regression used to fit the dissimilarities to this space is non-metric. Because it uses only the rank order of the dissimilarities, nMDS makes fewer assumptions about the structure of the distances than the eigen-methods do, which helps when ecological data depart from the distributional or geometric forms those methods expect. This is a difference in assumptions, not a verdict that nMDS is the better method: PCA remains the right tool for linear, Euclidean structure, and CA for $\chi^2$ structure. Like PCoA, nMDS can in principle take a dissimilarity built from any kind of variable, but in community ecology it is most often applied to species composition through measures such as Bray-Curtis, Jaccard, or Sørensen, and that is the use I develop here.
A new concept, not seen in the eigen-approaches, is the idea of 'stress.' Stress quantifies the discrepancy between the observed dissimilarities and the distances in the ordination space. Stress is visually presented as the scatter of observed dissimilarities against the expected monotone regression. Lower stress values indicate a better fit of the data to the ordination space. Because rank orders of dissimilarities cannot be exactly preserved by rank-orders of ordination distances in low-dimensional space, *some* stress cannot be avoided.
Stress is the nMDS counterpart of the goodness measures met in the earlier chapters, and it is read in the opposite direction, namely lower is better rather than higher. Each method reports how well its low-dimensional picture represents the data, under a different name:
| Method | Goodness measure |
| :--- | :--- |
| PCA | variance explained |
| CA | inertia explained |
| PCoA | eigenvalues |
| nMDS | stress (lower is better) |
nMDS does have some limitations. The rank-based approach means that information about the magnitude of differences between site pairs is lost and this can be a disadvantage when the actual distances are important for interpretation. Also, nMDS can be computationally intensive with large datasets or when trying to minimise stress through numerous iterations.
After performing nMDS, I can interpret environmental structure with **vegan**'s `envfit()` and `ordisurf()` functions, as I have already seen in PCA, CA, and PCoA. As before, these functions fit environmental variables onto the ordination. They help show how environmental variables are associated with the species ordination, but they do not turn nMDS into a formal constrained ordination. If I require a statistical framework to assess categorical effects on observed dissimilarities, I can use PERMANOVA (Permutational Multivariate Analysis of Variance). These ideas are shown below.
## Set-Up the Analysis Environment
```{r code-library-tidyverse, message=FALSE, warning=FALSE}
library(tidyverse)
library(vegan)
library(viridis)
library(ggrepel) # for tidy biplot labels
# Files will be referenced using here::here() for absolute paths
```
## The Doubs River Data
I continue to use the species data:
```{r code-load-here-here-data}
load(here::here(
"data",
"BCB743",
"NEwR-2ed_code_data",
"NEwR2-Data",
"Doubs.RData"
))
spe <- dplyr::slice(spe, -8)
```
## Do the nMDS
```{r code-spe-nmds-metamds-spe-distance}
spe_nmds <- metaMDS(spe, distance = "bray", trace = 0) # <1>
spe_nmds
```
1. I use `trace = 0` to suppress the output of the iterations.
By default, `metaMDS()` also has `autotransform = TRUE`, so it may transform raw community data before calculating dissimilarities. That is convenient for first-pass ordinations, but when you have already transformed the data or need exact agreement with another analysis, supply a precomputed distance matrix or set `autotransform = FALSE`.
::: {.callout-note appearance="simple"}
## Why "Best Solution from Try ..."?
Unlike the eigen-methods, `metaMDS()` does not solve for its axes in a single step. It searches, starting from many random configurations and keeping the one with the lowest stress, because the optimisation can settle into a local minimum that is not the best available. I suppressed the running log with `trace = 0`; without it, `metaMDS()` prints lines such as `Best solution was from try 7`. The same reason explains why running it again can return a plot that is rotated or mirrored: the orientation of the axes is arbitrary, only the configuration of points relative to one another carries meaning.
:::
As always, reading the help file (accessible as `?vegan::metaMDS`) is invaluable, as it is for all other ordination techniques.
There is a summary method available, but it is not particularly useful and I do not display the output here:
```{r code-summary-spe-nmds}
#| eval: false
summary(spe_nmds)
```
Although `summary(spe_nmds)` does not return anything interesting, the species and site scores are nevertheless available directly through the `scores()` command, and they can be plotted as a layer in **ggplot2** if needed:
```{r code-scores-spe-nmds}
scores(spe_nmds)
```
See *Numerical Ecology with R* (pp. 145 to 149) for information about the interpretation of a nMDS and the ordination diagrams shown below.
## Ordination Diagrams
I create the ordination diagrams as before, but new concepts introduced here are **stress**, **Shepard plots**, and **goodness of fit** (@fig-nmds-plots). The stress indicates the scatter of observed dissimilarities against an expected monotone regression, while a Shepard diagram plots ordination distances against original dissimilarities and adds a monotone or linear fit line to highlight this relationship. The `stressplot()` function also produces two fit statistics. The goodness-of-fit of the ordination is measured as the $R^{2}$ of either a linear or a non-linear regression of the nMDS distances on the original ones.
```{r fig-nmds-plots}
#| fig-width: 10
#| fig-height: 10.75
#| fig.align: center
#| fig.cap: "nMDS ordination plots of the Doubs River species data showing a Shepard plot (top, left), the ordination diagram (top, right), and goodness of fit (bottom, right)."
opar <- par(no.readonly = TRUE)
par(mfrow = c(2, 2))
stressplot(spe_nmds, main = "Shepard plot")
ordiplot(
spe_nmds,
type = "t",
cex = 1.2,
main = paste0("nMDS stress = ", round(spe_nmds$stress, 2))
)
gof <- goodness(spe_nmds)
plot(spe_nmds, type = "t", cex = 1.2, main = "Goodness of fit")
points(spe_nmds, display = "sites", cex = gof * 200)
# ...bigger bubbles indicate a worse fit
par(opar)
```
In the goodness-of-fit panel each site is drawn as a bubble whose size reflects how poorly its dissimilarities to the other sites are reproduced in the ordination. Large bubbles flag sites whose placement is a compromise, so their position should be read with more caution than that of the small-bubble sites.
A good rule of thumb (after Clarke 1993): stress <0.05 is an excellent representation in reduced dimensions, <0.1 is good, <0.2 is usable but only the broad pattern should be trusted, 0.2–0.3 is poor, and stress >0.3 is close to arbitrary.
::: {.callout-note appearance="simple"}
## What Stress Means
Interpret stress as a mismatch between the fitted ordination distances and the monotone relationship to the original dissimilarities, not as a percentage of variance explained. Lower is better, and the thresholds above are rough guides rather than pass marks. A stress of 0.12 is not a failure; it says the community is a little too complex to flatten perfectly into two dimensions. Treat the number as a measure of how far to trust the fine detail of the plot, not as a score to be minimised for its own sake.
:::
### How to Read This nMDS
The diagnostics tell me the ordination is trustworthy; the next question is what it means ecologically. @fig-nmds-interp redraws the ordination with the sites coloured from source to mouth and the distinctive species labelled. The axis is oriented so that the source lies on the left.
```{r fig-nmds-interp}
#| fig-cap: "An annotated nMDS (Bray-Curtis) of the Doubs fish data. Points are sites, coloured from the source (dark) to the mouth (yellow); green crosses are species, labelled where they sit away from the crowded centre. The first axis separates the upstream and downstream faunas."
#| fig-width: 6.5
#| fig-height: 5
#| code-fold: true
nmds_sites <- as.data.frame(scores(spe_nmds, display = "sites"))
nmds_sites$site <- as.integer(rownames(nmds_sites))
nmds_spp <- as.data.frame(scores(spe_nmds, display = "species"))
nmds_spp$lab <- rownames(nmds_spp)
# the orientation of nMDS axes is arbitrary; flip so the source (site 1) is on the left
flip <- if (nmds_sites$NMDS1[nmds_sites$site == 1] > 0) -1 else 1
nmds_sites$NMDS1 <- nmds_sites$NMDS1 * flip
nmds_spp$NMDS1 <- nmds_spp$NMDS1 * flip
nmds_spp$d <- sqrt(nmds_spp$NMDS1^2 + nmds_spp$NMDS2^2)
nmds_lab <- nmds_spp[nmds_spp$d > quantile(nmds_spp$d, 0.45), ]
ggplot(nmds_sites, aes(NMDS1, NMDS2)) +
geom_hline(yintercept = 0, colour = "grey85") +
geom_vline(xintercept = 0, colour = "grey85") +
geom_point(aes(colour = site), size = 2) +
scale_colour_viridis_c(name = "Site\n(1 = source)") +
geom_point(
data = nmds_spp,
aes(NMDS1, NMDS2),
colour = "seagreen4",
shape = 3,
size = 0.8
) +
geom_text_repel(
data = nmds_lab,
aes(NMDS1, NMDS2, label = lab),
colour = "seagreen4",
size = 2.5,
max.overlaps = Inf,
segment.colour = "grey80"
) +
annotate(
"segment",
x = min(nmds_sites$NMDS1) * 0.95,
y = min(nmds_sites$NMDS2) * 1.08,
xend = max(nmds_sites$NMDS1) * 0.95,
yend = min(nmds_sites$NMDS2) * 1.08,
arrow = arrow(length = unit(2.5, "mm")),
colour = "firebrick"
) +
annotate(
"text",
x = 0,
y = min(nmds_sites$NMDS2) * 1.2,
label = "upstream to downstream (NMDS1)",
colour = "firebrick",
size = 2.7
) +
labs(x = "NMDS1", y = "NMDS2") +
coord_equal()
```
The reading is the gradient I have met in every ordination of these data:
- **The sites fall along the first axis in river order.** `metaMDS()` rotates its solution so that the widest spread lies on NMDS1, and here that spread is the source-to-mouth sequence, recovered from species composition alone.
- **The two faunas separate along NMDS1.** The cool upper river holds trout-zone species, brown trout (`Satr`), grayling (`Thth`), bullhead (`Cogo`), and minnow (`Phph`); the lower river holds lowland species such as bream (`Abbr`), silver bream (`Blbj`), ruffe (`Gyce`), and eel (`Anan`).
- **NMDS1 is therefore the upstream-to-downstream river gradient**, the same gradient recovered by the [PCA](PCA.qmd), [CA](CA.qmd), and [PCoA](PCoA.qmd). The environmental vectors fitted below confirm it: dissolved oxygen and elevation point towards the trout end, and distance from source, nitrate, and organic load towards the lowland end.
- **The low stress (about 0.07) means the two-dimensional picture is faithful**, so the gradient can be read from it with confidence.
This ordination is descriptive. When nMDS is used beside PERMANOVA, also check group dispersions with `betadisper()`/PERMDISP, as shown in the [diatom case study](nMDS_diatoms.qmd), because apparent group separation can reflect differences in spread rather than differences in centroid location.
I can also build ordination plots from scratch to suit specific needs:
```{r code-pca-ordiplot}
#| fig-width: 5.625
#| fig-height: 5.625
#| fig.align: center
#| fig.cap: "nMDS ordination plot of the Doubs River species data assembled from scratch."
pl <- ordiplot(spe_nmds, type = "none", main = "nMDS fish abundances ")
points(pl, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80")
points(pl, "species", pch = 21, col = "turquoise", arrows = TRUE)
text(pl, "species", col = "blue4", cex = 0.9)
text(pl, "sites", col = "red4", cex = 0.9)
```
Or I can fit response surfaces using `ordisurf()` and project environmental drivers (@fig-nmds-ordisurf):
```{r fig-nmds-ordisurf}
#| fig-width: 10
#| fig-height: 10
#| fig.align: center
#| fig.cap: "nMDS ordination plots with species response surfaces of the Doubs River species data emphasising four species of fish: A) Satr, B) Scer, C) Teso, and D) Cogo. D) additionally has the environmental vectors projected on the plot, with the significant vectors shown in red."
palette(viridis(8))
opar <- par(no.readonly = TRUE)
par(mar = c(4, 4, 0.9, 0.5) + .1, mfrow = c(2, 2))
invisible(ordisurf(
spe_nmds ~ Satr,
data = spe,
bubble = 3,
family = quasipoisson,
knots = 2,
col = 6,
display = "sites",
main = "Salmo trutta fario"
))
abline(h = 0, v = 0, lty = 3)
invisible(ordisurf(
spe_nmds ~ Scer,
data = spe,
bubble = 3,
family = quasipoisson,
knots = 2,
col = 6,
display = "sites",
main = "Scardinius erythrophthalmus"
))
abline(h = 0, v = 0, lty = 3)
invisible(ordisurf(
spe_nmds ~ Teso,
data = spe,
bubble = 3,
family = quasipoisson,
knots = 2,
col = 6,
display = "sites",
main = "Telestes souffia"
))
abline(h = 0, v = 0, lty = 3)
invisible(ordisurf(
spe_nmds ~ Cogo,
data = spe,
bubble = 3,
family = quasipoisson,
knots = 2,
col = 6,
display = "sites",
main = "Cottus gobio"
))
abline(h = 0, v = 0, lty = 3)
env <- env[-8, ] # because we removed the eighth site in the spp data
# A posteriori projection of environmental variables onto the nMDS
# The last nMDS plot produced must be active
spe_nmds_env <- envfit(spe_nmds, env) # fit environmental vectors onto the ordination
plot(spe_nmds_env)
# Plot significant variables with a different colour
plot(spe_nmds_env, p.max = 0.05, col = "red")
par(opar)
```
## Four Methods, One Goal
PCA, CA, PCoA, and nMDS all seek a low-dimensional picture of multivariate ecological data, and they differ in what they hold fixed. PCA and CA begin with the data matrix and preserve a particular geometry, Euclidean and $\chi^2$ respectively. PCoA begins with a distance matrix and preserves the distances themselves. nMDS also begins with a distance matrix, but preserves only the rank ordering of the distances, trading exact magnitudes for freedom from distributional assumptions. That trade is why nMDS has become one of the most widely used unconstrained ordination methods in community ecology, particularly with Bray-Curtis dissimilarities of species composition.
All four are *unconstrained*: they recover whatever gradients the species or environmental data happen to contain, without testing them against an external hypothesis. The next step asks a sharper question, namely not what gradients exist, but how much of the community pattern a measured set of environmental variables can account for. That is the task of constrained ordination, [Redundancy Analysis (RDA) and Canonical Correspondence Analysis (CCA)](constrained_ordination.qmd), which follow next.
## References
::: {#refs}
:::