10: Principal Coordinate Analysis (PCoA)

Published

2026/06/15

TipMaterial Required for This Chapter
Type Name Link
Theory Numerical Ecology with R See pages 140-145
Slides CA lecture slides 💾 BCB743_10_PCoA.pdf
Data The Doubs River data 💾 Doubs.RData
The seaweed environmental data 💾 SeaweedEnv.RData
The seaweed bioregion classification 💾 bioregions.csv.
ImportantTasks to Complete in This Chapter

Suppose two river sites share not a single species. Ecologically they could hardly be more different, and a dissimilarity measure such as Bray-Curtis records that difference exactly. PCA cannot ordinate a Bray-Curtis matrix: it is built on Euclidean distance, just as CA is built on \(\chi^2\) distance, so each fixes the geometry in advance. Principal Coordinates Analysis (PCoA) removes that constraint. It takes a dissimilarity matrix, computed with whatever measure is ecologically appropriate, and finds the low-dimensional arrangement of sites that best preserves those dissimilarities. So, I shift from ordinating variables to ordinating distances, and that is the principle behind this chapter.

PCoA is also known as Classical Multidimensional Scaling (MDS). It is an ordination technique used to analyse and represent multivariate data based on a (dis)similarity matrix. I use PCoA when it is necessary to specify dissimilarity measures other than Euclidean distance or \(\chi^2\) distance, as in PCA, and CA, respectively.

Unlike PCA and CA, which operate directly on the raw data matrix, PCoA takes a (dis)similarity matrix as input. This matrix can be calculated using various dissimilarity measures available in vegan’s vegdist() function, which may be necessary when my dataset includes quantitative, semi-quantitative, qualitative, and mixed variables. If the dissimilarities are Euclidean distances among centred observations, PCoA results are equivalent to those obtained from PCA, up to ordinary differences in scaling, signs, and rotation.

That flexibility is what places PCoA in the wider ordination landscape. The choice of method follows from the kind of difference between sites that is ecologically relevant:

Situation Suitable method
Continuous environmental variables PCA
Species abundances under \(\chi^2\) geometry CA
Any ecological distance, such as Bray-Curtis PCoA
Rank-order dissimilarities nMDS

The workflow is always the same, so I build a dissimilarity matrix from the data, then ordinate the matrix (Figure 1). The first step is where the ecology enters, since that is where you choose how two sites are to be compared.

flowchart TD
  A["Species table<br/>(sites × species)"] --> B["Choose a dissimilarity<br/>(e.g. Bray-Curtis)"]
  B --> C["Distance matrix<br/>(site × site)"]
  C --> D["PCoA"]
  D --> E["Ordination plot"]
Figure 1: The PCoA workflow. The ecological decision is the choice of dissimilarity at the second step; everything after it is mechanical.

PCoA performs a dimensionality reduction on the (dis)similarity matrix, scaling the dissimilarities, and returning a set of points in a lower-dimensional space (typically 2D, or 3D). When plotted, the Euclidean distances between these points approximate the original dissimilarities, effectively representing the dissimilarities between objects as Euclidean distances in a lower-dimensional space. This representation can be helpful in visualising, and interpreting complex relationships in a more interpretable form.

Conceptually, PCoA is similar to PCA and CA in that it aims to represent the relationships between objects in a lower-dimensional space. However, it differs in its approach to preserving distances. While PCA preserves Euclidean distances between objects and CA preserves \(\chi^2\) distances, PCoA starts from whichever (dis)similarity measure is provided as input. If that dissimilarity is Euclidean, the full set of real axes can reproduce it exactly; if it is not Euclidean, PCoA gives the best real-coordinate representation and may report negative eigenvalues for the part that cannot be embedded.

The cleanest way to hold the relationship in mind is that PCoA generalises the geometry of PCA. PCA is the special case of PCoA that you obtain when the chosen dissimilarity is Euclidean:

PCA PCoA
starts with variables starts with distances
Euclidean distance only any dissimilarity measure
needs the raw variables a distance matrix is enough
a special case of distance ordination the general distance ordination

In PCoA, the eigenvalues represent the extent to which each principal coordinate axis captures the variability in the original dissimilarity matrix. The proportion of a given positive eigenvalue to the sum of positive eigenvalues indicates the relative importance of each real axis, but this proportion must be read carefully when the dissimilarity is non-Euclidean and negative eigenvalues are present. Higher positive eigenvalues represent axes that capture more of the embeddable structure in the data, helping identify the strongest gradients in the dataset.

When capscale() is given a raw community table and a distance method, it can add species scores as weighted averages of the site scores (wascores). If I instead supply only a precomputed dissimilarity matrix, that species information is no longer available from the input and the species scores are absent.

Set-up the Analysis Environment

library(tidyverse)
library(vegan)
library(ggrepel) # for tidy biplot labels

# Files will be referenced using here::here() for absolute paths

The Doubs River Data

I continue to use the species data:

load(here::here(
  "data",
  "BCB743",
  "NEwR-2ed_code_data",
  "NEwR2-Data",
  "Doubs.RData"
))
# remove the 8th row because it sums to zero
spe <- dplyr::slice(spe, -8)

Calculate a Suitable Dissimilarity Matrix

You may or may not want to calculate a dissimilarity index upfront (see below). Here I calculate the Bray-Curtis dissimilarity which is appropriate for abundance data:

spe_bray <- vegdist(spe)

Do the PCoA

The book Numerical Ecology with R uses a built-in function cmdscale() or the function pcoa() in ape for its PCoA calculation. The vegan function capscale() can also be used for PCoA, and this is the approach I take here. The ‘CAP’ in capscale() stands for ‘Canonical Analysis of Principal Coordinates’. capscale() works differently from rda() or cca() in that I specify the input through a formula interface (see ?capscale). To run a PCoA without constraints I put 1 on the right-hand side of the formula (the community data go on the left). The same interface also accepts predictors on the right, which is how capscale() performs constrained analysis (db-RDA).

The cleanest way to run it is to hand capscale() the raw species table and let it compute the dissimilarities. Supplying the raw table lets the function add species scores (as weighted averages of the site scores, or wascores), so the ordination places species as well as sites; a precomputed matrix carries no species information, and the scores are then missing. I built spe_bray above to make the dissimilarity-first logic clear, but capscale() can construct it internally, which is what I do here:

spe_pcoa <- capscale(spe ~ 1, distance = "bray")
spe_pcoa

Call: capscale(formula = spe ~ 1, distance = "bray")

              Inertia Rank
Total          6.7621
RealTotal      7.0583
Unconstrained  7.0583   17
Imaginary     -0.2963

Inertia is squared Bray distance

-- NOTE:
Species scores projected from 'spe'

Eigenvalues for unconstrained axes:
 MDS1  MDS2  MDS3  MDS4  MDS5  MDS6  MDS7  MDS8
3.695 1.098 0.710 0.415 0.305 0.192 0.157 0.132
(Showing 8 of 17 unconstrained eigenvalues)
NoteWhy Are There Negative (“Imaginary”) Eigenvalues?

The printout reports a small Imaginary component (here about \(-0.30\), against a real total of about \(7.06\)). This is expected, and is not a sign of error. Bray-Curtis, and most ecological dissimilarities, are not Euclidean, so they cannot be embedded perfectly in real coordinate space; PCoA represents the part that will not embed as negative eigenvalues. When the negative component is small relative to the leading positive axes, the ordination on the first two or three axes is usually still useful, but the proportions of variation need to be read with that limitation in mind.

If the negative component is large enough to affect interpretation, there are standard corrections. A square-root transformation of the dissimilarities can often make semi-metric distances closer to Euclidean. Cailliez and Lingoes corrections add constants so the corrected matrix has no negative eigenvalues; these are available through ape::pcoa(correction = "cailliez") or ape::pcoa(correction = "lingoes"), and base R’s cmdscale(add = TRUE) applies a Cailliez-style additive correction. The correction should be reported because it changes the geometry being analysed. One practical consequence of leaving negative eigenvalues in place is that the broken-stick test (bstick()) cannot be computed, which is why the scree plot below is drawn by hand.

The summary() now produces output that should look familiar from PCA and CA, with the Species scores present because the raw table was supplied:

summary(spe_pcoa)

Call:
capscale(formula = spe ~ 1, distance = "bray")

Partitioning of squared Bray distance:
              Inertia Proportion
Total           7.058          1
Unconstrained   7.058          1

Eigenvalues, and their contribution to the squared Bray distance

Importance of components:
                        MDS1   MDS2   MDS3    MDS4    MDS5    MDS6    MDS7
Eigenvalue            3.6953 1.0985 0.7105 0.41497 0.30456 0.19179 0.15697
Proportion Explained  0.5235 0.1556 0.1007 0.05879 0.04315 0.02717 0.02224
Cumulative Proportion 0.5235 0.6792 0.7798 0.83862 0.88177 0.90894 0.93118
                         MDS8    MDS9   MDS10    MDS11    MDS12   MDS13
Eigenvalue            0.13191 0.12943 0.08668 0.046158 0.038645 0.02746
Proportion Explained  0.01869 0.01834 0.01228 0.006539 0.005475 0.00389
Cumulative Proportion 0.94987 0.96820 0.98048 0.987023 0.992498 0.99639
                         MDS14    MDS15     MDS16     MDS17
Eigenvalue            0.013065 0.007088 0.0040395 0.0013006
Proportion Explained  0.001851 0.001004 0.0005723 0.0001843
Cumulative Proportion 0.998239 0.999243 0.9998157 1.0000000

How to Read This PCoA

The detailed plotting comes later; the result itself comes first. Figure 2 redraws the ordination with the sites coloured from source to mouth and the most distinctive species labelled.

Code
pos_eig <- spe_pcoa$CA$eig[spe_pcoa$CA$eig > 0]
pcoa_pct <- round(100 * pos_eig / sum(pos_eig), 1)

pcoa_sites <- as.data.frame(scores(spe_pcoa, display = "sites", choices = 1:2))
pcoa_sites$site <- as.integer(rownames(pcoa_sites))
pcoa_spp <- as.data.frame(scores(spe_pcoa, display = "species", choices = 1:2))
pcoa_spp$lab <- rownames(pcoa_spp)
pcoa_spp$d <- sqrt(pcoa_spp$MDS1^2 + pcoa_spp$MDS2^2)
pcoa_lab <- pcoa_spp[pcoa_spp$d > quantile(pcoa_spp$d, 0.45), ]

ggplot(pcoa_sites, aes(MDS1, MDS2)) +
  geom_hline(yintercept = 0, colour = "grey85") +
  geom_vline(xintercept = 0, colour = "grey85") +
  geom_point(aes(colour = site), size = 2) +
  scale_colour_viridis_c(name = "Site\n(1 = source)") +
  geom_point(
    data = pcoa_spp,
    aes(MDS1, MDS2),
    colour = "seagreen4",
    shape = 3,
    size = 0.8
  ) +
  geom_text_repel(
    data = pcoa_lab,
    aes(MDS1, MDS2, label = lab),
    colour = "seagreen4",
    size = 2.5,
    max.overlaps = Inf,
    segment.colour = "grey80"
  ) +
  annotate(
    "segment",
    x = -1.15,
    y = -1.5,
    xend = 1.0,
    yend = -1.5,
    arrow = arrow(length = unit(2.5, "mm")),
    colour = "firebrick"
  ) +
  annotate(
    "text",
    x = -0.1,
    y = -1.7,
    label = "upstream to downstream (MDS1)",
    colour = "firebrick",
    size = 2.7
  ) +
  labs(
    x = paste0("MDS1 (", pcoa_pct[1], "%)"),
    y = paste0("MDS2 (", pcoa_pct[2], "%)")
  ) +
  coord_equal()
Figure 2: An annotated PCoA (Bray-Curtis) of the Doubs fish data. Points are sites, coloured from the source (dark) to the mouth (yellow); green crosses are species, labelled where they sit away from the crowded centre. MDS1 orders the sites along the river, as the arrow shows.

The ecological interpretation is the gradient I have met throughout the module:

  • MDS1 has about 52% of the positive inertia, so it is the dominant pattern.
  • The sites fall along MDS1 in river order, from the source at one end to the mouth at the other. The Bray-Curtis ordination recovers the river sequence from species composition alone.
  • The two faunas separate along MDS1. The upstream end holds cool-water trout-zone species, brown trout (Satr) and minnow (Phph) among them; the downstream end holds lowland species such as bleak (Alal), roach (Ruru), and ruffe (Gyce).
  • MDS1 is therefore the upstream-to-downstream river gradient, the same gradient recovered by the PCA of the environmental data and the CA of the fish. Fitting the environmental vectors (below) confirms it: dissolved oxygen points towards the trout end, and distance from source, nitrate, and organic load towards the lowland end.

The ecological story is the point of the analysis. Two other ways to call the function, and how to recover species scores when a precomputed matrix leaves them out, are given in a note further down.

Eigenvalues and How Many Axes to Retain

I can unpack what is inside the results, and there I can see that I can access the eigenvalues as I did for PCA, and CA:

str(spe_pcoa) # not shown due to length of output

The percentage inertia explained by the first three axes is therefore:

round(sum(spe_pcoa$CA$eig[1:3]) / sum(spe_pcoa$CA$eig) * 100, 2)
[1] 77.98
# The `bstick()` function is not compatible with PCoA
# when negative eigenvalues are present
# Plot the scree plot without the broken stick model

# Extract eigenvalues
eigenvalues <- spe_pcoa$CA$eig

# Calculate the proportion of positive ordination variance explained
pos_eig <- eigenvalues[eigenvalues > 0]
variance_explained <- pos_eig / sum(pos_eig)

# Create a scree plot
plot(
  variance_explained,
  type = "b",
  main = "Scree Plot",
  xlab = "Principal Components",
  ylab = "Prop. Var. Explained"
)
Figure 3: Scree plot of the Doubs River fish species PCoA.

The scree plot (Figure 3) shows the proportion of variation explained by the PC axes. In this case, I will still only retain the first two axes. I selected these because after the 2nd PC, the proportion of variance explained by each additional PC is less than 10%, and the plot starts levelling off.

See Numerical Ecology with R (pp. 140 to 145) for information about the interpretation of a PCoA and the ordination diagrams shown below.

Ordination Diagrams

I create the ordination diagrammes as before:

opar <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
plot(spe_pcoa, scaling = 1, main = "PCoA fish abundances - biplot scaling 1")
plot(spe_pcoa, scaling = 2, main = "PCoA fish abundances - biplot scaling 2")
par(opar)
Figure 4: PCoA ordination plot of the Doubs River species data showing site scaling (left) and species scaling (right).

Scaling 1 and scaling 2 is the same as in PCA and CA.

The plots above work okay, but I can improve them. Note that you can also apply these improvements to PCA, and CA ordinations. Let me build plots from scratch:

opar <- par(no.readonly = TRUE)
par(mfrow = c(1, 2))
pl1 <- ordiplot(
  spe_pcoa,
  type = "none",
  scaling = 1,
  main = "PCoA fish abundances - biplot scaling 1"
)
points(pl1, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80")
points(pl1, "species", pch = 21, col = "turquoise", arrows = TRUE)
text(pl1, "species", col = "blue4", cex = 0.9)
text(pl1, "sites", col = "red4", cex = 0.9)

pl2 <- ordiplot(
  spe_pcoa,
  type = "none",
  scaling = 2,
  main = "PCoA fish abundances - biplot scaling 2"
)
points(pl2, "sites", pch = 21, cex = 1.75, col = "grey80", bg = "grey80")
points(pl2, "species", pch = 21, col = "turquoise", arrows = TRUE)
text(pl2, "species", col = "blue4", cex = 0.9)
text(pl2, "sites", col = "red4", cex = 0.9)

PCoA ordination plot made with ordiplot() of the Doubs River species data showing site scaling (left) and species scaling (right).

PCoA ordination plot made with ordiplot() of the Doubs River species data showing site scaling (left) and species scaling (right).
par(opar)

I can also fit response surfaces using ordisurf():

require('viridis')
palette(viridis(8))

opar <- par(no.readonly = TRUE)
par(mar = c(4, 4, 0.9, 0.5) + .1, mfrow = c(2, 2))
with(
  spe,
  tmp <- ordisurf(
    spe_pcoa ~ Satr,
    bubble = 3,
    family = quasipoisson,
    knots = 2,
    col = 6,
    display = "sites",
    main = "Salmo trutta fario"
  )
)
abline(h = 0, v = 0, lty = 3)
with(
  spe,
  tmp <- ordisurf(
    spe_pcoa ~ Scer,
    bubble = 3,
    family = quasipoisson,
    knots = 2,
    col = 6,
    display = "sites",
    main = "Scardinius erythrophthalmus"
  )
)
abline(h = 0, v = 0, lty = 3)
with(
  spe,
  tmp <- ordisurf(
    spe_pcoa ~ Teso,
    bubble = 3,
    family = quasipoisson,
    knots = 2,
    col = 6,
    display = "sites",
    main = "Telestes souffia"
  )
)
abline(h = 0, v = 0, lty = 3)
with(
  spe,
  tmp <- ordisurf(
    spe_pcoa ~ Cogo,
    bubble = 3,
    family = quasipoisson,
    knots = 2,
    col = 6,
    display = "sites",
    main = "Cottus gobio"
  )
)
abline(h = 0, v = 0, lty = 3)

env <- dplyr::slice(env, -8)

(spe_pcoa_env <- envfit(spe_pcoa, env, scaling = 2))

***VECTORS

        MDS1     MDS2     r2 Pr(>r)
dfs  0.99710 -0.07609 0.7210  0.001 ***
ele -0.99807  0.06208 0.5659  0.001 ***
slo -0.92225 -0.38660 0.1078  0.129
dis  0.99746  0.07129 0.5324  0.001 ***
pH  -0.42673  0.90438 0.0480  0.508
har  0.98804 -0.15417 0.2769  0.027 *
pho  0.45343 -0.89129 0.6912  0.002 **
nit  0.86338 -0.50456 0.6117  0.001 ***
amm  0.42719 -0.90416 0.7076  0.002 **
oxy -0.76847  0.63989 0.7639  0.001 ***
bod  0.43152 -0.90210 0.8561  0.001 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Permutation: free
Number of permutations: 999
plot(spe_pcoa_env, col = "grey40")
plot(spe_pcoa_env, p.max = 0.05, col = "red")
par(opar)
Figure 5: PCoA ordination plots with species response surfaces of the Doubs River species data emphasising four species of fish: A) Satr, B) Scer, C) Teso, and D) Cogo. D) additionally has the environmental vectors projected on the plot, with the significant vectors shown in red.

Above I gave capscale() the raw species table, which is the approach I recommend because it keeps the species scores. Two alternatives are worth knowing.

Supply a precomputed dissimilarity matrix. If I hand capscale() the Bray-Curtis matrix I built earlier, the ordination of the sites is identical, but the Species scores are missing, since the function sees only the site-by-site distances and never the species themselves:

# spe_pcoa <- cmdscale(spe_bray, k = nrow(spe) - 1, eig = TRUE)
spe_pcoa <- capscale(spe_bray ~ 1)
spe_pcoa

Call: capscale(formula = spe_bray ~ 1)

              Inertia Rank
Total          6.7621
RealTotal      7.0583
Unconstrained  7.0583   17
Imaginary     -0.2963

Inertia is squared Bray distance

Eigenvalues for unconstrained axes:
 MDS1  MDS2  MDS3  MDS4  MDS5  MDS6  MDS7  MDS8
3.695 1.098 0.710 0.415 0.305 0.192 0.157 0.132
(Showing 8 of 17 unconstrained eigenvalues)
summary(spe_pcoa)

Call:
capscale(formula = spe_bray ~ 1)

Partitioning of squared Bray distance:
              Inertia Proportion
Total           7.058          1
Unconstrained   7.058          1

Eigenvalues, and their contribution to the squared Bray distance

Importance of components:
                        MDS1   MDS2   MDS3    MDS4    MDS5    MDS6    MDS7
Eigenvalue            3.6953 1.0985 0.7105 0.41497 0.30456 0.19179 0.15697
Proportion Explained  0.5235 0.1556 0.1007 0.05879 0.04315 0.02717 0.02224
Cumulative Proportion 0.5235 0.6792 0.7798 0.83862 0.88177 0.90894 0.93118
                         MDS8    MDS9   MDS10    MDS11    MDS12   MDS13
Eigenvalue            0.13191 0.12943 0.08668 0.046158 0.038645 0.02746
Proportion Explained  0.01869 0.01834 0.01228 0.006539 0.005475 0.00389
Cumulative Proportion 0.94987 0.96820 0.98048 0.987023 0.992498 0.99639
                         MDS14    MDS15     MDS16     MDS17
Eigenvalue            0.013065 0.007088 0.0040395 0.0013006
Proportion Explained  0.001851 0.001004 0.0005723 0.0001843
Cumulative Proportion 0.998239 0.999243 0.9998157 1.0000000

Add the species back with sppscores(). When a precomputed matrix is unavoidable, e.g. a Gower matrix from another package (see Handling Mixed Variable Types, below), the species scores can be restored afterwards:

spe_pcoa <- capscale(spe_bray ~ 1)
sppscores(spe_pcoa) <- spe

summary(spe_pcoa)

Call:
capscale(formula = spe_bray ~ 1)

Partitioning of squared Bray distance:
              Inertia Proportion
Total           7.058          1
Unconstrained   7.058          1

Eigenvalues, and their contribution to the squared Bray distance

Importance of components:
                        MDS1   MDS2   MDS3    MDS4    MDS5    MDS6    MDS7
Eigenvalue            3.6953 1.0985 0.7105 0.41497 0.30456 0.19179 0.15697
Proportion Explained  0.5235 0.1556 0.1007 0.05879 0.04315 0.02717 0.02224
Cumulative Proportion 0.5235 0.6792 0.7798 0.83862 0.88177 0.90894 0.93118
                         MDS8    MDS9   MDS10    MDS11    MDS12   MDS13
Eigenvalue            0.13191 0.12943 0.08668 0.046158 0.038645 0.02746
Proportion Explained  0.01869 0.01834 0.01228 0.006539 0.005475 0.00389
Cumulative Proportion 0.94987 0.96820 0.98048 0.987023 0.992498 0.99639
                         MDS14    MDS15     MDS16     MDS17
Eigenvalue            0.013065 0.007088 0.0040395 0.0013006
Proportion Explained  0.001851 0.001004 0.0005723 0.0001843
Cumulative Proportion 0.998239 0.999243 0.9998157 1.0000000

Handling Mixed Variable Types

The simplest way to handle mixed variable types is to plot the factor variable of interest as a differently shaped or coloured symbol on the ordination diagram. I will show this with the seaweed dataset.

First, I construct an environmental dataset that contains some mixed variables by column binding a dataset of seawater temperatures, and a bioregional classification of the 58 coastal sections:

bioreg <- read.csv(
  here::here("data", "BCB743", "seaweed", "bioregions.csv"),
  header = TRUE
)
load(here::here("data", "BCB743", "seaweed", "SeaweedEnv.RData"))
E <- cbind(bioreg, env) %>%
  mutate(
    spal.prov = factor(spal.prov),
    spal.ecoreg = factor(spal.ecoreg),
    lombard = factor(lombard),
    bolton = factor(bolton)
  )
head(E)
  spal.prov spal.ecoreg lombard bolton  febMean   febMax   febMed   febX95
1       BMP          NE   NamBR    BMP 13.00117 18.72044 12.66004 16.80969
2       BMP          NE   NamBR    BMP 13.37950 18.61897 13.18389 17.07242
3       BMP          NE   NamBR    BMP 13.36163 17.86458 13.23187 16.61114
4       BMP          NE   NamBR    BMP 13.28966 17.12073 13.10284 16.12137
5       BMP          NE   NamBR    BMP 12.81128 16.37829 12.40032 15.53240
6       BMP          NE   NamBR    BMP 12.40247 15.96730 11.75096 15.21999
  febRange  augMean   augMin   augMed    augX5 augRange  annMean    annSD
1 6.070326 11.75228 9.812431 11.82838 10.12598 2.502093 12.33503 1.255298
2 5.889300 11.57731 9.739288 11.61312 10.08165 2.973370 12.38795 1.401646
3 5.431383 11.29382 9.619388 11.26842 10.01617 3.084130 12.24332 1.474712
4 5.049024 11.13296 9.567049 11.02333 10.03277 2.995822 12.15410 1.505176
5 4.977916 11.23448 9.624302 10.99935 10.17375 2.940255 11.94613 1.449530
6 5.142721 11.50199 9.757004 11.15880 10.38581 2.925087 11.83773 1.385862
   annRange    febSD     augSD    annChl    augChl   febChl
1 1.2488912 1.625917 0.7665420  2.623040 11.070480 8.884580
2 1.8021850 1.753863 0.8969112  4.903870  8.760170 8.401560
3 2.0678127 1.703917 0.9408326  3.723187  8.356506 6.718254
4 2.1567012 1.593944 0.9393490  4.165980  4.164904 3.727157
5 1.5767921 1.517366 0.9542671  8.020257  8.765154 8.786165
6 0.9004776 1.501801 0.9768441 12.882601  7.591975 9.160030
str(E)
'data.frame':   58 obs. of  22 variables:
 $ spal.prov  : Factor w/ 2 levels "AMP","BMP": 2 2 2 2 2 2 2 2 2 2 ...
 $ spal.ecoreg: Factor w/ 2 levels "ABE","NE": 2 2 2 2 2 2 2 2 2 2 ...
 $ lombard    : Factor w/ 4 levels "ABR","NamBR",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ bolton     : Factor w/ 4 levels "AMP","B-ATZ",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ febMean    : num  13 13.4 13.4 13.3 12.8 ...
 $ febMax     : num  18.7 18.6 17.9 17.1 16.4 ...
 $ febMed     : num  12.7 13.2 13.2 13.1 12.4 ...
 $ febX95     : num  16.8 17.1 16.6 16.1 15.5 ...
 $ febRange   : num  6.07 5.89 5.43 5.05 4.98 ...
 $ augMean    : num  11.8 11.6 11.3 11.1 11.2 ...
 $ augMin     : num  9.81 9.74 9.62 9.57 9.62 ...
 $ augMed     : num  11.8 11.6 11.3 11 11 ...
 $ augX5      : num  10.1 10.1 10 10 10.2 ...
 $ augRange   : num  2.5 2.97 3.08 3 2.94 ...
 $ annMean    : num  12.3 12.4 12.2 12.2 11.9 ...
 $ annSD      : num  1.26 1.4 1.47 1.51 1.45 ...
 $ annRange   : num  1.25 1.8 2.07 2.16 1.58 ...
 $ febSD      : num  1.63 1.75 1.7 1.59 1.52 ...
 $ augSD      : num  0.767 0.897 0.941 0.939 0.954 ...
 $ annChl     : num  2.62 4.9 3.72 4.17 8.02 ...
 $ augChl     : num  11.07 8.76 8.36 4.16 8.77 ...
 $ febChl     : num  8.88 8.4 6.72 3.73 8.79 ...

In Smit et al. (2017) I used forward selection and the assessment of VIF to find only the non-collinear variables, which included augMean, febRange, febSD and augSD as the most parsimonious descriptors. I will do a PCoA of the seaweed environmental data (only the subset indicated above) and colour the sites by the bioregion:

E_pcoa <- capscale(env[, c("augMean", "febRange", "febSD", "augSD")] ~ 1)

col <- c("firebrick1", "seagreen4", "blue2", "goldenrod2")
pch <- c(17, 19)

opar <- par(no.readonly = TRUE)
ordiplot(
  E_pcoa,
  type = "n",
  scaling = 1,
  xlim = c(-1.8, 2),
  ylim = c(-2.8, 1.2),
  main = "PCoA of seaweed env. data"
)
points(E_pcoa, "sites", pch = 21, cex = 1.75, col = col[E$bolton], bg = "white")
text(E_pcoa, "sites", col = col[E$bolton], cex = 0.5)
Figure 6: PCoA ordination plot of the seaweed environmental data with sites coloured by bioregion.

The arrangement of sites in the ordination diagram (Figure 6) is still only affected by augMean, febRange, febSD, and augSD and colour is used only to identify the sites by bioregion. Colour does not affect the outcome of the ordination, and yet I can already see that sites show a clear bioregional grouping. Please see the analysis of the Mayombo diatom dataset for additional approaches to deal with categorical variables that are presumed to be influential. To formally account for bioregion in the ordination, I must take a different approach.

The second option is to use a suitable dissimilarity metric. Numerical, nominal, ordinal, and binary variables can all be accommodated with the Gower distance. I do not use vegan for this, but rather the daisy() function in cluster.

I calculate the Gower distances and proceed with the PCoA as before:

library(cluster)

# cannot use mixed var  types
# E_gower <- vegdist(E, method = "gower")

# can handle mixed var types... use instead of vegdist() gower dissimilarity
E_gower <- daisy(E, metric = "gower")

summary(E_gower)
1653 dissimilarities, summarized :
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.006058 0.181880 0.344160 0.321890 0.443730 0.724140
Metric :  mixed ;  Types = N, N, N, N, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I
Number of objects : 58
E_mat <- as.matrix(E_gower)
E_mat[1:5, 1:5]
           1          2          3          4          5
1 0.00000000 0.03893923 0.05586573 0.09502716 0.06723042
2 0.03893923 0.00000000 0.02753347 0.06586077 0.04354363
3 0.05586573 0.02753347 0.00000000 0.04284923 0.04719106
4 0.09502716 0.06586077 0.04284923 0.00000000 0.06826655
5 0.06723042 0.04354363 0.04719106 0.06826655 0.00000000
E_pcoa <- capscale(E_mat ~ 1)

# sadly this means that the names in the Species scores are now missing
summary(E_pcoa)

Call:
capscale(formula = E_mat ~ 1)

Partitioning of squared Unknown distance:
              Inertia Proportion
Total            3.93          1
Unconstrained    3.93          1

Eigenvalues, and their contribution to the squared Unknown distance

Importance of components:
                        MDS1   MDS2    MDS3    MDS4    MDS5    MDS6    MDS7
Eigenvalue            2.2311 1.0530 0.16179 0.11635 0.08678 0.05990 0.05202
Proportion Explained  0.5678 0.2680 0.04117 0.02961 0.02208 0.01524 0.01324
Cumulative Proportion 0.5678 0.8358 0.87693 0.90654 0.92863 0.94387 0.95711
                         MDS8     MDS9    MDS10    MDS11    MDS12    MDS13
Eigenvalue            0.04274 0.023523 0.019153 0.016634 0.013035 0.010275
Proportion Explained  0.01088 0.005986 0.004874 0.004233 0.003317 0.002615
Cumulative Proportion 0.96799 0.973971 0.978845 0.983079 0.986396 0.989011
                         MDS14    MDS15    MDS16    MDS17    MDS18     MDS19
Eigenvalue            0.008646 0.006938 0.005866 0.005202 0.004133 0.0033437
Proportion Explained  0.002200 0.001766 0.001493 0.001324 0.001052 0.0008509
Cumulative Proportion 0.991211 0.992976 0.994469 0.995793 0.996845 0.9976959
                          MDS20     MDS21     MDS22    MDS23     MDS24
Eigenvalue            0.0029563 0.0020816 0.0012713 0.001041 0.0007704
Proportion Explained  0.0007523 0.0005297 0.0003235 0.000265 0.0001961
Cumulative Proportion 0.9984482 0.9989779 0.9993015 0.999567 0.9997626
                          MDS25     MDS26     MDS27
Eigenvalue            3.718e-04 2.940e-04 2.671e-04
Proportion Explained  9.463e-05 7.482e-05 6.798e-05
Cumulative Proportion 9.999e-01 9.999e-01 1.000e+00

I can extract the various kinds of scores for manual plotting.

Three Methods, Three Geometries

PCA, CA, and PCoA all reduce dimensionality, and they differ in the geometry they preserve. PCA preserves Euclidean distances, CA preserves \(\chi^2\) distances, and PCoA preserves whichever dissimilarity the analyst supplies. Because a Euclidean dissimilarity reduces PCoA to PCA, PCoA is the general method and the other two are particular choices of distance. This freedom to choose the distance on ecological grounds is what makes PCoA one of the most widely used ordination methods in community ecology, and it is the foundation for the distance-based methods that follow. The next chapter takes up non-metric Multidimensional Scaling (nMDS), which starts from the same kind of dissimilarity matrix but abandons the eigenvalue machinery in favour of preserving the rank order of the distances.

References

Reuse

Citation

BibTeX citation:
@online{smit2026,
  author = {Smit, A. J.},
  title = {10: {Principal} {Coordinate} {Analysis} {(PCoA)}},
  date = {2026-06-15},
  url = {https://tangledbank.netlify.app/BCB743/PCoA.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit AJ (2026) 10: Principal Coordinate Analysis (PCoA). https://tangledbank.netlify.app/BCB743/PCoA.html.