# install.packages("remotes")
remotes::install_github("ajsmit/spesim")
library(spesim)
# Load a complete example configuration, then adjust a few fields
P <- load_config(system.file(
"examples/spesim_init_basic.txt",
package = "spesim"
))
P$N_SPECIES <- 10
P$N_INDIVIDUALS <- 2000
P$SAMPLING_SCHEME <- "random"
P$N_QUADRATS <- 20
# Run the full workflow; keep it in memory for a quick look
res <- spesim_run(P, write_outputs = FALSE, seed = P$SEED)
# A publication-ready map of domain, individuals, and quadrats
plot_spatial_sampling(res$domain, res$species_dist, res$quadrats, res$P)
# The objects you would normally analyse
str(res$abund_matrix) # site x species abundance matrix
head(res$site_coords) # quadrat centroids
head(res$env_gradients) # gridded environmental fieldsspesim
| Package |
spesim, Spatial Sampling Simulation for Heterogeneous Ecological Communities |
|
| Author | AJ Smit | |
| Source | github.com/ajsmit/spesim | |
| Documentation | ajsmit.github.io/spesim | |
| Install | remotes::install_github("ajsmit/spesim") |
The code on this page is illustrative and is not run when the site is built, because spesim is installed from GitHub rather than CRAN. Copy the snippets into your own session after installing the package.
What spesim is
spesim simulates ecological communities in space and then samples them. You set the number of species, their relative abundances, the position of individuals in a landscape, the species responses to environmental gradients, and the quadrat design used to sample them. The package then returns the kinds of objects used in community ecology, namely a site-by-species abundance matrix, a table of per-quadrat environmental conditions, and a set of maps and diagnostic plots.
In field studies the generating process is hidden. You measure a community, fit an ordination or a model, and infer the gradients or processes that may have produced the pattern. A simulator reverses that relationship. Because you set the gradient, the species optima, the degree of clustering, and the sampling design, you can ask a question that observational data rarely answer directly, namely did the method recover the structure I imposed?
For BCB743, the package has three main uses, in this order of emphasis, namely teaching, methods testing, and exploratory research.
Why a simulator belongs in BCB743
BCB743 is mainly concerned with inference from community data, including correlation and association, distance and dissimilarity, ordination, clustering, and regression-type models. Each method builds on assumptions about how the data were generated, and each can mislead when those assumptions fail. For example, the horseshoe in principal component analysis (PCA), the arch in correspondence analysis (CA), the gradient-length rule used to choose between linear and unimodal ordination, and the effect of joint absences on Euclidean distances are examples. They show how a method behaves when it meets a particular ecological structure (Whittaker 1967; Legendre and Legendre 2012).
spesim lets you specify that structure directly. Set a single long environmental gradient with unimodal species responses, sample it, and run PCA. The horseshoe is then produced from data whose generating conditions you specified. Shorten the gradient and the curve weakens; lengthen it and the arch becomes more pronounced. So, the method’s behaviour becomes easier to interpret because the generating process is known.
The same approach also supports the practical question that the Integrative Assignment and later research projects often raise, namely which sampling design, and how much sampling effort, is enough? By placing quadrats at random, along transects, on a systematic grid, or by Voronoi tessellation over the same simulated community, you can compare what each design recovers before committing time and money to a field season.
How it works
A single high-level function, spesim_run(), runs the full workflow, and lower-level functions are available when you want to assemble the pieces yourself. A run goes through five stages.
- A domain defines the landscape as an
sfpolygon, either the built-in synthetic shape or a real coastline or river network. - Environmental gradients define gridded synthetic fields, by default temperature, elevation, and rainfall, each normalised to the interval \([0, 1]\) and reported back in familiar units.
- A community of individuals is placed as points with species labels. Their relative abundances are drawn from a species-abundance distribution (SAD) of your choosing, and their positions reflect both their response to the gradients and any spatial structure you impose.
- A sampling design places quadrats by one of several schemes.
- The derived data are returned, including the site-by-species matrix, the per-quadrat environment, and diagnostic summaries.
Three of the configurable layers are directly linked to the BCB743 material.
The species-abundance distribution controls how common species are relative to one another. spesim offers Fisher’s log-series, the default, together with geometric, broken-stick, Zipf, Zipf-Mandelbrot, lognormal, and Poisson-mixture options. It also includes a neutral-theory zero-sum multinomial (ZSM) sampler. These distributions are central to rank-abundance curves and comparisons of evenness, dominance, commonness, and rarity (Fisher et al. 1943; Preston 1948; Whittaker 1972; Hubbell 2011).
Environmental filtering assigns each species an optimum, namely the position on a gradient where performance is highest, and a tolerance, namely the breadth of its response. In spesim this is implemented as a Gaussian response on the normalised gradient. This is the unimodal species-response model that underpins CA and the gradient-length reasoning of detrended correspondence analysis (DCA). Here you can set the response directly and then test whether the ordination recovers it.
Spatial structure is added through standard point-process options, including complete spatial randomness as a baseline, clustering through a Thomas process, and inhibition or repulsion through Strauss and Geyer processes. Optional directed neighbour effects allow one species to suppress or facilitate another within a fixed interaction radius. These settings generate spatial autocorrelation and biotic interaction as deliberate, interpretable signals.
Relevance to BCB743
The table maps spesim’s capabilities to the places in the module where they are most useful. The pairing is deliberate because the package was written alongside this material and includes vignettes that reproduce two major BCB743 datasets, namely the Doubs river network and the South African seaweed coastline.
| spesim capability | Where it helps in BCB743 |
|---|---|
| Rank–abundance / SAD models (log-series, lognormal, geometric, broken-stick, ZSM) | Review: biodiversity concepts; rank–abundance and evenness |
| Known environmental gradient + Gaussian species responses | 7 Intro to ordination; 9a CA; 9b DCA, test gradient recovery and the gradient-length rule |
| Long vs short gradients (horseshoe / arch on demand) | 8a PCA; 9a CA, produce and interpret the horseshoe and arch from a known generating process |
| Distance–decay of community similarity | 6 Distance & dissimilarity metrics; 16 Deep dive into gradients |
| β-diversity partitioning over a controlled gradient | 13a db-RDA; 13a revised: β-diversity partition |
| Constrained ordination on simulated data (db-RDA vignette) | 13a db-RDA, test whether constraints recover the imposed filtering |
| Clustering vs continuous gradients | 14 Cluster analysis, test when clustering imposes groups on a continuous gradient |
| Point processes, spatial autocorrelation, neighbour effects | 16 Deep dive into gradients; spatial-structure diagnostics |
| Species–area and rarefaction curves | Review: quantifying biodiversity; sampling-effort reasoning |
| Quadrat schemes (random, systematic, tiled, transect, Voronoi) | Integrative Assignment, sampling-design choice and effort |
| Real-world constrained landscapes (Doubs network, seaweed coastline) | 13b Seaweeds example; the Datasets the module already uses |
The recurring theme across these rows is validation against a known generating process. When you run PCA, CA, principal coordinates analysis (PCoA), or non-metric multidimensional scaling (nMDS) on a spesim community, you can compare the ordination with the gradient you specified. When you run a db-RDA, you can check whether the constrained axes recover the filtering you imposed. When you cluster a community that varies continuously, you can see how readily a clustering algorithm imposes discrete groups on a smooth gradient. This habit, testing a method on data whose structure you control before trusting it on data whose structure you do not, is one of the most useful practices in the module.
What spesim is not
The software has limitations and there are some promises I never made. spesim is a single-time-point synthetic generator, not a mechanistic ecosystem model. It does not simulate demography, temporal dynamics, succession, observation error, or detectability, and it does not perform likelihood-based inference of point-process parameters. The neutral and hybrid model options include an individual recruitment step with a dispersal kernel, which brings them closer to process-based models, but they still generate plausible snapshots rather than calibrated dynamic simulations. Treat the output as a controlled teaching and testing instrument, namely a community whose generating process you specified, not as a prediction for a particular real system.
Getting started
Install from GitHub, then run a minimal in-memory simulation that writes nothing to disk:
For larger or reproducible setups, declare the settings in an init file and point spesim_run() at it; the file records all parameters in one place. The package can also write the abundance matrix, environment table, maps, an advanced diagnostic panel, namely rank-abundance, occupancy-abundance, species-area, distance-decay, and rarefaction summaries, and a plain-language report explaining what each run did.
Further reading
The package ships an extensive set of vignettes; those most useful for BCB743 are:
- What is spesim? What is it not?, start here to set expectations.
- Start here and the basic → advanced workflow.
- Simulating environmental gradients.
- Quadrat placement schemes.
- Point processes: options and intuition.
- Constrained ordination (db-RDA) on simulated data.
- Real-world constrained landscapes: Doubs and seaweed.
References
Reuse
Citation
@online{smit2026,
author = {Smit, A. J.},
title = {Spesim},
date = {2026-06-15},
url = {https://tangledbank.netlify.app/BCB743/spesim.html},
langid = {en}
}
