21. Generalised Additive Models
Flexible Smooths for Complex Relationships
- what a GAM is and why smooth terms are useful in ecology;
- how GAMs differ from polynomial and mechanistic nonlinear models;
- how to fit and inspect a Gaussian GAM with
mgcv::gam(); - how to interpret smooth terms, effective degrees of freedom, and model output;
- how to report a GAM in journal style without overclaiming mechanism.
- None
1 Introduction
Generalised additive models (GAMs) are often the most practical choice when biological relationships are clearly nonlinear, but the exact shape is unknown. Instead of forcing one global equation, GAMs estimate smooth functions directly from the data while retaining a clear regression structure.
This makes GAMs especially useful in ecology, where responses to environmental gradients are often curved, seasonal, and multi-scale. They are more flexible than low-order polynomials, but less explicitly mechanistic than a dedicated nonlinear process model.
2 Key Concepts
-
A GAM replaces straight-line terms with smooth functions such as
s(x). - Smoothness is penalised to avoid overfitting; wiggliness is not free.
- Effective degrees of freedom (edf) quantify the complexity of each smooth.
- Inference shifts from individual coefficients to smooth-term significance and shape.
- Interpretation remains ecological: the smooth tells you how the response changes across the predictor gradient.
3 When This Method Is Appropriate
Use a GAM when:
- the response-predictor relationship is nonlinear and not well captured by linear or quadratic forms;
- you do not have a single mechanistic process equation to impose;
- you need flexible trend estimation (for example seasonal or long-term environmental patterns);
- the sample size is sufficient to estimate smooths responsibly.
4 Nature of the Data and Assumptions
For Gaussian GAMs, assumptions are conceptually familiar:
- independent observations;
- approximately normal residuals;
- reasonably constant residual variance;
- appropriate smooth complexity (not too rigid, not too wiggly).
5 The Core Equations
For a Gaussian response, a GAM can be written as:
\[Y_i = \alpha + f_1(X_{i1}) + f_2(X_{i2}) + \cdots + f_p(X_{ip}) + \epsilon_i \tag{1}\]
In Equation 1, each \(f_j\) is a smooth function estimated from the data rather than a straight-line coefficient multiplying the predictor directly. This is the key structural change relative to an ordinary linear model.
For introductory purposes, the important idea is that a GAM still has an additive regression structure. What changes is that the effect of a predictor is allowed to bend smoothly instead of being forced into a straight line or a fixed low-order polynomial.
6 R Functions
The standard implementation is mgcv::gam().
method = "REML" is a widely-used default for smoothness selection.
7 Example 1: Sea-Temperature Structure Through Time
7.1 Example Dataset
We use monthly sea surface temperature records from data/BCB744/SACTN_day_1.csv for Port Nolloth on the South African west coast. This is a realistic ecological time series where both long-term structure and within-year seasonality may matter.
temp_raw <- read_csv(file.path("..", "..", "data", "BCB744", "SACTN_day_1.csv"), show_col_types = FALSE)
temp_df <- temp_raw |>
filter(site == "Port Nolloth", !is.na(temp)) |>
mutate(
date = as.Date(date),
year = as.numeric(format(date, "%Y")),
month = as.numeric(format(date, "%m")),
t_index = as.numeric(date - min(date)) / 365.25
)
gt(head(temp_df, 10) |> select(site, date, temp, year, month))| site | date | temp | year | month |
|---|---|---|---|---|
| Port Nolloth | 1973-07-01 | 11.722 | 1973 | 7 |
| Port Nolloth | 1973-08-01 | 11.534 | 1973 | 8 |
| Port Nolloth | 1973-09-01 | 10.879 | 1973 | 9 |
| Port Nolloth | 1973-10-01 | 11.786 | 1973 | 10 |
| Port Nolloth | 1973-11-01 | 12.308 | 1973 | 11 |
| Port Nolloth | 1973-12-01 | 12.340 | 1973 | 12 |
| Port Nolloth | 1974-01-01 | 11.538 | 1974 | 1 |
| Port Nolloth | 1974-02-01 | 12.105 | 1974 | 2 |
| Port Nolloth | 1974-03-01 | 11.971 | 1974 | 3 |
| Port Nolloth | 1974-04-01 | 12.462 | 1974 | 4 |
7.2 Do an Exploratory Data Analysis (EDA)
R> # A tibble: 1 × 5
R> n start end mean_temp sd_temp
R> <int> <date> <date> <dbl> <dbl>
R> 1 510 1973-07-01 2016-08-01 12.5 0.991
The data show clear nonlinearity through time and a strong seasonal cycle. A straight-line model in time is therefore likely to be inadequate.
7.3 State the Model Question and Hypotheses
Can sea temperature at Port Nolloth be explained by a flexible long-term trend plus a seasonal smooth cycle?
For smooth terms in GAMs, hypotheses are usually phrased as:
\[H_{0}: f_j(X) = 0\] \[H_{a}: f_j(X) \ne 0\]
for each smooth term \(f_j\). In practice, we inspect smooth-term significance and shape together.
7.4 Fit the Model
We fit a baseline linear model and then a GAM with:
- a smooth long-term trend in continuous time (
s(t_index)), and - a cyclic smooth for month (
s(month, bs = "cc")) so December and January join naturally.
R>
R> Call:
R> lm(formula = temp ~ t_index + factor(month), data = temp_df)
R>
R> Residuals:
R> Min 1Q Median 3Q Max
R> -1.7770 -0.5217 -0.0540 0.4275 3.3436
R>
R> Coefficients:
R> Estimate Std. Error t value Pr(>|t|)
R> (Intercept) 12.370393 0.131398 94.144 < 2e-16 ***
R> t_index 0.028353 0.002738 10.354 < 2e-16 ***
R> factor(month)2 0.342733 0.166124 2.063 0.039621 *
R> factor(month)3 0.029053 0.167118 0.174 0.862058
R> factor(month)4 -0.170806 0.167121 -1.022 0.307254
R> factor(month)5 -0.415614 0.166126 -2.502 0.012677 *
R> factor(month)6 -0.722765 0.166128 -4.351 1.65e-05 ***
R> factor(month)7 -1.117334 0.166129 -6.726 4.82e-11 ***
R> factor(month)8 -1.206711 0.165178 -7.306 1.11e-12 ***
R> factor(month)9 -1.259495 0.167114 -7.537 2.30e-13 ***
R> factor(month)10 -1.020191 0.168138 -6.068 2.58e-09 ***
R> factor(month)11 -0.617730 0.167111 -3.697 0.000243 ***
R> factor(month)12 -0.133916 0.167111 -0.801 0.423306
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R>
R> Residual standard error: 0.7703 on 497 degrees of freedom
R> Multiple R-squared: 0.4098, Adjusted R-squared: 0.3955
R> F-statistic: 28.76 on 12 and 497 DF, p-value: < 2.2e-16
R>
R> Family: gaussian
R> Link function: identity
R>
R> Formula:
R> temp ~ s(t_index, k = 20) + s(month, bs = "cc", k = 12)
R>
R> Parametric coefficients:
R> Estimate Std. Error t value Pr(>|t|)
R> (Intercept) 12.45935 0.02862 435.3 <2e-16 ***
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R>
R> Approximate significance of smooth terms:
R> edf Ref.df F p-value
R> s(t_index) 15.88 17.85 20.14 <2e-16 ***
R> s(month) 5.61 10.00 32.98 <2e-16 ***
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R>
R> R-sq.(adj) = 0.574 Deviance explained = 59.2%
R> -REML = 542.09 Scale est. = 0.41773 n = 510
R> df AIC
R> mod_lm 14.00000 1195.935
R> mod_gam 24.37806 1027.884
7.5 Test Assumptions / Check Diagnostics
R>
R> Method: REML Optimizer: outer newton
R> full convergence after 7 iterations.
R> Gradient range [-3.460087e-09,3.406164e-12]
R> (score 542.0866 & scale 0.4177274).
R> Hessian positive definite, eigenvalue range [2.683803,254.2529].
R> Model rank = 30 / 30
R>
R> Basis dimension (k) checking results. Low p-value (k-index<1) may
R> indicate that k is too low, especially if edf is close to k'.
R>
R> k' edf k-index p-value
R> s(t_index) 19.00 15.88 0.63 <2e-16 ***
R> s(month) 10.00 5.61 1.06 0.89
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R> para s(t_index) s(month)
R> worst 3.927862e-24 0.0057595704 0.005759570
R> observed 3.927862e-24 0.0016454850 0.001803124
R> estimate 3.927862e-24 0.0007068493 0.001149347
gam.check() helps assess residual behaviour and whether the chosen basis dimensions (k) are adequate.
7.6 Interpret the Results
The smooth for t_index captures gradual long-term structure that a single linear slope cannot represent. The cyclic month smooth captures recurring within-year seasonality without forcing identical month effects in each year.
The effective degrees of freedom (edf) indicate complexity: edf values close to 1 imply near-linearity; higher values indicate more curvature.
7.7 Reporting
Methods
Monthly sea surface temperature observations for Port Nolloth were analysed using a Gaussian GAM fitted with mgcv::gam() in R. Temperature was modelled as a function of a smooth long-term time index (s(t_index)) and a cyclic seasonal smooth for month (s(month, bs = "cc")), with smoothness selected by REML. A linear model with month as a factor and a linear time term was used as a baseline comparator.
Results
The GAM provided a better fit than the baseline linear model (lower AIC), indicating that nonlinear structure was important. The long-term smooth term was non-zero and captured gradual multi-year variability, while the cyclic month smooth described strong seasonal temperature cycling. Together, these terms reproduced the major temporal structure in observed temperatures without requiring a fixed parametric curve.
Discussion
For this ecological time series, a GAM was appropriate because both long-term and seasonal effects were clearly nonlinear. The model is best interpreted as a flexible description of temporal structure, not as a mechanistic oceanographic process model. Where mechanism is the primary goal, process-based nonlinear models should be considered alongside GAMs.
8 What to Do When Assumptions Fail / Alternatives
- If residuals show strong autocorrelation, move to GAMM frameworks (e.g., correlation structures or random effects).
- If response variance is clearly non-Gaussian, use an appropriate family (e.g., Poisson, negative binomial, binomial).
- If smooths are implausibly wiggly, reduce basis size (
k) and inspect diagnostics carefully.
9 Summary
- GAMs are additive smooth regression models that handle complex nonlinear ecological relationships.
- They are often superior to low-order polynomials when shape is unknown.
- Interpretation focuses on smooth shapes and ecological plausibility rather than mechanistic parameters.
- Diagnostics and smoothness control are essential for responsible use.
Reuse
Citation
@online{smit2026,
author = {Smit, A. J.},
title = {21. {Generalised} {Additive} {Models}},
date = {2026-03-22},
url = {https://tangledbank.netlify.app/BCB744/basic_stats/21-generalised-additive-models.html},
langid = {en}
}
