20. Generalised Additive Models

Flexible Smooths for Complex Relationships

Author

A. J. Smit

Published

2026/03/19

1 Introduction

Generalised additive models, or GAMs, use a sum of smooth functions to capture complex nonlinear relationships without requiring a single fixed parametric curve. They are especially useful when the relationship is obviously curved, but the biology does not suggest a simple mechanistic response such as Michaelis-Menten uptake or logistic growth.

Unlike polynomial regressions or specific nonlinear models, GAMs do not usually provide parameters that correspond directly to the mechanics of the system. Their strength lies in flexibility and robustness rather than in mechanistic interpretability.

2 Key Concepts

A GAM models the response as an additive combination of smooth functions.
Smooths are typically represented with splines.
GAMs are data-driven rather than mechanistically specified.
They are useful when curvature is clear but its exact form is unknown.
Interpretation shifts from coefficients to the shapes of fitted smooth terms.

3 When This Method Is Appropriate

You should consider a GAM when:

the relationship is clearly nonlinear;
a straight line and a low-order polynomial are both inadequate;
the underlying functional form is not known in advance;
you want a flexible fit but do not want to impose a specific mechanistic model.

This chapter therefore sits between 19-generalised-linear-models.qmd and 21-nonlinear-regression.qmd. It handles flexible nonlinear mean structures when neither the ordinary linear model nor a dedicated nonlinear function is quite right.

4 Nature of the Data and Assumptions

In a GAM, the smooth functions are usually represented with regression splines. A general form is:

\[Y_i = \alpha + f_1(X_{i1}) + f_2(X_{i2}) + \ldots + f_p(X_{ip}) + \epsilon_i\]

where each $f_j$ is a smooth function estimated from the data.

In introductory Gaussian GAMs, the usual concerns about independence, residual spread, and approximate normality still apply. The difference is that the mean structure is now smooth and data-driven rather than fixed as a straight line or polynomial.

5 R Functions

The main R function is mgcv::gam():

mgcv::gam(y ~ s(x), data = df)
mgcv::gam(y ~ s(x1) + s(x2), data = df)
mgcv::gam(y ~ s(x) + factor_var, data = df)

The s() term tells gam() to fit a smooth function to the predictor rather than a straight-line coefficient.

6 Worked Example Placeholder

GAMs utilise a sum of smooth functions, each of which may depend on different predictors. This additive structure allows the model to capture complex nonlinear responses without requiring a single global parametric curve.

At present, this chapter is a placeholder within the Tangled Bank sequence. The fuller worked example still needs to be developed, but the teaching logic is already set:

fit a straight-line model and inspect residual curvature;
compare a polynomial alternative if appropriate;
fit a GAM with one or more smooth terms;
inspect the fitted smooths and overall diagnostics;
interpret the result as a flexible description of the response shape.

7 Practical Caution

The main strength of GAMs is flexibility, but that is also their main risk. If the smoothness is not controlled well, the model can begin to follow noise rather than structure. GAMs are therefore best used when there is a clear need for flexibility and when the fitted smooths are interpreted with biological caution.

8 Summary

GAMs are flexible smooth regression models.
They are useful when the response is curved but no simple mechanistic form is known.
They are less interpretable mechanistically than dedicated nonlinear models.
They belong in the toolkit precisely because many biological relationships are complex but not theoretically fixed.

The next chapter turns to the dedicated nonlinear models that are most useful when the biological process itself suggests the shape of the curve.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2026,
  author = {Smit, A. J., and J. Smit, A.},
  title = {20. {Generalised} {Additive} {Models}},
  date = {2026-03-19},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/20-generalised-additive-models.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., J. Smit A (2026) 20. Generalised Additive Models. http://tangledbank.netlify.app/BCB744/basic_stats/20-generalised-additive-models.html.

--- title: "20. Generalised Additive Models" subtitle: "Flexible Smooths for Complex Relationships" author: "A. J. Smit" date: last-modified date-format: "YYYY/MM/DD" reference-location: margin --- ```{r code-brewing-opts} #| echo: false knitr::opts_chunk$set( comment = "R>", warning = FALSE, message = FALSE, fig.width = 6.5, fig.height = 4.5, out.width = "88%", fig.asp = NULL, fig.align = "center", fig.retina = 2, dpi = 300 ) ``` # Introduction Generalised additive models, or GAMs, use a sum of smooth functions to capture complex nonlinear relationships without requiring a single fixed parametric curve. They are especially useful when the relationship is obviously curved, but the biology does not suggest a simple mechanistic response such as Michaelis-Menten uptake or logistic growth. Unlike polynomial regressions or specific nonlinear models, GAMs do not usually provide parameters that correspond directly to the mechanics of the system. Their strength lies in flexibility and robustness rather than in mechanistic interpretability. # Key Concepts - **A GAM models the response as an additive combination of smooth functions.** - **Smooths are typically represented with splines.** - **GAMs are data-driven rather than mechanistically specified.** - **They are useful when curvature is clear but its exact form is unknown.** - **Interpretation shifts from coefficients to the shapes of fitted smooth terms.** # When This Method Is Appropriate You should consider a GAM when: - the relationship is clearly nonlinear; - a straight line and a low-order polynomial are both inadequate; - the underlying functional form is not known in advance; - you want a flexible fit but do not want to impose a specific mechanistic model. This chapter therefore sits between [19-generalised-linear-models.qmd](19-generalised-linear-models.qmd) and [21-nonlinear-regression.qmd](21-nonlinear-regression.qmd). It handles flexible nonlinear mean structures when neither the ordinary linear model nor a dedicated nonlinear function is quite right. # Nature of the Data and Assumptions In a GAM, the smooth functions are usually represented with regression splines. A general form is: $$Y_i = \alpha + f_1(X_{i1}) + f_2(X_{i2}) + \ldots + f_p(X_{ip}) + \epsilon_i$$ where each $f_j$ is a smooth function estimated from the data. In introductory Gaussian GAMs, the usual concerns about independence, residual spread, and approximate normality still apply. The difference is that the mean structure is now smooth and data-driven rather than fixed as a straight line or polynomial. # R Functions The main R function is `mgcv::gam()`: ```{r} #| eval: false mgcv::gam(y ~ s(x), data = df) mgcv::gam(y ~ s(x1) + s(x2), data = df) mgcv::gam(y ~ s(x) + factor_var, data = df) ``` The `s()` term tells `gam()` to fit a smooth function to the predictor rather than a straight-line coefficient. # Worked Example Placeholder GAMs utilise a sum of smooth functions, each of which may depend on different predictors. This additive structure allows the model to capture complex nonlinear responses without requiring a single global parametric curve. At present, this chapter is a placeholder within the Tangled Bank sequence. The fuller worked example still needs to be developed, but the teaching logic is already set: 1. fit a straight-line model and inspect residual curvature; 2. compare a polynomial alternative if appropriate; 3. fit a GAM with one or more smooth terms; 4. inspect the fitted smooths and overall diagnostics; 5. interpret the result as a flexible description of the response shape. # Practical Caution The main strength of GAMs is flexibility, but that is also their main risk. If the smoothness is not controlled well, the model can begin to follow noise rather than structure. GAMs are therefore best used when there is a clear need for flexibility and when the fitted smooths are interpreted with biological caution. # Summary - GAMs are flexible smooth regression models. - They are useful when the response is curved but no simple mechanistic form is known. - They are less interpretable mechanistically than dedicated nonlinear models. - They belong in the toolkit precisely because many biological relationships are complex but not theoretically fixed. The next chapter turns to the dedicated nonlinear models that are most useful when the biological process itself suggests the shape of the curve.