19. Generalised Linear Models

Extending Regression Beyond Normal Responses

Author

A. J. Smit

Published

2026/03/19

1 Introduction

The linear model assumes a normally distributed response with constant variance. Many biological data do not have that structure. Counts, proportions, presences and absences, and success-failure outcomes require a broader framework. Generalised linear models (GLMs) provide that framework while retaining the central logic of regression.

A GLM combines three parts:

a response distribution;
a linear predictor;
a link function that connects the linear predictor to the expected response scale.

The important point is that GLMs do not replace the regression logic you have already learned. They extend it to different kinds of response variables.

2 Key Concepts

GLMs extend regression to non-normal response variables.
The family should match the data-generating structure, such as binomial data for binary outcomes or Poisson data for counts.
The link function connects the linear predictor to the response scale.
Interpretation still depends on the biological question, not only on the model family.
Overdispersion is a warning sign that the simplest GLM may be inadequate.

3 When This Method Is Appropriate

You should consider a GLM when:

the response is binary, such as alive/dead or present/absent;
the response is a proportion based on counts of successes and failures;
the response is a count and cannot sensibly be modelled with a normal distribution;
the variance changes with the mean in a way that a standard linear model does not handle well.

This chapter therefore picks up directly from the earlier proportion-testing work and places it in the broader modelling framework where binary and count responses can be analysed with predictors.

4 Common GLM Families

4.1 Logistic regression

Use a binomial GLM when the response is binary or a proportion.

Examples:

infection status as a function of temperature and host size;
settlement success as a function of habitat and season;
survival as a function of treatment.

4.2 Poisson regression

Use a Poisson GLM when the response is a count.

Examples:

number of individuals per quadrat;
number of flowers per plant;
number of parasite eggs per host.

4.3 Overdispersed count models

If the count variance is much larger than the mean, a simple Poisson model may be too restrictive. In such cases, a negative binomial model or another overdispersion-aware model is often more appropriate.

5 R Functions

The main function is glm():

glm(response ~ predictors, family = ..., data = df)

A logistic regression might look like:

glm(cbind(successes, failures) ~ temperature + treatment,
    family = binomial,
    data = df)

or, for a binary response:

glm(presence ~ salinity + habitat,
    family = binomial,
    data = df)

A count model might look like:

glm(count ~ reef + depth,
    family = poisson,
    data = df)

6 A Practical Workflow

The teaching workflow remains familiar:

identify the biological question;
identify the response structure;
choose a family that matches the response;
fit the model;
inspect model adequacy, especially for overdispersion and fit;
interpret coefficients on the correct scale.

The new difficulty is that the coefficients are often estimated on the link scale, so interpretation requires more care than in an ordinary linear model.

7 Interpretation and Reporting

In GLMs, interpretation often needs to distinguish between:

the coefficient on the link scale;
the effect on the response scale;
the biological meaning of that effect.

For example, in logistic regression a positive coefficient means the log-odds of success increase with the predictor. In practice, it is often clearer to translate that into statements about the probability of success increasing or decreasing rather than reporting only the raw coefficient.

8 Common Mistakes

Common mistakes include:

choosing a family because it is familiar rather than because it matches the response;
fitting Poisson models to strongly overdispersed counts;
interpreting link-scale coefficients as though they were ordinary linear slopes;
forgetting that diagnostics still matter in GLMs.

9 Summary

GLMs extend regression to binary, proportional, and count responses.
The model family should match the response structure.
Logistic and Poisson models are the most common introductory GLMs.
Overdispersion is an important practical warning sign.
The regression logic is unchanged, but interpretation becomes more careful because of the link function.

This chapter broadens the modelling family. The next chapter turns from non-normal responses to relationships that are not well described by straight lines.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2026,
  author = {Smit, A. J., and J. Smit, A.},
  title = {19. {Generalised} {Linear} {Models}},
  date = {2026-03-19},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/19-generalised-linear-models.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., J. Smit A (2026) 19. Generalised Linear Models. http://tangledbank.netlify.app/BCB744/basic_stats/19-generalised-linear-models.html.

--- title: "19. Generalised Linear Models" subtitle: "Extending Regression Beyond Normal Responses" author: "A. J. Smit" date: last-modified date-format: "YYYY/MM/DD" reference-location: margin --- ```{r code-brewing-opts, echo=FALSE} knitr::opts_chunk$set( comment = "R>", warning = FALSE, message = FALSE, fig.width = 6.5, fig.height = 4.5, out.width = "88%", fig.asp = NULL, fig.align = "center", fig.retina = 2, dpi = 300 ) ``` # Introduction The linear model assumes a normally distributed response with constant variance. Many biological data do not have that structure. Counts, proportions, presences and absences, and success-failure outcomes require a broader framework. Generalised linear models (GLMs) provide that framework while retaining the central logic of regression. A GLM combines three parts: 1. a response distribution; 2. a linear predictor; 3. a link function that connects the linear predictor to the expected response scale. The important point is that GLMs do not replace the regression logic you have already learned. They extend it to different kinds of response variables. # Key Concepts - **GLMs** extend regression to non-normal response variables. - **The family** should match the data-generating structure, such as binomial data for binary outcomes or Poisson data for counts. - **The link function** connects the linear predictor to the response scale. - **Interpretation still depends on the biological question**, not only on the model family. - **Overdispersion** is a warning sign that the simplest GLM may be inadequate. # When This Method Is Appropriate You should consider a GLM when: - the response is binary, such as alive/dead or present/absent; - the response is a proportion based on counts of successes and failures; - the response is a count and cannot sensibly be modelled with a normal distribution; - the variance changes with the mean in a way that a standard linear model does not handle well. This chapter therefore picks up directly from the earlier proportion-testing work and places it in the broader modelling framework where binary and count responses can be analysed with predictors. # Common GLM Families ## Logistic regression Use a **binomial GLM** when the response is binary or a proportion. Examples: - infection status as a function of temperature and host size; - settlement success as a function of habitat and season; - survival as a function of treatment. ## Poisson regression Use a **Poisson GLM** when the response is a count. Examples: - number of individuals per quadrat; - number of flowers per plant; - number of parasite eggs per host. ## Overdispersed count models If the count variance is much larger than the mean, a simple Poisson model may be too restrictive. In such cases, a **negative binomial model** or another overdispersion-aware model is often more appropriate. # R Functions The main function is `glm()`: ```{r} #| eval: false glm(response ~ predictors, family = ..., data = df) ``` A logistic regression might look like: ```{r} #| eval: false glm(cbind(successes, failures) ~ temperature + treatment, family = binomial, data = df) ``` or, for a binary response: ```{r} #| eval: false glm(presence ~ salinity + habitat, family = binomial, data = df) ``` A count model might look like: ```{r} #| eval: false glm(count ~ reef + depth, family = poisson, data = df) ``` # A Practical Workflow The teaching workflow remains familiar: 1. identify the biological question; 2. identify the response structure; 3. choose a family that matches the response; 4. fit the model; 5. inspect model adequacy, especially for overdispersion and fit; 6. interpret coefficients on the correct scale. The new difficulty is that the coefficients are often estimated on the **link scale**, so interpretation requires more care than in an ordinary linear model. # Interpretation and Reporting In GLMs, interpretation often needs to distinguish between: - the coefficient on the link scale; - the effect on the response scale; - the biological meaning of that effect. For example, in logistic regression a positive coefficient means the log-odds of success increase with the predictor. In practice, it is often clearer to translate that into statements about the probability of success increasing or decreasing rather than reporting only the raw coefficient. # Common Mistakes Common mistakes include: - choosing a family because it is familiar rather than because it matches the response; - fitting Poisson models to strongly overdispersed counts; - interpreting link-scale coefficients as though they were ordinary linear slopes; - forgetting that diagnostics still matter in GLMs. # Summary - GLMs extend regression to binary, proportional, and count responses. - The model family should match the response structure. - Logistic and Poisson models are the most common introductory GLMs. - Overdispersion is an important practical warning sign. - The regression logic is unchanged, but interpretation becomes more careful because of the link function. This chapter broadens the modelling family. The next chapter turns from non-normal responses to relationships that are not well described by straight lines.