19. Generalised Linear Models
Extending Regression Beyond Normal Responses
1 Introduction
The linear model assumes a normally distributed response with constant variance. Many biological data do not have that structure. Counts, proportions, presences and absences, and success-failure outcomes require a broader framework. Generalised linear models (GLMs) provide that framework while retaining the central logic of regression.
A GLM combines three parts:
- a response distribution;
- a linear predictor;
- a link function that connects the linear predictor to the expected response scale.
The important point is that GLMs do not replace the regression logic you have already learned. They extend it to different kinds of response variables.
2 Key Concepts
- GLMs extend regression to non-normal response variables.
- The family should match the data-generating structure, such as binomial data for binary outcomes or Poisson data for counts.
- The link function connects the linear predictor to the response scale.
- Interpretation still depends on the biological question, not only on the model family.
- Overdispersion is a warning sign that the simplest GLM may be inadequate.
3 When This Method Is Appropriate
You should consider a GLM when:
- the response is binary, such as alive/dead or present/absent;
- the response is a proportion based on counts of successes and failures;
- the response is a count and cannot sensibly be modelled with a normal distribution;
- the variance changes with the mean in a way that a standard linear model does not handle well.
This chapter therefore picks up directly from the earlier proportion-testing work and places it in the broader modelling framework where binary and count responses can be analysed with predictors.
4 Common GLM Families
4.1 Logistic regression
Use a binomial GLM when the response is binary or a proportion.
Examples:
- infection status as a function of temperature and host size;
- settlement success as a function of habitat and season;
- survival as a function of treatment.
4.2 Poisson regression
Use a Poisson GLM when the response is a count.
Examples:
- number of individuals per quadrat;
- number of flowers per plant;
- number of parasite eggs per host.
4.3 Overdispersed count models
If the count variance is much larger than the mean, a simple Poisson model may be too restrictive. In such cases, a negative binomial model or another overdispersion-aware model is often more appropriate.
5 R Functions
The main function is glm():
A logistic regression might look like:
or, for a binary response:
A count model might look like:
6 A Practical Workflow
The teaching workflow remains familiar:
- identify the biological question;
- identify the response structure;
- choose a family that matches the response;
- fit the model;
- inspect model adequacy, especially for overdispersion and fit;
- interpret coefficients on the correct scale.
The new difficulty is that the coefficients are often estimated on the link scale, so interpretation requires more care than in an ordinary linear model.
7 Interpretation and Reporting
In GLMs, interpretation often needs to distinguish between:
- the coefficient on the link scale;
- the effect on the response scale;
- the biological meaning of that effect.
For example, in logistic regression a positive coefficient means the log-odds of success increase with the predictor. In practice, it is often clearer to translate that into statements about the probability of success increasing or decreasing rather than reporting only the raw coefficient.
8 Common Mistakes
Common mistakes include:
- choosing a family because it is familiar rather than because it matches the response;
- fitting Poisson models to strongly overdispersed counts;
- interpreting link-scale coefficients as though they were ordinary linear slopes;
- forgetting that diagnostics still matter in GLMs.
9 Summary
- GLMs extend regression to binary, proportional, and count responses.
- The model family should match the response structure.
- Logistic and Poisson models are the most common introductory GLMs.
- Overdispersion is an important practical warning sign.
- The regression logic is unchanged, but interpretation becomes more careful because of the link function.
This chapter broadens the modelling family. The next chapter turns from non-normal responses to relationships that are not well described by straight lines.
Reuse
Citation
@online{smit,_a._j.2026,
author = {Smit, A. J., and J. Smit, A.},
title = {19. {Generalised} {Linear} {Models}},
date = {2026-03-19},
url = {http://tangledbank.netlify.app/BCB744/basic_stats/19-generalised-linear-models.html},
langid = {en}
}
