14. Collinearity, Confounding, and Measurement Error
Three Threats to Interpretable Regression Models
- What collinearity is and why it matters
- How confounding differs from collinearity
- Why measurement error weakens inference
- The role of proxy variables in biology
- Practical responses when these problems arise
- None
1 Introduction
Regression models are often presented as if the main problem is choosing the right formula and fitting it correctly. In practice, some of the hardest problems arise before or beneath the fitting step. Predictors may overlap, causal roles may be confused, or key variables may be measured imperfectly.
This chapter brings together three related but distinct problems:
- collinearity: predictors share too much information,
- confounding: a variable distorts the interpretation of another effect, and
- measurement error: the variables we fit are noisy or are only proxies for what we really want.
2 Key Concepts
These ideas should be kept separate even though they often co-occur in practice.
- Collinearity is overlap among predictors that destabilises coefficient interpretation.
- Confounding is a causal attribution problem, not merely a correlation problem.
- Measurement error weakens inference by adding noise to observed variables.
- Proxy variables can be useful, but they complicate interpretation.
- Interpretability often depends more on design and variable choice than on fitting alone.
3 Collinearity
3.1 What it is
Collinearity occurs when two or more predictor variables are strongly correlated. In ecology this is common because many environmental variables co-vary through shared processes.
Examples include:
- temperature and dissolved oxygen,
- altitude and temperature,
- nitrate and phosphate,
- rainfall and river flow.
3.2 Why it is a problem
Collinearity is mainly a problem of interpretation, not always of raw prediction. A model may still fit the response well, but the individual coefficients can become:
- unstable,
- imprecise,
- biologically misleading, or
- assigned the wrong sign.
The model struggles to partition shared explanatory information among overlapping predictors.
3.3 Diagnosing collinearity
The most common diagnostic is the variance inflation factor (VIF).
-
VIF = 1: no inflation -
VIFbetween about1and5: often manageable -
VIFabove about5or10: often problematic
In R:
The threshold is not sacred. The important question is whether the collinearity prevents defensible interpretation.
4 Confounding
Confounding is a causal problem, not just a statistical one.
A confounder is a variable that influences both:
- the predictor of interest, and
- the response.
This can create a spurious association or distort a real one.
For example, if you relate species abundance to temperature without accounting for nutrient supply, and nutrient supply is associated with both temperature and abundance, then the temperature coefficient may partly reflect nutrient effects.
4.1 Confounding is not the same as collinearity
These ideas are related but not identical:
- Collinearity is about overlap among predictors in the data.
- Confounding is about mistaken causal attribution.
Two variables can be highly collinear without one being a confounder. A confounder may also matter even when the correlation structure does not look especially dramatic.
5 Measurement Error
Regression models often assume that predictor variables are measured without serious error. That assumption is rarely exactly true.
When a predictor is noisy, the estimated effect is often biased toward zero. This is known as attenuation bias. In practical terms, measurement error can make a real effect appear weaker and less stable than it truly is.
This matters because:
- biological field measurements are often noisy,
- instruments are imperfect,
- environmental conditions fluctuate, and
- some variables are difficult to measure directly.
6 Proxy Variables
Many biological models use proxies because the true mechanistic variables are unavailable or expensive to obtain.
Examples:
- altitude as a proxy for temperature or radiation,
- distance from shore as a proxy for exposure,
- chlorophyll as a proxy for productivity,
- body size as a proxy for age or condition.
Proxy variables are often useful, but they create interpretive limits. A proxy coefficient should not be over-read as if it identified a single clean mechanism.
If your question is about physiology, use physiological drivers where possible. If your question is about broad spatial pattern, a proxy may be acceptable. The variable should match the scale and aim of the question.
7 A Practical Workflow
When building a regression model:
- Specify the biological hypothesis first.
- Identify which predictors are mechanistic and which are proxies.
- Ask which variables may confound the relationship of interest.
- Inspect predictor correlations and calculate VIF where needed.
- Remove or combine redundant predictors when interpretation is the goal.
- Be cautious about causal language when important variables are missing or measured poorly.
8 Common Responses
Depending on the problem, possible responses include:
- removing a redundant predictor,
- choosing one variable over another based on theory,
- combining predictors into an index or ordination axis,
- centring or rescaling variables to improve interpretation,
- collecting better measurements,
- being explicit that a variable is only a proxy.
If prediction is the main goal, some collinearity may be tolerable. If interpretation is the goal, these issues become much more serious.
9 Summary
- Collinearity makes coefficients unstable because predictors share information.
- Confounding is a causal problem in which another variable distorts interpretation.
- Measurement error weakens and destabilises estimated effects.
- Proxy variables can be useful, but they limit what coefficients mean.
- These problems are best handled through good biological reasoning before model fitting, not only by software after the fact.
Regression is powerful, but only if we remain honest about what the variables actually represent.
Reuse
Citation
@online{smit,_a._j.2026,
author = {Smit, A. J., and J. Smit, A.},
title = {14. {Collinearity,} {Confounding,} and {Measurement} {Error}},
date = {2026-03-19},
url = {http://tangledbank.netlify.app/BCB744/basic_stats/14-collinearity-confounding-measurement-error.html},
langid = {en}
}
