14. Interaction Effects

When the Effect of One Predictor Depends on Another

Author

A. J. Smit

Published

2026/03/19

In This Chapter

what an interaction term means;
why main effects change meaning when an interaction is present;
how to specify interaction models in lm();
how to compare nested models with and without an interaction;
how to interpret conditional effects graphically and verbally.

Tasks to Complete in This Chapter

None

1 Introduction

Many ecological and biological processes are interactive. The effect of one variable often depends on the level of another. Nutrient supply may matter more at high temperature than at low temperature. The influence of distance may differ among bioregions. A treatment may affect males and females differently.

These situations are handled in regression by including an interaction term. Interaction models are often where linear modelling becomes much more interesting biologically, but they are also where interpretation becomes more demanding. Once an interaction is present, the main effects can no longer be read in the same simple way as before.

2 Key Concepts

The essential ideas are these.

An interaction means the effect of one predictor depends on another predictor.
Main effects become conditional once an interaction term is present.
Plots are indispensable because coefficients alone rarely communicate interactions clearly.
Nested model comparisons are useful for asking whether the interaction improves the model beyond the main effects alone.
Interactions should be biologically motivated, not generated mechanically because software allows it.

3 When Is an Interaction Appropriate?

You should consider an interaction when the biology suggests that the effect of one predictor is contingent on another. Typical examples include:

temperature × nutrient supply;
treatment × sex;
distance × habitat type;
rainfall × soil type.

If there is no sensible biological reason to expect one effect to change across the level of another, then an interaction term is often hard to justify.

4 The Interaction Model

For two predictors, a linear model with interaction is:

\[ Y_i = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i}X_{2i} + \epsilon_i \]

The coefficient $\beta_3$ is the interaction term. It describes how the effect of one predictor changes as the other predictor changes.

In R, an interaction is specified with *:

lm(response ~ x1 * x2, data = df)

This expands to:

lm(response ~ x1 + x2 + x1:x2, data = df)

The : term is the interaction itself, while * includes both main effects and the interaction.

5 Why Main Effects Become Harder to Read

When an interaction is present, the main-effect coefficients are conditional.

For example, in a model with dist * bio:

the coefficient of dist is the slope of distance only for the reference bioregion;
the coefficients of the non-reference bioregions are differences in intercept relative to that reference;
the interaction coefficients describe how the slope of distance changes in the other bioregions relative to the reference bioregion.

This is why interaction models should almost never be interpreted from the coefficient table alone. The fitted relationships must be plotted.

Centring Predictors

If a continuous predictor has no meaningful zero, the main effects in an interaction model can be awkward to interpret. In such cases, centring the predictor often helps because it shifts the zero point to a more meaningful reference value, often the sample mean.

6 Example Dataset

We continue with the seaweed dataset, now using the full data. The question is whether the relationship between Sørensen dissimilarity (Y) and distance along the coast (dist) differs among bioregions (bio).

sw <- read.csv("../../data/BCB743/seaweed/spp_df2.csv")
sw$bio <- factor(sw$bio)

7 Do an Exploratory Data Analysis (EDA)

We first inspect the response against distance within each bioregion.

ggplot(sw, aes(x = dist, y = Y, colour = bio)) +
  geom_point(alpha = 0.6, size = 1.6) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.9) +
  labs(x = "Distance along the coast (km)",
       y = "Sørensen dissimilarity",
       colour = "Bioregion") +
  theme_grey()

Figure 1: Sørensen dissimilarity as a function of coastal distance in four bioregions.

The fitted lines are not parallel. That already suggests an interaction: the relationship between distance and dissimilarity appears to differ among bioregions.

8 State the Hypotheses

There are three increasingly rich questions here:

Is distance related to dissimilarity?
Do the average levels of dissimilarity differ among bioregions?
Does the effect of distance differ among bioregions?

The third question is the interaction question. In that case the null hypothesis is:

\[H_{0}: \beta_{\text{interaction}} = 0\]

for all interaction terms. The alternative is that at least one interaction coefficient differs from zero, implying that the slope of distance is not the same in all bioregions.

9 Fit Nested Models

We fit three nested models:

a distance-only model;
a model with the main effects of distance and bioregion;
a model that adds the interaction between distance and bioregion.

mod_dist <- lm(Y ~ dist, data = sw)
mod_main <- lm(Y ~ dist + bio, data = sw)
mod_int <- lm(Y ~ dist * bio, data = sw)

anova(mod_dist, mod_main, mod_int)

R> Analysis of Variance Table
R> 
R> Model 1: Y ~ dist
R> Model 2: Y ~ dist + bio
R> Model 3: Y ~ dist * bio
R>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
R> 1    968 7.7388                                  
R> 2    965 4.1156  3    3.6232 516.21 < 2.2e-16 ***
R> 3    962 2.2507  3    1.8648 265.69 < 2.2e-16 ***
R> ---
R> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(mod_int)

R> 
R> Call:
R> lm(formula = Y ~ dist * bio, data = sw)
R> 
R> Residuals:
R>       Min        1Q    Median        3Q       Max 
R> -0.112117 -0.030176 -0.004195  0.023698  0.233520 
R> 
R> Coefficients:
R>                 Estimate Std. Error t value Pr(>|t|)    
R> (Intercept)    5.341e-03  4.177e-03   1.279   0.2013    
R> dist           3.530e-04  1.140e-05  30.958  < 2e-16 ***
R> bioB-ATZ      -6.140e-03  1.659e-02  -0.370   0.7114    
R> bioBMP         3.820e-02  6.659e-03   5.737 1.29e-08 ***
R> bioECTZ        1.629e-02  6.447e-03   2.527   0.0117 *  
R> dist:bioB-ATZ  7.976e-04  1.875e-04   4.255 2.30e-05 ***
R> dist:bioBMP   -1.285e-04  2.065e-05  -6.222 7.31e-10 ***
R> dist:bioECTZ   4.213e-04  1.801e-05  23.392  < 2e-16 ***
R> ---
R> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R> 
R> Residual standard error: 0.04837 on 962 degrees of freedom
R> Multiple R-squared:  0.8607, Adjusted R-squared:  0.8597 
R> F-statistic: 849.2 on 7 and 962 DF,  p-value: < 2.2e-16

The model comparison is the most useful result at this stage. Adding bioregion to the distance-only model improves fit strongly, and adding the interaction improves fit strongly again. This means the interaction is not a cosmetic extra term. It changes the model materially.

10 Interpret the Interaction

The interaction model shows that the slope of distance differs among bioregions. In other words, distance does not have one universal effect on Sørensen dissimilarity across the coastline. The direction and steepness of the distance effect depend on which bioregion is being considered.

This is best seen in the plotted lines rather than the raw coefficient table. Some bioregions show a steeper increase of dissimilarity with distance, while others show a shallower increase. The coefficient table provides the detailed parameterisation, but the plot provides the biologically useful interpretation.

The overall interaction model also fits the data well, with an adjusted $R^2$ of about 0.86, and the model comparison provides very strong evidence that the interaction contributes importantly to explaining variation in the response ($p < 0.001$).

11 Reporting

When reporting an interaction model, do not simply list coefficients. State what the interaction means biologically.

For example:

Sørensen dissimilarity increased with coastal distance, but the slope of that relationship differed among bioregions. A model containing the interaction between distance and bioregion fit the data substantially better than models with distance alone or with only additive main effects ($p < 0.001$ for the interaction comparison; adjusted $R^2 = 0.86$). The effect of distance on dissimilarity was therefore conditional on bioregion rather than uniform across the coastline.

12 Common Mistakes

Common mistakes with interaction models include:

adding interactions without biological justification;
interpreting the main effects as if no interaction were present;
failing to plot the fitted interaction;
fitting many interaction terms in small datasets;
treating a statistically detectable interaction as important without considering its effect size and biological meaning.

13 Summary

An interaction means that the effect of one predictor depends on another.
Once an interaction is present, the main effects must be interpreted conditionally.
Nested model comparisons are useful for asking whether the interaction improves fit.
Plots are essential because interaction models are hard to understand from coefficients alone.
Interactions should be motivated by biological reasoning and then interpreted in those same terms.

The next chapter remains in the multiple-regression setting but turns to three major threats to interpretation: collinearity, confounding, and measurement error.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2026,
  author = {Smit, A. J., and J. Smit, A.},
  title = {14. {Interaction} {Effects}},
  date = {2026-03-19},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/14-interaction-effects.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., J. Smit A (2026) 14. Interaction Effects. http://tangledbank.netlify.app/BCB744/basic_stats/14-interaction-effects.html.

--- title: "14. Interaction Effects" subtitle: "When the Effect of One Predictor Depends on Another" author: "A. J. Smit" date: last-modified date-format: "YYYY/MM/DD" reference-location: margin --- ```{r code-brewing-opts, echo=FALSE} knitr::opts_chunk$set( comment = "R>", warning = FALSE, message = FALSE, fig.width = 6.5, fig.height = 4.5, out.width = "88%", fig.asp = NULL, fig.align = "center", fig.retina = 2, dpi = 300 ) ggplot2::theme_set( ggplot2::theme_grey(base_size = 8) ) ``` ```{r code-knitr-opts-chunk-set, echo=FALSE} library(tidyverse) ``` ::: {.callout-note appearance="simple"} ## In This Chapter - what an interaction term means; - why main effects change meaning when an interaction is present; - how to specify interaction models in `lm()`; - how to compare nested models with and without an interaction; - how to interpret conditional effects graphically and verbally. ::: ::: {.callout-important appearance="simple"} ## Tasks to Complete in This Chapter - None ::: # Introduction Many ecological and biological processes are interactive. The effect of one variable often depends on the level of another. Nutrient supply may matter more at high temperature than at low temperature. The influence of distance may differ among bioregions. A treatment may affect males and females differently. These situations are handled in regression by including an **interaction term**. Interaction models are often where linear modelling becomes much more interesting biologically, but they are also where interpretation becomes more demanding. Once an interaction is present, the main effects can no longer be read in the same simple way as before. # Key Concepts The essential ideas are these. - **An interaction** means the effect of one predictor depends on another predictor. - **Main effects become conditional** once an interaction term is present. - **Plots are indispensable** because coefficients alone rarely communicate interactions clearly. - **Nested model comparisons** are useful for asking whether the interaction improves the model beyond the main effects alone. - **Interactions should be biologically motivated**, not generated mechanically because software allows it. # When Is an Interaction Appropriate? You should consider an interaction when the biology suggests that the effect of one predictor is contingent on another. Typical examples include: - temperature × nutrient supply; - treatment × sex; - distance × habitat type; - rainfall × soil type. If there is no sensible biological reason to expect one effect to change across the level of another, then an interaction term is often hard to justify. # The Interaction Model For two predictors, a linear model with interaction is: $$ Y_i = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i}X_{2i} + \epsilon_i $$ The coefficient $\beta_3$ is the interaction term. It describes how the effect of one predictor changes as the other predictor changes. In R, an interaction is specified with `*`: ```{r} #| eval: false lm(response ~ x1 * x2, data = df) ``` This expands to: ```{r} #| eval: false lm(response ~ x1 + x2 + x1:x2, data = df) ``` The `:` term is the interaction itself, while `*` includes both main effects and the interaction. # Why Main Effects Become Harder to Read When an interaction is present, the main-effect coefficients are conditional. For example, in a model with `dist * bio`: - the coefficient of `dist` is the slope of distance only for the reference bioregion; - the coefficients of the non-reference bioregions are differences in intercept relative to that reference; - the interaction coefficients describe how the slope of distance changes in the other bioregions relative to the reference bioregion. This is why interaction models should almost never be interpreted from the coefficient table alone. The fitted relationships must be plotted. ::: {.callout-note appearance="simple"} ## Centring Predictors If a continuous predictor has no meaningful zero, the main effects in an interaction model can be awkward to interpret. In such cases, centring the predictor often helps because it shifts the zero point to a more meaningful reference value, often the sample mean. ::: # Example Dataset We continue with the seaweed dataset, now using the full data. The question is whether the relationship between Sørensen dissimilarity (`Y`) and distance along the coast (`dist`) differs among bioregions (`bio`). ```{r code-seaweed-int} sw <- read.csv("../../data/BCB743/seaweed/spp_df2.csv") sw$bio <- factor(sw$bio) ``` # Do an Exploratory Data Analysis (EDA) We first inspect the response against distance within each bioregion. ```{r fig-seaweed-interaction} #| fig-cap: "Sørensen dissimilarity as a function of coastal distance in four bioregions." ggplot(sw, aes(x = dist, y = Y, colour = bio)) + geom_point(alpha = 0.6, size = 1.6) + geom_smooth(method = "lm", se = FALSE, linewidth = 0.9) + labs(x = "Distance along the coast (km)", y = "Sørensen dissimilarity", colour = "Bioregion") + theme_grey() ``` The fitted lines are not parallel. That already suggests an interaction: the relationship between distance and dissimilarity appears to differ among bioregions. # State the Hypotheses There are three increasingly rich questions here: 1. Is distance related to dissimilarity? 2. Do the average levels of dissimilarity differ among bioregions? 3. Does the effect of distance differ among bioregions? The third question is the interaction question. In that case the null hypothesis is: $$H_{0}: \beta_{\text{interaction}} = 0$$ for all interaction terms. The alternative is that at least one interaction coefficient differs from zero, implying that the slope of distance is not the same in all bioregions. # Fit Nested Models We fit three nested models: - a distance-only model; - a model with the main effects of distance and bioregion; - a model that adds the interaction between distance and bioregion. ```{r code-fit-interaction} mod_dist <- lm(Y ~ dist, data = sw) mod_main <- lm(Y ~ dist + bio, data = sw) mod_int <- lm(Y ~ dist * bio, data = sw) anova(mod_dist, mod_main, mod_int) summary(mod_int) ``` The model comparison is the most useful result at this stage. Adding bioregion to the distance-only model improves fit strongly, and adding the interaction improves fit strongly again. This means the interaction is not a cosmetic extra term. It changes the model materially. # Interpret the Interaction The interaction model shows that the slope of distance differs among bioregions. In other words, distance does not have one universal effect on Sørensen dissimilarity across the coastline. The direction and steepness of the distance effect depend on which bioregion is being considered. This is best seen in the plotted lines rather than the raw coefficient table. Some bioregions show a steeper increase of dissimilarity with distance, while others show a shallower increase. The coefficient table provides the detailed parameterisation, but the plot provides the biologically useful interpretation. The overall interaction model also fits the data well, with an adjusted $R^2$ of about 0.86, and the model comparison provides very strong evidence that the interaction contributes importantly to explaining variation in the response ($p < 0.001$). # Reporting When reporting an interaction model, do not simply list coefficients. State what the interaction means biologically. For example: > Sørensen dissimilarity increased with coastal distance, but the slope of that relationship differed among bioregions. A model containing the interaction between distance and bioregion fit the data substantially better than models with distance alone or with only additive main effects ($p < 0.001$ for the interaction comparison; adjusted $R^2 = 0.86$). The effect of distance on dissimilarity was therefore conditional on bioregion rather than uniform across the coastline. # Common Mistakes Common mistakes with interaction models include: - adding interactions without biological justification; - interpreting the main effects as if no interaction were present; - failing to plot the fitted interaction; - fitting many interaction terms in small datasets; - treating a statistically detectable interaction as important without considering its effect size and biological meaning. # Summary - An interaction means that the effect of one predictor depends on another. - Once an interaction is present, the main effects must be interpreted conditionally. - Nested model comparisons are useful for asking whether the interaction improves fit. - Plots are essential because interaction models are hard to understand from coefficients alone. - Interactions should be motivated by biological reasoning and then interpreted in those same terms. The next chapter remains in the multiple-regression setting but turns to three major threats to interpretation: collinearity, confounding, and measurement error.