15. Interaction Effects

When the Effect of One Predictor Depends on Another

Published

2026/03/22

NoteIn This Chapter
  • what an interaction term means;
  • why main effects change meaning when an interaction is present;
  • how to specify interaction models in lm();
  • how to compare nested models with and without an interaction;
  • how to interpret conditional effects graphically and verbally.
ImportantTasks to Complete in This Chapter
  • None

1 Introduction

Many ecological and biological processes are interactive. The effect of one variable often depends on the level of another. Nutrient supply may be more impactful at high temperature than at low temperature. The influence of distance may differ among bioregions. A treatment may affect males and females differently.

These situations are handled in regression by including an interaction term. Interaction models are often where linear modelling becomes much more interesting biologically, but they are also where interpretation becomes more demanding. Once an interaction is present, the main effects can no longer be read in the same simple way as before.

In this chapter, I build directly on the additive factor-plus-covariate model introduced in Chapter 14. In older terminology, the additive model y ~ group + x is often called ANCOVA. Here I begin where that simple ANCOVA framing stops, which is when the slope of x may differ among groups and the model must therefore include group * x.

2 Key Concepts

The essential ideas are these.

  • An interaction means the effect of one predictor depends on another predictor.
  • Main effects become conditional once an interaction term is present.
  • Plots are indispensable because coefficients alone rarely communicate interactions clearly.
  • Nested model comparisons are useful for asking whether the interaction improves the model beyond the main effects alone.
  • Interactions should be biologically motivated, not generated mechanically because software allows it.

3 When Is an Interaction Appropriate?

You should consider an interaction when the biology suggests that the effect of one predictor is contingent on another. Typical examples include:

  • temperature × nutrient supply;
  • treatment × sex;
  • distance × habitat type;
  • rainfall × soil type.

If there is no sensible biological reason to expect one effect to change across the level of another, then an interaction term is often hard to justify.

4 The Core Equation

For two predictors, a linear model with interaction is:

\[Y_i = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i}X_{2i} + \epsilon_i \tag{1}\]

In Equation 1, the coefficient \(\beta_3\) is the interaction term. It describes how the effect of one predictor changes as the other predictor changes; in other words, an interaction is a claim that one slope depends on another variable.

In R, an interaction is specified with *:

lm(response ~ x1 * x2, data = df)

This expands to:

lm(response ~ x1 + x2 + x1:x2, data = df)

The : term is the interaction itself, while * includes both main effects and the interaction.

5 Why Main Effects Become Harder to Read

When an interaction is present, the main-effect coefficients are conditional.

For example, in a model with dist * bio:

  • the coefficient of dist is the slope of distance only for the reference bioregion;
  • the coefficients of the non-reference bioregions are differences in intercept relative to that reference;
  • the interaction coefficients describe how the slope of distance changes in the other bioregions relative to the reference bioregion.

This is why interaction models should almost never be interpreted from the coefficient table alone. The fitted relationships must be plotted.

NoteCentring Predictors

If a continuous predictor has no meaningful zero, the main effects in an interaction model can be awkward to interpret. In such cases, centring the predictor often helps because it shifts the zero point to a more meaningful reference value, often the sample mean.

6 Example 1: Two Continuous Predictors with and without Centring

Interactions do not only occur between a factor and a continuous predictor. They also arise when two continuous predictors work together, and this case gives one of the clearest demonstrations of why interaction models can become awkward to interpret if the predictors are left on their raw scales.

Suppose we want to model algal growth as a function of temperature and nutrient concentration, and we suspect that the effect of temperature depends on nutrient supply. That is a biologically plausible interaction: warming may matter more when nutrients are abundant than when they are scarce.

6.1 Example Dataset

To keep the logic transparent, we first create a small synthetic dataset in which both predictors are continuous and their interaction is genuinely present.

set.seed(14)

int_demo <- tibble(
  temp = runif(120, 8, 24),
  nutrient = runif(120, 0.5, 5)
) |>
  mutate(
    growth = 4 + 0.25 * temp + 1.1 * nutrient + 0.18 * temp * nutrient +
      rnorm(n(), sd = 1.8),
    temp_c = temp - mean(temp),
    nutrient_c = nutrient - mean(nutrient)
  )

6.2 Do an Exploratory Data Analysis (EDA)

Before fitting the interaction model, inspect the scale of the predictors and the overall response pattern.

int_demo |>
  summarise(
    mean_temp = mean(temp),
    sd_temp = sd(temp),
    mean_nutrient = mean(nutrient),
    sd_nutrient = sd(nutrient),
    mean_growth = mean(growth),
    sd_growth = sd(growth)
  )
# A tibble: 1 × 6
  mean_temp sd_temp mean_nutrient sd_nutrient mean_growth sd_growth
      <dbl>   <dbl>         <dbl>       <dbl>       <dbl>     <dbl>
1      16.1    4.43          2.75        1.31        18.8      6.41
int_demo |>
  mutate(
    nutrient_band = cut(
      nutrient,
      breaks = quantile(nutrient, probs = c(0, 1/3, 2/3, 1)),
      include.lowest = TRUE,
      labels = c("Low nutrient", "Medium nutrient", "High nutrient")
    )
  ) |>
  ggplot(aes(x = temp, y = growth, colour = nutrient_band)) +
  geom_point(alpha = 0.65, size = 1.5) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.8) +
  labs(
    x = "Temperature",
    y = "Growth",
    colour = "Nutrient level"
  ) +
  theme_grey()
Figure 1: Algal growth as a function of temperature at low, medium, and high nutrient levels. The fitted lines suggest that the temperature-growth slope steepens as nutrient concentration increases.

The fitted lines are not parallel. As nutrient concentration increases, the fitted temperature-growth slope becomes steeper. That is exactly the visual signature of an interaction.

6.3 State the Model Question

The biological question is not merely whether temperature matters or whether nutrient concentration matters. It is whether the effect of temperature changes across the nutrient gradient.

For the interaction term, the hypothesis pair is:

\[H_{0}: \beta_{\text{temp:nutrient}} = 0\] \[H_{a}: \beta_{\text{temp:nutrient}} \ne 0\]

If the interaction coefficient is zero, the slope of temperature does not depend on nutrient concentration. If it differs from zero, the temperature effect is conditional on nutrient supply.

6.4 Fit the Uncentred and Centred Models

We fit the same interaction twice: once on the raw predictor scales, and once after centring both predictors on their means.

mod_raw <- lm(growth ~ temp * nutrient, data = int_demo)
mod_cent <- lm(growth ~ temp_c * nutrient_c, data = int_demo)

tibble(
  term = names(coef(mod_raw)),
  uncentred = round(coef(mod_raw), 3),
  centred = round(coef(mod_cent), 3)
)
# A tibble: 4 × 3
  term          uncentred centred
  <chr>             <dbl>   <dbl>
1 (Intercept)       3.89   18.9  
2 temp              0.23    0.8  
3 nutrient          0.76    4.09 
4 temp:nutrient     0.207   0.207

6.5 Interpret the Results

The important result is that the interaction term is the same model statement in both fits, and its estimate is unchanged by centring apart from trivial rounding differences. What changes are the main effects.

In the uncentred model:

  • the temperature coefficient is the slope when nutrient concentration is zero;
  • the nutrient coefficient is the slope when temperature is zero.

Those values are awkward here because neither zero nutrient nor zero temperature is a meaningful reference point for these data.

In the centred model:

  • the temperature coefficient is the slope at the mean nutrient concentration;
  • the nutrient coefficient is the slope at the mean temperature.

That makes the coefficients much easier to interpret biologically. The interaction is still there, and the fitted surface is still the same, but the reference point has moved to a scientifically sensible part of the predictor space.

6.6 Why is This Example Important?

This is why centring helps in continuous × continuous interactions. It does not remove the interaction or change the fitted values. It changes the reference point so that the main effects describe something scientifically interpretable rather than a mathematically arbitrary zero point.

The lesson extends beyond this synthetic example. The same logic carries into factor × continuous interactions, ANCOVA-style models, and more complex multiple-regression settings. Whenever an interaction is present, you should ask whether the zero point of each continuous predictor gives the main effects a useful meaning. If it does not, centring is often the cleanest fix.

7 Example 2: Coastal Distance and Bioregional Context

7.1 Example Dataset

We now return to the same seaweed dataset used in Chapter 14, where distance and bioregion were first combined in an additive model. There the question was whether distance and bioregion both helped explain Sørensen dissimilarity. Here the question becomes more specific: does the relationship between Sørensen dissimilarity (Y) and distance along the coast (dist) differ among bioregions (bio)?

sw <- read.csv("../../data/BCB743/seaweed/spp_df2.csv")
sw$bio <- factor(sw$bio)

7.2 Do an Exploratory Data Analysis (EDA)

We first inspect the response against distance within each bioregion.

ggplot(sw, aes(x = dist, y = Y, colour = bio)) +
  geom_point(alpha = 0.6, size = 1.6) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.9) +
  labs(x = "Distance along the coast (km)",
       y = "Sørensen dissimilarity",
       colour = "Bioregion") +
  theme_grey()
Figure 2: Sørensen dissimilarity as a function of coastal distance in four bioregions.

The fitted lines are not parallel. That already suggests an interaction: the relationship between distance and dissimilarity appears to differ among bioregions. In other words, the additive assumption from Chapter 14 may be too simple.

7.3 State the Hypotheses

There are three increasingly specific questions here:

  1. Is distance related to dissimilarity?
  2. Do the average levels of dissimilarity differ among bioregions?
  3. Does the effect of distance differ among bioregions?

The third question is the interaction question. In that case the null hypothesis is:

\[H_{0}: \beta_{\text{interaction}} = 0\] \[H_{a}: \beta_{\text{interaction}} \ne 0\]

for all interaction terms. Here \(\beta_{\text{interaction}}\) refers to the interaction coefficient or set of coefficients that describe how the slope of distance changes among bioregions. If any of these differ from zero, the slope of distance is not the same in all bioregions.

7.4 Fit Nested Models

We fit three nested models:

  • a distance-only model;
  • a model with the main effects of distance and bioregion;
  • a model that adds the interaction between distance and bioregion.
mod_dist <- lm(Y ~ dist, data = sw)
mod_main <- lm(Y ~ dist + bio, data = sw)
mod_int <- lm(Y ~ dist * bio, data = sw)

anova(mod_dist, mod_main, mod_int)
Analysis of Variance Table

Model 1: Y ~ dist
Model 2: Y ~ dist + bio
Model 3: Y ~ dist * bio
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1    968 7.7388                                  
2    965 4.1156  3    3.6232 516.21 < 2.2e-16 ***
3    962 2.2507  3    1.8648 265.69 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod_int)

Call:
lm(formula = Y ~ dist * bio, data = sw)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.112117 -0.030176 -0.004195  0.023698  0.233520 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    5.341e-03  4.177e-03   1.279   0.2013    
dist           3.530e-04  1.140e-05  30.958  < 2e-16 ***
bioB-ATZ      -6.140e-03  1.659e-02  -0.370   0.7114    
bioBMP         3.820e-02  6.659e-03   5.737 1.29e-08 ***
bioECTZ        1.629e-02  6.447e-03   2.527   0.0117 *  
dist:bioB-ATZ  7.976e-04  1.875e-04   4.255 2.30e-05 ***
dist:bioBMP   -1.285e-04  2.065e-05  -6.222 7.31e-10 ***
dist:bioECTZ   4.213e-04  1.801e-05  23.392  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04837 on 962 degrees of freedom
Multiple R-squared:  0.8607,    Adjusted R-squared:  0.8597 
F-statistic: 849.2 on 7 and 962 DF,  p-value: < 2.2e-16

The model comparison is the most useful result at this stage. Adding bioregion to the distance-only model improves fit strongly, and adding the interaction improves fit strongly again. This means the interaction changes the model materially.

7.5 Interpret the Interaction

The interaction model shows that the slope of distance differs among bioregions. In other words, distance does not have one universal effect on Sørensen dissimilarity across the coastline. The direction and steepness of the distance effect depend on which bioregion is being considered.

This is best seen in the plotted lines rather than the raw coefficient table. Some bioregions show a steeper increase of dissimilarity with distance, while others show a shallower increase. The coefficient table provides the detailed parameterisation, but the plot provides the biologically useful interpretation.

The overall interaction model also fits the data well, with an adjusted \(R^2\) of about 0.86, and the model comparison provides very strong evidence that the interaction contributes importantly to explaining variation in the response (\(p < 0.001\)).

7.6 Reporting

NoteWrite-Up

Methods

A linear model containing the interaction between geographic distance and bioregion was fitted to test whether the relationship between distance and Sørensen dissimilarity differed among regions. Nested model comparisons were used to evaluate whether the interaction improved fit beyond the additive model.

Results

Sørensen dissimilarity increased with coastal distance, but the slope of that relationship differed among bioregions. A model containing the interaction between distance and bioregion fit the data substantially better than models with distance alone or with only additive main effects (\(p < 0.001\) for the interaction comparison; adjusted \(R^2 = 0.86\)). The effect of distance on dissimilarity was therefore conditional on bioregion rather than uniform across the coastline.

Discussion

The biologically important point is that distance does not have one universal effect. In Discussion, the emphasis should therefore fall on regional differences in turnover rate, not only on the existence of an interaction term in the model table.

8 Common Mistakes

Common mistakes with interaction models include:

  • adding interactions without biological justification;
  • interpreting the main effects as if no interaction were present;
  • failing to plot the fitted interaction;
  • fitting many interaction terms in small datasets;
  • treating a statistically detectable interaction as important without considering its effect size and biological meaning.

9 Summary

  • An interaction means that the effect of one predictor depends on another.
  • Once an interaction is present, the main effects must be interpreted conditionally.
  • Nested model comparisons are useful for asking whether the interaction improves fit.
  • Plots are essential because interaction models are hard to understand from coefficients alone.
  • Interactions should be motivated by biological reasoning and then interpreted in those same terms.

In the next chapter, I remain in the multiple-regression setting but turn to three major threats to interpretation: collinearity, confounding, and measurement error.

Reuse

Citation

BibTeX citation:
@online{smit2026,
  author = {Smit, A. J.},
  title = {15. {Interaction} {Effects}},
  date = {2026-03-22},
  url = {https://tangledbank.netlify.app/BCB744/basic_stats/15-interaction-effects.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit AJ (2026) 15. Interaction Effects. https://tangledbank.netlify.app/BCB744/basic_stats/15-interaction-effects.html.