23. Quantile Regression
Modelling the Whole Response Distribution
- why modelling only the mean can hide important ecological structure;
- what quantile regression estimates and how it differs from ordinary least squares;
- when quantile regression is appropriate, and what its assumptions are;
- how to fit, compare, and visualise quantile regressions in R with
quantreg::rq(); - how to interpret lower, median, and upper quantile slopes biologically;
- how to report a quantile regression analysis in journal style.
- None
1 Introduction
Most regression chapters so far have focused on the conditional mean of the response: ordinary least squares regression describes the average value of \(Y\) expected at each value of \(X\).
Focussing on the mean may not always be enough. In ecology, the most interesting pattern may lie at the edges of the data cloud rather than through its centre. A predictor may place a strong upper limit on performance without explaining why some individuals or populations fall well below that limit. In such cases, a mean regression line can flatten over a broad wedge of points and miss the biological implication.
Instead of modelling only the mean, quantile regression models one or more conditional quantiles of the response distribution. This allows us to ask whether a predictor has the same effect on the lower, middle, and upper parts of the response.
This may be useful when:
- variance changes across the predictor gradient;
- upper or lower boundaries are biologically important;
- outliers make mean-based summaries unstable;
- the ecological question is about constraints, limits, or heterogeneous responses rather than only about the average trend.
2 Key Concepts
- Quantile regression models conditional quantiles, not the conditional mean.
- Quantiles divide a distribution into cumulative proportions such as the 10th, 50th, or 90th percentile.
- The median is the 0.5 quantile, so median regression is a special case of quantile regression.
- Tau (\(\tau\)) denotes the target quantile and ranges from 0 to 1.
- Inference can differ across quantiles, meaning that the same predictor may be weak at the lower boundary but strong at the upper boundary.
- Quantile regression is robust to heteroscedasticity and does not require normally distributed residuals in the way ordinary least squares does.
3 When This Method Is Appropriate
Quantile regression is useful when:
- the response spread changes along the predictor gradient;
- the data form a wedge, funnel, or boundary pattern;
- upper or lower limits are biologically meaningful;
- you want a robust model of the median rather than the mean;
- the same predictor may affect slow-growing, average, and high-performing organisms differently.
Quantile regression is not a substitute for every nonlinear or hierarchical model. If the relationship is curved, use polynomial or additive extensions where needed. If the data are clustered by site, year, or individual, the dependence structure is still important and should not be ignored just because the model is quantile-based.
4 Nature of the Data and Assumptions
Quantile regression is more flexible than ordinary least squares, but it is not assumption-free.
The main requirements are:
- independent observations;
- a response variable that is continuous or at least ordered on a meaningful scale;
- a predictor-response relationship that is appropriately specified for the quantile model being fitted;
- enough data across the predictor range to estimate the chosen quantiles sensibly.
Unlike ordinary least squares, quantile regression does not assume normally distributed residuals or constant variance. In fact, it is most useful exactly when spread changes with the predictor.
What we need to consider is model form. If the relationship is curved, then a straight-line quantile model may be inadequate. It is also important to check that fitted quantile lines do not behave implausibly or cross in the range of interest.
5 The Core Equations
For a simple linear quantile regression, the conditional quantile of the response can be written as:
\[Q_{Y}(\tau \mid X) = \beta_{0}(\tau) + \beta_{1}(\tau)X \tag{1}\]
In Equation 1, \(Q_{Y}(\tau \mid X)\) is the \(\tau\)-th conditional quantile of the response, \(\beta_{0}(\tau)\) is the intercept for that quantile, and \(\beta_{1}(\tau)\) is the slope for that quantile. The principle idea is that the coefficients are allowed to change with \(\tau\).
Ordinary least squares chooses coefficients by minimising squared residuals. Quantile regression instead minimises an asymmetrically weighted sum of absolute residuals:
\[\min_{\beta} \sum_{i=1}^{n} \rho_{\tau}\left(y_{i} - x_{i}^{T}\beta\right) \tag{2}\]
where the check-loss function is:
\[\rho_{\tau}(u) = u\left[\tau - I(u < 0)\right] \tag{3}\]
You do not need to memorise Equation 2 and Equation 3, but they explain why quantile regression behaves differently from mean regression. Residuals above and below the fitted line are weighted differently depending on the target quantile. For the median, positive and negative residuals are weighted equally; for the 90th quantile, underprediction is penalised more strongly than overprediction.
6 R Functions
The main function is quantreg::rq().
The syntax is similar to lm(), which makes quantile regression easy to adopt once you are comfortable with ordinary regression.
7 Example 1: Blade-Length Envelopes in Laminaria pallida
7.1 Example Dataset
We use the laminaria.csv dataset from the course data folder. These data come from a morphometric sampling campaign on the kelp Laminaria pallida at 13 sites along the Cape Peninsula, spanning the west coast and False Bay regions. The sampled plants were large adult kelps, and several blade and stipe measurements were recorded for each individual.
Here we focus on the relationship between stipe_length and blade_length.
The biological question is simply, does increasing stipe length shift the whole distribution of blade length in the same way, or is the effect stronger for the upper envelope of blade length than for the lower envelope?
| region | site | blade_length | stipe_length |
|---|---|---|---|
| WC | Kommetjie | 160 | 120 |
| WC | Kommetjie | 120 | 149 |
| WC | Kommetjie | 110 | 97 |
| WC | Kommetjie | 159 | 167 |
| WC | Kommetjie | 149 | 146 |
| WC | Kommetjie | 107 | 161 |
| WC | Kommetjie | 104 | 110 |
| WC | Kommetjie | 111 | 136 |
| WC | Kommetjie | 178 | 176 |
| FB | Bordjiestif North | 145 | 82 |
7.2 Do an Exploratory Data Analysis (EDA)
R> # A tibble: 1 × 7
R> n n_sites min_stipe max_stipe mean_stipe mean_blade sd_blade
R> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
R> 1 140 13 34 224 111. 139. 23.9
The scatterplot does not suggest a single, uniform linear effect. Blade length generally increases with stipe length, but the spread also changes across the predictor range. In particular, the upper edge of the point cloud steepens more strongly than the lower edge. That is exactly the kind of pattern for which quantile regression is useful.
7.3 State the Model Question and Hypotheses
We want to know whether stipe length affects the lower, median, and upper parts of the blade-length distribution differently.
For any fitted quantile, a useful hypothesis pair is:
\[H_{0}: \beta_{1}(\tau) = 0\] \[H_{a}: \beta_{1}(\tau) \ne 0\]
This asks whether stipe length has any association with blade length at the chosen quantile.
Because the chapter is about comparing quantiles, we also care about whether the slopes differ among quantiles:
\[H_{0}: \beta_{1}(0.1) = \beta_{1}(0.5) = \beta_{1}(0.9)\] \[H_{a}: \text{not all quantile slopes are equal}\]
7.4 Fit the Models
We fit one ordinary least squares regression for comparison, and then three quantile regressions for the 10th, 50th, and 90th quantiles.
fit_ols <- lm(blade_length ~ stipe_length, data = laminaria)
fit_rq10 <- rq(blade_length ~ stipe_length, tau = 0.1, data = laminaria)
fit_rq50 <- rq(blade_length ~ stipe_length, tau = 0.5, data = laminaria)
fit_rq90 <- rq(blade_length ~ stipe_length, tau = 0.9, data = laminaria)
fit_rq_all <- rq(blade_length ~ stipe_length,
tau = c(0.1, 0.5, 0.9),
data = laminaria)The coefficient summaries are:
coef_tbl <- bind_rows(
as.data.frame(summary(fit_ols)$coefficients) |>
rownames_to_column("term") |>
mutate(model = "OLS mean"),
as.data.frame(summary(fit_rq10, se = "nid")$coefficients) |>
rownames_to_column("term") |>
mutate(model = "tau = 0.1"),
as.data.frame(summary(fit_rq50, se = "nid")$coefficients) |>
rownames_to_column("term") |>
mutate(model = "tau = 0.5"),
as.data.frame(summary(fit_rq90, se = "nid")$coefficients) |>
rownames_to_column("term") |>
mutate(model = "tau = 0.9")
) |>
rename(
estimate = 2,
std_error = 3,
t_value = 4,
p_value = 5
) |>
mutate(across(c(estimate, std_error, t_value), round, 3))
gt(coef_tbl)| term | estimate | std_error | t_value | p_value | model | Value |
|---|---|---|---|---|---|---|
| (Intercept) | 114.967 | 5.699 | 20.172 | 5.492740e-43 | OLS mean | NA |
| stipe_length | 0.217 | 0.048 | 4.482 | 1.537502e-05 | OLS mean | NA |
| (Intercept) | NA | 5.948 | 16.537 | 0.000000e+00 | tau = 0.1 | 98.3600000 |
| stipe_length | NA | 0.059 | 2.023 | 4.504584e-02 | tau = 0.1 | 0.1200000 |
| (Intercept) | NA | 6.458 | 17.023 | 0.000000e+00 | tau = 0.5 | 109.9299363 |
| stipe_length | NA | 0.073 | 3.488 | 6.523367e-04 | tau = 0.5 | 0.2547771 |
| (Intercept) | NA | 10.231 | 12.770 | 0.000000e+00 | tau = 0.9 | 130.6538462 |
| stipe_length | NA | 0.055 | 6.242 | 4.979948e-09 | tau = 0.9 | 0.3461538 |
To compare the quantile slopes directly, we test whether the fitted slopes are equal across \(\tau = 0.1\), \(0.5\), and \(0.9\):
R> Quantile Regression Analysis of Deviance Table
R>
R> Model: blade_length ~ stipe_length
R> Joint Test of Equality of Slopes: tau in { 0.1 0.5 0.9 }
R>
R> Df Resid Df F value Pr(>F)
R> 1 2 418 4.3359 0.01368 *
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
7.5 Test Assumptions / Check Diagnostics
Quantile regression does not require homoscedasticity or normal residuals, so diagnostics focus more on whether the fitted lines are sensible and whether the model form is adequate.
The first check is visual: do the fitted quantile lines follow the structure of the data, and do they avoid pathological crossing?
line_df <- tibble(
model = c("Mean (OLS)", "tau = 0.1", "tau = 0.5", "tau = 0.9"),
intercept = c(coef(fit_ols)[1], coef(fit_rq10)[1], coef(fit_rq50)[1], coef(fit_rq90)[1]),
slope = c(coef(fit_ols)[2], coef(fit_rq10)[2], coef(fit_rq50)[2], coef(fit_rq90)[2])
)
ggplot(laminaria, aes(x = stipe_length, y = blade_length)) +
geom_point(alpha = 0.45) +
geom_abline(
data = line_df,
aes(intercept = intercept, slope = slope, colour = model, linetype = model),
linewidth = 0.8
) +
scale_colour_manual(
values = c(
"Mean (OLS)" = "black",
"tau = 0.1" = "dodgerblue4",
"tau = 0.5" = "firebrick3",
"tau = 0.9" = "darkgreen"
),
name = "Model"
) +
scale_linetype_manual(
values = c(
"Mean (OLS)" = "dashed",
"tau = 0.1" = "solid",
"tau = 0.5" = "solid",
"tau = 0.9" = "solid"
),
name = "Model"
) +
labs(
x = "Stipe length (cm)",
y = "Blade length (cm)"
)The second useful check is whether the fitted slope itself changes smoothly across a wider range of quantiles.
tau_grid <- seq(0.1, 0.9, by = 0.1)
slope_grid <- map_dfr(
tau_grid,
\(tau_now) {
fit_now <- rq(blade_length ~ stipe_length, tau = tau_now, data = laminaria)
sum_now <- summary(fit_now, se = "nid")$coefficients
tibble(
tau = tau_now,
slope = sum_now["stipe_length", 1],
se = sum_now["stipe_length", 2]
)
}
) |>
mutate(
lo = slope - 1.96 * se,
hi = slope + 1.96 * se
)
ggplot(slope_grid, aes(x = tau, y = slope)) +
geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.2) +
geom_line(linewidth = 0.8) +
geom_point(size = 1.5) +
labs(
x = expression(tau),
y = "Estimated slope"
)In this example the fitted lines do not cross in the observed data range, and the slope increases steadily towards the upper quantiles. That is consistent with a biologically meaningful quantile effect rather than a numerical artefact.
7.6 Interpret the Results
The ordinary least squares fit gives the average change in blade length with stipe length. In these data that mean slope is positive, but it understates what is happening at the upper edge of the distribution.
The quantile regressions show that the relationship is not constant across the response distribution. This is clearest in Figure 2 and is reinforced by the slope-by-quantile summary in Figure 3:
- at \(\tau = 0.1\), the fitted slope is about
0.12; - at \(\tau = 0.5\), the fitted slope is about
0.255; - at \(\tau = 0.9\), the fitted slope is about
0.346.
This means that longer stipes are associated with modest increases in blade length near the lower boundary of the data, but much larger increases near the upper boundary. In other words, stipe length is a much stronger predictor of the longest blades than of the shortest blades.
The joint quantile comparison must also be considered. The quantile ANOVA indicates that the slopes are not all the same across \(\tau = 0.1\), \(0.5\), and \(0.9\) (F = 4.34, p < 0.05), and gives formal statistical support to what the figure already suggests visually.
Biologically, the result reveals that increasing stipe length appears to raise the upper potential for blade elongation more strongly than it shifts the lower end of the distribution. That is the kind of edge-case ecological pattern that mean ordinary least squares regression can hide; it is visible directly in Figure 2.
7.7 Reporting
Methods
Blade length was analysed as a function of stipe length in adult Laminaria pallida sampled across 13 sites on the Cape Peninsula. To assess whether stipe length affected only the mean blade length or different parts of the blade-length distribution, we fitted an ordinary least squares regression and quantile regressions for the 10th, 50th, and 90th quantiles using rq() from the quantreg package in R. We then tested whether the fitted slopes differed among quantiles.
Results
Blade length increased with stipe length in the mean model, but the fitted quantile slopes showed that this relationship was not constant across the response distribution (see Figure 2 and Figure 3). The estimated slope was weakest at the 10th quantile (0.12), intermediate at the median (0.255), and strongest at the 90th quantile (0.346). A joint test across the fitted quantiles indicated that the slopes were not all equal (F = 4.34, p < 0.05). Thus, increasing stipe length was associated with a much stronger increase in upper-bound blade length than in lower-bound blade length.
Discussion
The quantile regression reveals a biologically important pattern that would be muted in a mean-only analysis. Stipe length appears to influence the upper potential for blade development more strongly than the lower end of the blade-length distribution. This suggests that while longer stipes may permit greater blade elongation, shorter blades can still occur across much of the stipe-length range, presumably because other unmeasured factors also constrain morphology.
8 What to Do When Assumptions Fail / Alternatives
- If the fitted straight quantile lines are clearly inadequate, extend the model with polynomial terms or quantile smoothers rather than forcing a linear form.
- If quantile lines cross implausibly, reduce model complexity and inspect whether sparse data at extreme predictor values are driving the problem.
- If observations are not independent because of repeated measures, sites, or years, then the dependence structure must be addressed explicitly; ordinary
rq()is not enough. - If the scientific question is still about the conditional mean, ordinary least squares remains simpler and easier to interpret.
9 Summary
- Quantile regression models the response distribution more completely than mean regression.
- It is especially useful for wedge-shaped ecological data and for questions about limits or constraints.
- It does not require constant variance or normally distributed residuals, but it still requires sensible model specification and independent observations.
- In the Laminaria example, stipe length had a much stronger effect on the upper quantiles of blade length than on the lower quantiles.
- The main biological gain is not mathematical novelty, but a clearer view of how predictors shape the whole distribution of ecological responses.
Reuse
Citation
@online{smit2026,
author = {Smit, A. J.},
title = {23. {Quantile} {Regression}},
date = {2026-03-22},
url = {https://tangledbank.netlify.app/BCB744/basic_stats/23-quantile-regression.html},
langid = {en}
}
