24. Regularisation

Ridge, Lasso, Elastic Net, and Cross-Validation

Author

A. J. Smit

Published

2026/03/19

1 Introduction

Regularisation techniques are useful when ordinary multiple regression starts to struggle under the weight of many predictors, overlapping predictors, or a modelling goal that leans more toward prediction than explanation. In that setting, ordinary least squares can produce unstable coefficients, poor generalisation, and models that appear stronger in the sample than they really are.

Regularisation addresses this by shrinking coefficients towards zero. In some cases, this simply stabilises them. In others, it also removes weak predictors from the model entirely. This makes regularisation relevant when we want to reduce overfitting, manage multicollinearity, or build models that predict more reliably on new data.

This chapter follows directly from the previous one. There we distinguished explanation from prediction. Here we develop one of the main statistical responses to that distinction.

2 Key Concepts

Regularisation shrinks coefficients to stabilise the model.
Ridge regression shrinks all coefficients continuously towards zero.
Lasso regression can shrink some coefficients exactly to zero.
Elastic net combines ridge and lasso behaviour.
Cross-validation is used to tune the amount of shrinkage.
The interpretation of coefficients is changed by regularisation, which improves model stability, prediction, and objective variable reduction.

3 When This Method Is Appropriate

You should consider regularisation when:

the predictor set is large relative to the amount of data;
several predictors are strongly correlated;
ordinary multiple regression produces unstable coefficients;
prediction on new data matters more than exact coefficient interpretation;
you want a data-driven complement to the more theory-driven model selection discussed in Chapter 13 and the collinearity material in Chapter 15.

Regularisation is not a magic correction for weak scientific questions or poor study design. It is still your responsibility to define sensible predictors and to understand the biology of the system. These methods work best when they extend ecological reasoning rather than replace it.

4 Why Regularisation Matters

Regularisation addresses several common modelling problems.

Variable selection becomes difficult when many candidate predictors are available and only some are genuinely useful. Traditional selection procedures often rely on stepwise inclusion or exclusion, or on statistics such as VIF. Regularisation offers an alternative data-driven route.

Overfitting occurs when the model begins to fit noise together with the underlying biological signal. Such a model often performs well on the observed data but poorly on new observations.

Multicollinearity inflates standard errors and destabilises coefficients when predictors overlap. Regularisation reduces this instability by shrinking coefficients, which usually improves model behaviour even if it introduces some bias.

The point is not to recover perfectly unbiased coefficients. The point is to get a model that is more stable, more generalisable, and more useful for the modelling goal at hand.

5 Ridge, Lasso, and Elastic Net

5.1 Ridge Regression

Ridge regression adds a penalty proportional to the squared size of the coefficients. Large coefficients are penalised more heavily, so the fitted model shrinks them towards zero without setting them exactly to zero.

The practical effect is that all predictors remain in the model, but the most unstable coefficients are tamed. Ridge is therefore especially useful when multicollinearity is the main problem and you do not necessarily want automatic variable removal.

5.2 Lasso Regression

Lasso regression uses a penalty based on the absolute values of the coefficients. This has an important consequence: some coefficients can be shrunk all the way to zero.

That means lasso does two jobs at once. It shrinks the model, and it can also perform automatic variable selection. When you have many candidate predictors and suspect that some contribute little to predictive performance, lasso can be attractive.

5.3 Elastic Net

Elastic net combines the ridge and lasso penalties. It is useful when predictors are strongly correlated and you want a compromise between coefficient shrinkage and variable selection.

In practice, elastic net is often a strong default when you are unsure whether ridge or lasso is the more appropriate starting point.

6 Cross-Validation

The amount of shrinkage is controlled by a tuning parameter, usually written as $\lambda$. When $\lambda = 0$, the fitted model behaves like ordinary least squares. As $\lambda$ increases, the penalty grows stronger and coefficients are shrunk more aggressively.

The problem is that we do not know the best value of $\lambda$ in advance. This is where cross-validation becomes central.

In k-fold cross-validation, the data are split into k subsets. The model is trained repeatedly on k - 1 folds and evaluated on the held-out fold. This gives an estimate of how well the model performs away from the data used to fit it. We then choose the tuning parameter that gives the best average predictive performance.

Cross-validation therefore helps us avoid selecting a model that is optimised only for the present sample.

7 R Functions

In R, the usual introductory function for regularised regression is glmnet::cv.glmnet():

glmnet::cv.glmnet(x, y, alpha = 0)   # ridge
glmnet::cv.glmnet(x, y, alpha = 1)   # lasso
glmnet::cv.glmnet(x, y, alpha = 0.5) # elastic net

The important practical detail is that glmnet expects a predictor matrix rather than the formula interface used by lm().

8 Example 1: Ridge Regression with the Seaweed Data

The seaweed dataset should now be familiar. We will use the climatic predictors annMean, augMean, augSD, febSD, and febRange to predict Y.

8.1 Prepare the Data

The response is centred, and the predictors are supplied as a matrix:

y <- sw |>
  select(Y) |>
  scale(center = TRUE, scale = FALSE) |>
  as.matrix()

X <- sw |>
  select(-X, -dist, -bio, -Y, -Y1, -Y2) |>
  as.matrix()

8.2 Fit the Cross-Validated Ridge Model

set.seed(123)
ridge_cv <- cv.glmnet(
  X, y,
  alpha = 0,
  lambda = lambdas_to_try,
  standardize = TRUE,
  nfolds = 10
)

8.3 Inspect the Cross-Validation Curve

Figure 1: Cross-validation statistics for ridge regression applied to the seaweed data.

The two vertical lines in Figure 1 identify the $\lambda$ that minimises cross-validated error (lambda.min) and the larger, more conservative value within one standard error of that minimum (lambda.1se). The latter is often chosen when a simpler, more stable model is preferred.

8.4 Extract the Fitted Model

ridge_model <- glmnet(
  X, y,
  alpha = 0,
  lambda = ridge_cv$lambda.min,
  standardize = TRUE
)

ridge_pred <- predict(ridge_model, X)
ridge_rsq <- cor(y, ridge_pred) ^ 2
coef(ridge_model)

6 x 1 sparse Matrix of class "dgCMatrix"
                     s0
(Intercept) -0.12916991
augMean      0.25319971
febRange     0.03884481
febSD       -0.02964465
augSD        0.02599145
annMean      0.02672789

8.5 Interpret the Ridge Results

Ridge keeps all predictors in the model, but shrinks them towards zero. This means the coefficients remain interpretable in the broad sense, but their absolute values are biased by design. Ridge is therefore most useful when the main aim is stable prediction or improved behaviour under collinearity, rather than precise coefficient interpretation.

In this seaweed example, the regularised model still retains all climatic predictors, but it reduces their instability and produces a model that is better suited to predictive use than an ordinary unpenalised fit under the same overlap among predictors.

8.6 Reporting

A journal-style Results statement could read:

Ridge regression was fitted to the seaweed climate data using 10-fold cross-validation to select the optimal penalty parameter. The final model retained all five climatic predictors and explained a substantial proportion of the variation in the response (R^2 approximately 0.67). The purpose of the analysis was not strict coefficient interpretation, but improved coefficient stability and predictive behaviour under overlapping climatic predictors.

9 Example 2: Lasso Regression

Lasso uses the same data structure and the same cross-validation logic. The key difference is alpha = 1.

lasso_cv <- cv.glmnet(
  X, y,
  alpha = 1,
  lambda = lambdas_to_try,
  standardize = TRUE,
  nfolds = 10
)

lasso_model <- glmnet(
  X, y,
  alpha = 1,
  lambda = lasso_cv$lambda.min,
  standardize = TRUE
)

lasso_pred <- predict(lasso_model, X)
lasso_rsq <- cor(y, lasso_pred) ^ 2

Figure 2: Cross-validation statistics for lasso regression applied to the seaweed data.

coef(lasso_model)

6 x 1 sparse Matrix of class "dgCMatrix"
                     s0
(Intercept) -0.12886019
augMean      0.26097296
febRange     0.03431981
febSD       -0.02497532
augSD        0.02441380
annMean      0.02021480

The important feature of lasso is that some coefficients can be set exactly to zero. This makes lasso useful when you want the model itself to carry out a degree of variable selection.

In the seaweed example, the chosen penalty reduces the effective model complexity by shrinking weaker coefficients more strongly. The resulting model is easier to simplify than the ridge model because some predictors can be removed altogether.

9.1 Reporting

A journal-style Results statement could read:

Lasso regression was fitted to the seaweed climate data with the penalty chosen by 10-fold cross-validation. The regularised fit reduced model complexity by shrinking weak coefficients strongly and, where appropriate, setting some coefficients to zero. The final model explained substantial variation in the response (R^2 approximately 0.67) while providing a more compact predictor set than ridge regression.

10 Example 3: Elastic Net Regression

Elastic net introduces a second tuning parameter, alpha, which controls the balance between ridge and lasso behaviour.

cv_results <- lapply(alphas_to_try, function(a) {
  cv.glmnet(
    X, y,
    alpha = a,
    lambda = lambdas_to_try,
    standardize = TRUE,
    nfolds = 10
  )
})

best_result <- which.min(sapply(cv_results, function(x) min(x$cvm)))
best_alpha <- alphas_to_try[best_result]
best_lambda <- cv_results[[best_result]]$lambda.min

elastic_model <- glmnet(
  X, y,
  alpha = best_alpha,
  lambda = best_lambda,
  standardize = TRUE
)

elastic_pred <- predict(elastic_model, X)
elastic_rsq <- cor(y, elastic_pred) ^ 2

Figure 3: Cross-validation statistics for elastic net regression applied to the seaweed data.

coef(elastic_model)

6 x 1 sparse Matrix of class "dgCMatrix"
                     s0
(Intercept) -0.12910260
augMean      0.25467960
febRange     0.03797258
febSD       -0.02874393
augSD        0.02568390
annMean      0.02547694

Elastic net is often useful when predictors occur in correlated groups. Lasso may select one predictor and drop the others. Ridge keeps them all. Elastic net often provides a more balanced compromise.

10.1 Reporting

A journal-style Results statement could read:

Elastic net regression was used to model the seaweed response with both the mixing parameter and penalty term selected by cross-validation. The optimal model combined coefficient shrinkage with variable reduction (alpha = 0.2), explained substantial variation in the response (R^2 approximately 0.67), and provided a stable compromise between ridge and lasso behaviour.

11 Theory-Driven and Data-Driven Variable Selection

The choice between theory-driven and data-driven variable selection should not be treated as a fight with a single winner. In practice, the strongest ecological modelling often combines both.

Theory-driven selection is central to the scientific method. It uses prior ecological reasoning to define a defensible set of candidate predictors. This keeps the model close to mechanism and strengthens interpretation.

Data-driven methods, including regularisation, can then help assess which predictors contribute most strongly to predictive performance, where redundancy lies, and how strongly coefficients need to be stabilised. They are especially useful in high-dimensional settings or when the predictors are strongly overlapping.

The danger is to let automated variable selection replace ecological thinking. Regularisation can help refine the model, but it cannot tell you what the scientific question ought to be.

12 Summary

Regularisation is useful when predictors are many, overlapping, or likely to produce unstable ordinary regression coefficients.
Ridge shrinks all coefficients, lasso can remove some, and elastic net blends both behaviours.
Cross-validation is central to selecting the amount of shrinkage.
These methods are usually most useful when prediction, stability, and model simplification matter more than exact coefficient interpretation.
Regularisation should complement, not replace, ecological reasoning and theory-driven model building.

The final chapter now turns from model choice to the workflow required to make the whole analysis transparent and reproducible.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2026,
  author = {Smit, A. J., and J. Smit, A.},
  title = {24. {Regularisation}},
  date = {2026-03-19},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/24-regularisation.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., J. Smit A (2026) 24. Regularisation. http://tangledbank.netlify.app/BCB744/basic_stats/24-regularisation.html.

--- title: "24. Regularisation" subtitle: "Ridge, Lasso, Elastic Net, and Cross-Validation" author: "A. J. Smit" date: last-modified date-format: "YYYY/MM/DD" reference-location: margin --- ```{r code-brewing-opts} #| echo: false knitr::opts_chunk$set( comment = "R>", warning = FALSE, message = FALSE, fig.width = 6.5, fig.height = 4.5, out.width = "88%", fig.asp = NULL, fig.align = "center", fig.retina = 2, dpi = 300 ) ``` ```{r} #| echo: false library(tidyverse) library(glmnet) sw <- read.csv("../../data/BCB743/seaweed/spp_df2.csv") y <- sw |> select(Y) |> scale(center = TRUE, scale = FALSE) |> as.matrix() X <- sw |> select(-X, -dist, -bio, -Y, -Y1, -Y2) |> as.matrix() lambdas_to_try <- 10 ^ seq(-3, 3, length.out = 100) alphas_to_try <- seq(0, 1, by = 0.1) ``` # Introduction Regularisation techniques are useful when ordinary multiple regression starts to struggle under the weight of many predictors, overlapping predictors, or a modelling goal that leans more toward prediction than explanation. In that setting, ordinary least squares can produce unstable coefficients, poor generalisation, and models that appear stronger in the sample than they really are. Regularisation addresses this by shrinking coefficients towards zero. In some cases, this simply stabilises them. In others, it also removes weak predictors from the model entirely. This makes regularisation relevant when we want to reduce overfitting, manage multicollinearity, or build models that predict more reliably on new data. This chapter follows directly from the previous one. There we distinguished explanation from prediction. Here we develop one of the main statistical responses to that distinction. # Key Concepts - **Regularisation** shrinks coefficients to stabilise the model. - **Ridge regression** shrinks all coefficients continuously towards zero. - **Lasso regression** can shrink some coefficients exactly to zero. - **Elastic net** combines ridge and lasso behaviour. - **Cross-validation** is used to tune the amount of shrinkage. - **The interpretation** of coefficients is changed by regularisation, which improves model stability, prediction, and objective variable reduction. # When This Method Is Appropriate You should consider regularisation when: - the predictor set is large relative to the amount of data; - several predictors are strongly correlated; - ordinary multiple regression produces unstable coefficients; - prediction on new data matters more than exact coefficient interpretation; - you want a data-driven complement to the more theory-driven model selection discussed in [Chapter 13](13-multiple-regression-and-model-specification.qmd) and the collinearity material in [Chapter 15](15-collinearity-confounding-measurement-error.qmd). Regularisation is not a magic correction for weak scientific questions or poor study design. It is still your responsibility to define sensible predictors and to understand the biology of the system. These methods work best when they extend ecological reasoning rather than replace it. # Why Regularisation Matters Regularisation addresses several common modelling problems. **Variable selection** becomes difficult when many candidate predictors are available and only some are genuinely useful. Traditional selection procedures often rely on stepwise inclusion or exclusion, or on statistics such as VIF. Regularisation offers an alternative data-driven route. **Overfitting** occurs when the model begins to fit noise together with the underlying biological signal. Such a model often performs well on the observed data but poorly on new observations. **Multicollinearity** inflates standard errors and destabilises coefficients when predictors overlap. Regularisation reduces this instability by shrinking coefficients, which usually improves model behaviour even if it introduces some bias. The point is not to recover perfectly unbiased coefficients. The point is to get a model that is more stable, more generalisable, and more useful for the modelling goal at hand. # Ridge, Lasso, and Elastic Net ## Ridge Regression Ridge regression adds a penalty proportional to the squared size of the coefficients. Large coefficients are penalised more heavily, so the fitted model shrinks them towards zero without setting them exactly to zero. The practical effect is that all predictors remain in the model, but the most unstable coefficients are tamed. Ridge is therefore especially useful when multicollinearity is the main problem and you do not necessarily want automatic variable removal. ## Lasso Regression Lasso regression uses a penalty based on the absolute values of the coefficients. This has an important consequence: some coefficients can be shrunk all the way to zero. That means lasso does two jobs at once. It shrinks the model, and it can also perform automatic variable selection. When you have many candidate predictors and suspect that some contribute little to predictive performance, lasso can be attractive. ## Elastic Net Elastic net combines the ridge and lasso penalties. It is useful when predictors are strongly correlated and you want a compromise between coefficient shrinkage and variable selection. In practice, elastic net is often a strong default when you are unsure whether ridge or lasso is the more appropriate starting point. # Cross-Validation The amount of shrinkage is controlled by a tuning parameter, usually written as $\lambda$. When $\lambda = 0$, the fitted model behaves like ordinary least squares. As $\lambda$ increases, the penalty grows stronger and coefficients are shrunk more aggressively. The problem is that we do not know the best value of $\lambda$ in advance. This is where cross-validation becomes central. In *k*-fold cross-validation, the data are split into *k* subsets. The model is trained repeatedly on *k* - 1 folds and evaluated on the held-out fold. This gives an estimate of how well the model performs away from the data used to fit it. We then choose the tuning parameter that gives the best average predictive performance. Cross-validation therefore helps us avoid selecting a model that is optimised only for the present sample. # R Functions In R, the usual introductory function for regularised regression is `glmnet::cv.glmnet()`: ```{r} #| eval: false glmnet::cv.glmnet(x, y, alpha = 0) # ridge glmnet::cv.glmnet(x, y, alpha = 1) # lasso glmnet::cv.glmnet(x, y, alpha = 0.5) # elastic net ``` The important practical detail is that `glmnet` expects a predictor matrix rather than the formula interface used by `lm()`. # Example 1: Ridge Regression with the Seaweed Data The seaweed dataset should now be familiar. We will use the climatic predictors `annMean`, `augMean`, `augSD`, `febSD`, and `febRange` to predict `Y`. ## Prepare the Data The response is centred, and the predictors are supplied as a matrix: ```{r} #| eval: false y <- sw |> select(Y) |> scale(center = TRUE, scale = FALSE) |> as.matrix() X <- sw |> select(-X, -dist, -bio, -Y, -Y1, -Y2) |> as.matrix() ``` ## Fit the Cross-Validated Ridge Model ```{r} set.seed(123) ridge_cv <- cv.glmnet( X, y, alpha = 0, lambda = lambdas_to_try, standardize = TRUE, nfolds = 10 ) ``` ## Inspect the Cross-Validation Curve ```{r} #| echo: false #| fig-width: 4.6 #| fig-height: 2.6 #| label: fig-ridge-cv #| fig-cap: "Cross-validation statistics for ridge regression applied to the seaweed data." ridge_cv_df <- data.frame( lambda = ridge_cv$lambda, cvm = ridge_cv$cvm, cvsd = ridge_cv$cvsd, nzero = ridge_cv$nzero ) ggplot(ridge_cv_df, aes(x = log(lambda), y = cvm)) + geom_errorbar(aes(ymin = cvm - cvsd, ymax = cvm + cvsd), width = 0, colour = "dodgerblue4", linewidth = 0.2) + geom_point(colour = "dodgerblue4", fill = "white", shape = 21, size = 0.7) + geom_line(colour = "dodgerblue4", linewidth = 0.3) + geom_line(aes(y = nzero / max(nzero) * max(cvm)), colour = "magenta") + scale_y_continuous( name = "Mean squared error", sec.axis = sec_axis(~ . / max(ridge_cv_df$cvm) * max(ridge_cv_df$nzero), name = "No. non-zero coefficients") ) + labs(x = "log(lambda)") + theme_grey(base_size = 11) + theme( axis.title.y.right = element_text(color = "magenta"), axis.title.y.left = element_text(color = "dodgerblue4"), axis.text.y.right = element_text(color = "magenta"), axis.text.y.left = element_text(color = "dodgerblue4") ) + geom_vline(xintercept = log(ridge_cv$lambda.min), linetype = "dashed") + geom_vline(xintercept = log(ridge_cv$lambda.1se), linetype = "dashed") ``` The two vertical lines in @fig-ridge-cv identify the $\lambda$ that minimises cross-validated error (`lambda.min`) and the larger, more conservative value within one standard error of that minimum (`lambda.1se`). The latter is often chosen when a simpler, more stable model is preferred. ## Extract the Fitted Model ```{r} ridge_model <- glmnet( X, y, alpha = 0, lambda = ridge_cv$lambda.min, standardize = TRUE ) ridge_pred <- predict(ridge_model, X) ridge_rsq <- cor(y, ridge_pred) ^ 2 coef(ridge_model) ``` ## Interpret the Ridge Results Ridge keeps all predictors in the model, but shrinks them towards zero. This means the coefficients remain interpretable in the broad sense, but their absolute values are biased by design. Ridge is therefore most useful when the main aim is stable prediction or improved behaviour under collinearity, rather than precise coefficient interpretation. In this seaweed example, the regularised model still retains all climatic predictors, but it reduces their instability and produces a model that is better suited to predictive use than an ordinary unpenalised fit under the same overlap among predictors. ## Reporting A journal-style Results statement could read: Ridge regression was fitted to the seaweed climate data using 10-fold cross-validation to select the optimal penalty parameter. The final model retained all five climatic predictors and explained a substantial proportion of the variation in the response (`R^2` approximately `r round(ridge_rsq, 2)`). The purpose of the analysis was not strict coefficient interpretation, but improved coefficient stability and predictive behaviour under overlapping climatic predictors. # Example 2: Lasso Regression Lasso uses the same data structure and the same cross-validation logic. The key difference is `alpha = 1`. ```{r} lasso_cv <- cv.glmnet( X, y, alpha = 1, lambda = lambdas_to_try, standardize = TRUE, nfolds = 10 ) lasso_model <- glmnet( X, y, alpha = 1, lambda = lasso_cv$lambda.min, standardize = TRUE ) lasso_pred <- predict(lasso_model, X) lasso_rsq <- cor(y, lasso_pred) ^ 2 ``` ```{r} #| echo: false #| fig-width: 4.6 #| fig-height: 2.6 #| label: fig-lasso-cv #| fig-cap: "Cross-validation statistics for lasso regression applied to the seaweed data." lasso_cv_df <- data.frame( lambda = lasso_cv$lambda, cvm = lasso_cv$cvm, cvsd = lasso_cv$cvsd, nzero = lasso_cv$nzero ) ggplot(lasso_cv_df, aes(x = log(lambda), y = cvm)) + geom_errorbar(aes(ymin = cvm - cvsd, ymax = cvm + cvsd), width = 0, colour = "dodgerblue4", linewidth = 0.2) + geom_point(colour = "dodgerblue4", fill = "white", shape = 21, size = 0.7) + geom_line(colour = "dodgerblue4", linewidth = 0.3) + geom_line(aes(y = nzero / max(nzero) * max(cvm)), colour = "magenta") + scale_y_continuous( name = "Mean squared error", sec.axis = sec_axis(~ . / max(lasso_cv_df$cvm) * max(lasso_cv_df$nzero), name = "No. non-zero coefficients") ) + labs(x = "log(lambda)") + theme_grey(base_size = 11) + theme( axis.title.y.right = element_text(color = "magenta"), axis.title.y.left = element_text(color = "dodgerblue4"), axis.text.y.right = element_text(color = "magenta"), axis.text.y.left = element_text(color = "dodgerblue4") ) + geom_vline(xintercept = log(lasso_cv$lambda.min), linetype = "dashed") + geom_vline(xintercept = log(lasso_cv$lambda.1se), linetype = "dashed") ``` ```{r} coef(lasso_model) ``` The important feature of lasso is that some coefficients can be set exactly to zero. This makes lasso useful when you want the model itself to carry out a degree of variable selection. In the seaweed example, the chosen penalty reduces the effective model complexity by shrinking weaker coefficients more strongly. The resulting model is easier to simplify than the ridge model because some predictors can be removed altogether. ## Reporting A journal-style Results statement could read: Lasso regression was fitted to the seaweed climate data with the penalty chosen by 10-fold cross-validation. The regularised fit reduced model complexity by shrinking weak coefficients strongly and, where appropriate, setting some coefficients to zero. The final model explained substantial variation in the response (`R^2` approximately `r round(lasso_rsq, 2)`) while providing a more compact predictor set than ridge regression. # Example 3: Elastic Net Regression Elastic net introduces a second tuning parameter, `alpha`, which controls the balance between ridge and lasso behaviour. ```{r} cv_results <- lapply(alphas_to_try, function(a) { cv.glmnet( X, y, alpha = a, lambda = lambdas_to_try, standardize = TRUE, nfolds = 10 ) }) best_result <- which.min(sapply(cv_results, function(x) min(x$cvm))) best_alpha <- alphas_to_try[best_result] best_lambda <- cv_results[[best_result]]$lambda.min elastic_model <- glmnet( X, y, alpha = best_alpha, lambda = best_lambda, standardize = TRUE ) elastic_pred <- predict(elastic_model, X) elastic_rsq <- cor(y, elastic_pred) ^ 2 ``` ```{r} #| echo: false #| fig-width: 4.6 #| fig-height: 2.6 #| label: fig-elastic-cv #| fig-cap: "Cross-validation statistics for elastic net regression applied to the seaweed data." cv_df <- data.frame( lambda = cv_results[[best_result]]$lambda, cvm = cv_results[[best_result]]$cvm, cvsd = cv_results[[best_result]]$cvsd, nzero = cv_results[[best_result]]$nzero ) ggplot(cv_df, aes(x = log(lambda), y = cvm)) + geom_errorbar(aes(ymin = cvm - cvsd, ymax = cvm + cvsd), width = 0, colour = "dodgerblue4", linewidth = 0.2) + geom_point(colour = "dodgerblue4", fill = "white", shape = 21, size = 0.7) + geom_line(colour = "dodgerblue4", linewidth = 0.3) + geom_line(aes(y = nzero / max(nzero) * max(cvm)), colour = "magenta") + scale_y_continuous( name = "Mean squared error", sec.axis = sec_axis(~ . / max(cv_df$cvm) * max(cv_df$nzero), name = "No. non-zero coefficients") ) + labs(x = "log(lambda)") + theme_grey(base_size = 11) + theme( axis.title.y.right = element_text(color = "magenta"), axis.title.y.left = element_text(color = "dodgerblue4"), axis.text.y.right = element_text(color = "magenta"), axis.text.y.left = element_text(color = "dodgerblue4") ) + geom_vline(xintercept = log(best_lambda), linetype = "dashed") ``` ```{r} coef(elastic_model) ``` Elastic net is often useful when predictors occur in correlated groups. Lasso may select one predictor and drop the others. Ridge keeps them all. Elastic net often provides a more balanced compromise. ## Reporting A journal-style Results statement could read: Elastic net regression was used to model the seaweed response with both the mixing parameter and penalty term selected by cross-validation. The optimal model combined coefficient shrinkage with variable reduction (`alpha =` `r best_alpha`), explained substantial variation in the response (`R^2` approximately `r round(elastic_rsq, 2)`), and provided a stable compromise between ridge and lasso behaviour. # Theory-Driven and Data-Driven Variable Selection The choice between theory-driven and data-driven variable selection should not be treated as a fight with a single winner. In practice, the strongest ecological modelling often combines both. Theory-driven selection is central to the scientific method. It uses prior ecological reasoning to define a defensible set of candidate predictors. This keeps the model close to mechanism and strengthens interpretation. Data-driven methods, including regularisation, can then help assess which predictors contribute most strongly to predictive performance, where redundancy lies, and how strongly coefficients need to be stabilised. They are especially useful in high-dimensional settings or when the predictors are strongly overlapping. The danger is to let automated variable selection replace ecological thinking. Regularisation can help refine the model, but it cannot tell you what the scientific question ought to be. # Summary - Regularisation is useful when predictors are many, overlapping, or likely to produce unstable ordinary regression coefficients. - Ridge shrinks all coefficients, lasso can remove some, and elastic net blends both behaviours. - Cross-validation is central to selecting the amount of shrinkage. - These methods are usually most useful when prediction, stability, and model simplification matter more than exact coefficient interpretation. - Regularisation should complement, not replace, ecological reasoning and theory-driven model building. The final chapter now turns from model choice to the workflow required to make the whole analysis transparent and reproducible.