24. Regularisation
Ridge, Lasso, Elastic Net, and Cross-Validation
1 Introduction
Regularisation techniques are useful when ordinary multiple regression starts to struggle under the weight of many predictors, overlapping predictors, or a modelling goal that leans more toward prediction than explanation. In that setting, ordinary least squares can produce unstable coefficients, poor generalisation, and models that appear stronger in the sample than they really are.
Regularisation addresses this by shrinking coefficients towards zero. In some cases, this simply stabilises them. In others, it also removes weak predictors from the model entirely. This makes regularisation relevant when we want to reduce overfitting, manage multicollinearity, or build models that predict more reliably on new data.
This chapter follows directly from the previous one. There we distinguished explanation from prediction. Here we develop one of the main statistical responses to that distinction.
2 Key Concepts
- Regularisation shrinks coefficients to stabilise the model.
- Ridge regression shrinks all coefficients continuously towards zero.
- Lasso regression can shrink some coefficients exactly to zero.
- Elastic net combines ridge and lasso behaviour.
- Cross-validation is used to tune the amount of shrinkage.
- The interpretation of coefficients is changed by regularisation, which improves model stability, prediction, and objective variable reduction.
3 When This Method Is Appropriate
You should consider regularisation when:
- the predictor set is large relative to the amount of data;
- several predictors are strongly correlated;
- ordinary multiple regression produces unstable coefficients;
- prediction on new data matters more than exact coefficient interpretation;
- you want a data-driven complement to the more theory-driven model selection discussed in Chapter 13 and the collinearity material in Chapter 15.
Regularisation is not a magic correction for weak scientific questions or poor study design. It is still your responsibility to define sensible predictors and to understand the biology of the system. These methods work best when they extend ecological reasoning rather than replace it.
4 Why Regularisation Matters
Regularisation addresses several common modelling problems.
Variable selection becomes difficult when many candidate predictors are available and only some are genuinely useful. Traditional selection procedures often rely on stepwise inclusion or exclusion, or on statistics such as VIF. Regularisation offers an alternative data-driven route.
Overfitting occurs when the model begins to fit noise together with the underlying biological signal. Such a model often performs well on the observed data but poorly on new observations.
Multicollinearity inflates standard errors and destabilises coefficients when predictors overlap. Regularisation reduces this instability by shrinking coefficients, which usually improves model behaviour even if it introduces some bias.
The point is not to recover perfectly unbiased coefficients. The point is to get a model that is more stable, more generalisable, and more useful for the modelling goal at hand.
5 Ridge, Lasso, and Elastic Net
5.1 Ridge Regression
Ridge regression adds a penalty proportional to the squared size of the coefficients. Large coefficients are penalised more heavily, so the fitted model shrinks them towards zero without setting them exactly to zero.
The practical effect is that all predictors remain in the model, but the most unstable coefficients are tamed. Ridge is therefore especially useful when multicollinearity is the main problem and you do not necessarily want automatic variable removal.
5.2 Lasso Regression
Lasso regression uses a penalty based on the absolute values of the coefficients. This has an important consequence: some coefficients can be shrunk all the way to zero.
That means lasso does two jobs at once. It shrinks the model, and it can also perform automatic variable selection. When you have many candidate predictors and suspect that some contribute little to predictive performance, lasso can be attractive.
5.3 Elastic Net
Elastic net combines the ridge and lasso penalties. It is useful when predictors are strongly correlated and you want a compromise between coefficient shrinkage and variable selection.
In practice, elastic net is often a strong default when you are unsure whether ridge or lasso is the more appropriate starting point.
6 Cross-Validation
The amount of shrinkage is controlled by a tuning parameter, usually written as \(\lambda\). When \(\lambda = 0\), the fitted model behaves like ordinary least squares. As \(\lambda\) increases, the penalty grows stronger and coefficients are shrunk more aggressively.
The problem is that we do not know the best value of \(\lambda\) in advance. This is where cross-validation becomes central.
In k-fold cross-validation, the data are split into k subsets. The model is trained repeatedly on k - 1 folds and evaluated on the held-out fold. This gives an estimate of how well the model performs away from the data used to fit it. We then choose the tuning parameter that gives the best average predictive performance.
Cross-validation therefore helps us avoid selecting a model that is optimised only for the present sample.
7 R Functions
In R, the usual introductory function for regularised regression is glmnet::cv.glmnet():
The important practical detail is that glmnet expects a predictor matrix rather than the formula interface used by lm().
8 Example 1: Ridge Regression with the Seaweed Data
The seaweed dataset should now be familiar. We will use the climatic predictors annMean, augMean, augSD, febSD, and febRange to predict Y.
8.1 Prepare the Data
The response is centred, and the predictors are supplied as a matrix:
8.2 Fit the Cross-Validated Ridge Model
8.3 Inspect the Cross-Validation Curve
The two vertical lines in Figure 1 identify the \(\lambda\) that minimises cross-validated error (lambda.min) and the larger, more conservative value within one standard error of that minimum (lambda.1se). The latter is often chosen when a simpler, more stable model is preferred.
8.4 Extract the Fitted Model
6 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.12916991
augMean 0.25319971
febRange 0.03884481
febSD -0.02964465
augSD 0.02599145
annMean 0.02672789
8.5 Interpret the Ridge Results
Ridge keeps all predictors in the model, but shrinks them towards zero. This means the coefficients remain interpretable in the broad sense, but their absolute values are biased by design. Ridge is therefore most useful when the main aim is stable prediction or improved behaviour under collinearity, rather than precise coefficient interpretation.
In this seaweed example, the regularised model still retains all climatic predictors, but it reduces their instability and produces a model that is better suited to predictive use than an ordinary unpenalised fit under the same overlap among predictors.
8.6 Reporting
A journal-style Results statement could read:
Ridge regression was fitted to the seaweed climate data using 10-fold cross-validation to select the optimal penalty parameter. The final model retained all five climatic predictors and explained a substantial proportion of the variation in the response (R^2 approximately 0.67). The purpose of the analysis was not strict coefficient interpretation, but improved coefficient stability and predictive behaviour under overlapping climatic predictors.
9 Example 2: Lasso Regression
Lasso uses the same data structure and the same cross-validation logic. The key difference is alpha = 1.
6 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.12886019
augMean 0.26097296
febRange 0.03431981
febSD -0.02497532
augSD 0.02441380
annMean 0.02021480
The important feature of lasso is that some coefficients can be set exactly to zero. This makes lasso useful when you want the model itself to carry out a degree of variable selection.
In the seaweed example, the chosen penalty reduces the effective model complexity by shrinking weaker coefficients more strongly. The resulting model is easier to simplify than the ridge model because some predictors can be removed altogether.
9.1 Reporting
A journal-style Results statement could read:
Lasso regression was fitted to the seaweed climate data with the penalty chosen by 10-fold cross-validation. The regularised fit reduced model complexity by shrinking weak coefficients strongly and, where appropriate, setting some coefficients to zero. The final model explained substantial variation in the response (R^2 approximately 0.67) while providing a more compact predictor set than ridge regression.
10 Example 3: Elastic Net Regression
Elastic net introduces a second tuning parameter, alpha, which controls the balance between ridge and lasso behaviour.
cv_results <- lapply(alphas_to_try, function(a) {
cv.glmnet(
X, y,
alpha = a,
lambda = lambdas_to_try,
standardize = TRUE,
nfolds = 10
)
})
best_result <- which.min(sapply(cv_results, function(x) min(x$cvm)))
best_alpha <- alphas_to_try[best_result]
best_lambda <- cv_results[[best_result]]$lambda.min
elastic_model <- glmnet(
X, y,
alpha = best_alpha,
lambda = best_lambda,
standardize = TRUE
)
elastic_pred <- predict(elastic_model, X)
elastic_rsq <- cor(y, elastic_pred) ^ 26 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.12910260
augMean 0.25467960
febRange 0.03797258
febSD -0.02874393
augSD 0.02568390
annMean 0.02547694
Elastic net is often useful when predictors occur in correlated groups. Lasso may select one predictor and drop the others. Ridge keeps them all. Elastic net often provides a more balanced compromise.
10.1 Reporting
A journal-style Results statement could read:
Elastic net regression was used to model the seaweed response with both the mixing parameter and penalty term selected by cross-validation. The optimal model combined coefficient shrinkage with variable reduction (alpha = 0.2), explained substantial variation in the response (R^2 approximately 0.67), and provided a stable compromise between ridge and lasso behaviour.
11 Theory-Driven and Data-Driven Variable Selection
The choice between theory-driven and data-driven variable selection should not be treated as a fight with a single winner. In practice, the strongest ecological modelling often combines both.
Theory-driven selection is central to the scientific method. It uses prior ecological reasoning to define a defensible set of candidate predictors. This keeps the model close to mechanism and strengthens interpretation.
Data-driven methods, including regularisation, can then help assess which predictors contribute most strongly to predictive performance, where redundancy lies, and how strongly coefficients need to be stabilised. They are especially useful in high-dimensional settings or when the predictors are strongly overlapping.
The danger is to let automated variable selection replace ecological thinking. Regularisation can help refine the model, but it cannot tell you what the scientific question ought to be.
12 Summary
- Regularisation is useful when predictors are many, overlapping, or likely to produce unstable ordinary regression coefficients.
- Ridge shrinks all coefficients, lasso can remove some, and elastic net blends both behaviours.
- Cross-validation is central to selecting the amount of shrinkage.
- These methods are usually most useful when prediction, stability, and model simplification matter more than exact coefficient interpretation.
- Regularisation should complement, not replace, ecological reasoning and theory-driven model building.
The final chapter now turns from model choice to the workflow required to make the whole analysis transparent and reproducible.
Reuse
Citation
@online{smit,_a._j.2026,
author = {Smit, A. J., and J. Smit, A.},
title = {24. {Regularisation}},
date = {2026-03-19},
url = {http://tangledbank.netlify.app/BCB744/basic_stats/24-regularisation.html},
langid = {en}
}
