1. The Statistical Landscape
“Most people use statistics like a drunk man uses a lamppost; more for support than illumination.”
— Andrew Lang
- Why biologists need statistics
- How statistical reasoning fits into the scientific process
- Falsifiability, hypotheses, and inference
- The difference between description, explanation, and prediction
- The broad structure of the methods covered in this module
- None
1 Introduction
Statistics provides a framework for learning from data. In biology and ecology we rarely observe systems under perfectly controlled, repeatable conditions. Instead, we measure processes that vary across space, time, and individuals. Statistical methods allow us to describe that variation, quantify uncertainty, and decide how strongly the data support a biological claim.
This module approaches statistics as a tool for scientific reasoning, rather than as a disconnected list of tests. The central task is to align three things:
- the biological question,
- the data we collect, and
- the model we use to represent the process.
A statistical analysis is only as strong as the fit among these three elements.
2 Key Concepts
The following ideas anchor the rest of the biostatistics sequence.
- Statistical reasoning links biological questions, data, and models.
- Falsifiability helps define hypotheses that data can genuinely challenge.
- Inference is stronger when design, analysis, and interpretation are aligned.
- Explanation and prediction are related but distinct modelling goals.
- Model-based thinking unifies many named statistical procedures.
3 Statistics in the Scientific Process
A scientific workflow in biology usually unfolds in the following broad sequence:
- Observe a pattern or phenomenon in the natural world.
- Formulate a clear biological question.
- Translate that question into a testable hypothesis.
- Design a study that can produce the right data.
- Anticipate alternative explanations and possible confounders.
- Choose an analysis that matches the design and data structure.
- Collect the data.
- Explore, analyse, and interpret the results.
- Communicate the findings clearly in words, figures, and tables.
Statistics is therefore not something applied only at the end of a project. It begins when the study is designed. If the design is weak, no later analysis can fully rescue the inference.
4 Hypotheses, Falsifiability, and Inference
Much of modern biostatistics is tied to the logic of hypothesis testing. This logic sits comfortably within the Popperian view that scientific claims should be falsifiable. A useful scientific hypothesis must be framed in a way that allows data to contradict it.
In practice, we often state:
- a null hypothesis: a sceptical claim of no effect, no difference, or no relationship; and
- an alternative hypothesis: the competing claim we wish to evaluate.
This does not mean science is only about rejecting null hypotheses. Good science also depends on careful observation, exploratory work, and model building. But falsifiability remains a useful discipline because it forces us to state clearly what evidence would count against a claim.
This module emphasises frequentist biostatistics because it provides a practical foundation for many biological applications. That does not make it the only valid way to reason from data. Bayesian methods, multivariate methods, phylogenetic approaches, simulation models, and qualitative approaches all have their place. The key question is always the same: does the method match the scientific problem?
5 What Statistics Does in Biology
In ecological and biological systems, variation is not merely noise to be removed. It is often a property of the system itself. Statistics helps us to:
- describe patterns in data,
- quantify uncertainty around those patterns,
- compare groups or treatments,
- evaluate associations among variables, and
- build models that connect biological mechanisms to observations.
This leads to two broad modelling goals:
- Explanation: attributing a pattern to a biological mechanism.
- Prediction: forecasting outcomes under new conditions.
These goals are related but not identical. A model that predicts well is not always the best model for interpretation, and an interpretable model is not always the most accurate predictor.
6 From Tests to Models
Introductory statistics is often taught as a sequence of named procedures: t-tests, ANOVA, correlations, regressions, and so on. These names are useful, but they can encourage a checklist mentality.
This module instead adopts a model-based view.
A statistical model is a simplified description of how a response variable depends on one or more predictors. In that sense, many classical tests are special cases of a broader modelling framework. Thinking in this way has two advantages:
- it unifies seemingly separate methods, and
- it makes assumptions easier to see and evaluate.
7 The Statistical Toolbox
For the purposes of this module, the methods fall into four broad groups.
7.1 Foundations
We begin with summaries, figures, distributions, and sampling variation. These chapters establish how to describe data before making inferential claims.
7.2 Hypothesis Tests
These methods evaluate claims about means, medians, proportions, or associations. They include t-tests, ANOVA, correlation, and their non-parametric counterparts.
7.3 Regression and Model Building
Regression extends hypothesis testing by modelling the relationship between a response and one or more predictors. This is where much of modern biological data analysis lives.
7.4 Dependence, Extensions, and Workflow
Real data often violate the assumptions of simple models. We therefore need to deal with pseudoreplication, dependence, mixed models, non-normal responses, model evaluation, and reproducible reporting.
8 Core Principles for This Module
Several principles recur throughout the course:
8.1 Define the question before the method
The biological question determines the design and analysis. Methods should not be selected by habit or by what is fashionable.
8.2 Understand the data-generating process
Every dataset reflects a process: how observations were taken, what the experimental units were, and what structure exists in the data.
8.3 Match the model to the process
Predictors should represent meaningful biological drivers or carefully justified proxies.
8.4 Check assumptions explicitly
Assumptions are part of the analysis, not an afterthought. If they fail, that failure tells us something important about the adequacy of the model.
8.5 Separate explanation from prediction
Interpretation and forecasting are not the same goal. The choice of method depends on which goal matters most.
9 How This Book Is Organised
The chapter sequence mirrors how analyses are often conducted in practice:
- data summaries and visualisation,
- distributions and sampling uncertainty,
- inference and assumptions,
- standard hypothesis tests,
- regression and model building,
- common failure modes such as pseudoreplication and collinearity,
- more advanced model structures, and
- reproducible analytical workflow.
This order is not the only way to organise statistics, but it is a practical route for biologists learning to reason from data.
10 What You Should Be Able to Do
By the end of this module, you should be able to:
- translate a biological question into an appropriate statistical analysis,
- identify the response, predictor, and experimental unit in a study,
- recognise the difference between description, inference, and modelling,
- anticipate common design and analysis failures, and
- interpret results in biological rather than merely numerical terms.
11 Final Remark
Statistics is not a final ritual applied after data collection. It is part of the scientific method from the moment a question is formulated. Strong inference depends on good questions, good design, and a model that matches the biology.
Reuse
Citation
@online{smit,_a._j.2026,
author = {Smit, A. J., and J. Smit, A.},
title = {1. {The} {Statistical} {Landscape}},
date = {2026-03-19},
url = {http://tangledbank.netlify.app/BCB744/basic_stats/01-statistical-landscape.html},
langid = {en}
}
