---
title: "BCB744 Biostatistics — Theory Test (Version 1)"
subtitle: "Total: 100 marks | Time: 90 minutes"
date: "2026"
format:
typst: default
html:
number-sections: false
toc: true
toc-depth: 2
toc-title: "Contents"
engine: knitr
params:
hide_answers: true
---
::: {.callout-important appearance="simple"}
**Instructions**
- This paper has **three parts**: Part A (General Theory, 50 marks), Part B (Experiment Design and Hypothesis Formulation, 25 marks), and Part C (Statistical Output Interpretation, 25 marks).
- Answer **all** questions.
- Write clearly and in complete sentences where prose is required.
- Mark allocations are shown next to each question in **(/ marks)** notation.
- Statistical notation: use *H*~0~ for the null hypothesis and *H*~A~ for the alternative hypothesis.
:::
---
# Part A: General Theory (50 marks)
## Question 1 — The Scientific Method (/6)
a. Explain the difference between a null hypothesis and an alternative hypothesis. **(/ 2)**
b. Why is it important to formulate hypotheses *before* collecting data? What statistical problem arises when hypotheses are adjusted after seeing the data? **(/ 2)**
c. What is a confounding variable? Provide one example from biology and explain how you would control for it in an experiment. **(/ 2)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 1**
*a.*
- ✓ The null hypothesis (*H*~0~) is the default position of no effect, no difference, or no relationship between variables — it is what we assume to be true until evidence suggests otherwise.
- ✓ The alternative hypothesis (*H*~A~) states that there *is* an effect, difference, or relationship — it is what the researcher typically hopes to support with data.
*b.*
- ✓ Formulating hypotheses before data collection ensures that the test is a genuine test of a prediction rather than a post-hoc rationalisation, preserving the logical structure of hypothesis testing.
- ✓ Adjusting hypotheses after seeing the data is called *HARKing* (Hypothesising After Results are Known) or contributes to *p-hacking*, inflating the Type I error rate because multiple implicit comparisons have been made without correction.
*c.*
- ✓ A confounding variable is one that is associated with both the predictor and the response variable, creating a spurious apparent relationship. Example: studying the effect of intertidal height on limpet size, with wave exposure confounding both (exposed shores have lower intertidal zones and smaller limpets). Control: hold wave exposure constant by sampling from shores of the same exposure class, or include it as a covariate in the model.
`r if (params$hide_answers) ":::"`
:::
---
## Question 2 — Descriptive Statistics and Visualisation (/5)
a. When is the median a more appropriate measure of central tendency than the mean? Give a biological example where you would choose the median. **(/ 2)**
b. A researcher has continuous measurements of body mass (in grams) from three species of lizard. Suggest the most informative plot type for comparing the distributions across species, and explain in one sentence why it is better than a bar chart with error bars. **(/ 3)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 2**
*a.*
- ✓ The median is preferred when the data are skewed or contain outliers, because unlike the mean it is not pulled towards extreme values.
- ✓ Example: parasite load per host individual — most hosts carry few parasites but a small number carry very heavy infestations (a right-skewed distribution), so the median better represents the typical host.
*b.*
- ✓ A **violin plot** (or overlaid boxplot) is most informative because it shows the full distribution shape — modality, skew, and spread — not just a point estimate and a single error metric.
- ✓✓ A bar chart with error bars reduces all distributional information to a mean and one measure of spread, hiding the shape of the data (e.g., bimodality, heavy tails), which can be biologically meaningful.
:::
`r if (params$hide_answers) ":::"`
---
## Question 3 — Probability Distributions and the Central Limit Theorem (/8)
a. List **three** characteristics of the normal distribution. **(/ 3)**
b. A researcher is counting the number of bird nesting attempts per territory per season. Under what conditions would you expect these counts to follow a Poisson distribution? What key assumption must hold? **(/ 3)**
c. State the Central Limit Theorem and explain why it is important for applying parametric hypothesis tests to biological data that are not perfectly normally distributed. **(/ 2)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 3**
*a.* Any three of the following:
- ✓ It is symmetric around its mean.
- ✓ Its mean, median, and mode are all equal.
- ✓ It is characterised entirely by two parameters: the mean (μ) and standard deviation (σ).
- ✓ Approximately 68% of values fall within ±1 SD, 95% within ±2 SD, and 99.7% within ±3 SD of the mean.
- ✓ It is bell-shaped and extends to ±∞ (though with negligible probability in the tails).
*b.*
- ✓ A Poisson distribution applies to counts of events occurring in a fixed interval (here, a territory per season), when events occur independently of one another.
- ✓ The key assumption is that the mean count equals the variance (equidispersion); if variance greatly exceeds the mean (overdispersion), a negative binomial distribution may be more appropriate.
- ✓ The counts must be non-negative integers, and the probability of an event must be constant across territories and seasons.
*c.*
- ✓ The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as sample size increases, regardless of the shape of the underlying population distribution.
- ✓ This is important because it means that, with sufficiently large samples, parametric tests based on normal theory (t-tests, ANOVA, regression) remain valid even when the raw data deviate moderately from normality.
:::
`r if (params$hide_answers) ":::"`
---
## Question 4 — Statistical Inference and Error (/7)
a. Define a *p*-value in plain language (without using the word "probability" in a circular way). **(/ 2)**
b. Distinguish between a Type I error and a Type II error. Which one does the significance level *α* directly control? **(/ 3)**
c. A researcher increases the sample size of their experiment from *n* = 20 to *n* = 80. What effect does this have on statistical power, and why? **(/ 2)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 4**
*a.*
- ✓ A *p*-value is the probability of obtaining a test statistic at least as extreme as the one observed, *assuming the null hypothesis is true*. It quantifies how surprising the data are under *H*~0~.
- ✓ A small *p*-value means the observed result would be rare if *H*~0~ were true — it does **not** tell us the probability that *H*~0~ is true.
*b.*
- ✓ A **Type I error** (false positive) is rejecting a true *H*~0~: concluding there is an effect when there is none.
- ✓ A **Type II error** (false negative) is failing to reject a false *H*~0~: missing a real effect.
- ✓ The significance level *α* directly controls the Type I error rate: by setting *α* = 0.05 we accept a 5% chance of a false positive.
*c.*
- ✓ Increasing sample size increases statistical power (the ability to detect a real effect when one exists), because larger samples produce more precise estimates with smaller standard errors.
- ✓ With smaller standard errors, the test statistic becomes larger for the same true effect size, making it more likely to exceed the critical threshold and lead to rejection of a false *H*~0~.
:::
`r if (params$hide_answers) ":::"`
---
## Question 5 — Assumptions and Transformations (/6)
a. List the **three** main assumptions that must hold before applying a parametric test (such as a *t*-test or ANOVA). For each, name one diagnostic method. **(/ 6)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 5**
*a.* Two marks per assumption–diagnostic pair (1 mark for assumption, 1 mark for diagnostic):
- ✓ **Normality** of residuals (or data within groups): diagnosed with a Q-Q (quantile-quantile) plot or a Shapiro-Wilk test.
- ✓ **Homogeneity of variance** (homoscedasticity): diagnosed with Levene's test or a residuals-vs-fitted plot (look for a fan-shaped pattern).
- ✓ **Independence** of observations: assessed by knowledge of the study design (e.g., absence of repeated measures, spatial/temporal autocorrelation, or nested structure) — there is no single statistical test; it requires scientific judgement and careful design documentation.
:::
`r if (params$hide_answers) ":::"`
---
## Question 6 — Correlation and Association (/8)
a. Explain the conceptual difference between Pearson's *r* and Spearman's *ρ* (rho). Under what circumstances would you choose Spearman over Pearson? **(/ 4)**
b. A marine biologist finds a strong positive correlation (*r* = 0.91) between mean sea surface temperature (SST) and the frequency of coral bleaching events over a 20-year dataset. A colleague claims this proves that warming SST *causes* bleaching. Provide **two** alternative explanations for this correlation that do not require direct causation, and briefly explain why the biologist's claim is premature. **(/ 4)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 6**
*a.*
- ✓ Pearson's *r* measures the *linear* association between two continuous, normally distributed variables; it is sensitive to outliers.
- ✓ Spearman's *ρ* measures the monotonic association between the *ranks* of two variables; it makes no assumptions about the distribution of the data.
- ✓✓ Choose Spearman when: the data are not normally distributed; the relationship is monotonic but non-linear; outliers are present; or one or both variables are ordinal.
*b.*
- ✓ **Confounding variable**: Both SST and bleaching frequency may be driven by a third variable (e.g., El Niño–Southern Oscillation events), which independently raises SST and creates thermal stress on reefs.
- ✓ **Spurious correlation / coincident trends**: Both variables may be trending upward over time for unrelated reasons (SST due to climate forcing; bleaching frequency due to increasing observer effort and reporting), producing a correlation that does not reflect a mechanistic link.
- ✓✓ Correlation quantifies the strength and direction of association but cannot establish the *direction of causation* (SST could affect bleaching, bleaching could alter local thermal dynamics, or a third factor could drive both). Establishing causation requires experimental manipulation or a strong mechanistic causal framework.
:::
`r if (params$hide_answers) ":::"`
---
## Question 7 — Simple Linear Regression (/10)
a. A regression of seagrass shoot density (shoots m⁻²) on water clarity (Secchi depth, m) yields: $\hat{y} = 14.3 + 8.7x$. Interpret both the intercept and the slope in biological terms. **(/ 2)**
b. The model returns *R*² = 0.71. What does this value mean? **(/ 2)**
c. Describe **three** diagnostic plots or tests you would use to verify that the assumptions of the linear regression model are met. For each, state what assumption it checks and what a violation would look like. **(/ 3)**
d. Explain the difference between a **confidence interval** and a **prediction interval** for a regression model. Which is wider, and why? **(/ 3)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 7**
*a.*
- ✓ **Intercept (14.3)**: the estimated shoot density when Secchi depth = 0 m. Biologically, this is an extrapolation beyond measurable water clarity and may not be interpretable in isolation.
- ✓ **Slope (8.7)**: for each additional 1 m increase in Secchi depth (i.e., clearer water), shoot density is predicted to increase by 8.7 shoots m⁻², indicating a positive relationship between light penetration and seagrass density.
*b.*
- ✓✓ *R*² = 0.71 means that 71% of the total variation in shoot density is explained by variation in water clarity (Secchi depth). The remaining 29% is attributable to other factors not captured by the model.
*c.* One mark per diagnostic (assumption + violation pattern):
- ✓ **Residuals-vs-fitted plot** — checks *linearity* and *homoscedasticity*. A violation appears as a curved trend (non-linearity) or a fan shape expanding left to right (heteroscedasticity).
- ✓ **Normal Q-Q plot of residuals** — checks *normality of residuals*. A violation appears as systematic deviation from the 45° reference line, particularly S-curves (heavy or light tails) or a curved bow.
- ✓ **Scale-location plot (or spread-location plot)** — checks *homoscedasticity* specifically. A violation appears as a slope in the lowess line through the square-rooted absolute residuals.
*d.*
- ✓ A **confidence interval** around a fitted line represents the uncertainty in the *mean* response at a given *x* value (i.e., where the true regression line lies).
- ✓ A **prediction interval** represents the uncertainty for a *single new observation* at a given *x* value, incorporating both the uncertainty in the mean (the CI) plus the natural variability of individual observations around that mean.
- ✓ The prediction interval is always wider, because predicting an individual observation requires accounting for both sampling uncertainty in the line's position *and* the inherent scatter of individuals around the line (residual variance).
:::
`r if (params$hide_answers) ":::"`
---
# Part B: Experiment Design and Hypothesis Formulation (25 marks)
## Question 8 — Seabird Egg Mass Across Island Populations (/12)
A researcher is studying egg mass (g) of a colonial seabird across four island populations in the sub-Antarctic. The first six rows of the dataset are shown below:
```
island egg_mass_g
1 A 82.4
2 A 79.1
3 A 84.3
4 B 91.7
5 B 88.2
6 B 94.5
```
The dataset contains 72 records: 18 eggs measured from each of the four islands (A, B, C, D). The researcher wishes to determine **whether mean egg mass differs significantly among the four island populations**.
a. State the formal null and alternative hypotheses for this analysis (use appropriate notation). **(/ 3)**
b. Identify the most appropriate statistical test for this research question. **(/ 2)**
c. Provide **three** specific reasons why you selected this test, with reference to the nature of the predictor and response variables, the number of groups, and the assumptions required. **(/ 6)**
d. If the main test returns a significant result, what additional procedure would you perform, and why? **(/ 1)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 8**
*a.*
- ✓ *H*~0~: The mean egg mass is equal across all four island populations: μ~A~ = μ~B~ = μ~C~ = μ~D~.
- ✓ *H*~A~: At least one island population has a mean egg mass that differs from the others (not all μ are equal).
- ✓ The hypotheses must be stated before analysis; the alternative is non-directional because we have no a priori reason to predict which island will differ.
*b.*
- ✓✓ **One-way analysis of variance (ANOVA)**. If assumptions of normality or variance homogeneity are violated, the non-parametric **Kruskal-Wallis rank-sum test** is the appropriate alternative.
*c.* Two marks per reason (up to 6 marks, one mark for partial answers):
- ✓✓ The **response variable** (egg mass, in grams) is continuous and ratio-scaled, meeting the requirement for parametric testing.
- ✓✓ The **predictor variable** (island) is categorical with **four** levels. Comparing more than two group means requires ANOVA rather than a *t*-test; running multiple pairwise *t*-tests would inflate the family-wise Type I error rate (multiple comparisons problem).
- ✓✓ ANOVA assumes **normality within each group** (assessable via Q-Q plots or Shapiro-Wilk per group) and **homogeneity of variance** across groups (assessable via Levene's test). With 18 replicates per group the design is balanced, which makes ANOVA robust to modest departures from these assumptions.
*d.*
- ✓ A **post-hoc test** (e.g., Tukey's Honestly Significant Difference, HSD) is needed to identify *which specific pairs* of islands differ significantly, since the omnibus ANOVA *F*-test only tells us that *some* difference exists, not where it lies.
:::
`r if (params$hide_answers) ":::"`
---
## Question 9 — Intertidal Algal Cover and Wave Exposure (/13)
An ecologist samples 50 intertidal rocky-shore plots across a gradient of wave exposure. For each plot, wave exposure is measured on a continuous index (0 = fully sheltered; 100 = fully exposed) and percentage cover of the dominant alga *Ectocarpus* sp. is estimated. The first six rows of the dataset are:
```
plot_id exposure_index cover_pct
1 1 4.2 71.3
2 2 9.8 68.9
3 3 18.5 59.4
4 4 31.2 47.1
5 5 47.6 38.2
6 6 58.1 29.5
```
The researcher's aim is: *"To determine whether there is a significant relationship between wave exposure and the cover of* Ectocarpus *sp. in the intertidal zone."*
a. State formal null and alternative hypotheses appropriate for this research aim. **(/ 3)**
b. Identify the most appropriate statistical test and state **two** reasons for your choice, with reference to the nature of both variables. **(/ 4)**
c. What **two** assumption checks would you perform *before* fitting the model, and what would you do if each assumption were violated? **(/ 4)**
d. Based on the data preview, describe what you expect the scatter plot to look like (direction, strength, linearity), and why this is consistent with the biological interpretation. **(/ 2)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 9**
*a.*
- ✓ *H*~0~: There is no linear relationship between wave exposure index and *Ectocarpus* cover; the slope of the regression line (β~1~) = 0.
- ✓ *H*~A~: There is a significant linear relationship between wave exposure index and *Ectocarpus* cover; β~1~ ≠ 0.
- ✓ Note: A directional (*H*~A~: β~1~ < 0) alternative is justifiable if the researcher has a prior biological reason to predict negative association, but this must be declared before analysis.
*b.*
- ✓✓ **Simple linear regression** (or Pearson correlation if only association strength is of interest, not prediction).
- ✓ The **response variable** (cover_pct) and the **predictor variable** (exposure_index) are both continuous — this precludes the use of *t*-tests or ANOVA (which require a categorical predictor) and points towards regression.
- ✓ Simple linear regression additionally provides a quantitative estimate of the *rate of change* in cover per unit of exposure, which directly addresses the aim.
*c.* Two marks per assumption:
- ✓ **Linearity**: Plot cover_pct against exposure_index in a scatter plot. If the relationship is non-linear (e.g., curved), consider a polynomial term or a data transformation (e.g., arcsine-square-root for proportions). If severely non-linear, Spearman's ρ is a non-parametric alternative for association.
- ✓ **Normality of residuals / homoscedasticity**: After fitting the model, inspect the Q-Q plot and residuals-vs-fitted plot. A log or arcsine transformation of cover_pct may stabilise variance; if neither resolves violations, a generalised linear model (e.g., beta regression for proportions) should be considered.
*d.*
- ✓ The data preview shows a consistent decrease in cover as exposure increases (from ~71% at exposure 4 to ~30% at exposure 58), suggesting a **negative, moderately strong, approximately linear** relationship in the scatter plot.
- ✓ This is biologically consistent with the hypothesis that *Ectocarpus* sp. is a sheltered-shore specialist that is physiologically stressed (e.g., by desiccation and physical disturbance) under high wave exposure, reducing its competitive dominance.
:::
`r if (params$hide_answers) ":::"`
---
# Part C: Statistical Output Interpretation (25 marks)
## Question 10 — Wilcoxon Rank-Sum Test Output (/12)
A researcher compares the percentage cover of lichen on north-facing versus south-facing granite outcrops at 20 sites per aspect. The data failed the Shapiro-Wilk normality test. The following output was produced:
```
Wilcoxon rank-sum exact test
data: cover_pct by aspect
W = 142, p-value = 0.0234
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
1.450 12.351
sample estimates:
difference in location
6.912
```
a. State the null hypothesis that this test evaluated. **(/ 2)**
b. What does the test statistic *W* = 142 represent conceptually? **(/ 2)**
c. Interpret the *p*-value = 0.0234 at α = 0.05. What conclusion do you draw? **(/ 3)**
d. What does the 95% confidence interval (1.450, 12.351) tell you, in plain language? **(/ 3)**
e. Why was the Wilcoxon rank-sum test used instead of an independent-samples *t*-test? **(/ 2)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 10**
*a.*
- ✓✓ *H*~0~: There is no difference in the location (median) of lichen cover between north-facing and south-facing outcrops; the distributions of the two groups are identical (location shift = 0).
*b.*
- ✓✓ *W* is the Wilcoxon rank-sum statistic: it is the sum of the ranks assigned to observations from one group (typically the smaller-ranked group) after ranking all observations from both groups together. It quantifies how much the two rank distributions overlap; extreme values indicate systematic separation between groups.
*c.*
- ✓ *p* = 0.0234 < *α* = 0.05, so we reject *H*~0~.
- ✓ We conclude that there is a statistically significant difference in lichen cover between north- and south-facing outcrops.
- ✓ The probability of observing a *W* statistic this extreme (or more extreme) if the two populations had identical distributions is only 2.34%, which is below our threshold for acceptable Type I error.
*d.*
- ✓ The 95% CI (1.450, 12.351) is an interval for the estimated *difference in location* (median cover) between the two groups.
- ✓ We can be 95% confident that the true median difference in lichen cover between aspects lies between approximately 1.5 and 12.4 percentage points.
- ✓ Because the interval does not include zero, this independently confirms statistical significance and gives a sense of biological effect size (the difference is at least ~1.5 percentage points, and may be as large as ~12 points).
*e.*
- ✓ The Wilcoxon rank-sum test was chosen because the data failed the Shapiro-Wilk test for normality (a prerequisite for the *t*-test).
- ✓ Being a rank-based non-parametric test, it makes no assumption about the distributional form of the data and is therefore valid when normality cannot be assumed.
:::
`r if (params$hide_answers) ":::"`
---
## Question 11 — Two-Way ANOVA Output (/13)
An experiment tests the effects of temperature (three levels: 15°C, 20°C, 25°C) and nutrient addition (two levels: ambient, enriched) on the growth rate (cm day⁻¹) of a marine macroalga. Ten replicates per treatment combination were used. The ANOVA table is:
```
Df Sum Sq Mean Sq F value Pr(>F)
temperature 2 845.3 422.7 18.432 < 0.001 ***
nutrient 1 312.1 312.1 13.610 0.0004 ***
temperature:nutrient 2 89.4 44.7 1.950 0.1503
Residuals 54 1238.7 22.9
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
a. Identify the two predictor variables in this experiment. **(/ 2)**
b. Which main effects are statistically significant? Which interaction term is significant? **(/ 3)**
c. Interpret the non-significant interaction term biologically. What does this mean for how temperature and nutrient effects operate on algal growth? **(/ 3)**
d. What is the residual mean square (22.9) measuring in this experiment? **(/ 2)**
e. The *F*-value for temperature is 18.432. Show how it was calculated from the ANOVA table, and explain what it measures. **(/ 3)**
`r if (params$hide_answers) "::: {.content-hidden}"`
::: {.callout-tip appearance="simple"}
**Model Answer — Question 11**
*a.*
- ✓ Temperature (categorical, 3 levels: 15°C, 20°C, 25°C).
- ✓ Nutrient addition (categorical, 2 levels: ambient, enriched).
*b.*
- ✓ Temperature is a significant main effect (*F*(2, 54) = 18.432, *p* < 0.001).
- ✓ Nutrient addition is a significant main effect (*F*(1, 54) = 13.610, *p* = 0.0004).
- ✓ The temperature × nutrient interaction term is **not** statistically significant (*F*(2, 54) = 1.950, *p* = 0.1503), so there is no evidence that the effect of temperature depends on nutrient level (or vice versa).
*c.*
- ✓ A non-significant interaction means that the effect of temperature on algal growth rate is **consistent** regardless of nutrient level, and the effect of nutrient addition is consistent across all temperatures.
- ✓ In biological terms: warming promotes growth to the same degree whether nutrients are ambient or enriched; similarly, nutrient enrichment stimulates growth by the same amount at 15°C, 20°C, and 25°C.
- ✓ The main effects can therefore be interpreted independently and additively — the factors act on growth in parallel rather than modulating each other.
*d.*
- ✓ The residual mean square (22.9) estimates the **within-group variance** — the average squared deviation of individual replicates from their treatment group mean.
- ✓ It represents unexplained variation (error), arising from natural biological variability among individual algal specimens within a treatment combination.
*e.*
- ✓ *F* = Mean Square~temperature~ / Mean Square~residuals~ = 422.7 / 22.9 = **18.432** ✓.
- ✓ The *F*-ratio measures how much more variation is explained by temperature than would be expected from random within-group variation alone.
- ✓ A large *F* (much greater than 1) indicates that the variation among temperature group means far exceeds what would be expected by chance, supporting rejection of the null hypothesis that all temperature group means are equal.
:::
`r if (params$hide_answers) ":::"`
---
*End of Version 1*