BCB744 Biostatistics — Theory Test (Version 3)

Total: 100 marks | Time: 90 minutes

Author

Affiliation

A. J. Smit

University of the Western Cape

Published

January 1, 2026

Instructions

This paper has three parts: Part A (General Theory, 50 marks), Part B (Experiment Design and Hypothesis Formulation, 25 marks), and Part C (Statistical Output Interpretation, 25 marks).
Answer all questions.
Write clearly and in complete sentences where prose is required.
Mark allocations are shown next to each question in (/ marks) notation.
Statistical notation: use H₀ for the null hypothesis and H_A for the alternative hypothesis.

1 Part A: General Theory (50 marks)

1.1 Question 1 — Observational vs Experimental Studies (/5)

What is the fundamental distinction between an observational study and a controlled experiment? (/ 2)
Why is it generally not possible to draw causal conclusions from an observational study, even when a strong statistical association exists? (/ 2)
Give one example from biological sciences where an observational study is the only ethical or practical option. (/ 1)

Model Answer — Question 1

✓ In a controlled experiment, the researcher deliberately manipulates one or more factors (independent variables), randomly assigns subjects to treatment conditions, and controls all other variables — thereby allowing cause-and-effect inferences.
✓ In an observational study, the researcher measures variables as they naturally occur, without manipulation or random assignment — associations can be detected but the researcher cannot isolate the causal factor.

✓ Without random assignment to treatment groups, subjects that differ in the predictor variable (e.g., smokers vs. non-smokers) may also differ in many other ways (confounding variables). Any observed association could be driven by these unmeasured differences rather than the predictor itself.
✓ Additionally, the direction of causality is ambiguous: the putative “cause” and “effect” may share a common upstream driver, or the arrow of causation may run in the opposite direction from what is assumed.

✓ Any one of: studying the health effects of smoking in humans (unethical to randomly assign people to smoke); tracking the long-term effects of a natural disaster on wildlife populations; observing migration routes or breeding behaviour of endangered species that cannot be disturbed.

1.2 Question 2 — Statistical Power and Effect Size (/7)

Define statistical power. What does a power of 0.80 mean in practice? (/ 2)
List three factors that increase the statistical power of a hypothesis test. For each, briefly explain the mechanism. (/ 3)
A researcher reports a statistically significant result with p = 0.032, but the estimated effect size is d = 0.12 (a small effect). What does this tell us, and why is a statistically significant result not always biologically meaningful? (/ 2)

Model Answer — Question 2

✓ Statistical power is the probability of correctly rejecting a false null hypothesis — the probability of detecting a real effect when one truly exists (1 − β, where β is the Type II error rate).
✓ A power of 0.80 means there is an 80% chance that the test will return a significant result if the true effect size is at least as large as the one specified in the power calculation. Equivalently, there is a 20% chance of a Type II error (missing a real effect).

b. One mark per factor + mechanism (must include mechanism for full credit):

✓ Larger sample size: reduces the standard error of the estimator, producing a larger test statistic for the same effect size, making it easier to exceed the critical threshold.
✓ Larger true effect size: a bigger difference between groups or a steeper slope makes the signal more detectable against background noise; for a fixed design, detecting a 10-unit difference is easier than detecting a 1-unit difference.
✓ Lower significance level α (counterintuitively, increasing α) OR lower residual variance: reducing measurement error or controlling extraneous variation (e.g., using a paired design) decreases the within-group variance, improving the signal-to-noise ratio.
✓ One-tailed test (if justified): concentrating all the rejection region in one tail doubles the effective sensitivity in that direction at the same α.

(Any three correct, with mechanisms, for 3 marks.)

✓ With a very large sample size, even trivially small effects become statistically significant because power becomes extremely high. Here, d = 0.12 is a negligible effect by conventional benchmarks (Cohen’s small effect ≈ 0.2).
✓ Statistical significance tells us that the effect is unlikely to be exactly zero; it does not tell us whether the effect is large enough to matter biologically or practically. Biological meaningfulness depends on the magnitude (effect size) and whether the difference would have any real-world consequences for ecology, physiology, or conservation.

1.3 Question 3 — Assumptions and Transformations (/8)

Name three properties of the normal distribution that are directly relevant to the validity of parametric hypothesis tests. (/ 3)
A researcher measures tree-ring width (mm) for 100 trees. The data are strongly right-skewed with several very large values. Why might a log-transformation be appropriate, and what property does it tend to stabilise? (/ 2)
A parasitologist counts the number of helminth parasites per fish host and wants to test whether counts differ between two host species. The count data are right-skewed with variance exceeding the mean. They apply a square-root transformation before running a t-test. What property of the count distribution does the square-root transformation specifically address, and is this transformation sufficient given the degree of overdispersion described? (/ 3)

Model Answer — Question 3

a. Any three of the following (1 mark each):

✓ It is symmetric around the mean — residuals of equal magnitude above and below the mean are equally probable, which is required for unbiased estimation.
✓ It is fully described by only two parameters (μ and σ) — tests based on normal theory rely on this parsimony to derive exact null distributions.
✓ The mean, median, and mode coincide — ensuring the mean is a stable and meaningful measure of central tendency on which parametric tests focus.
✓ The distribution has defined, finite variance — required for the central limit theorem and for the calculation of standard errors and t-statistics.

✓ A log-transformation compresses large values and expands small values, reducing right skew and making the transformed distribution closer to symmetric/normal.
✓ It stabilises multiplicative variance (variance that scales with the mean): if the coefficient of variation (SD/mean) is roughly constant across the range of the data, log-transformation converts multiplicative error structure to additive, which is what normal-theory models assume.

✓ The square-root transformation is classically applied to Poisson-distributed counts to stabilise variance: for a Poisson distribution, variance = mean, so variance increases with the mean. The square-root transformation makes variance approximately constant (homoscedastic) across the range of means.
✓ However, the parasitologist’s counts show overdispersion (variance > mean), which indicates the data follow a negative binomial rather than Poisson distribution. The square-root transformation stabilises Poisson (equidispersed) variance but is less effective for negative binomial overdispersion.
✓ The transformation may be insufficient; a log(x + 1) transformation (which stabilises negative binomial variance more effectively) or a non-parametric Wilcoxon rank-sum test would be more appropriate alternatives.

1.4 Question 4 — ANOVA and Post-hoc Tests (/6)

Explain why it is statistically incorrect to perform all pairwise comparisons between three or more groups using individual t-tests, rather than ANOVA. (/ 3)
What is the Tukey Honestly Significant Difference (HSD) test? When is it the appropriate post-hoc procedure following a significant one-way ANOVA result? (/ 3)

Model Answer — Question 4

✓ Each individual t-test is conducted at α = 0.05, meaning there is a 5% chance of a Type I error per test. With k groups, there are k(k−1)/2 pairwise comparisons: for k = 3, that is 3 comparisons; for k = 5, that is 10.
✓ The family-wise error rate (FWER) — the probability of making at least one false rejection across all tests — inflates substantially: with 3 independent tests, FWER ≈ 1 − (0.95)³ ≈ 0.14, not 0.05. With 10 tests, FWER ≈ 0.40.
✓ ANOVA conducts a single omnibus F-test that controls the error rate at α = 0.05 for the global null hypothesis (all means equal), avoiding this inflation.

✓ The Tukey HSD test is a post-hoc multiple comparison procedure that makes all pairwise comparisons among group means while controlling the family-wise error rate at α across all comparisons. It uses the studentised range distribution to compute critical differences.
✓ It is appropriate when: (a) the omnibus ANOVA F-test is significant (indicating some difference exists), (b) all groups have approximately equal sample sizes (balanced or near-balanced design), and (c) the researcher wants to identify which specific pairs of groups differ significantly, with simultaneous Type I error control across all pairwise tests.

1.5 Question 5 — Correlation vs Causation (/7)

Explain the conceptual difference between correlation analysis and simple linear regression, even though both describe relationships between two continuous variables. (/ 3)
A researcher reports a correlation of r = 0.85 (p < 0.001) between ocean surface temperature (°C) and the frequency of harmful algal bloom (HAB) events per year, based on a 25-year observational time series. A journalist headlines this as “Warming Oceans Proven to Trigger Algal Blooms.” Identify two alternative explanations for this correlation and explain why the journalist’s conclusion is premature. (/ 4)

Model Answer — Question 5

✓ Correlation quantifies the strength and direction of the linear association between two variables, treating both symmetrically — there is no distinction between predictor and response. Pearson’s r ranges from −1 to +1 and is scale-free.
✓ Simple linear regression explicitly models one variable (the response, y) as a function of the other (the predictor, x), estimating the slope (rate of change of y per unit x) and intercept. It is used for prediction and for quantifying the magnitude of the relationship.
✓ Regression imposes an asymmetric causal framework (x influences y) and provides a fitted equation; correlation does not imply or require directionality and only describes the co-variation.

✓ Shared temporal trend (spurious correlation): both SST and HAB events may be independently increasing over time due to long-term climate change and increases in coastal nutrient loading (eutrophication), respectively. The time series correlation captures their shared temporal trajectory, not a direct mechanistic link.
✓ Confounding by coastal nutrient enrichment: warmer years may coincide with drier years that reduce river flushing of coastal waters, concentrating nutrients and promoting blooms — it is nutrient concentration that directly causes blooms, not temperature per se.
✓ Why the conclusion is premature: correlation establishes association, not causation. The direction of influence cannot be confirmed from observational data alone; no manipulation of SST was performed, and multiple confounders and alternative mechanisms have not been ruled out. Establishing causation requires controlled experiments, mechanistic pathway confirmation, or at minimum a structural causal model with confounders accounted for.

(Accept any two distinct alternatives, 2 marks each; 2 marks for the methodological explanation.)

1.6 Question 6 — Residual Diagnostics (/7)

Residual diagnostic plots are central tools for evaluating whether a fitted model meets its underlying assumptions. Describe what each of the following patterns in diagnostic plots suggests, what assumption is violated, and what corrective action you might take.

A fan-shaped (heteroscedastic) pattern in the residuals-vs-fitted plot, where residuals widen as fitted values increase. (/ 2)
A systematic S-shaped curve in the normal Q-Q plot of residuals. (/ 2)
A U-shaped (concave) curve in the residuals-vs-fitted plot. (/ 2)
One or two points with very large residuals far from the main point cloud. (/ 1)

Model Answer — Question 6

✓ A fan shape indicates heteroscedasticity — the residual variance is not constant but increases with the fitted values. This violates the homoscedasticity assumption.
✓ Corrective actions: apply a variance-stabilising transformation to the response (e.g., log or square-root); fit a weighted least squares model; or use a generalised linear model with an appropriate error family (e.g., Poisson or Gamma with a log link).

✓ An S-shaped curve in the Q-Q plot indicates heavy tails (leptokurtosis) if the S bends upward on the right and downward on the left — residuals are more extreme than expected under normality. The normality assumption is violated.
✓ Corrective actions: consider a transformation (e.g., log or Box-Cox); investigate whether the extreme residuals correspond to outliers that should be examined; use a robust regression method or a distribution with heavier tails (e.g., t-distribution errors).

✓ A U-shaped (or arch-shaped) curve in the residuals-vs-fitted plot indicates non-linearity — the fitted linear model systematically under- or over-predicts in different regions of the predictor space.
✓ Corrective actions: add a polynomial (quadratic) term to the model; apply a transformation to the predictor variable; consider a non-linear or generalised additive model (GAM).

✓ Large isolated residuals indicate potential outliers or influential observations — individual data points that deviate markedly from the model’s predictions. They may represent data entry errors, genuinely unusual biological events, or evidence that the model is misspecified for a subset of the data. These should be investigated (not automatically removed) by examining the raw data and leverage/influence statistics (Cook’s distance).

1.7 Question 7 — Multiple Regression and Interaction Effects (/10)

What is multicollinearity in a multiple regression context, and how does it affect the interpretation of regression coefficients? (/ 3)
What is the Variance Inflation Factor (VIF), and what threshold is commonly used to flag problematic multicollinearity? (/ 2)
A researcher fits the following model to data on algal growth rate (cm day⁻¹): growth ~ temperature + nutrient + temperature:nutrient. The interaction term temperature:nutrient is significant. Explain what a significant interaction means for the interpretation of the main effect of temperature. How would you interpret the interaction coefficient in practical, biological terms? (/ 5)

Model Answer — Question 7

✓ Multicollinearity occurs when two or more predictors in a multiple regression model are highly correlated with each other — they share much of the same variance in the response.
✓ Consequence: the individual regression coefficients become unstable (high standard errors), because the model cannot reliably partition the variation in the response between the collinear predictors. Small changes in the dataset can produce large swings in coefficient estimates.
✓ The combined predictive power of the model may remain unaffected, but the individual coefficients can no longer be interpreted as the effect of one variable holding the other constant — because in reality the predictors cannot be independently varied.

✓ The VIF for predictor j is 1 / (1 − R²_j), where R²_j is the proportion of variance in predictor j explained by all other predictors. It quantifies how much the variance of the coefficient estimate is inflated by collinearity.
✓ A common threshold: VIF > 5 (some authorities use 10) is flagged as problematic; VIF = 1 indicates no collinearity.

✓ When the interaction term is significant, the main effect of temperature is conditional — it is not a single universal effect but depends on the level of the nutrient variable. The “main effect” label in the coefficient table describes the effect of temperature only when nutrients are at the reference level (e.g., ambient), not across all nutrient conditions.
✓ The interaction coefficient represents the additional change in the slope of temperature when nutrients change from the reference level (e.g., ambient) to the comparison level (e.g., enriched). In other words, the temperature effect on growth rate differs between ambient and enriched nutrient conditions.
✓ Biological example: if the interaction coefficient is positive (+0.12), the growth rate increases by an additional 0.12 cm day⁻¹ per °C of warming under enriched nutrients compared to ambient nutrients. This means warming has a stronger stimulatory effect on growth when nutrients are not limiting — temperature and nutrient availability operate synergistically, not independently.
✓ To fully describe the relationship, you must report separate slope estimates for each nutrient level (the conditional effects), rather than a single temperature effect, because the single “main effect” is misleading when the interaction is significant.
✓ This has important implications for biological interpretation: if nutrient enrichment amplifies the warming response, then eutrophication and ocean warming may act synergistically to increase macroalgal proliferation — a non-additive interaction that cannot be predicted from single-factor studies.

2 Part B: Experiment Design and Hypothesis Formulation (25 marks)

2.1 Question 8 — Factorial Design: Lizard Sprint Speed (/13)

A herpetologist measures the maximum sprint speed (m s⁻¹) of common lizards (Zootoca vivipara) reared under two temperatures (20°C and 30°C) and two diet types (insect-based and plant-based). Six individuals are assigned to each of the four treatment combinations. The first six rows of the dataset are:

  lizard_id  temperature  diet_type  sprint_speed_m_s
1         1        20°C     insects              1.23
2         2        20°C     insects              1.18
3         3        20°C  vegetation              0.89
4         4        20°C  vegetation              0.92
5         5        30°C     insects              1.67
6         6        30°C     insects              1.71

The researcher asks: “Does sprint speed vary with temperature, diet type, or the interaction between them?”

State formal null and alternative hypotheses for each of the following effects: (i) the main effect of temperature, (ii) the main effect of diet type, and (iii) the temperature × diet interaction. (/ 6)
What statistical test is most appropriate, and give three reasons, including reference to the number of predictors and their nature. (/ 4)
The temperature × diet interaction is significant. What does this mean biologically? How does it affect how you would report and interpret the main effects? (/ 3)

Model Answer — Question 8

a. Two marks per effect pair (H₀ + H_A):

(i) Temperature:

✓ H₀: Mean sprint speed does not differ between lizards reared at 20°C and 30°C (μ₂₀ = μ₃₀).
✓ H_A: Mean sprint speed differs between the two temperature treatments (μ₂₀ ≠ μ₃₀).

(ii) Diet type:

✓ H₀: Mean sprint speed does not differ between lizards fed insects and those fed vegetation (μ_insects = μ_vegetation).
✓ H_A: Mean sprint speed differs between the two diet types (μ_insects ≠ μ_vegetation).

(iii) Temperature × diet interaction:

✓ H₀: The effect of temperature on sprint speed is the same regardless of diet type (no interaction; the effects are additive).
✓ H_A: The effect of temperature on sprint speed depends on diet type (the two factors interact; their combined effect is not simply additive).

✓ Two-way (factorial) ANOVA — this is the correct test because there are two categorical predictors (temperature with 2 levels; diet with 2 levels) and a single continuous response variable (sprint speed).
✓ Reason 1: There are two factorial predictors (not one), each with distinct levels. A two-way ANOVA simultaneously tests main effects of each factor and their interaction — a design that one-way ANOVA or t-tests cannot accommodate.
✓ Reason 2: The response (sprint speed, m s⁻¹) is continuous and ratio-scaled, appropriate for ANOVA which compares group means.
✓ Reason 3: The design is balanced (equal replication, 6 per cell), which maximises the power and interpretive clarity of a factorial ANOVA; each cell’s mean is estimated with equal precision.

✓ A significant interaction means that the effect of temperature on sprint speed depends on diet type (or equivalently, the diet effect depends on temperature). The two factors do not act independently.
✓ For example, warming may strongly enhance sprint speed in insect-fed lizards (because sufficient protein supports muscle development) but have little effect in vegetation-fed lizards (because plant-based nutrition cannot support the thermal enhancement of locomotor performance).
✓ Because the interaction is significant, the main effects cannot be interpreted in isolation — reporting a single main effect of temperature (e.g., “warmer lizards are faster”) is misleading if this is only true for one diet type. You must present and interpret the conditional effects (simple main effects) separately for each diet type, ideally via an interaction plot.

2.2 Question 9 — Bacterial Colony Counts Across Antibiotic Concentrations (/12)

A microbiologist grows Staphylococcus aureus at four concentrations of a novel antibiotic (0, 10, 50, and 100 μg mL⁻¹) with three replicate cultures per concentration. Colony counts (CFU mL⁻¹) are recorded after 24 hours. The first eight rows of the dataset are:

  replicate  conc_ug_mL  colony_CFU_mL
1         1           0           4500
2         2           0           5120
3         3           0           4800
4         1          10           1230
5         2          10            980
6         3          10           1105
7         1          50            213
8         2          50            178

The research question is: “Does antibiotic concentration significantly affect bacterial colony count?”

Formulate formal null and alternative hypotheses. (/ 3)
Identify the appropriate statistical test, explaining with specific reference to the nature of the response variable and the experimental design. (/ 4)
Why would standard one-way ANOVA be problematic to apply directly to these colony count data? (/ 2)
What transformation might make the data more amenable to a parametric test, and what property would it stabilise? (/ 3)

Model Answer — Question 9

✓ H₀: The mean (or median) bacterial colony count does not differ among antibiotic concentration groups; all four concentrations produce equal mean colony counts (μ₀ = μ₁₀ = μ₅₀ = μ₁₀₀).
✓ H_A: At least one antibiotic concentration produces a mean colony count that differs from the others.
✓ Given the expectation that higher concentrations will reduce counts (antibiotic effect), a directional prediction (decreasing counts with increasing concentration) is scientifically reasonable, but the omnibus test remains non-directional.

✓ Kruskal-Wallis rank-sum test (non-parametric one-way analysis) — or one-way ANOVA on log-transformed counts if the transformation achieves normality and homoscedasticity.
✓ There are four independent groups (four concentration levels), each with independent replicate cultures (not the same culture measured at each concentration), making this a one-factor, four-level design.
✓ The response variable (CFU mL⁻¹) consists of positive counts that are likely right-skewed with variance proportional to the mean (a characteristic of microbial count data) — typical parametric ANOVA assumptions (normality, homoscedasticity) are likely violated.
✓ With only 3 replicates per group (12 observations total), the dataset is too small to reliably test distributional assumptions; non-parametric methods are particularly appropriate for small, non-normal count datasets.

✓ ANOVA assumes normality of residuals within each group: microbial colony counts are non-negative integers that often follow a log-normal or negative binomial distribution rather than a normal distribution.
✓ ANOVA assumes homoscedasticity: the enormous range in counts (from ~5000 at 0 μg mL⁻¹ to ~200 at 50 μg mL⁻¹) suggests strongly unequal variances across groups — a clear violation of this assumption. Applying ANOVA directly would produce unreliable F-statistics and p-values.

✓ A log-transformation (log₁₀ or natural log of CFU mL⁻¹) is the standard transformation for microbial count data.
✓ It stabilises the multiplicative variance structure: because counts span several orders of magnitude and variance scales with the mean (the coefficient of variation is roughly constant), the log transform converts this to approximately additive, constant variance.
✓ The log-transformed counts are also much more likely to be normally distributed within groups (log-normal counts → normal on the log scale), enabling valid application of one-way ANOVA. One-way ANOVA on log-transformed CFU data followed by Tukey HSD post-hoc tests would then be appropriate.

3 Part C: Statistical Output Interpretation (25 marks)

3.1 Question 10 — One-Way ANOVA with Tukey HSD Post-hoc (/12)

A researcher compares the maximum sprint speed (m s⁻¹) of lizards from four habitat types: Desert, Grassland, Forest, and Savanna. Ten lizards per habitat are measured. The ANOVA and Tukey HSD results are:

Analysis of Variance Table

Response: sprint_speed
          Df  Sum Sq  Mean Sq  F value   Pr(>F)
habitat    3   4.821   1.607   12.44    <0.001 ***
Residuals 36   4.651   0.129

Tukey multiple comparisons of means
    95% family-wise confidence level

$habitat
                     diff      lwr      upr    p adj
Forest-Desert       0.312    0.089    0.535   0.0024
Grassland-Desert    0.187   -0.036    0.410   0.1210
Forest-Grassland    0.125   -0.098    0.348   0.4120
Savanna-Desert      0.528    0.305    0.751   0.0001
Savanna-Forest      0.216   -0.007    0.439   0.0612
Savanna-Grassland   0.341    0.118    0.564   0.0008

State the null hypothesis being tested by the ANOVA F-test. (/ 2)
Interpret the F-value (12.44) and the associated p-value. What conclusion do you draw from the ANOVA alone? (/ 3)
Based on the Tukey HSD output, identify all significantly and non-significantly different pairs of habitat types. (/ 4)
The Tukey test uses a “95% family-wise confidence level.” What does this mean, and why is it preferable to performing all pairwise comparisons each at α = 0.05? (/ 3)

Model Answer — Question 10

✓ H₀: The mean sprint speed is equal across all four habitat types (μ_Desert = μ_Grassland = μ_Forest = μ_Savanna).
✓ H_A (implicit): At least one habitat type has a mean sprint speed that differs from the others.

✓ F(3, 36) = 12.44 means that the between-group variance is 12.44 times larger than the within-group (residual) variance — the habitat groups differ far more than would be expected from random sampling of a common population.
✓ p < 0.001 ≪ α = 0.05: we reject H₀. There is very strong statistical evidence that mean sprint speed differs among at least some habitat types.
✓ However, the ANOVA alone does not identify which habitats differ — only that differences exist. Post-hoc testing is required to pinpoint the specific pairwise differences.

Significantly different pairs (adjusted p < 0.05):

✓ Forest vs. Desert (p = 0.0024) — Forest lizards sprint faster than Desert lizards.
✓ Savanna vs. Desert (p = 0.0001) — Savanna lizards sprint fastest vs. Desert.
✓ Savanna vs. Grassland (p = 0.0008) — Savanna differs from Grassland.

Non-significantly different pairs (adjusted p > 0.05):

✓ Grassland vs. Desert (p = 0.1210) — no significant difference.
Forest vs. Grassland (p = 0.4120) — no significant difference.
Savanna vs. Forest (p = 0.0612) — borderline, not significant at α = 0.05.

(Award 1 mark per correctly classified pair, up to 4 marks total; accept minor omissions.)

✓ “95% family-wise confidence level” means that there is a 95% probability that all confidence intervals in the table simultaneously contain the true pairwise differences — the error is controlled across the entire family of 6 comparisons, not separately per interval.
✓ If all 6 comparisons were each run at α = 0.05, the family-wise Type I error rate would be approximately 1 − (0.95)⁶ ≈ 0.26 — a 26% chance of at least one false positive among the six tests.
✓ Tukey HSD adjusts the critical difference threshold so that the combined probability of any false positive across all comparisons remains at 5%, providing rigorous control while remaining more powerful than simpler corrections (e.g., Bonferroni) when all pairwise comparisons are of interest.

3.2 Question 11 — Multiple Regression with an Interaction Term (/13)

An environmental physiologist models the growth rate (cm day⁻¹) of a marine macroalga as a function of seawater temperature (continuous, °C) and nutrient level (categorical: Low vs. High). An interaction term is included. The lm() output is:

Call:
lm(formula = growth_rate ~ temperature + nutrient + temperature:nutrient,
   data = algae)

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)
(Intercept)               -1.2450     0.3870   -3.22   0.0020 **
temperature                0.1870     0.0310    6.03  < 0.001 ***
nutrientHigh               2.3410     0.4120    5.68  < 0.001 ***
temperature:nutrientHigh   0.1240     0.0520    2.38   0.0204 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.834 on 56 degrees of freedom
Multiple R-squared:  0.7891,  Adjusted R-squared:  0.7769
F-statistic: 69.81 on 3 and 56 DF,  p-value: < 2.2e-16

How many predictor variables are in this model (count distinct predictors, not rows in the table)? (/ 1)
Interpret the coefficient for temperature (0.1870) in the context of this model. (/ 2)
The interaction term (temperature:nutrientHigh) has a coefficient of 0.1240 and is significant. Explain what this means for the biological relationship being studied. (/ 4)
Write out the full regression equation separately for (i) Low-nutrient algae and (ii) High-nutrient algae. (/ 3)
What does the adjusted R² (0.7769) indicate, and why is it reported in preference to R² (0.7891)? (/ 3)

Model Answer — Question 11

✓ Two distinct predictor variables: temperature (continuous) and nutrient level (categorical, two levels). The interaction term is derived from these two and is not an additional independent predictor.

✓ The coefficient 0.1870 for temperature represents the effect of temperature for Low-nutrient algae (the reference level of the nutrient factor): for each 1°C increase in temperature, growth rate increases by approximately 0.187 cm day⁻¹, when nutrients are at the Low level.
✓ This is the conditional slope — because an interaction term is present, this coefficient does not apply uniformly to all algae; it specifically describes the temperature effect under Low-nutrient conditions.

✓ The significant interaction coefficient (0.1240) indicates that the effect of temperature on growth rate is stronger under High-nutrient conditions than under Low-nutrient conditions.
✓ Specifically: in High-nutrient conditions, the growth rate increases by an additional 0.124 cm day⁻¹ per °C compared to the Low-nutrient slope (0.1870), giving a combined temperature slope of 0.1870 + 0.1240 = 0.311 cm day⁻¹ per °C under High nutrients.
✓ Biologically: nutrients appear to be co-limiting with temperature. When nutrients are abundant, warming has a larger stimulatory effect on growth, possibly because the biochemical machinery for photosynthesis and protein synthesis can operate at higher rates when both thermal energy and building materials are available.
✓ This synergistic interaction means that in oligotrophic (nutrient-poor) systems, ocean warming will have a smaller effect on macroalgal growth than in eutrophic (nutrient-rich) coastal environments — an important distinction for management of coastal blooms under climate change.

✓ (i) Low-nutrient algae (nutrientHigh = 0): $\hat{growth} = -1.245 + 0.187 \times temperature$
✓✓ (ii) High-nutrient algae (nutrientHigh = 1; add both the nutrientHigh coefficient and the interaction term): $\hat{growth} = (-1.245 + 2.341) + (0.187 + 0.124) \times temperature = 1.096 + 0.311 \times temperature$

✓ Adjusted R² = 0.7769 means that approximately 77.7% of the variation in algal growth rate is explained by the model (temperature, nutrient level, and their interaction), after accounting for model complexity.
✓ The adjusted R² is slightly lower than R² (0.7891) because it penalises each additional predictor term relative to sample size — unlike R², adjusted R² does not automatically increase when irrelevant predictors are added; it decreases if a predictor adds less explanatory power than expected by chance.
✓ It is preferred over R² when comparing models with different numbers of predictors, because it provides a fairer comparison of model fit that accounts for model complexity.

End of Version 3

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit2026,
  author = {Smit, A. J.},
  title = {BCB744 {Biostatistics} — {Theory} {Test} {(Version} 3)},
  date = {2026-01-01},
  url = {https://tangledbank.netlify.app/BCB744/assessments/BCB744_Biostats_Theory_Test_V3.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit AJ (2026) BCB744 Biostatistics — Theory Test (Version 3). https://tangledbank.netlify.app/BCB744/assessments/BCB744_Biostats_Theory_Test_V3.html.

--- title: "BCB744 Biostatistics — Theory Test (Version 3)" subtitle: "Total: 100 marks | Time: 90 minutes" date: "2026" format: html: number-sections: true toc: true toc-depth: 2 toc-title: "Contents" engine: knitr params: hide_answers: false --- ::: {.callout-important appearance="simple"} **Instructions** - This paper has **three parts**: Part A (General Theory, 50 marks), Part B (Experiment Design and Hypothesis Formulation, 25 marks), and Part C (Statistical Output Interpretation, 25 marks). - Answer **all** questions. - Write clearly and in complete sentences where prose is required. - Mark allocations are shown next to each question in **(/ marks)** notation. - Statistical notation: use *H*~0~ for the null hypothesis and *H*~A~ for the alternative hypothesis. ::: --- # Part A: General Theory (50 marks) ## Question 1 — Observational vs Experimental Studies (/5) a. What is the fundamental distinction between an **observational** study and a **controlled experiment**? **(/ 2)** b. Why is it generally not possible to draw causal conclusions from an observational study, even when a strong statistical association exists? **(/ 2)** c. Give **one** example from biological sciences where an observational study is the only ethical or practical option. **(/ 1)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 1** *a.* - ✓ In a **controlled experiment**, the researcher deliberately manipulates one or more factors (independent variables), randomly assigns subjects to treatment conditions, and controls all other variables — thereby allowing cause-and-effect inferences. - ✓ In an **observational study**, the researcher measures variables as they naturally occur, without manipulation or random assignment — associations can be detected but the researcher cannot isolate the causal factor. *b.* - ✓ Without random assignment to treatment groups, subjects that differ in the predictor variable (e.g., smokers vs. non-smokers) may also differ in many other ways (confounding variables). Any observed association could be driven by these unmeasured differences rather than the predictor itself. - ✓ Additionally, the direction of causality is ambiguous: the putative "cause" and "effect" may share a common upstream driver, or the arrow of causation may run in the opposite direction from what is assumed. *c.* - ✓ Any one of: studying the health effects of smoking in humans (unethical to randomly assign people to smoke); tracking the long-term effects of a natural disaster on wildlife populations; observing migration routes or breeding behaviour of endangered species that cannot be disturbed. ::: `r if (params$hide_answers) ":::"` --- ## Question 2 — Statistical Power and Effect Size (/7) a. Define **statistical power**. What does a power of 0.80 mean in practice? **(/ 2)** b. List **three** factors that increase the statistical power of a hypothesis test. For each, briefly explain the mechanism. **(/ 3)** c. A researcher reports a statistically significant result with *p* = 0.032, but the estimated effect size is *d* = 0.12 (a small effect). What does this tell us, and why is a statistically significant result not always biologically meaningful? **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 2** *a.* - ✓ Statistical power is the probability of correctly rejecting a false null hypothesis — the probability of detecting a real effect when one truly exists (1 − β, where β is the Type II error rate). - ✓ A power of 0.80 means there is an 80% chance that the test will return a significant result if the true effect size is at least as large as the one specified in the power calculation. Equivalently, there is a 20% chance of a Type II error (missing a real effect). *b.* One mark per factor + mechanism (must include mechanism for full credit): - ✓ **Larger sample size**: reduces the standard error of the estimator, producing a larger test statistic for the same effect size, making it easier to exceed the critical threshold. - ✓ **Larger true effect size**: a bigger difference between groups or a steeper slope makes the signal more detectable against background noise; for a fixed design, detecting a 10-unit difference is easier than detecting a 1-unit difference. - ✓ **Lower significance level *α* (counterintuitively, increasing *α*) OR lower residual variance**: reducing measurement error or controlling extraneous variation (e.g., using a paired design) decreases the within-group variance, improving the signal-to-noise ratio. - ✓ **One-tailed test (if justified)**: concentrating all the rejection region in one tail doubles the effective sensitivity in that direction at the same *α*. *(Any three correct, with mechanisms, for 3 marks.)* *c.* - ✓ With a very large sample size, even trivially small effects become statistically significant because power becomes extremely high. Here, *d* = 0.12 is a negligible effect by conventional benchmarks (Cohen's small effect ≈ 0.2). - ✓ Statistical significance tells us that the effect is unlikely to be exactly zero; it does **not** tell us whether the effect is large enough to matter biologically or practically. Biological meaningfulness depends on the **magnitude** (effect size) and whether the difference would have any real-world consequences for ecology, physiology, or conservation. ::: `r if (params$hide_answers) ":::"` --- ## Question 3 — Assumptions and Transformations (/8) a. Name **three** properties of the normal distribution that are directly relevant to the validity of parametric hypothesis tests. **(/ 3)** b. A researcher measures tree-ring width (mm) for 100 trees. The data are strongly right-skewed with several very large values. Why might a **log-transformation** be appropriate, and what property does it tend to stabilise? **(/ 2)** c. A parasitologist counts the number of helminth parasites per fish host and wants to test whether counts differ between two host species. The count data are right-skewed with variance exceeding the mean. They apply a square-root transformation before running a *t*-test. What property of the count distribution does the square-root transformation specifically address, and is this transformation sufficient given the degree of overdispersion described? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 3** *a.* Any three of the following (1 mark each): - ✓ It is **symmetric** around the mean — residuals of equal magnitude above and below the mean are equally probable, which is required for unbiased estimation. - ✓ It is **fully described by only two parameters** (μ and σ) — tests based on normal theory rely on this parsimony to derive exact null distributions. - ✓ The mean, median, and mode **coincide** — ensuring the mean is a stable and meaningful measure of central tendency on which parametric tests focus. - ✓ The distribution has **defined, finite variance** — required for the central limit theorem and for the calculation of standard errors and *t*-statistics. *b.* - ✓ A log-transformation compresses large values and expands small values, reducing right skew and making the transformed distribution closer to symmetric/normal. - ✓ It stabilises **multiplicative variance** (variance that scales with the mean): if the coefficient of variation (SD/mean) is roughly constant across the range of the data, log-transformation converts multiplicative error structure to additive, which is what normal-theory models assume. *c.* - ✓ The square-root transformation is classically applied to Poisson-distributed counts to stabilise variance: for a Poisson distribution, variance = mean, so variance increases with the mean. The square-root transformation makes variance approximately constant (homoscedastic) across the range of means. - ✓ However, the parasitologist's counts show **overdispersion** (variance > mean), which indicates the data follow a negative binomial rather than Poisson distribution. The square-root transformation stabilises Poisson (equidispersed) variance but is less effective for negative binomial overdispersion. - ✓ The transformation may be insufficient; a log(x + 1) transformation (which stabilises negative binomial variance more effectively) or a **non-parametric Wilcoxon rank-sum test** would be more appropriate alternatives. ::: `r if (params$hide_answers) ":::"` --- ## Question 4 — ANOVA and Post-hoc Tests (/6) a. Explain why it is statistically incorrect to perform all pairwise comparisons between three or more groups using individual *t*-tests, rather than ANOVA. **(/ 3)** b. What is the **Tukey Honestly Significant Difference (HSD)** test? When is it the appropriate post-hoc procedure following a significant one-way ANOVA result? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 4** *a.* - ✓ Each individual *t*-test is conducted at *α* = 0.05, meaning there is a 5% chance of a Type I error per test. With *k* groups, there are *k*(k−1)/2 pairwise comparisons: for *k* = 3, that is 3 comparisons; for *k* = 5, that is 10. - ✓ The **family-wise error rate** (FWER) — the probability of making *at least one* false rejection across all tests — inflates substantially: with 3 independent tests, FWER ≈ 1 − (0.95)³ ≈ 0.14, not 0.05. With 10 tests, FWER ≈ 0.40. - ✓ ANOVA conducts a single omnibus *F*-test that controls the error rate at *α* = 0.05 for the global null hypothesis (all means equal), avoiding this inflation. *b.* - ✓ The Tukey HSD test is a **post-hoc multiple comparison procedure** that makes all pairwise comparisons among group means while controlling the family-wise error rate at *α* across all comparisons. It uses the studentised range distribution to compute critical differences. - ✓ It is appropriate when: (a) the omnibus ANOVA *F*-test is significant (indicating *some* difference exists), (b) all groups have approximately equal sample sizes (balanced or near-balanced design), and (c) the researcher wants to identify *which specific pairs* of groups differ significantly, with simultaneous Type I error control across all pairwise tests. ::: `r if (params$hide_answers) ":::"` --- ## Question 5 — Correlation vs Causation (/7) a. Explain the conceptual difference between **correlation analysis** and **simple linear regression**, even though both describe relationships between two continuous variables. **(/ 3)** b. A researcher reports a correlation of *r* = 0.85 (*p* < 0.001) between ocean surface temperature (°C) and the frequency of harmful algal bloom (HAB) events per year, based on a 25-year observational time series. A journalist headlines this as "Warming Oceans Proven to Trigger Algal Blooms." Identify **two** alternative explanations for this correlation and explain why the journalist's conclusion is premature. **(/ 4)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 5** *a.* - ✓ **Correlation** quantifies the *strength and direction* of the linear association between two variables, treating both symmetrically — there is no distinction between predictor and response. Pearson's *r* ranges from −1 to +1 and is scale-free. - ✓ **Simple linear regression** explicitly models one variable (the *response*, *y*) as a function of the other (the *predictor*, *x*), estimating the slope (rate of change of *y* per unit *x*) and intercept. It is used for prediction and for quantifying the *magnitude* of the relationship. - ✓ Regression imposes an asymmetric causal framework (x influences y) and provides a fitted equation; correlation does not imply or require directionality and only describes the co-variation. *b.* - ✓ **Shared temporal trend (spurious correlation)**: both SST and HAB events may be independently increasing over time due to long-term climate change and increases in coastal nutrient loading (eutrophication), respectively. The time series correlation captures their shared temporal trajectory, not a direct mechanistic link. - ✓ **Confounding by coastal nutrient enrichment**: warmer years may coincide with drier years that reduce river flushing of coastal waters, concentrating nutrients and promoting blooms — it is nutrient concentration that directly causes blooms, not temperature per se. - ✓ **Why the conclusion is premature**: correlation establishes association, not causation. The direction of influence cannot be confirmed from observational data alone; no manipulation of SST was performed, and multiple confounders and alternative mechanisms have not been ruled out. Establishing causation requires controlled experiments, mechanistic pathway confirmation, or at minimum a structural causal model with confounders accounted for. *(Accept any two distinct alternatives, 2 marks each; 2 marks for the methodological explanation.)* ::: `r if (params$hide_answers) ":::"` --- ## Question 6 — Residual Diagnostics (/7) Residual diagnostic plots are central tools for evaluating whether a fitted model meets its underlying assumptions. Describe what each of the following patterns in diagnostic plots suggests, what assumption is violated, and what corrective action you might take. a. A **fan-shaped** (heteroscedastic) pattern in the residuals-vs-fitted plot, where residuals widen as fitted values increase. **(/ 2)** b. A systematic **S-shaped curve** in the normal Q-Q plot of residuals. **(/ 2)** c. A **U-shaped (concave) curve** in the residuals-vs-fitted plot. **(/ 2)** d. One or two points with very large residuals far from the main point cloud. **(/ 1)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 6** *a.* - ✓ A fan shape indicates **heteroscedasticity** — the residual variance is not constant but increases with the fitted values. This violates the homoscedasticity assumption. - ✓ Corrective actions: apply a variance-stabilising transformation to the response (e.g., log or square-root); fit a weighted least squares model; or use a generalised linear model with an appropriate error family (e.g., Poisson or Gamma with a log link). *b.* - ✓ An S-shaped curve in the Q-Q plot indicates **heavy tails** (leptokurtosis) if the S bends upward on the right and downward on the left — residuals are more extreme than expected under normality. The **normality** assumption is violated. - ✓ Corrective actions: consider a transformation (e.g., log or Box-Cox); investigate whether the extreme residuals correspond to outliers that should be examined; use a robust regression method or a distribution with heavier tails (e.g., t-distribution errors). *c.* - ✓ A U-shaped (or arch-shaped) curve in the residuals-vs-fitted plot indicates **non-linearity** — the fitted linear model systematically under- or over-predicts in different regions of the predictor space. - ✓ Corrective actions: add a polynomial (quadratic) term to the model; apply a transformation to the predictor variable; consider a non-linear or generalised additive model (GAM). *d.* - ✓ Large isolated residuals indicate potential **outliers** or **influential observations** — individual data points that deviate markedly from the model's predictions. They may represent data entry errors, genuinely unusual biological events, or evidence that the model is misspecified for a subset of the data. These should be investigated (not automatically removed) by examining the raw data and leverage/influence statistics (Cook's distance). ::: `r if (params$hide_answers) ":::"` --- ## Question 7 — Multiple Regression and Interaction Effects (/10) a. What is **multicollinearity** in a multiple regression context, and how does it affect the interpretation of regression coefficients? **(/ 3)** b. What is the **Variance Inflation Factor (VIF)**, and what threshold is commonly used to flag problematic multicollinearity? **(/ 2)** c. A researcher fits the following model to data on algal growth rate (cm day⁻¹): `growth ~ temperature + nutrient + temperature:nutrient`. The interaction term `temperature:nutrient` is significant. Explain what a significant interaction means for the *interpretation* of the main effect of temperature. How would you interpret the interaction coefficient in practical, biological terms? **(/ 5)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 7** *a.* - ✓ **Multicollinearity** occurs when two or more predictors in a multiple regression model are highly correlated with each other — they share much of the same variance in the response. - ✓ Consequence: the individual regression coefficients become **unstable** (high standard errors), because the model cannot reliably partition the variation in the response between the collinear predictors. Small changes in the dataset can produce large swings in coefficient estimates. - ✓ The *combined* predictive power of the model may remain unaffected, but the *individual* coefficients can no longer be interpreted as the effect of one variable holding the other constant — because in reality the predictors cannot be independently varied. *b.* - ✓ The **VIF** for predictor *j* is 1 / (1 − *R*²~j~), where *R*²~j~ is the proportion of variance in predictor *j* explained by all other predictors. It quantifies how much the variance of the coefficient estimate is inflated by collinearity. - ✓ A common threshold: VIF > **5** (some authorities use 10) is flagged as problematic; VIF = 1 indicates no collinearity. *c.* - ✓ When the interaction term is significant, the main effect of temperature is **conditional** — it is not a single universal effect but depends on the level of the nutrient variable. The "main effect" label in the coefficient table describes the effect of temperature only when nutrients are at the reference level (e.g., ambient), not across all nutrient conditions. - ✓ The interaction coefficient represents the *additional change in the slope of temperature* when nutrients change from the reference level (e.g., ambient) to the comparison level (e.g., enriched). In other words, the temperature effect on growth rate differs between ambient and enriched nutrient conditions. - ✓ Biological example: if the interaction coefficient is positive (+0.12), the growth rate increases by an additional 0.12 cm day⁻¹ per °C of warming under enriched nutrients compared to ambient nutrients. This means warming has a **stronger stimulatory effect on growth when nutrients are not limiting** — temperature and nutrient availability operate synergistically, not independently. - ✓ To fully describe the relationship, you must report **separate slope estimates** for each nutrient level (the conditional effects), rather than a single temperature effect, because the single "main effect" is misleading when the interaction is significant. - ✓ This has important implications for biological interpretation: if nutrient enrichment amplifies the warming response, then eutrophication and ocean warming may act synergistically to increase macroalgal proliferation — a non-additive interaction that cannot be predicted from single-factor studies. ::: `r if (params$hide_answers) ":::"` --- # Part B: Experiment Design and Hypothesis Formulation (25 marks) ## Question 8 — Factorial Design: Lizard Sprint Speed (/13) A herpetologist measures the maximum sprint speed (m s⁻¹) of common lizards (*Zootoca vivipara*) reared under two temperatures (20°C and 30°C) and two diet types (insect-based and plant-based). Six individuals are assigned to each of the four treatment combinations. The first six rows of the dataset are: ``` lizard_id temperature diet_type sprint_speed_m_s 1 1 20°C insects 1.23 2 2 20°C insects 1.18 3 3 20°C vegetation 0.89 4 4 20°C vegetation 0.92 5 5 30°C insects 1.67 6 6 30°C insects 1.71 ``` The researcher asks: *"Does sprint speed vary with temperature, diet type, or the interaction between them?"* a. State formal null and alternative hypotheses for each of the following effects: (i) the main effect of temperature, (ii) the main effect of diet type, and (iii) the temperature × diet interaction. **(/ 6)** b. What statistical test is most appropriate, and give **three** reasons, including reference to the number of predictors and their nature. **(/ 4)** c. The temperature × diet interaction is significant. What does this mean biologically? How does it affect how you would report and interpret the main effects? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 8** *a.* Two marks per effect pair (*H*~0~ + *H*~A~): *(i) Temperature:* - ✓ *H*~0~: Mean sprint speed does not differ between lizards reared at 20°C and 30°C (μ~20~ = μ~30~). - ✓ *H*~A~: Mean sprint speed differs between the two temperature treatments (μ~20~ ≠ μ~30~). *(ii) Diet type:* - ✓ *H*~0~: Mean sprint speed does not differ between lizards fed insects and those fed vegetation (μ~insects~ = μ~vegetation~). - ✓ *H*~A~: Mean sprint speed differs between the two diet types (μ~insects~ ≠ μ~vegetation~). *(iii) Temperature × diet interaction:* - ✓ *H*~0~: The effect of temperature on sprint speed is the same regardless of diet type (no interaction; the effects are additive). - ✓ *H*~A~: The effect of temperature on sprint speed depends on diet type (the two factors interact; their combined effect is not simply additive). *b.* - ✓ **Two-way (factorial) ANOVA** — this is the correct test because there are two categorical predictors (temperature with 2 levels; diet with 2 levels) and a single continuous response variable (sprint speed). - ✓ Reason 1: There are **two factorial predictors** (not one), each with distinct levels. A two-way ANOVA simultaneously tests main effects of each factor and their interaction — a design that one-way ANOVA or *t*-tests cannot accommodate. - ✓ Reason 2: The response (sprint speed, m s⁻¹) is **continuous** and ratio-scaled, appropriate for ANOVA which compares group means. - ✓ Reason 3: The design is **balanced** (equal replication, 6 per cell), which maximises the power and interpretive clarity of a factorial ANOVA; each cell's mean is estimated with equal precision. *c.* - ✓ A significant interaction means that the effect of temperature on sprint speed **depends on diet type** (or equivalently, the diet effect depends on temperature). The two factors do not act independently. - ✓ For example, warming may strongly enhance sprint speed in insect-fed lizards (because sufficient protein supports muscle development) but have little effect in vegetation-fed lizards (because plant-based nutrition cannot support the thermal enhancement of locomotor performance). - ✓ Because the interaction is significant, the **main effects cannot be interpreted in isolation** — reporting a single main effect of temperature (e.g., "warmer lizards are faster") is misleading if this is only true for one diet type. You must present and interpret the **conditional effects** (simple main effects) separately for each diet type, ideally via an interaction plot. ::: `r if (params$hide_answers) ":::"` --- ## Question 9 — Bacterial Colony Counts Across Antibiotic Concentrations (/12) A microbiologist grows *Staphylococcus aureus* at four concentrations of a novel antibiotic (0, 10, 50, and 100 μg mL⁻¹) with three replicate cultures per concentration. Colony counts (CFU mL⁻¹) are recorded after 24 hours. The first eight rows of the dataset are: ``` replicate conc_ug_mL colony_CFU_mL 1 1 0 4500 2 2 0 5120 3 3 0 4800 4 1 10 1230 5 2 10 980 6 3 10 1105 7 1 50 213 8 2 50 178 ``` The research question is: *"Does antibiotic concentration significantly affect bacterial colony count?"* a. Formulate formal null and alternative hypotheses. **(/ 3)** b. Identify the appropriate statistical test, explaining with **specific reference** to the nature of the response variable and the experimental design. **(/ 4)** c. Why would standard one-way ANOVA be problematic to apply directly to these colony count data? **(/ 2)** d. What transformation might make the data more amenable to a parametric test, and what property would it stabilise? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 9** *a.* - ✓ *H*~0~: The mean (or median) bacterial colony count does not differ among antibiotic concentration groups; all four concentrations produce equal mean colony counts (μ~0~ = μ~10~ = μ~50~ = μ~100~). - ✓ *H*~A~: At least one antibiotic concentration produces a mean colony count that differs from the others. - ✓ Given the expectation that higher concentrations will reduce counts (antibiotic effect), a directional prediction (decreasing counts with increasing concentration) is scientifically reasonable, but the omnibus test remains non-directional. *b.* - ✓ **Kruskal-Wallis rank-sum test** (non-parametric one-way analysis) — or one-way ANOVA on log-transformed counts if the transformation achieves normality and homoscedasticity. - ✓ There are **four independent groups** (four concentration levels), each with independent replicate cultures (not the same culture measured at each concentration), making this a one-factor, four-level design. - ✓ The response variable (CFU mL⁻¹) consists of **positive counts** that are likely right-skewed with variance proportional to the mean (a characteristic of microbial count data) — typical parametric ANOVA assumptions (normality, homoscedasticity) are likely violated. - ✓ With only 3 replicates per group (12 observations total), the dataset is too small to reliably test distributional assumptions; non-parametric methods are particularly appropriate for small, non-normal count datasets. *c.* - ✓ ANOVA assumes **normality of residuals within each group**: microbial colony counts are non-negative integers that often follow a log-normal or negative binomial distribution rather than a normal distribution. - ✓ ANOVA assumes **homoscedasticity**: the enormous range in counts (from ~5000 at 0 μg mL⁻¹ to ~200 at 50 μg mL⁻¹) suggests strongly unequal variances across groups — a clear violation of this assumption. Applying ANOVA directly would produce unreliable *F*-statistics and *p*-values. *d.* - ✓ A **log-transformation** (log~10~ or natural log of CFU mL⁻¹) is the standard transformation for microbial count data. - ✓ It stabilises the **multiplicative variance** structure: because counts span several orders of magnitude and variance scales with the mean (the coefficient of variation is roughly constant), the log transform converts this to approximately additive, constant variance. - ✓ The log-transformed counts are also much more likely to be normally distributed within groups (log-normal counts → normal on the log scale), enabling valid application of one-way ANOVA. One-way ANOVA on log-transformed CFU data followed by Tukey HSD post-hoc tests would then be appropriate. ::: `r if (params$hide_answers) ":::"` --- # Part C: Statistical Output Interpretation (25 marks) ## Question 10 — One-Way ANOVA with Tukey HSD Post-hoc (/12) A researcher compares the maximum sprint speed (m s⁻¹) of lizards from four habitat types: Desert, Grassland, Forest, and Savanna. Ten lizards per habitat are measured. The ANOVA and Tukey HSD results are: ``` Analysis of Variance Table Response: sprint_speed Df Sum Sq Mean Sq F value Pr(>F) habitat 3 4.821 1.607 12.44 <0.001 *** Residuals 36 4.651 0.129 Tukey multiple comparisons of means 95% family-wise confidence level $habitat diff lwr upr p adj Forest-Desert 0.312 0.089 0.535 0.0024 Grassland-Desert 0.187 -0.036 0.410 0.1210 Forest-Grassland 0.125 -0.098 0.348 0.4120 Savanna-Desert 0.528 0.305 0.751 0.0001 Savanna-Forest 0.216 -0.007 0.439 0.0612 Savanna-Grassland 0.341 0.118 0.564 0.0008 ``` a. State the null hypothesis being tested by the ANOVA *F*-test. **(/ 2)** b. Interpret the *F*-value (12.44) and the associated *p*-value. What conclusion do you draw from the ANOVA alone? **(/ 3)** c. Based on the Tukey HSD output, identify all **significantly** and **non-significantly** different pairs of habitat types. **(/ 4)** d. The Tukey test uses a "95% family-wise confidence level." What does this mean, and why is it preferable to performing all pairwise comparisons each at *α* = 0.05? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 10** *a.* - ✓ *H*~0~: The mean sprint speed is equal across all four habitat types (μ~Desert~ = μ~Grassland~ = μ~Forest~ = μ~Savanna~). - ✓ *H*~A~ (implicit): At least one habitat type has a mean sprint speed that differs from the others. *b.* - ✓ *F*(3, 36) = 12.44 means that the between-group variance is 12.44 times larger than the within-group (residual) variance — the habitat groups differ far more than would be expected from random sampling of a common population. - ✓ *p* < 0.001 ≪ *α* = 0.05: we reject *H*~0~. There is very strong statistical evidence that mean sprint speed differs among at least some habitat types. - ✓ However, the ANOVA alone does not identify *which* habitats differ — only that differences exist. Post-hoc testing is required to pinpoint the specific pairwise differences. *c.* **Significantly different pairs** (adjusted *p* < 0.05): - ✓ Forest vs. Desert (*p* = 0.0024) — Forest lizards sprint faster than Desert lizards. - ✓ Savanna vs. Desert (*p* = 0.0001) — Savanna lizards sprint fastest vs. Desert. - ✓ Savanna vs. Grassland (*p* = 0.0008) — Savanna differs from Grassland. **Non-significantly different pairs** (adjusted *p* > 0.05): - ✓ Grassland vs. Desert (*p* = 0.1210) — no significant difference. - Forest vs. Grassland (*p* = 0.4120) — no significant difference. - Savanna vs. Forest (*p* = 0.0612) — borderline, not significant at *α* = 0.05. *(Award 1 mark per correctly classified pair, up to 4 marks total; accept minor omissions.)* *d.* - ✓ "95% family-wise confidence level" means that there is a 95% probability that **all** confidence intervals in the table simultaneously contain the true pairwise differences — the error is controlled across the entire family of 6 comparisons, not separately per interval. - ✓ If all 6 comparisons were each run at *α* = 0.05, the family-wise Type I error rate would be approximately 1 − (0.95)⁶ ≈ 0.26 — a 26% chance of at least one false positive among the six tests. - ✓ Tukey HSD adjusts the critical difference threshold so that the *combined* probability of any false positive across all comparisons remains at 5%, providing rigorous control while remaining more powerful than simpler corrections (e.g., Bonferroni) when all pairwise comparisons are of interest. ::: `r if (params$hide_answers) ":::"` --- ## Question 11 — Multiple Regression with an Interaction Term (/13) An environmental physiologist models the growth rate (cm day⁻¹) of a marine macroalga as a function of seawater temperature (continuous, °C) and nutrient level (categorical: Low vs. High). An interaction term is included. The `lm()` output is: ``` Call: lm(formula = growth_rate ~ temperature + nutrient + temperature:nutrient, data = algae) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.2450 0.3870 -3.22 0.0020 ** temperature 0.1870 0.0310 6.03 < 0.001 *** nutrientHigh 2.3410 0.4120 5.68 < 0.001 *** temperature:nutrientHigh 0.1240 0.0520 2.38 0.0204 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.834 on 56 degrees of freedom Multiple R-squared: 0.7891, Adjusted R-squared: 0.7769 F-statistic: 69.81 on 3 and 56 DF, p-value: < 2.2e-16 ``` a. How many predictor variables are in this model (count distinct predictors, not rows in the table)? **(/ 1)** b. Interpret the coefficient for `temperature` (0.1870) in the context of this model. **(/ 2)** c. The interaction term (`temperature:nutrientHigh`) has a coefficient of 0.1240 and is significant. Explain what this means for the biological relationship being studied. **(/ 4)** d. Write out the full regression equation separately for (i) **Low-nutrient** algae and (ii) **High-nutrient** algae. **(/ 3)** e. What does the adjusted *R*² (0.7769) indicate, and why is it reported in preference to *R*² (0.7891)? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 11** *a.* - ✓ **Two** distinct predictor variables: temperature (continuous) and nutrient level (categorical, two levels). The interaction term is derived from these two and is not an additional independent predictor. *b.* - ✓ The coefficient 0.1870 for temperature represents the effect of temperature **for Low-nutrient algae** (the reference level of the nutrient factor): for each 1°C increase in temperature, growth rate increases by approximately 0.187 cm day⁻¹, when nutrients are at the Low level. - ✓ This is the **conditional** slope — because an interaction term is present, this coefficient does not apply uniformly to all algae; it specifically describes the temperature effect under Low-nutrient conditions. *c.* - ✓ The significant interaction coefficient (0.1240) indicates that the **effect of temperature on growth rate is stronger under High-nutrient conditions** than under Low-nutrient conditions. - ✓ Specifically: in High-nutrient conditions, the growth rate increases by an **additional** 0.124 cm day⁻¹ per °C compared to the Low-nutrient slope (0.1870), giving a combined temperature slope of 0.1870 + 0.1240 = 0.311 cm day⁻¹ per °C under High nutrients. - ✓ Biologically: nutrients appear to be **co-limiting** with temperature. When nutrients are abundant, warming has a larger stimulatory effect on growth, possibly because the biochemical machinery for photosynthesis and protein synthesis can operate at higher rates when both thermal energy and building materials are available. - ✓ This synergistic interaction means that in oligotrophic (nutrient-poor) systems, ocean warming will have a smaller effect on macroalgal growth than in eutrophic (nutrient-rich) coastal environments — an important distinction for management of coastal blooms under climate change. *d.* - ✓ **(i) Low-nutrient algae** (nutrientHigh = 0): $\hat{growth} = -1.245 + 0.187 \times temperature$ - ✓✓ **(ii) High-nutrient algae** (nutrientHigh = 1; add both the nutrientHigh coefficient and the interaction term): $\hat{growth} = (-1.245 + 2.341) + (0.187 + 0.124) \times temperature = 1.096 + 0.311 \times temperature$ *e.* - ✓ Adjusted *R*² = 0.7769 means that approximately **77.7% of the variation in algal growth rate** is explained by the model (temperature, nutrient level, and their interaction), after accounting for model complexity. - ✓ The adjusted *R*² is **slightly lower** than *R*² (0.7891) because it penalises each additional predictor term relative to sample size — unlike *R*², adjusted *R*² does not automatically increase when irrelevant predictors are added; it decreases if a predictor adds less explanatory power than expected by chance. - ✓ It is preferred over *R*² when comparing models with different numbers of predictors, because it provides a fairer comparison of model fit that accounts for model complexity. ::: `r if (params$hide_answers) ":::"` --- *End of Version 3*