BCB744 Biostatistics — Theory Test (Version 2)

Total: 100 marks | Time: 90 minutes

Author

Affiliation

A. J. Smit

University of the Western Cape

Published

January 1, 2026

Instructions

This paper has three parts: Part A (General Theory, 50 marks), Part B (Experiment Design and Hypothesis Formulation, 25 marks), and Part C (Statistical Output Interpretation, 25 marks).
Answer all questions.
Write clearly and in complete sentences where prose is required.
Mark allocations are shown next to each question in (/ marks) notation.
Statistical notation: use H₀ for the null hypothesis and H_A for the alternative hypothesis.

1 Part A: General Theory (50 marks)

1.1 Question 1 — Variables and Study Design (/7)

Distinguish between a continuous and a categorical (nominal) predictor variable. Give one biological example of each. (/ 3)
What is the difference between a fixed factor and a random factor in an experimental design? Give one example of each from a field ecology context. (/ 2)
Define pseudoreplication. Why is it statistically problematic? (/ 2)

Model Answer — Question 1

✓ A continuous predictor takes any real value on a scale (e.g., water temperature in °C, salinity in ppt, or body mass in grams).
✓ A categorical predictor groups observations into discrete, named categories with no inherent numeric spacing (e.g., species identity, treatment group, sex, or habitat type).
✓ Biological examples: continuous — seawater pH as a predictor of coral calcification rate; categorical — diet type (herbivore, omnivore, carnivore) as a predictor of gut passage time.

✓ A fixed factor is one whose specific levels are deliberately chosen and are the levels of direct interest (conclusions apply only to those levels). Example: three specific herbicide concentrations selected by the researcher.
✓ A random factor is one whose levels are a random sample from a broader population of possible levels, and the goal is to generalise to that population. Example: 10 randomly chosen study plots within a forest biome, representing all possible plots.

✓ Pseudoreplication occurs when multiple measurements from the same experimental unit (or non-independent observations) are treated as independent replicates, artificially inflating the apparent sample size.
✓ This is statistically problematic because it underestimates the true variance within treatments, inflates the F- or t-statistic, and dramatically inflates the Type I error rate — leading to false conclusions of significance.

1.2 Question 2 — Data Visualisation (/5)

When is a boxplot more informative than a bar chart showing mean ± standard error? (/ 2)
What feature of a dataset does a violin plot reveal that a boxplot does not? (/ 1)
Identify the five summary statistics encoded in a standard boxplot (Tukey box-and-whisker). (/ 2)

Model Answer — Question 2

✓ A boxplot is more informative when the data are skewed or contain outliers — it directly shows the median, spread, skewness (via asymmetric box), and outlier positions that a mean ± SE bar chart obscures.
✓ It is particularly useful when sample sizes are modest and the mean is not a robust summary (e.g., bimodal or heavy-tailed distributions), or when comparing distributions that differ in shape rather than just location.

✓ A violin plot reveals the full probability density (distributional shape) of the data — including multimodality, skew, and the concentration of values at different levels — something a boxplot cannot show since it only summarises five quantiles.

✓✓ The five summary statistics: lower quartile (Q1, 25th percentile), median (Q2, 50th percentile), upper quartile (Q3, 75th percentile), and the two whisker endpoints that extend to the most extreme values within 1.5 × IQR of Q1 and Q3. Points beyond the whiskers are plotted individually as outliers.

1.3 Question 3 — Sampling Distributions and Uncertainty (/8)

Define standard error of the mean. How does it differ from standard deviation, and what does each quantity describe? (/ 3)
Explain what a 95% confidence interval means. Correct a common misinterpretation: “there is a 95% chance that the true mean lies within this interval.” (/ 3)
How does increasing sample size affect the width of a confidence interval, and why? (/ 2)

Model Answer — Question 3

✓ The standard deviation (SD) describes the spread of individual observations around the sample mean — it is a property of the raw data distribution.
✓ The standard error of the mean (SE) = SD / √n, and describes the precision of the sample mean as an estimate of the population mean — it is a property of the sampling distribution of the mean.
✓ As sample size increases, SE decreases (more data → more precise estimate of the mean) while SD remains approximately constant (it reflects inherent biological variability, not sampling effort).

✓ A 95% CI is constructed so that if we repeated the sampling procedure many times, 95% of the intervals produced would contain the true population parameter. It is a statement about the procedure, not about any single interval.
✓ The common misinterpretation is wrong because after computing one interval, the true mean is either inside it or it is not — there is no 95% probability associated with this particular interval. The 95% refers to the long-run frequency property of the method.
✓ A correct interpretation: “We used a procedure that, in repeated sampling, produces intervals capturing the true mean 95% of the time; we have no basis for saying the probability is other than 0 or 1 for this specific interval.”

✓ Increasing sample size narrows the confidence interval, because SE = SD / √n decreases as n grows, and the CI half-width is approximately ±(t-critical × SE).
✓ Intuitively: more data provides a more precise estimate of the population mean, so the uncertainty band around it contracts.

1.4 Question 4 — One-Tailed vs Two-Tailed Tests (/6)

Explain the difference between a one-tailed and a two-tailed hypothesis test. Under what circumstances is a one-tailed test scientifically justified? (/ 3)
What is the primary statistical risk associated with choosing a one-tailed test after looking at the direction of the observed effect? (/ 3)

Model Answer — Question 4

✓ A two-tailed test places the rejection region in both tails of the null distribution, testing whether the effect could be larger or smaller than the null value (H_A: μ₁ ≠ μ₂).
✓ A one-tailed test places all rejection area in one tail, testing only one direction of departure (H_A: μ₁ > μ₂, or H_A: μ₁ < μ₂).
✓ A one-tailed test is justified only when there is a strong, a priori mechanistic reason to predict the direction of the effect before data collection (e.g., a drug is known to be an inhibitor, so only a decrease in enzyme activity is biologically plausible).

✓ If the direction is chosen after observing the data, the effective significance level is inflated: the researcher is implicitly running a two-tailed comparison but claiming the critical threshold of a one-tailed test (halving the reported p-value).
✓ This constitutes a form of p-hacking and doubles the actual Type I error rate compared to the stated α — an observed one-tailed p = 0.04 is really equivalent to a two-tailed p = 0.08 if the direction was chosen post-hoc.
✓ It also means the researcher cannot detect an effect in the opposite direction, which may be scientifically important (e.g., a drug producing unexpected stimulation rather than inhibition).

1.5 Question 5 — t-tests (/7)

Explain the conceptual difference between an independent-samples t-test and a paired t-test. Give a biological scenario where the paired design is more appropriate. (/ 3)
Under what conditions should you use Welch’s t-test rather than Student’s t-test? (/ 2)
What non-parametric test is the appropriate alternative when the assumptions of the independent-samples t-test cannot be met? (/ 2)

Model Answer — Question 5

✓ An independent-samples t-test compares means from two separate, unrelated groups — the observations in one group have no pairing or correspondence with observations in the other.
✓ A paired t-test compares means where each observation in one condition is logically matched to an observation in the other — differences are computed within pairs before testing, removing between-individual variation.
✓ Biological scenario: measuring cortisol in the same individual fish before and after a stressor. Because both measurements come from the same fish, pairing removes individual-level baseline differences and increases power to detect the stress response.

✓ Welch’s t-test should be used when the two groups have unequal variances (heteroscedasticity), as it adjusts the degrees of freedom to account for this.
✓ It is generally recommended as the default over Student’s t-test even when variances appear similar, because it is valid under both equal and unequal variances with minimal power cost when variances happen to be equal.

✓✓ The Wilcoxon rank-sum test (also called the Mann-Whitney U test) is the appropriate non-parametric alternative when normality or other assumptions of the independent-samples t-test are violated. It compares the distributions of two independent groups using ranked data.

1.6 Question 6 — Selecting the Correct Statistical Test (/7)

You are comparing body condition index (a continuous measure) across three groups of an endangered antelope species (wet-season migrants, dry-season migrants, and year-round residents). A Shapiro-Wilk test on each group shows the data are not normally distributed. What test would you use, and provide three reasons for your choice. (/ 4)
You have data on the feeding rates of 15 individual fish, measured once under ambient light and once under reduced light. What information would you need to confirm before deciding between an independent-samples t-test and a paired t-test? (/ 3)

Model Answer — Question 6

✓ Kruskal-Wallis rank-sum test (the non-parametric analogue of one-way ANOVA).
✓ Reason 1: There are three independent groups, requiring a test that can simultaneously compare more than two groups (ruling out pairwise t-tests or Wilcoxon rank-sum, which are two-sample methods).
✓ Reason 2: The data are not normally distributed within groups (Shapiro-Wilk significant), violating a key assumption of one-way ANOVA. Kruskal-Wallis operates on ranks and requires no distributional assumption.
✓ Reason 3: The response (body condition index) is continuous and the groups are independent (different individual animals), meeting the design requirements for Kruskal-Wallis.

✓ You need to confirm whether the same 15 individual fish were measured under both light conditions (repeated measures / within-subject design) or whether the 30 measurements come from different fish assigned to each condition (between-subject design).
✓ If the same fish were measured twice (paired), a paired t-test (or Wilcoxon signed-rank if non-normal) is appropriate and more powerful.
✓ If different fish were used for each condition, an independent-samples t-test (or Wilcoxon rank-sum) is correct. Applying a paired test to unpaired data, or vice versa, is a design error that can produce invalid results.

1.7 Question 7 — Polynomial Regression (/10)

Explain why a polynomial regression model (e.g., $\hat{y} = \beta_0 + \beta_1 x + \beta_2 x^2$) is described as “linear in its parameters” even though it models a curved relationship. (/ 3)
A researcher fits both a linear and a quadratic regression model to the same dataset. How would they decide which model is more appropriate? Describe two methods. (/ 4)
What is overfitting, and what is one consequence of overfitting a polynomial model to biological data? (/ 3)

Model Answer — Question 7

✓ A model is linear in its parameters if each coefficient (β) enters the equation as a simple multiplier of a predictor term — the parameters are never raised to a power, multiplied together, or appear inside a nonlinear function.
✓ In $\hat{y} = \beta_0 + \beta_1 x + \beta_2 x^2$: β₀, β₁, and β₂ each enter linearly (as multipliers), even though x² introduces curvature. The variable x² is simply treated as a new predictor column in the design matrix — the model is still solved by ordinary least squares.
✓ Nonlinear models (e.g., $\hat{y} = \beta_0 e^{\beta_1 x}$) are fundamentally different because the parameters appear inside a nonlinear function, requiring iterative optimisation rather than closed-form OLS.

✓✓ Residual diagnostic plots: after fitting the linear model, plot residuals vs. fitted values. A systematic curved (U-shaped) pattern in the residuals suggests that a linear model is insufficient and a quadratic term may be warranted. If the quadratic model’s residuals show no systematic pattern, it is preferred.
✓✓ Model comparison via AIC / adjusted R²: the Akaike Information Criterion (AIC) penalises model complexity; the model with the lower AIC is preferred. Adjusted R² increases only when the added polynomial term improves fit more than expected by chance. An F-test comparing the two nested models can also be used: a significant F indicates the quadratic term adds explanatory value.

✓ Overfitting occurs when a model is too complex relative to the data — it fits the noise of the sample rather than the underlying signal (true biological relationship).
✓ A consequence is poor predictive performance: the overfitted polynomial may pass through every data point in the sample, but when used to predict new observations it produces large errors because the fitted curve conforms to idiosyncratic noise rather than the true relationship.
✓ Additionally, overfitted models can produce biologically nonsensical predictions, such as extreme oscillations outside the observed range of the predictor (extrapolation artefacts).

2 Part B: Experiment Design and Hypothesis Formulation (25 marks)

2.1 Question 8 — Cortisol Response to Acute Stress in Zebrafish (/12)

A researcher measures plasma cortisol (ng mL⁻¹) in 20 individual zebrafish (Danio rerio) before and after a 5-minute confinement stressor. The same 20 fish are measured at both time points. The first six rows of the dataset are:

  fish_id  timepoint  cortisol_ng_mL
1       1     before           12.4
2       2     before           15.1
3       3     before           11.8
4       1      after           28.7
5       2      after           31.2
6       3      after           25.9

The research question is: “Does acute confinement stress significantly increase plasma cortisol in zebrafish?”

Formulate appropriate null and alternative hypotheses. Note whether the alternative is directional (one-tailed) or non-directional (two-tailed) and justify this choice. (/ 3)
Identify the most appropriate statistical test and give three reasons for your choice, with reference to the data structure. (/ 5)
What aspect of normality would you check for this test specifically, and how would you check it? (/ 2)
If the result is statistically significant, what additional information would you report to describe the biological magnitude of the effect? (/ 2)

Model Answer — Question 8

✓ H₀: The mean cortisol level does not change following confinement stress; the mean difference (after − before) = 0.
✓ H_A: The mean cortisol level is significantly higher after confinement stress than before; mean difference > 0. This is a one-tailed alternative.
✓ A one-tailed test is justified here because the HPA/HPI axis stress physiology of teleost fish is well-established: confinement is known to activate cortisol release. A directional prediction is supported by strong a priori mechanistic knowledge, not by inspecting the data.

✓ Paired t-test (or Wilcoxon signed-rank test if the paired differences are non-normal).
✓ Reason 1: The same individuals are measured at both time points — each before-measurement is intrinsically linked to the after-measurement from the same fish. This is a within-subject (repeated-measures) design, requiring a paired analysis.
✓ Reason 2: Using an independent-samples t-test would ignore this pairing, increasing residual variance (between-fish baseline differences) and reducing power.
✓ Reason 3: The response variable (cortisol, ng mL⁻¹) is continuous and ratio-scaled, meeting the measurement-scale requirement for a parametric test.

✓ For a paired t-test, the assumption of normality applies to the paired differences (after − before), not to the raw cortisol values. Calculate each difference (e.g., 28.7 − 12.4 = 16.3), then apply a Shapiro-Wilk test or examine a Q-Q plot of the 20 difference values.
✓ If the differences are non-normal (especially with only 20 pairs), the Wilcoxon signed-rank test is the appropriate non-parametric alternative.

✓ Report the mean (or median) difference in cortisol (e.g., mean increase of X ng mL⁻¹) alongside its 95% confidence interval, to convey the biological magnitude of the stress response.
✓ Cohen’s d (effect size) could also be reported, comparing the mean difference to the SD of the differences, to indicate whether the magnitude is small, medium, or large in practical terms.

2.2 Question 9 — Plant Height Along a Salinity Gradient (/13)

An agronomist measures the height (cm) of 50 individual salt-marsh plants (Spartina alterniflora) growing across a salinity gradient. Each plant’s location is characterised by soil salinity (ppt). The first six rows of the dataset are:

  plant_id  salinity_ppt  height_cm  root_depth_cm
1        1           2.1      62.3           18.4
2        2           5.8      57.9           16.7
3        3          12.4      49.3           14.2
4        4          21.6      41.1           12.8
5        5          34.8      31.7           10.1
6        6          47.2      22.4            8.6

The research aim is: *“To test whether soil salinity negatively predicts plant height in* S. alterniflora.”

Formulate formal null and alternative hypotheses. Is a one-tailed or two-tailed alternative more appropriate? Justify your answer. (/ 3)
Identify the appropriate statistical test and give three reasons for this choice, referring to the nature of both variables and the aim. (/ 4)
What would a significant negative slope tell you biologically? What would a non-significant result imply? (/ 3)
The researcher also measured root depth (cm). Could root depth be included in the same statistical framework? If so, what type of model would this be, and what additional concern would arise? (/ 3)

Model Answer — Question 9

✓ H₀: There is no linear relationship between soil salinity and plant height; the regression slope (β₁) = 0.
✓ H_A: Soil salinity negatively predicts plant height; β₁ < 0. This is a one-tailed alternative.
✓ A one-tailed test is justified because osmotic stress physiology provides a strong mechanistic a priori prediction: increasing soil salinity increases the energy cost of osmoregulation and reduces water availability, both of which suppress shoot growth. The prediction of a negative slope is theoretically grounded before data collection.

✓ Simple linear regression.
✓ Reason 1: Both the predictor (salinity_ppt) and response (height_cm) are continuous ratio-scale variables. This rules out t-tests and ANOVA (categorical predictor) and points to regression.
✓ Reason 2: The aim is explicitly to predict height from salinity and quantify the rate of change (slope), which is the purpose of regression (not merely correlation, which would only assess association strength).
✓ Reason 3: The data preview shows a consistent decrease in height with salinity across 6 rows spanning 2–47 ppt, supporting the linearity assumption that is a prerequisite for simple linear regression.

✓ A significant negative slope would indicate that increasing soil salinity is associated with a statistically reliable decrease in S. alterniflora height — consistent with osmotic stress limiting shoot elongation. The slope value itself (cm decrease per ppt increase) quantifies the effect size.
✓ A non-significant result would imply insufficient evidence to conclude that salinity influences height over the observed range — this could mean the relationship is truly absent, or that the effect exists but the study lacks the power to detect it (Type II error), or that the relationship is non-linear (and thus not captured by a linear model).

✓ Yes — root depth could be included as a second continuous predictor in a multiple linear regression model: height_cm ~ salinity_ppt + root_depth_cm.
✓ The additional concern is multicollinearity: salinity and root depth both appear to decrease together in the data preview. If they are strongly correlated with each other, their effects cannot be estimated independently — coefficient estimates become unstable and standard errors inflate. This should be diagnosed using a Variance Inflation Factor (VIF).
✓ Including a collinear predictor may not improve explanatory power and can make biological interpretation of individual coefficients misleading.

3 Part C: Statistical Output Interpretation (25 marks)

3.1 Question 10 — Simple Linear Regression Summary (/12)

A researcher models the relationship between soil salinity (ppt) and plant height (cm) in S. alterniflora. The lm() output is:

Call:
lm(formula = height_cm ~ salinity_ppt, data = spartina)

Residuals:
    Min      1Q  Median      3Q     Max
 -6.312  -1.874   0.102   1.993   5.876

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)   64.8312     1.4231   45.55  < 2e-16 ***
salinity_ppt  -0.8847     0.0512  -17.28  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.641 on 48 degrees of freedom
Multiple R-squared:  0.8614,  Adjusted R-squared:  0.8585
F-statistic: 298.6 on 1 and 48 DF,  p-value: < 2.2e-16

Write the equation of the fitted regression line. (/ 2)
Interpret the slope coefficient (−0.8847) in biological terms. (/ 2)
What does R² = 0.8614 mean? Why is the adjusted R² (0.8585) slightly lower? (/ 3)
The residuals range from −6.312 to 5.876. What do these values represent, and are there any obvious concerns from this summary? (/ 3)
What do the *** significance codes next to both coefficients indicate, and what is the p-value threshold they correspond to? (/ 2)

Model Answer — Question 10

✓✓ $\hat{height} = 64.831 - 0.885 \times salinity\_ppt$

(Accept coefficient values rounded to 2–3 decimal places.)

✓ For every 1 ppt increase in soil salinity, plant height is predicted to decrease by approximately 0.88 cm.
✓ This negative slope is consistent with the hypothesis that saline stress limits shoot elongation — plants in saltier soils are reliably shorter than those in fresher conditions.

✓ R² = 0.8614 means that 86.14% of the total variation in plant height is explained by variation in soil salinity. The model accounts for the vast majority of height differences among plants.
✓ The adjusted R² (0.8585) is slightly lower because it penalises the model for the number of predictors relative to sample size — with only one predictor and 50 observations the penalty is small, but adjusted R² always decreases relative to R² to avoid inflated estimates from adding irrelevant predictors.

✓ The residuals are the differences between each observed height and the height predicted by the model ($e_i = y_i - \hat{y}_i$). They represent variation in height that is not explained by salinity.
✓ The maximum residual (5.876 cm) and minimum (−6.312 cm) are moderately larger than a quarter of the data spread (IQR roughly −1.874 to 1.993), suggesting at least one or two individual plants deviate more strongly from the fitted line. This is not alarming but warrants checking the residuals-vs-fitted plot for any influential outliers or systematic patterns.
✓ There is no strong asymmetry between the Min (−6.312) and Max (5.876) residuals, suggesting broadly symmetric residuals around zero — consistent with the normality assumption.

✓ *** indicates a p-value < 0.001: the probability of observing a slope (or intercept) this large or larger under the null hypothesis is less than 0.1%.
✓ Both the intercept and the slope are highly significant — we have very strong evidence that the intercept differs from zero and that the slope of salinity on height is non-zero, i.e., that the relationship exists.

3.2 Question 11 — Kruskal-Wallis Test and Post-hoc Comparisons (/13)

A conservation ecologist compares the diversity index (Shannon H’) of bird communities across four habitat types: Forest, Grassland, Wetland, and Urban. Thirty plots per habitat were surveyed. The diversity data were non-normally distributed. The following output was produced:

    Kruskal-Wallis rank sum test

data:  diversity_index by habitat_type
Kruskal-Wallis chi-squared = 17.843, df = 3, p-value = 0.000477

Pairwise comparisons using Dunn test (Bonferroni adjustment):
                 Forest Grassland Wetland
Grassland        0.0012  -         -
Wetland          0.3412  0.0087    -
Urban            0.0001  0.2341    0.0002

State the null hypothesis evaluated by the Kruskal-Wallis test. (/ 2)
What does df = 3 tell you about the experimental design? (/ 2)
Interpret the p-value = 0.000477 at α = 0.05. (/ 2)
Based on the Dunn post-hoc test, identify all pairs of habitat types that differ significantly, and all pairs that do not. (/ 4)
Why was the Kruskal-Wallis test used rather than one-way ANOVA, and what is one advantage and one disadvantage of the non-parametric approach? (/ 3)

Model Answer — Question 11

✓ H₀: The distribution of bird diversity indices is identical across all four habitat types — there is no systematic difference in diversity among Forest, Grassland, Wetland, and Urban habitats.
✓ Equivalently: the median diversity index does not differ among habitat types (Kruskal-Wallis tests for differences in rank distributions, often interpreted as differences in medians).

✓ df = 3 indicates that there are 4 habitat groups (k − 1 = 4 − 1 = 3 degrees of freedom).
✓ This is consistent with the four-level categorical factor (Forest, Grassland, Wetland, Urban) described in the experimental setup.

✓ p = 0.000477 ≪ α = 0.05, so we reject H₀. There is strong statistical evidence that bird diversity is not uniformly distributed across the four habitat types.
✓ The probability of observing a chi-squared statistic this extreme (17.843) by chance, if all habitats had the same distribution, is only 0.05% — a highly significant result.

Significant pairs (Bonferroni-adjusted p < 0.05):

✓ Forest vs. Grassland (p = 0.0012) — significantly different.
✓ Grassland vs. Wetland (p = 0.0087) — significantly different.
✓ Forest vs. Urban (p = 0.0001) — significantly different.
✓ Wetland vs. Urban (p = 0.0002) — significantly different.

Non-significant pairs (p > 0.05):

Forest vs. Wetland (p = 0.3412) — not significantly different.
Grassland vs. Urban (p = 0.2341) — not significantly different.

(1 mark per pair correctly classified, max 4 marks.)

✓ Kruskal-Wallis was used because the data failed the normality assumption — ANOVA requires approximately normal distributions within groups.
✓ Advantage: robust to non-normality; makes no distributional assumptions; applicable to ordinal data or data with outliers.
✓ Disadvantage: it has lower statistical power than ANOVA when the data are normally distributed (ranks discard some information about the magnitude of differences); post-hoc testing options are more limited; it does not directly estimate group means or effect sizes.

End of Version 2

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit2026,
  author = {Smit, A. J.},
  title = {BCB744 {Biostatistics} — {Theory} {Test} {(Version} 2)},
  date = {2026-01-01},
  url = {https://tangledbank.netlify.app/BCB744/assessments/BCB744_Biostats_Theory_Test_V2.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit AJ (2026) BCB744 Biostatistics — Theory Test (Version 2). https://tangledbank.netlify.app/BCB744/assessments/BCB744_Biostats_Theory_Test_V2.html.

--- title: "BCB744 Biostatistics — Theory Test (Version 2)" subtitle: "Total: 100 marks | Time: 90 minutes" date: "2026" format: html: number-sections: true toc: true toc-depth: 2 toc-title: "Contents" embed-resources: true engine: knitr params: hide_answers: false --- ::: {.callout-important appearance="simple"} **Instructions** - This paper has **three parts**: Part A (General Theory, 50 marks), Part B (Experiment Design and Hypothesis Formulation, 25 marks), and Part C (Statistical Output Interpretation, 25 marks). - Answer **all** questions. - Write clearly and in complete sentences where prose is required. - Mark allocations are shown next to each question in **(/ marks)** notation. - Statistical notation: use *H*~0~ for the null hypothesis and *H*~A~ for the alternative hypothesis. ::: --- # Part A: General Theory (50 marks) ## Question 1 — Variables and Study Design (/7) a. Distinguish between a **continuous** and a **categorical** (nominal) predictor variable. Give one biological example of each. **(/ 3)** b. What is the difference between a **fixed factor** and a **random factor** in an experimental design? Give one example of each from a field ecology context. **(/ 2)** c. Define **pseudoreplication**. Why is it statistically problematic? **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 1** *a.* - ✓ A **continuous predictor** takes any real value on a scale (e.g., water temperature in °C, salinity in ppt, or body mass in grams). - ✓ A **categorical predictor** groups observations into discrete, named categories with no inherent numeric spacing (e.g., species identity, treatment group, sex, or habitat type). - ✓ Biological examples: continuous — seawater pH as a predictor of coral calcification rate; categorical — diet type (herbivore, omnivore, carnivore) as a predictor of gut passage time. *b.* - ✓ A **fixed factor** is one whose specific levels are deliberately chosen and are the levels of direct interest (conclusions apply only to those levels). Example: three specific herbicide concentrations selected by the researcher. - ✓ A **random factor** is one whose levels are a random sample from a broader population of possible levels, and the goal is to generalise to that population. Example: 10 randomly chosen study plots within a forest biome, representing all possible plots. *c.* - ✓ Pseudoreplication occurs when multiple measurements from the same experimental unit (or non-independent observations) are treated as independent replicates, artificially inflating the apparent sample size. - ✓ This is statistically problematic because it underestimates the true variance within treatments, inflates the *F*- or *t*-statistic, and dramatically inflates the Type I error rate — leading to false conclusions of significance. ::: `r if (params$hide_answers) ":::"` --- ## Question 2 — Data Visualisation (/5) a. When is a **boxplot** more informative than a bar chart showing mean ± standard error? **(/ 2)** b. What feature of a dataset does a **violin plot** reveal that a boxplot does not? **(/ 1)** c. Identify the five summary statistics encoded in a standard boxplot (Tukey box-and-whisker). **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 2** *a.* - ✓ A boxplot is more informative when the data are skewed or contain outliers — it directly shows the median, spread, skewness (via asymmetric box), and outlier positions that a mean ± SE bar chart obscures. - ✓ It is particularly useful when sample sizes are modest and the mean is not a robust summary (e.g., bimodal or heavy-tailed distributions), or when comparing distributions that differ in shape rather than just location. *b.* - ✓ A violin plot reveals the **full probability density** (distributional shape) of the data — including multimodality, skew, and the concentration of values at different levels — something a boxplot cannot show since it only summarises five quantiles. *c.* - ✓✓ The five summary statistics: **lower quartile** (Q1, 25th percentile), **median** (Q2, 50th percentile), **upper quartile** (Q3, 75th percentile), and the two **whisker endpoints** that extend to the most extreme values within 1.5 × IQR of Q1 and Q3. Points beyond the whiskers are plotted individually as **outliers**. ::: `r if (params$hide_answers) ":::"` --- ## Question 3 — Sampling Distributions and Uncertainty (/8) a. Define **standard error of the mean**. How does it differ from standard deviation, and what does each quantity describe? **(/ 3)** b. Explain what a **95% confidence interval** means. Correct a common misinterpretation: "there is a 95% chance that the true mean lies within this interval." **(/ 3)** c. How does increasing sample size affect the width of a confidence interval, and why? **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 3** *a.* - ✓ The **standard deviation (SD)** describes the spread of *individual observations* around the sample mean — it is a property of the raw data distribution. - ✓ The **standard error of the mean (SE)** = SD / √*n*, and describes the precision of the *sample mean as an estimate* of the population mean — it is a property of the sampling distribution of the mean. - ✓ As sample size increases, SE decreases (more data → more precise estimate of the mean) while SD remains approximately constant (it reflects inherent biological variability, not sampling effort). *b.* - ✓ A 95% CI is constructed so that **if we repeated the sampling procedure many times**, 95% of the intervals produced would contain the true population parameter. It is a statement about the *procedure*, not about any single interval. - ✓ The common misinterpretation is wrong because after computing one interval, the true mean is either inside it or it is not — there is no 95% probability associated with *this particular* interval. The 95% refers to the long-run frequency property of the method. - ✓ A correct interpretation: "We used a procedure that, in repeated sampling, produces intervals capturing the true mean 95% of the time; we have no basis for saying the probability is other than 0 or 1 for this specific interval." *c.* - ✓ Increasing sample size **narrows** the confidence interval, because SE = SD / √*n* decreases as *n* grows, and the CI half-width is approximately ±(*t*-critical × SE). - ✓ Intuitively: more data provides a more precise estimate of the population mean, so the uncertainty band around it contracts. ::: `r if (params$hide_answers) ":::"` --- ## Question 4 — One-Tailed vs Two-Tailed Tests (/6) a. Explain the difference between a one-tailed and a two-tailed hypothesis test. Under what circumstances is a one-tailed test scientifically justified? **(/ 3)** b. What is the primary statistical risk associated with choosing a one-tailed test *after* looking at the direction of the observed effect? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 4** *a.* - ✓ A **two-tailed test** places the rejection region in both tails of the null distribution, testing whether the effect could be *larger or smaller* than the null value (*H*~A~: μ~1~ ≠ μ~2~). - ✓ A **one-tailed test** places all rejection area in one tail, testing only one direction of departure (*H*~A~: μ~1~ > μ~2~, or *H*~A~: μ~1~ < μ~2~). - ✓ A one-tailed test is justified only when there is a strong, *a priori* mechanistic reason to predict the direction of the effect *before* data collection (e.g., a drug is known to be an inhibitor, so only a decrease in enzyme activity is biologically plausible). *b.* - ✓ If the direction is chosen *after* observing the data, the effective significance level is inflated: the researcher is implicitly running a two-tailed comparison but claiming the critical threshold of a one-tailed test (halving the reported *p*-value). - ✓ This constitutes a form of *p-hacking* and doubles the actual Type I error rate compared to the stated α — an observed one-tailed *p* = 0.04 is really equivalent to a two-tailed *p* = 0.08 if the direction was chosen post-hoc. - ✓ It also means the researcher cannot detect an effect in the opposite direction, which may be scientifically important (e.g., a drug producing unexpected stimulation rather than inhibition). ::: `r if (params$hide_answers) ":::"` --- ## Question 5 — *t*-tests (/7) a. Explain the conceptual difference between an **independent-samples** *t*-test and a **paired** *t*-test. Give a biological scenario where the paired design is more appropriate. **(/ 3)** b. Under what conditions should you use **Welch's *t*-test** rather than Student's *t*-test? **(/ 2)** c. What non-parametric test is the appropriate alternative when the assumptions of the independent-samples *t*-test cannot be met? **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 5** *a.* - ✓ An **independent-samples *t*-test** compares means from two separate, unrelated groups — the observations in one group have no pairing or correspondence with observations in the other. - ✓ A **paired *t*-test** compares means where each observation in one condition is logically matched to an observation in the other — differences are computed within pairs before testing, removing between-individual variation. - ✓ Biological scenario: measuring cortisol in the same individual fish *before* and *after* a stressor. Because both measurements come from the same fish, pairing removes individual-level baseline differences and increases power to detect the stress response. *b.* - ✓ Welch's *t*-test should be used when the two groups have **unequal variances** (heteroscedasticity), as it adjusts the degrees of freedom to account for this. - ✓ It is generally recommended as the default over Student's *t*-test even when variances appear similar, because it is valid under both equal and unequal variances with minimal power cost when variances happen to be equal. *c.* - ✓✓ The **Wilcoxon rank-sum test** (also called the Mann-Whitney U test) is the appropriate non-parametric alternative when normality or other assumptions of the independent-samples *t*-test are violated. It compares the distributions of two independent groups using ranked data. ::: `r if (params$hide_answers) ":::"` --- ## Question 6 — Selecting the Correct Statistical Test (/7) a. You are comparing body condition index (a continuous measure) across **three groups** of an endangered antelope species (wet-season migrants, dry-season migrants, and year-round residents). A Shapiro-Wilk test on each group shows the data are not normally distributed. What test would you use, and provide **three** reasons for your choice. **(/ 4)** b. You have data on the feeding rates of 15 individual fish, measured once under ambient light and once under reduced light. What information would you need to confirm before deciding between an independent-samples *t*-test and a paired *t*-test? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 6** *a.* - ✓ **Kruskal-Wallis rank-sum test** (the non-parametric analogue of one-way ANOVA). - ✓ Reason 1: There are **three independent groups**, requiring a test that can simultaneously compare more than two groups (ruling out pairwise *t*-tests or Wilcoxon rank-sum, which are two-sample methods). - ✓ Reason 2: The data are **not normally distributed** within groups (Shapiro-Wilk significant), violating a key assumption of one-way ANOVA. Kruskal-Wallis operates on ranks and requires no distributional assumption. - ✓ Reason 3: The response (body condition index) is continuous and the groups are independent (different individual animals), meeting the design requirements for Kruskal-Wallis. *b.* - ✓ You need to confirm whether the same 15 individual fish were measured under **both** light conditions (repeated measures / within-subject design) or whether the 30 measurements come from **different** fish assigned to each condition (between-subject design). - ✓ If the same fish were measured twice (paired), a paired *t*-test (or Wilcoxon signed-rank if non-normal) is appropriate and more powerful. - ✓ If different fish were used for each condition, an independent-samples *t*-test (or Wilcoxon rank-sum) is correct. Applying a paired test to unpaired data, or vice versa, is a design error that can produce invalid results. ::: `r if (params$hide_answers) ":::"` --- ## Question 7 — Polynomial Regression (/10) a. Explain why a polynomial regression model (e.g., $\hat{y} = \beta_0 + \beta_1 x + \beta_2 x^2$) is described as "linear in its parameters" even though it models a curved relationship. **(/ 3)** b. A researcher fits both a linear and a quadratic regression model to the same dataset. How would they decide which model is more appropriate? Describe **two** methods. **(/ 4)** c. What is **overfitting**, and what is one consequence of overfitting a polynomial model to biological data? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 7** *a.* - ✓ A model is **linear in its parameters** if each coefficient (β) enters the equation as a simple multiplier of a predictor term — the parameters are never raised to a power, multiplied together, or appear inside a nonlinear function. - ✓ In $\hat{y} = \beta_0 + \beta_1 x + \beta_2 x^2$: β~0~, β~1~, and β~2~ each enter linearly (as multipliers), even though *x*² introduces curvature. The variable *x*² is simply treated as a new predictor column in the design matrix — the model is still solved by ordinary least squares. - ✓ Nonlinear models (e.g., $\hat{y} = \beta_0 e^{\beta_1 x}$) are fundamentally different because the parameters appear inside a nonlinear function, requiring iterative optimisation rather than closed-form OLS. *b.* - ✓✓ **Residual diagnostic plots**: after fitting the linear model, plot residuals vs. fitted values. A systematic curved (U-shaped) pattern in the residuals suggests that a linear model is insufficient and a quadratic term may be warranted. If the quadratic model's residuals show no systematic pattern, it is preferred. - ✓✓ **Model comparison via AIC / adjusted *R*²**: the Akaike Information Criterion (AIC) penalises model complexity; the model with the lower AIC is preferred. Adjusted *R*² increases only when the added polynomial term improves fit more than expected by chance. An *F*-test comparing the two nested models can also be used: a significant *F* indicates the quadratic term adds explanatory value. *c.* - ✓ **Overfitting** occurs when a model is too complex relative to the data — it fits the noise of the sample rather than the underlying signal (true biological relationship). - ✓ A consequence is **poor predictive performance**: the overfitted polynomial may pass through every data point in the sample, but when used to predict new observations it produces large errors because the fitted curve conforms to idiosyncratic noise rather than the true relationship. - ✓ Additionally, overfitted models can produce biologically nonsensical predictions, such as extreme oscillations outside the observed range of the predictor (extrapolation artefacts). ::: `r if (params$hide_answers) ":::"` --- # Part B: Experiment Design and Hypothesis Formulation (25 marks) ## Question 8 — Cortisol Response to Acute Stress in Zebrafish (/12) A researcher measures plasma cortisol (ng mL⁻¹) in 20 individual zebrafish (*Danio rerio*) before and after a 5-minute confinement stressor. The same 20 fish are measured at both time points. The first six rows of the dataset are: ``` fish_id timepoint cortisol_ng_mL 1 1 before 12.4 2 2 before 15.1 3 3 before 11.8 4 1 after 28.7 5 2 after 31.2 6 3 after 25.9 ``` The research question is: *"Does acute confinement stress significantly increase plasma cortisol in zebrafish?"* a. Formulate appropriate null and alternative hypotheses. Note whether the alternative is directional (one-tailed) or non-directional (two-tailed) and justify this choice. **(/ 3)** b. Identify the most appropriate statistical test and give **three** reasons for your choice, with reference to the data structure. **(/ 5)** c. What aspect of normality would you check for this test specifically, and how would you check it? **(/ 2)** d. If the result is statistically significant, what additional information would you report to describe the biological magnitude of the effect? **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 8** *a.* - ✓ *H*~0~: The mean cortisol level does not change following confinement stress; the mean difference (after − before) = 0. - ✓ *H*~A~: The mean cortisol level is significantly **higher** after confinement stress than before; mean difference > 0. This is a **one-tailed** alternative. - ✓ A one-tailed test is justified here because the HPA/HPI axis stress physiology of teleost fish is well-established: confinement is known to activate cortisol release. A directional prediction is supported by strong *a priori* mechanistic knowledge, not by inspecting the data. *b.* - ✓ **Paired *t*-test** (or Wilcoxon signed-rank test if the paired differences are non-normal). - ✓ Reason 1: The **same individuals** are measured at both time points — each before-measurement is intrinsically linked to the after-measurement from the same fish. This is a within-subject (repeated-measures) design, requiring a paired analysis. - ✓ Reason 2: Using an independent-samples *t*-test would ignore this pairing, increasing residual variance (between-fish baseline differences) and reducing power. - ✓ Reason 3: The response variable (cortisol, ng mL⁻¹) is continuous and ratio-scaled, meeting the measurement-scale requirement for a parametric test. *c.* - ✓ For a paired *t*-test, the assumption of normality applies to the **paired differences** (after − before), not to the raw cortisol values. Calculate each difference (e.g., 28.7 − 12.4 = 16.3), then apply a Shapiro-Wilk test or examine a Q-Q plot of the 20 difference values. - ✓ If the differences are non-normal (especially with only 20 pairs), the **Wilcoxon signed-rank test** is the appropriate non-parametric alternative. *d.* - ✓ Report the **mean (or median) difference** in cortisol (e.g., mean increase of X ng mL⁻¹) alongside its 95% confidence interval, to convey the biological magnitude of the stress response. - ✓ Cohen's *d* (effect size) could also be reported, comparing the mean difference to the SD of the differences, to indicate whether the magnitude is small, medium, or large in practical terms. ::: `r if (params$hide_answers) ":::"` --- ## Question 9 — Plant Height Along a Salinity Gradient (/13) An agronomist measures the height (cm) of 50 individual salt-marsh plants (*Spartina alterniflora*) growing across a salinity gradient. Each plant's location is characterised by soil salinity (ppt). The first six rows of the dataset are: ``` plant_id salinity_ppt height_cm root_depth_cm 1 1 2.1 62.3 18.4 2 2 5.8 57.9 16.7 3 3 12.4 49.3 14.2 4 4 21.6 41.1 12.8 5 5 34.8 31.7 10.1 6 6 47.2 22.4 8.6 ``` The research aim is: *"To test whether soil salinity negatively predicts plant height in* S. alterniflora." a. Formulate formal null and alternative hypotheses. Is a one-tailed or two-tailed alternative more appropriate? Justify your answer. **(/ 3)** b. Identify the appropriate statistical test and give **three** reasons for this choice, referring to the nature of both variables and the aim. **(/ 4)** c. What would a **significant negative slope** tell you biologically? What would a **non-significant result** imply? **(/ 3)** d. The researcher also measured root depth (cm). Could root depth be included in the same statistical framework? If so, what type of model would this be, and what additional concern would arise? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 9** *a.* - ✓ *H*~0~: There is no linear relationship between soil salinity and plant height; the regression slope (β~1~) = 0. - ✓ *H*~A~: Soil salinity **negatively** predicts plant height; β~1~ < 0. This is a **one-tailed** alternative. - ✓ A one-tailed test is justified because osmotic stress physiology provides a strong mechanistic a priori prediction: increasing soil salinity increases the energy cost of osmoregulation and reduces water availability, both of which suppress shoot growth. The prediction of a *negative* slope is theoretically grounded before data collection. *b.* - ✓ **Simple linear regression**. - ✓ Reason 1: Both the predictor (salinity_ppt) and response (height_cm) are **continuous** ratio-scale variables. This rules out *t*-tests and ANOVA (categorical predictor) and points to regression. - ✓ Reason 2: The aim is explicitly to **predict** height from salinity and quantify the rate of change (slope), which is the purpose of regression (not merely correlation, which would only assess association strength). - ✓ Reason 3: The data preview shows a consistent decrease in height with salinity across 6 rows spanning 2–47 ppt, supporting the linearity assumption that is a prerequisite for simple linear regression. *c.* - ✓ A **significant negative slope** would indicate that increasing soil salinity is associated with a statistically reliable decrease in *S. alterniflora* height — consistent with osmotic stress limiting shoot elongation. The slope value itself (cm decrease per ppt increase) quantifies the effect size. - ✓ A **non-significant result** would imply insufficient evidence to conclude that salinity influences height over the observed range — this could mean the relationship is truly absent, or that the effect exists but the study lacks the power to detect it (Type II error), or that the relationship is non-linear (and thus not captured by a linear model). *d.* - ✓ Yes — root depth could be included as a **second continuous predictor** in a **multiple linear regression** model: `height_cm ~ salinity_ppt + root_depth_cm`. - ✓ The additional concern is **multicollinearity**: salinity and root depth both appear to decrease together in the data preview. If they are strongly correlated with each other, their effects cannot be estimated independently — coefficient estimates become unstable and standard errors inflate. This should be diagnosed using a **Variance Inflation Factor (VIF)**. - ✓ Including a collinear predictor may not improve explanatory power and can make biological interpretation of individual coefficients misleading. ::: `r if (params$hide_answers) ":::"` --- # Part C: Statistical Output Interpretation (25 marks) ## Question 10 — Simple Linear Regression Summary (/12) A researcher models the relationship between soil salinity (ppt) and plant height (cm) in *S. alterniflora*. The `lm()` output is: ``` Call: lm(formula = height_cm ~ salinity_ppt, data = spartina) Residuals: Min 1Q Median 3Q Max -6.312 -1.874 0.102 1.993 5.876 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 64.8312 1.4231 45.55 < 2e-16 *** salinity_ppt -0.8847 0.0512 -17.28 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.641 on 48 degrees of freedom Multiple R-squared: 0.8614, Adjusted R-squared: 0.8585 F-statistic: 298.6 on 1 and 48 DF, p-value: < 2.2e-16 ``` a. Write the equation of the fitted regression line. **(/ 2)** b. Interpret the slope coefficient (−0.8847) in biological terms. **(/ 2)** c. What does *R*² = 0.8614 mean? Why is the adjusted *R*² (0.8585) slightly lower? **(/ 3)** d. The residuals range from −6.312 to 5.876. What do these values represent, and are there any obvious concerns from this summary? **(/ 3)** e. What do the `***` significance codes next to both coefficients indicate, and what is the *p*-value threshold they correspond to? **(/ 2)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 10** *a.* - ✓✓ $\hat{height} = 64.831 - 0.885 \times salinity\_ppt$ (Accept coefficient values rounded to 2–3 decimal places.) *b.* - ✓ For every 1 ppt increase in soil salinity, plant height is predicted to **decrease** by approximately 0.88 cm. - ✓ This negative slope is consistent with the hypothesis that saline stress limits shoot elongation — plants in saltier soils are reliably shorter than those in fresher conditions. *c.* - ✓ *R*² = 0.8614 means that **86.14% of the total variation in plant height** is explained by variation in soil salinity. The model accounts for the vast majority of height differences among plants. - ✓ The adjusted *R*² (0.8585) is slightly lower because it penalises the model for the number of predictors relative to sample size — with only one predictor and 50 observations the penalty is small, but adjusted *R*² always decreases relative to *R*² to avoid inflated estimates from adding irrelevant predictors. *d.* - ✓ The residuals are the differences between each observed height and the height predicted by the model ($e_i = y_i - \hat{y}_i$). They represent variation in height that is *not* explained by salinity. - ✓ The maximum residual (5.876 cm) and minimum (−6.312 cm) are moderately larger than a quarter of the data spread (IQR roughly −1.874 to 1.993), suggesting at least one or two individual plants deviate more strongly from the fitted line. This is not alarming but warrants checking the residuals-vs-fitted plot for any influential outliers or systematic patterns. - ✓ There is no strong asymmetry between the Min (−6.312) and Max (5.876) residuals, suggesting broadly symmetric residuals around zero — consistent with the normality assumption. *e.* - ✓ `***` indicates a *p*-value < 0.001: the probability of observing a slope (or intercept) this large or larger under the null hypothesis is less than 0.1%. - ✓ Both the intercept and the slope are highly significant — we have very strong evidence that the intercept differs from zero and that the slope of salinity on height is non-zero, i.e., that the relationship exists. ::: `r if (params$hide_answers) ":::"` --- ## Question 11 — Kruskal-Wallis Test and Post-hoc Comparisons (/13) A conservation ecologist compares the diversity index (Shannon H') of bird communities across four habitat types: Forest, Grassland, Wetland, and Urban. Thirty plots per habitat were surveyed. The diversity data were non-normally distributed. The following output was produced: ``` Kruskal-Wallis rank sum test data: diversity_index by habitat_type Kruskal-Wallis chi-squared = 17.843, df = 3, p-value = 0.000477 Pairwise comparisons using Dunn test (Bonferroni adjustment): Forest Grassland Wetland Grassland 0.0012 - - Wetland 0.3412 0.0087 - Urban 0.0001 0.2341 0.0002 ``` a. State the null hypothesis evaluated by the Kruskal-Wallis test. **(/ 2)** b. What does `df = 3` tell you about the experimental design? **(/ 2)** c. Interpret the *p*-value = 0.000477 at *α* = 0.05. **(/ 2)** d. Based on the Dunn post-hoc test, identify **all pairs** of habitat types that differ significantly, and **all pairs** that do not. **(/ 4)** e. Why was the Kruskal-Wallis test used rather than one-way ANOVA, and what is one advantage and one disadvantage of the non-parametric approach? **(/ 3)** `r if (params$hide_answers) "::: {.content-hidden}"` ::: {.callout-tip appearance="simple"} **Model Answer — Question 11** *a.* - ✓ *H*~0~: The distribution of bird diversity indices is identical across all four habitat types — there is no systematic difference in diversity among Forest, Grassland, Wetland, and Urban habitats. - ✓ Equivalently: the median diversity index does not differ among habitat types (Kruskal-Wallis tests for differences in rank distributions, often interpreted as differences in medians). *b.* - ✓ `df = 3` indicates that there are **4 habitat groups** (k − 1 = 4 − 1 = 3 degrees of freedom). - ✓ This is consistent with the four-level categorical factor (Forest, Grassland, Wetland, Urban) described in the experimental setup. *c.* - ✓ *p* = 0.000477 ≪ *α* = 0.05, so we **reject *H*~0~**. There is strong statistical evidence that bird diversity is not uniformly distributed across the four habitat types. - ✓ The probability of observing a chi-squared statistic this extreme (17.843) by chance, if all habitats had the same distribution, is only 0.05% — a highly significant result. *d.* Significant pairs (Bonferroni-adjusted *p* < 0.05): - ✓ Forest vs. Grassland (*p* = 0.0012) — significantly different. - ✓ Grassland vs. Wetland (*p* = 0.0087) — significantly different. - ✓ Forest vs. Urban (*p* = 0.0001) — significantly different. - ✓ Wetland vs. Urban (*p* = 0.0002) — significantly different. Non-significant pairs (*p* > 0.05): - Forest vs. Wetland (*p* = 0.3412) — not significantly different. - Grassland vs. Urban (*p* = 0.2341) — not significantly different. (1 mark per pair correctly classified, max 4 marks.) *e.* - ✓ Kruskal-Wallis was used because the data failed the normality assumption — ANOVA requires approximately normal distributions within groups. - ✓ **Advantage**: robust to non-normality; makes no distributional assumptions; applicable to ordinal data or data with outliers. - ✓ **Disadvantage**: it has lower statistical power than ANOVA when the data *are* normally distributed (ranks discard some information about the magnitude of differences); post-hoc testing options are more limited; it does not directly estimate group means or effect sizes. ::: `r if (params$hide_answers) ":::"` --- *End of Version 2*