BCB744: Biostatistics R Test

Published

April 25, 2025

About the Test

The Biostatistics Test will start at 8:30 on 25 April, 2025 and you have until 11:30 to complete it. This is the Theory Test, which must be conducted on campus. The theory component contributes 30% of the final assessment marks.

Assessment Policy

The marks indicated for each section reflect the relative weight (and hence depth expected in your response) rather than a rigid check-list of individual points. Your answers should demonstrate a comprehensive understanding of the concepts and techniques required. Higher marks will be awarded for narratives that demonstrate not only conceptual and theoretical correctness but also insightful discussion and clear communication of insights or findings. We are assessing your ability to think systematically through complex inquiries, make appropriate theoretical and methodological choices, and present feedback in a coherent narrative that reveals deep understanding.

Please refer to the Assessment Policy for more information on the test format and rules.

Theory Test

This is the closed book assessment.

Below is a set of questions to answer. You must answer all questions in the allocated time of 3-hr. Please write your answers in a neatly formatted Word document and submit it to the iKamva platform.

Clearly indicate the question number and provide detailed explanations for your answers. Use Word’s headings and subheadings facility to structure your document logically.

Naming convention: Biostatistics_Theory_Test_YourSurname.docx

Question 1

Imagine you are presented with the following five research scenarios (see below). In each case, your task is to decide which statistical method would be most appropriate and to justify your reasoning.

For each of the five scenarios below:

  1. Identify the appropriate statistical method.
  2. Explain why this method is more suitable than the others listed.
  3. Clearly identify the dependent and independent variables (where applicable), and describe their type (categorical, continuous, etc.).
  4. Describe what the method would allow you to infer, and what its limitations might be in the given context.

Scenarios:

  1. A researcher wants to compare average leaf nitrogen content between two plant species growing in the same habitat.
  2. An ecologist is interested in whether water temperature predicts fish body size across multiple river sites.
  3. A conservation biologist is comparing average bird abundance across five habitat types, while also accounting for altitude which is known to influence bird detection rates.
  4. A physiologist wants to explore whether heart rate and body temperature are linearly associated in a sample of animals under heat stress conditions.
  5. A botanist tests whether fertiliser type (3 levels: organic, inorganic, control) affects plant height, but only has access to a small sample from each group.

[20 marks]

Answer

Scenario 1:

  1. Independent (two-sample) t-test (or Mann-Whitney U test if data are not normally distributed).
  2. The t-test is appropriate for comparing means between two independent groups (species). The Mann-Whitney U test is a non-parametric alternative that does not assume normality.
  3. Dependent variable: leaf nitrogen content (continuous); independent variable: plant species (categorical).
  4. The t-test allows for inference about differences in means, but is sensitive to normality and equal variance assumptions. The Mann-Whitney U test is less sensitive to these assumptions but does not provide mean differences (differences based on ranks).

Scenario 2:

  1. Linear regression analysis.
  2. Linear regression is suitable for assessing the relationship between a continuous dependent variable (fish body size) and a continuous independent variable (water temperature).
  3. Dependent variable: fish body size (continuous); independent variable: water temperature (continuous).
  4. Linear regression allows for inference about the strength and direction of the relationship, but assumes linearity and homoscedasticity. It may not capture non-linear relationships or interactions.

Scenario 3:

  1. Analysis of covariance (ANCOVA).
  2. ANCOVA is appropriate for comparing means across multiple groups (habitat types) while controlling for a covariate (altitude).
  3. Dependent variable: bird abundance (continuous); independent variable: habitat type (categorical); covariate: altitude (continuous).
  4. ANCOVA allows for inference about group differences while accounting for the influence of altitude, but assumes homogeneity of regression slopes and normality of residuals.

Scenario 4:

  1. Linear regression analysis.
  2. Linear regression is suitable for exploring the relationship (often causal) between two continuous variables (heart rate and body temperature).
  3. Dependent variable: heart rate (continuous); independent variable: body temperature (continuous).
  4. Linear regression allows for inference about the strength and direction of the relationship, but assumes linearity and homoscedasticity. It may not capture non-linear relationships or interactions.

Scenario 5:

  1. One-way ANOVA (or Kruskal-Wallis test if data are not normally distributed).
  2. One-way ANOVA is appropriate for comparing means across three or more independent groups (fertiliser types). The Kruskal-Wallis test is a non-parametric alternative that does not assume normality.
  3. Dependent variable: plant height (continuous); independent variable: fertiliser type (categorical).
  4. One-way ANOVA allows for inference about differences in means across groups, but assumes normality and homogeneity of variances. The Kruskal-Wallis test is less sensitive to these assumptions but does not provide mean differences (differences based on ranks).

Question 2

Science does not rely on certainty but on scepticism and structured doubt. Its premise is not the claim to final truth; rather, it has the capacity to generate reliable, revisable knowledge through empirical observation, theoretical coherence, and methodological transparency.

In contrast, faith-based systems appeal to revelation, authority, or moral intuition – forms of conviction that do not invite or value independent verification. Yet both systems organise trust. What, then, distinguishes scientific knowledge from belief? What makes the scientific method a unique epistemological endeavour?

Question: What is the basis of knowledge in the scientific method, and how does this differ from the basis of knowledge in faith-based systems such as religion or mysticism? In your answer, consider the roles of observation, verification, theoretical coherence, and error correction in scientific reasoning, and contrast these with how knowledge is “made real” in non-empirical approaches.

[15 marks]

Answer

Question 3

Throughout history, the development of statistical reasoning has been shaped not just by mathematical discoveries, but by synergies across intellectual traditions, technological innovation, and societal imperatives. From ancient record-keeping and proto-quantification, through the epistemic insights of the Renaissance and Enlightenment, to the formalisation of probabilistic thinking, statistics has evolved alongside shifting ideas about what it means to know, to measure, and to infer.

Question: How have historical interactions between these forces – ideas, instruments, and institutions – shaped the philosophy underpinning statistical practice as we know it today? In your response, identify and critically examine what you consider, with justification, to be five major conceptual or methodological turning points. These may include developments in logical reasoning, technological breakthroughs that extended observational capacity, institutional needs for demographic governance, or shifts in philosophical approaches to uncertainty and knowledge.

Your analysis should not simply recount historical facts, but provide a reasoned argument about how each moment contributed to the emergence of statistics as a knowledge framework – that is, not just a set of techniques, but a way of thinking about the world.

[20 marks]

Answer

Question 4

Statistical reasoning begins with our wish to learn about something large and often inaccessible by examining something smaller and manageable. The credibility of this approach – from observed data to broader inference – depends on how we conceptualise and structure the relationship between what we observe and what we want to know.

This question asks that you examine the important terms and principles that make this act of inference possible.

Question: What do statisticians mean by “population” and “sample”? Define each term clearly, and explain the distinction between them. How are they related in practice, and how does the method of sampling affect the validity of estimates for population parameters such as the mean and dispersion? Support your discussion with examples where appropriate.

[10 marks]

Answer

Question 5

Words shape our thoughts, and nowhere is this more consequential than in science, where terminological precision goes hand-in-hand with conceptual clarity. Statistical terms like “random” or “stochastic” carry specific meanings in the context of probabilistic logic and mathematical formalism. Yet in everyday language, such terms are often misused. They are flattened into colloquialisms that only hint at their true meaning. This insidious slippage is more than semantic; it has consequences for how we value knowledge.

Why does it matter if “random” is used imprecisely? How do scientific concepts become confused, or even trivialised, when technical language is absorbed into everyday language without regard for its analytic structure?

Question: Discuss the scientific meaning of “random” and contrast it with its colloquial usage. Why is this distinction important for statistical reasoning, and how can imprecise language lead to conceptual misunderstandings? In your answer, consider how terms like “haphazard” and “unpredictable” differ from “random,” and evaluate the knowledge implications of using such words loosely in scientific or public discourse.

[10 marks]

Answer

Question 6

Your task is to design a hypothetical study that could lead to a statistical analysis using one of the following methods:

  • One-way ANOVA
  • Simple linear regression
  • Pearson or Spearman correlation

Your study may involve field sampling, a laboratory experiment, or observational data – what matters is that your design aligns meaningfully with the statistical method you choose.

In your answer, do the following:

  1. Describe your hypothetical experiment or sampling campaign.
    • Outline what you are investigating, how data will be collected, and what your variables are. Be clear about their measurement scale (categorical, continuous) and expected behaviour.
    • Present this as a formally written Methods section suitable for a peer-review publication.
  2. Justify the statistical method you have chosen.
    • Explain why your design is appropriate for ANOVA, regression, or correlation.
  3. Formally state the null and alternative hypotheses as they would be tested in the chosen analysis.
  4. Show a portion of the pseudo-data as one would see using the head() or tail() functions in R.
    • This should be a small, representative sample of the data you would collect.
  5. Describe the sequence of analytical steps you would take – from raw data to final conclusion.
    • Include any relevant assumptions, diagnostic checks, or transformations that may be required before interpreting the results.
  6. Write a hypothetical Results section that summarises the findings of your analysis.
    • This should include a brief interpretation of the statistical output, including relevant pseudo-tables or pseudo-figures.

Your answer should reflect an understanding of the logic and structure of statistical inference, from design to decision. You are welcome to use R and RStudio to generate any data, tables, and graphs, should you wish.

[25 marks]

Answer

TOTAL MARKS: 100

– THE END –

Reuse

Citation

BibTeX citation:
@online{smit,_a._j.2025,
  author = {Smit, A. J.,},
  title = {BCB744: {Biostatistics} {R} {Test}},
  date = {2025-04-25},
  url = {http://tangledbank.netlify.app/assessments/BCB744_Biostats_Theory_Test_2025.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit, A. J. (2025) BCB744: Biostatistics R Test. http://tangledbank.netlify.app/assessments/BCB744_Biostats_Theory_Test_2025.html.