Biostatistics R Exam (Example)

A. J. Smit

2025-05-30

About the Exam

The Biostatistics Exam will start at 8:30 on 30 May, 2025 and you have until 8:30 on 31 May, 2025 to complete it. This exam may be conducted anywhere in the world, and it will contribute 70% of the final assessment marks for the Biostatistics component of the module.

Assessment Criteria

Your responses will be evaluated based on the following criteria:

  1. Technical Accuracy (50%)
  2. Depth of Analysis (20%)
  3. Clarity and Communication (20%)
  4. Critical Thinking Shown in Final Conclusion/Synopsis (10%)

General Notes to Assessor (applies to all tasks):

The marks indicated for each task reflect the relative weight and expected depth of your response. Focus on demonstrating both technical proficiency and conceptual understanding in your answers.

Instructions

This is the open book assessment.

You must address all tasks in the allocated time of 24-hr. Please submit your answers in a neatly formatted .html document (produced from a Quarto document in RStudio) and submit it to the iKamva platform.

Clearly structure the document according to the task numbers, i.e., use appropriately hierarchical headings, subheadings, and sub-subheadings to structure your document logically.

Naming convention: Biostatistics_Prac_Exam_YourSurname.html

Background

These data represent the aerial cover of kelp canopy in South Africa, as measured by Landsat satellites, for the period 1984 to 2024 at a quarterly interval. The intention is to understand the spatio-temporal patterns in kelp canopy cover and to explore how these patterns may be related to coastal sections and biogeographical provinces.

You are provided with two datasets at the Google Drive link emailed to you:

  1. A table of 58 coastal sections (58_sections.csv) that partitions the South African coastline into approximately 50 km intervals. Each section is defined by a single coordinate point (latitude, longitude) representing the boundary of the section.
  2. A table of the biogeographical provinces (bioregions.csv) that the 58 coastal sections fall within. There is one row for each of the 58 sections. For this exercise, the biogeographical classification by Professor John Bolton is of interest.
  3. A netCDF file (kelpCanopyFromLandsat_SouthAfrica_v04.nc) of kelp sampling locations and aerial cover data – these are presented as various variables at grid points across time.

Task 1: Initial Processing

You are provided with a NetCDF file that contains satellite-derived measurements of kelp canopy area across the South African coastline from 1984 to 2024, sampled quarterly. Each observation corresponds to a grid cell at a specific time point.

  1. Read the kelp canopy area, time, location (latitude/longitude), and satellite pass data from the NetCDF file. Once unpacked, it contains over 5 million rows. Your processing workflow will include:
  1. Restructure the data into a data.table or data.frame:

If you are unable to read the NetCDF file, you may request access to a processed version of this file (in long CSV format) from me, but you’ll be penalised by 10% if you do so.

Task 2: Exploratory Data Analysis

2.1 Weighted Mean Time Series

  1. For each year and quarter combination:
  1. Compute the weighted mean area at each unique (longitude, latitude) pixel across time. Then:

2.2 Summary Statistics

  1. Using the weighted data prepared for each year and quarter combination (prepared in 2.1.1), compute and report summary statistics for the levels of temporal aggregation:
  1. Create visualisations (e.g. boxplots, violin plots, histograms) to support your interpretations.
  1. Based on these, discuss any discernible temporal trends (e.g. decadal increases/decreases) and seasonal patterns (quarterly effects).

2.3 Observation Density Map

Create a map plotting each observed pixel location (defined by longitude × latitude):

Task 3: Inferential Statistics (Part 1)

You are now asked to formally test whether the weighted mean kelp canopy area has changed over time, and whether it shows evidence of seasonal variation.

You should:

  1. Formulate and clearly state the null and alternative hypotheses for each of the following:
  1. Choose and implement a statistical model appropriate to this task.

You may also consider:

The model you choose should reflect your understanding of the data structure and the nature of the questions being asked.

  1. Justify your modelling approach, including:
  1. Present and interpret your results as you would in a scientific paper.

Task 4: Assigning Kelp Observations to Coastal Sections

Using the data prepared above, your task now is to spatially classify each kelp canopy observation by assigning it to two types of geographic units.

4.1 Assignment to Coastal Sections

You are provided with a table of 58 coastal sections, each defined by a single geographic coordinate (Latitude and Longitude). These points mark successive ~50 km intervals along the South African coastline, numbered from west (1) to east (58).

Assign each kelp canopy observation to the nearest coastal section based on geographic proximity:

4.2 Assignment to Biogeographical Provinces

You are also provided with a table that maps each coastal section (1–58) to a biogeographical province, based on a classification by Professor John Bolton.

Task 5: Inferential Statistics (Part 2)

You are now asked to evaluate a series of research questions concerning the spatial and temporal structure of kelp canopy area. These questions are to be answered using the kelp dataset that has already been processed to include both section_id and bioregion_id. Use the weighted kelp canopy area (area, weighted by passes) as your response variable throughout – you should have already prepared this dataset in Task 2.

You may use ANOVAs and/or linear models. In each case you must clearly state your hypotheses, justify your choice of model, and interpret your findings both statistically and ecologically.

5.1 Spatial Differences Between Coastal Sections

Question: Is there a statistically significant difference in mean kelp canopy area between coastal sections?

5.2 Spatial Differences Between Biogeographical Provinces

Question: Is there a statistically significant difference in mean kelp canopy area between biogeographical provinces?

5.3 Interaction Between Section and Province

Question: Is there an interaction between coastal section and biogeographical province in explaining variation in kelp canopy area?

5.4 Linear Trend Over Time by Province

Question: Is there a linear trend in kelp canopy area over time, and does the direction or strength of this trend differ between biogeographical provinces?

5.5 Seasonal Variation Across Provinces

Question: Does the seasonal pattern in kelp canopy area differ between provinces?

General Instructions for Task 5 (above)

For each sub-question, above, consider:

You are not required to use the same modelling approach for all five sub-questions, though consistency across related questions is encouraged.

Task 6: Write-up

Write a short report (maximum 2 pages of text) that synthesises your findings across Tasks 2 through 5. This report should be written in the style of the Discussion section of a scientific paper, intended for an ecological audience.

Your goal is to interpret the major patterns and relationships you have identified, and to comment meaningfully on their ecological significance. Your write-up should include:

Format and tone: