We use cookies

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

The Tangled Bank
  • Home
  • Vignettes
    • Dates From netCDF Files: Two Approaches
    • Downloading and Prep: ERDDAP
    • Retrieving Chlorophyll-a Data from ERDDAP Servers
    • Downloading and Prep: Python + CDO
    • Detecting Events in Gridded Data
    • Regridding gridded data
    • Displaying MHWs and MCSs as Horizon Plots
    • heatwaveR Issues
    • Plotting the Whale Sightings and Chlorophyll-a Concentrations
    • Spatial Localisation, Subsetting, and Aggregation of the Chlorophyll-a Data
    • Wavelet Analysis of Diatom Time Series
    • Extracting gridded data within a buffer
  • High-Performace Computing
    • Using Lengau
    • PBSPro user and job management
    • Using tmux
  • Tangled Bank Blog
  • About
  • Blogroll
    • R-bloggers
    • Source Code
    • Report a Bug

On this page

  • Honesty Pledge
  • Format and mode of submission
  • Style and organisation
  • Questions
    • Question 1
    • Question 2
    • Question 3
    • Question 4
    • Question 5
    • Question 6
    • Question 7
  • Submission instructions
  • Edit this page
  • Report an issue

BCB744: End-of-Intro-R Assessment

Author
Affiliation

Smit, A. J.

University of the Western Cape

Published

Invalid Date

Honesty Pledge

This assignment requires that you work as an individual and not share your code, results, or discussion with your peers. Penalties and disciplinary action will apply if you are found cheating.

Acknowledgement of the Pledge

Copy the statement, below, into your document and replace the underscores with your name acknowledging adherence to the UWC’s Honesty Pledge.

I, ____________, hereby state that I have not communicated with or gained information in any way from my peers and that all work is my own.

Format and mode of submission

This Assignment requires submission as both a Quarto (.qmd) file and the knitted .html product. You are welcome to copy any text from here to use as headers or other pieces of informative explanation to use in your Assignment.

Style and organisation

As part of the assessment, we will look for a variety of features, including, but not limited to the following:

  • Content:
    • Questions answered in order
    • A written explanation of approach included for each question
    • Appropriate formatting of text, for example, fonts not larger than necessary, headings used properly, etc. Be sensible and tasteful.
  • Code formatting:
    • Use Tidyverse code
    • No more than ~80 characters of code per line (pay particular attention to the comments)
    • Application of R code conventions, e.g. spaces around <-, after #, after ,, etc.
    • New line for each dplyr function (lines end in %>%) or ggplot layer (lines end in +)
    • Proper indentation of pipes and ggplot() layers
    • All chunks labelled without spaces
    • No unwanted / commented out code left behind in the document
  • Figures:
    • Sensible use of themes / colours
    • Publication quality
    • Informative and complete titles, axes labels, legends, etc.
    • No redundant features or aesthetics

Questions

Question 1

The shells.csv data

  • Produce a tidy dataset from the data contained in shells.csv.
  • For each species, relate two measurement variables within the dataset to one-another and represent the relationship with a straight line.
  • For each species, concisely produce histograms for each of the measurement variables.
  • Use the colorspace package and assign interesting colours to your graphs (all graphs above).
  • Use the ggthemr package and assign interesting themes to your graphs (all graphs above).

Question 2

Head Dimensions in Brothers

The boot::frets data: The data consist of measurements of the length and breadth of the heads of pairs of adult brothers in 25 randomly sampled families. All measurements are expressed in millimetres.

Please consult the dataset’s help file (i.e., load the package boot package and type ?frets on the command line).

  • Create a tidy dataset from the frets data.
  • Demonstrate the most concise way for displaying both brother’s data on one set of axes.
  • Apply your own unique theme modification to the graph in order to produce a publication-worthy figure.

Question 3

Results from an Experiment on Plant Growth

The datasets::PlantGrowth data: Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

  • Concisely present the results of the plant growth experiment as graphs:
    • a scatterplot with individual weight datapoints as a function of group
    • a box and whisker plot showing each group (on one set of axes)
    • a bar plot with associated SD for each group (on one set of axes)

Question 4

Student’s Sleep Data

The datasets::sleep data: Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

  • Graphically display these data in two different ways.

Question 5

English Narrative for Some Code

  • Provide an English description for what the following lines of code does.

Listing 1

the_data <- some_data %>%
  mutate(yr = year(date),
         mo = month(date)) %>% 
  group_by(country, yr) %>% 
  summarise(med_chl = mean(chl, na.rm = TRUE)) %>% 
  ungroup()
ggplot(the_data, aes(x = yr, y = med_chl)) +
  geom_line(aes(group = country), colour = "blue3") +
  facet_wrap(~country, nrow = 3) +
  labs(x = "Year", y = "Chlorophyll-a (mg/m3)",
       title = "Chlorophyll-a concentration")

Listing 2

library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
    geom_point() +
    facet_zoom(x = Species == "versicolor")

Listing 3

set.seed(13)
my_data = data.frame(
        gender = factor(rep(c("F", "M"), each=200)),
        length = c(rnorm(200, 55), rnorm(200, 58)))
head(my_data)

ggplot(my_data, aes(x = gender, y = length)) +
  geom_boxplot(aes(fill = gender))

ggplot(my_data, aes(x = gender, y = length)) +
  geom_violin()

ggplot(my_data, aes(x = gender, y = length)) +
  geom_dotplot(stackdir = "center", binaxis = "y", dotsize = 0.5)

Question 6

Create panels of plots

  • For this exercise, you’ll be expected to accomplish Parts 1, 2 and 3 before producing the final output in Part 4.
  • Considerations:
    • take care to use the most appropriate geom considering the nature of the data
    • creatively modify the graph’s appearance (but remain sensible and be cognisant of which aesthetics are suitable for publications!)

Part 1

The datasets::AirPassengers data

  • Create a plot of the monthly totals of international airline passengers, 1949 to 1960.
  • Construct a figure showing the annual number of airline passengers (±SE) from 1949-1960.

Part 2

The datasets::Loblolly and the datasets::Orange data

These are some data collected from two kinds of trees at different ages.

  • Devise a figure with a two-panel 2 x 1 (rows x columns) layout showing:
    • the relationship between age and height independently for each seed source for the Loblolly data
    • the relationship between age and circumference for each tree

Part 3

Your ‘own’ data

  • Find your own dataset (one that has not been used in this Assessment or earlier in the BCB744 module) and create a pair of faceted figures of your choice.
  • Provide an explanation of what you aim to show, and what the figure ultimately tells you.

Part 4

The last steps

  • Assemble all graphs (Parts 1-3) into a 2 x 2 layout using a suitable function provided by an appropriate R package. Note that only three of the four facets will be occupied by the figures you created in Parts 1-3.

Question 7

The datasets::UKDriverDeaths and datasets::Seatbelts datasets

These datasets are meant to be used together—UKDriverDeaths has the same data as is provided in the variable drivers in seatbelts, but it also provides information about the temporal structure of the Seatbelts dataset. You will have to devise a way to use this temporal information in your analysis.

  • Produce a dataframe that combines the temporal information provided in UKDriverDeaths with the other information in Seatbelts.
  • Produce a faceted graph (using facet_wrap(), placing drivers, front, rear, and VanKilled in facets) showing a timeline of monthly means of deaths (means taken across years) whilst distinguishing between the two levels of law.
  • What do you conclude from your analysis?

Submission instructions

Submission instructions

Submit your .qmd and .html files wherein you provide answers to these Questions by no later than 6 March 2024 at 16:00.

Label the files as follows:

  • BCB744_<first_name>_<last_name>_Intro_R_Assessment.qmd, and

  • BCB744_<first_name>_<last_name>_Intro_R_Assessment.html

(the < and > must be omitted as they are used in the example as field indicators only).

Failing to follow these instructions carefully, precisely, and thoroughly will cause you to lose marks, which could cause a significant drop in your score as formatting counts for 15% of the final mark (out of 100%).

Submit your Tasks on the Google Form when ready.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:
@online{smit,_a._j.,
  author = {Smit, A. J.,},
  title = {BCB744: {End-of-Intro-R} {Assessment}},
  date = {},
  url = {http://tangledbank.netlify.app/assessments/BCB744_Mid_Assessment_2023.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit, A. J. BCB744: End-of-Intro-R Assessment. http://tangledbank.netlify.app/assessments/BCB744_Mid_Assessment_2023.html.
Source Code
---
title: "BCB744: End-of-Intro-R Assessment"
subtile: "Final Intro R Assessment"
date: "Due: 4 March 2023 at 16:00"
format:
  html:
    code-fold: false
    toc-title: "On this page"
    standalone: true
    toc-location: right
    page-layout: full
    embed-resources: true
number-sections: false
---



## Honesty Pledge

**This assignment requires that you work as an individual and not share
your code, results, or discussion with your peers. Penalties and
disciplinary action will apply if you are found cheating.**

::: callout-note
## Acknowledgement of the Pledge

Copy the statement, below, into your document and replace the
underscores with your name acknowledging adherence to the UWC's Honesty
Pledge.

**I, \_\_\_\_\_\_\_\_\_\_\_\_, hereby state that I have not communicated
with or gained information in any way from my peers and that all work is
my own.**
:::

## Format and mode of submission

This Assignment requires submission as both a Quarto (.qmd) file and the
knitted .html product. You are welcome to copy any text from here to use
as headers or other pieces of informative explanation to use in your
Assignment.

## Style and organisation

As part of the assessment, we will look for a variety of features,
including, but not limited to the following:

-   Content:
    -   Questions answered in order
    -   A written explanation of approach included for each question
    -   Appropriate formatting of text, for example, fonts not larger
        than necessary, headings used properly, etc. Be sensible and
        tasteful.
-   Code formatting:
    -   Use **Tidyverse** code
    -   No more than \~80 characters of code per line (pay particular
        attention to the comments)
    -   Application of [R code
        conventions](http://adv-r.had.co.nz/Style.html), e.g. spaces
        around `<-`, after `#`, after `,`, etc.
    -   New line for each `dplyr` function (lines end in `%>%`) or
        `ggplot` layer (lines end in `+`)
    -   Proper indentation of pipes and `ggplot()` layers
    -   All chunks labelled without spaces
    -   No unwanted / commented out code left behind in the document
-   Figures:
    -   Sensible use of themes / colours
    -   Publication quality
    -   Informative and complete titles, axes labels, legends, etc.
    -   No redundant features or aesthetics

```{r load-packages, message=FALSE}
#| echo: false
#| eval: false
library(tidyverse)
library(ggpubr)
```

## Questions

### Question 1

**The [`shells.csv`](../data/shells.csv) data**

* Produce a *tidy* dataset from the data contained in [`shells.csv`](../data/shells.csv).
* For each species, relate two measurement variables within the dataset to one-another and represent the relationship with a straight line.
* For each species, concisely produce histograms for each of the measurement variables.
* Use the **colorspace** package and assign interesting colours to your graphs (all graphs above).
* Use the **ggthemr** package and assign interesting themes to your graphs (all graphs above).

### Question 2

**Head Dimensions in Brothers**

The `boot::frets` data: The data consist of measurements of the length and breadth of the heads of pairs of adult brothers in 25 randomly sampled families. All measurements are expressed in millimetres.

Please consult the dataset's help file (i.e., load the package **boot** package and type `?frets` on the command line).

* Create a *tidy* dataset from the `frets` data.
* Demonstrate the most concise way for displaying both brother's data on one set of axes.
* Apply your own unique theme modification to the graph in order to produce a publication-worthy figure.

### Question 3

**Results from an Experiment on Plant Growth**

The `datasets::PlantGrowth` data: Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

* Concisely present the results of the plant growth experiment as graphs:
    - a scatterplot with individual `weight` datapoints as a function of `group`
    - a box and whisker plot showing each `group` (on one set of axes)
    - a bar plot with associated SD for each `group` (on one set of axes)
    
### Question 4

**Student's Sleep Data**

The `datasets::sleep` data: Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

* Graphically display these data in two different ways.

### Question 5

**English Narrative for Some Code**

* Provide an English description for what the following lines of code does.

#### Listing 1

```{r, eval=FALSE}
the_data <- some_data %>%
  mutate(yr = year(date),
         mo = month(date)) %>% 
  group_by(country, yr) %>% 
  summarise(med_chl = mean(chl, na.rm = TRUE)) %>% 
  ungroup()
```

```{r, eval=FALSE}
ggplot(the_data, aes(x = yr, y = med_chl)) +
  geom_line(aes(group = country), colour = "blue3") +
  facet_wrap(~country, nrow = 3) +
  labs(x = "Year", y = "Chlorophyll-a (mg/m3)",
       title = "Chlorophyll-a concentration")
```

#### Listing 2

```{r, eval=FALSE}
library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
    geom_point() +
    facet_zoom(x = Species == "versicolor")
```

#### Listing 3

```{r, eval=FALSE}
set.seed(13)
my_data = data.frame(
        gender = factor(rep(c("F", "M"), each=200)),
        length = c(rnorm(200, 55), rnorm(200, 58)))
head(my_data)

ggplot(my_data, aes(x = gender, y = length)) +
  geom_boxplot(aes(fill = gender))

ggplot(my_data, aes(x = gender, y = length)) +
  geom_violin()

ggplot(my_data, aes(x = gender, y = length)) +
  geom_dotplot(stackdir = "center", binaxis = "y", dotsize = 0.5)
```

### Question 6

**Create panels of plots**

* For this exercise, you'll be expected to accomplish Parts 1, 2 and 3 before producing the final output in Part 4.
* Considerations:
  - take care to use the most appropriate geom considering the nature of the data
  - creatively modify the graph's appearance (but remain sensible and be cognisant of which aesthetics are suitable for publications!)

#### Part 1

**The `datasets::AirPassengers` data**

* Create a plot of the monthly totals of international airline passengers, 1949 to 1960.
* Construct a figure showing the **annual** number of airline passengers (±SE) from 1949-1960.

<!-- ```{r} -->
<!-- library(tidyverse) -->
<!-- air <- datasets::AirPassengers -->

<!-- # just plot the time series object directly -->
<!-- plot(air) -->

<!-- # or make a dataframe and plot with ggplot -->
<!-- air_df <- data.frame(date = seq.Date(as.Date("1949-01-01"), as.Date("1960-12-01"), by = "month"), -->
<!--                      number = as.numeric(datasets::AirPassengers)) -->
<!-- ggplot(air_df, aes(x = date, y = number)) + -->
<!--   geom_line() + -->
<!--   xlab("Date") + ylab("Number of passengers") + -->
<!--   ggtitle("Monthly mean number of passengers") -->
<!-- ``` -->


<!-- ```{r} -->
<!-- air_df |>  -->
<!--   mutate(year = year(date)) |>  -->
<!--   group_by(year) |>  -->
<!--   summarise(mean_number = round(mean(number), 2), -->
<!--             SE = round(sd(number)/sqrt(n()), 2), -->
<!--             .groups = "drop") |>  -->
<!--   ggplot(aes(x = year, y = mean_number)) + -->
<!--   geom_line() + -->
<!--   geom_errorbar(aes(ymin = mean_number - SE, ymax = mean_number + SE), -->
<!--                 width = 0.1) + -->
<!--   xlab("Date") + ylab("Number of passengers") + -->
<!--   ggtitle("Annual mean number of passengers") -->
<!-- ``` -->

#### Part 2

**The `datasets::Loblolly` and the `datasets::Orange` data**

These are some data collected from two kinds of trees at different ages.

* Devise a figure with a two-panel 2 x 1 (rows x columns) layout showing:
    * the relationship between age and height independently for each seed source for the Loblolly data
    * the relationship between age and circumference for each tree

<!-- ```{r} -->
<!-- pines <- datasets::Loblolly -->

<!-- # age vs height -->
<!-- plt1 <- ggplot(pines, aes(x = age, y = height, colour = Seed)) + -->
<!--   geom_line(aes(colour = Seed)) + -->
<!--   ylab("Height (ft)") + xlab("Age (yr)") -->

<!-- orange <- datasets::Orange -->

<!-- plt2 <- ggplot(orange, aes(x = age, y = circumference, colour = Tree)) + -->
<!--   geom_line(aes(colour = Tree)) + -->
<!--   ylab("Circumference (mm)") + xlab("Age (days)") -->

<!-- ggpubr::ggarrange(plt1, plt2, nrow = 2, labels = "AUTO") -->
<!-- ``` -->

#### Part 3

**Your 'own' data**

* Find your own dataset (one that has not been used in this Assessment or earlier in the BCB744 module) and create a pair of faceted figures of your choice.
* Provide an explanation of what you aim to show, and what the figure ultimately tells you.

<!-- ```{r} -->
<!-- ## Make own assessment about validity of this graph -->
<!-- ``` -->

#### Part 4

**The last steps**

* Assemble all graphs (Parts 1-3) into a 2 x 2 layout using a suitable function provided by an appropriate R package. Note that only three of the four facets will be occupied by the figures you created in Parts 1-3.

<!-- ```{r} -->
<!-- # Again, this is self evident. -->
<!-- ``` -->

### Question 7

**The `datasets::UKDriverDeaths` and `datasets::Seatbelts` datasets**

These datasets are meant to be used together---`UKDriverDeaths` has the same data as is provided in the variable `drivers` in `seatbelts`, but it also provides information about the temporal structure of the `Seatbelts` dataset. You will have to devise a way to use this temporal information in your analysis.

* Produce a dataframe that combines the temporal information provided in `UKDriverDeaths` with the other information in `Seatbelts`.
* Produce a faceted graph (using `facet_wrap()`, placing `drivers`, `front`, `rear`, and `VanKilled` in facets) showing a timeline of monthly means of deaths (means taken across years) whilst distinguishing between the two levels of `law`.
* What do you conclude from your analysis?

<!-- ```{r} -->
<!-- deaths <- datasets::UKDriverDeaths -->
<!-- seatbelts <- datasets::Seatbelts -->

<!-- # checking the data (not necessary) -->
<!-- plot(deaths) -->
<!-- plot(seatbelts) -->

<!-- # make the combined dataframe -->
<!-- df <- data.frame(date = seq.Date(as.Date("1969-01-01"), as.Date("1984-12-01"), by = "month"), -->
<!--                  as.matrix(seatbelts)) -->

<!-- # make long dataframe -->
<!-- df |>  -->
<!--   select(-DriversKilled, -kms, -PetrolPrice) |>  -->
<!--   pivot_longer(cols = c(drivers, front, rear, VanKilled), -->
<!--                names_to = "type", -->
<!--                values_to = "killed") |>  -->
<!--   mutate(year = year(date)) |>  -->
<!--   group_by(year, type, law) |>  -->
<!--   summarise(mean_killed = round(mean(killed), 2), .groups = "drop") |>  -->
<!--   ggplot(aes(x = year, y = mean_killed)) + -->
<!--     geom_line(colour = "grey70") + -->
<!--     geom_point(aes(colour = as.factor(law)), size = 0.3) + -->
<!--     facet_wrap(~type, ncol = 2, scales = "free") + -->
<!--     guides(colour = guide_legend("Law")) + -->
<!--     theme_minimal() + -->
<!--     labs(x = "Year", y = "Mean deaths (count)", -->
<!--          title = "UK Car Driver and Passenger Deaths", -->
<!--          subtitle = "Jan 1969 to Dec 1984") -->
<!-- ``` -->

<!-- The data indicate that mandatory the seatbelt law, introduced on January 31, 1983, appears to have reduced the number of deaths resulting from car crashes for drivers and front passengers of passenger cars, and for the drivers of vans. However, there was no apparent reduction in deaths for rear passengers, likely because the law requiring them to wear seatbelts did not come into effect until 1991. -->

<!-- It should be noted, however, that the interpretation of these results is not entirely clear. From 1969 to December 1982, there was a consistent reduction in deaths, which may have been due to other advancements in car safety features and road safety legislation. Additionally, only two data points are available following the introduction of mandatory seatbelt laws in 1983, and more data are needed to determine whether this reduction was statistically significant. Nevertheless, it is worth noting that the reduction did prove to be statistically significant. -->


## Submission instructions

::: callout-important
## Submission instructions

Submit your .qmd and .html files wherein you provide answers to these
Questions by no later than 6 March 2024 at 16:00.

Label the files as follows:

-   `BCB744_<first_name>_<last_name>_Intro_R_Assessment.qmd`, and

-   `BCB744_<first_name>_<last_name>_Intro_R_Assessment.html`

(the `<` and `>` must be omitted as they are used in the example as
field indicators only).

Failing to follow these instructions carefully, precisely, and
thoroughly will cause you to lose marks, which could cause a significant
drop in your score as formatting counts for 15% of the final mark (out
of 100%).

Submit your Tasks on the Google Form when ready.
:::

© 2025, AJ Smit

  • Edit this page
  • Report an issue
Cookie Preferences

Made with and Quarto View the source at GitHub