BCB744: End-of-Intro-R Assessment

Author

Affiliation

Published

Invalid Date

Honesty Pledge

This assignment requires that you work as an individual and not share your code, results, or discussion with your peers. Penalties and disciplinary action will apply if you are found cheating.

Acknowledgement of the Pledge

Copy the statement, below, into your document and replace the underscores with your name acknowledging adherence to the UWC’s Honesty Pledge.

I, ____________, hereby state that I have not communicated with or gained information in any way from my peers and that all work is my own.

Format and mode of submission

This Assignment requires submission as both a Quarto (.qmd) file and the knitted .html product. You are welcome to copy any text from here to use as headers or other pieces of informative explanation to use in your Assignment.

Style and organisation

As part of the assessment, we will look for a variety of features, including, but not limited to the following:

Content:
- Questions answered in order
- A written explanation of approach included for each question
- Appropriate formatting of text, for example, fonts not larger than necessary, headings used properly, etc. Be sensible and tasteful.
Code formatting:
- Use Tidyverse code
- No more than ~80 characters of code per line (pay particular attention to the comments)
- Application of R code conventions, e.g. spaces around <-, after #, after ,, etc.
- New line for each dplyr function (lines end in %>%) or ggplot layer (lines end in +)
- Proper indentation of pipes and ggplot() layers
- All chunks labelled without spaces
- No unwanted / commented out code left behind in the document
Figures:
- Sensible use of themes / colours
- Publication quality
- Informative and complete titles, axes labels, legends, etc.
- No redundant features or aesthetics

Questions

Question 1

The shells.csv data

Produce a tidy dataset from the data contained in shells.csv.
For each species, relate two measurement variables within the dataset to one-another and represent the relationship with a straight line.
For each species, concisely produce histograms for each of the measurement variables.
Use the colorspace package and assign interesting colours to your graphs (all graphs above).
Use the ggthemr package and assign interesting themes to your graphs (all graphs above).

Question 2

Head Dimensions in Brothers

The boot::frets data: The data consist of measurements of the length and breadth of the heads of pairs of adult brothers in 25 randomly sampled families. All measurements are expressed in millimetres.

Please consult the dataset’s help file (i.e., load the package boot package and type ?frets on the command line).

Create a tidy dataset from the frets data.
Demonstrate the most concise way for displaying both brother’s data on one set of axes.
Apply your own unique theme modification to the graph in order to produce a publication-worthy figure.

Question 3

Results from an Experiment on Plant Growth

The datasets::PlantGrowth data: Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

Concisely present the results of the plant growth experiment as graphs:
- a scatterplot with individual weight datapoints as a function of group
- a box and whisker plot showing each group (on one set of axes)
- a bar plot with associated SD for each group (on one set of axes)

Question 4

Student’s Sleep Data

The datasets::sleep data: Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.

Graphically display these data in two different ways.

Question 5

English Narrative for Some Code

Provide an English description for what the following lines of code does.

Listing 1

the_data <- some_data %>%
  mutate(yr = year(date),
         mo = month(date)) %>% 
  group_by(country, yr) %>% 
  summarise(med_chl = mean(chl, na.rm = TRUE)) %>% 
  ungroup()

ggplot(the_data, aes(x = yr, y = med_chl)) +
  geom_line(aes(group = country), colour = "blue3") +
  facet_wrap(~country, nrow = 3) +
  labs(x = "Year", y = "Chlorophyll-a (mg/m3)",
       title = "Chlorophyll-a concentration")

Listing 2

library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
    geom_point() +
    facet_zoom(x = Species == "versicolor")

Listing 3

set.seed(13)
my_data = data.frame(
        gender = factor(rep(c("F", "M"), each=200)),
        length = c(rnorm(200, 55), rnorm(200, 58)))
head(my_data)

ggplot(my_data, aes(x = gender, y = length)) +
  geom_boxplot(aes(fill = gender))

ggplot(my_data, aes(x = gender, y = length)) +
  geom_violin()

ggplot(my_data, aes(x = gender, y = length)) +
  geom_dotplot(stackdir = "center", binaxis = "y", dotsize = 0.5)

Question 6

Create panels of plots

For this exercise, you’ll be expected to accomplish Parts 1, 2 and 3 before producing the final output in Part 4.
Considerations:
- take care to use the most appropriate geom considering the nature of the data
- creatively modify the graph’s appearance (but remain sensible and be cognisant of which aesthetics are suitable for publications!)

Part 1

The datasets::AirPassengers data

Create a plot of the monthly totals of international airline passengers, 1949 to 1960.
Construct a figure showing the annual number of airline passengers (±SE) from 1949-1960.

Part 2

The datasets::Loblolly and the datasets::Orange data

These are some data collected from two kinds of trees at different ages.

Devise a figure with a two-panel 2 x 1 (rows x columns) layout showing:
- the relationship between age and height independently for each seed source for the Loblolly data
- the relationship between age and circumference for each tree

Part 3

Your ‘own’ data

Find your own dataset (one that has not been used in this Assessment or earlier in the BCB744 module) and create a pair of faceted figures of your choice.
Provide an explanation of what you aim to show, and what the figure ultimately tells you.

Part 4

The last steps

Assemble all graphs (Parts 1-3) into a 2 x 2 layout using a suitable function provided by an appropriate R package. Note that only three of the four facets will be occupied by the figures you created in Parts 1-3.

Question 7

The datasets::UKDriverDeaths and datasets::Seatbelts datasets

These datasets are meant to be used together—UKDriverDeaths has the same data as is provided in the variable drivers in seatbelts, but it also provides information about the temporal structure of the Seatbelts dataset. You will have to devise a way to use this temporal information in your analysis.

Produce a dataframe that combines the temporal information provided in UKDriverDeaths with the other information in Seatbelts.
Produce a faceted graph (using facet_wrap(), placing drivers, front, rear, and VanKilled in facets) showing a timeline of monthly means of deaths (means taken across years) whilst distinguishing between the two levels of law.
What do you conclude from your analysis?

Submission instructions

Submit your .qmd and .html files wherein you provide answers to these Questions by no later than 6 March 2024 at 16:00.

Label the files as follows:

BCB744_<first_name>_<last_name>_Intro_R_Assessment.qmd, and
BCB744_<first_name>_<last_name>_Intro_R_Assessment.html

(the < and > must be omitted as they are used in the example as field indicators only).

Failing to follow these instructions carefully, precisely, and thoroughly will cause you to lose marks, which could cause a significant drop in your score as formatting counts for 15% of the final mark (out of 100%).

Submit your Tasks on the Google Form when ready.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.,
  author = {Smit, A. J.,},
  title = {BCB744: {End-of-Intro-R} {Assessment}},
  date = {},
  url = {http://tangledbank.netlify.app/assessments/BCB744_Mid_Assessment_2023.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J. BCB744: End-of-Intro-R Assessment. http://tangledbank.netlify.app/assessments/BCB744_Mid_Assessment_2023.html.

--- title: "BCB744: End-of-Intro-R Assessment" subtile: "Final Intro R Assessment" date: "Due: 4 March 2023 at 16:00" format: html: code-fold: false toc-title: "On this page" standalone: true toc-location: right page-layout: full embed-resources: true number-sections: false --- ## Honesty Pledge **This assignment requires that you work as an individual and not share your code, results, or discussion with your peers. Penalties and disciplinary action will apply if you are found cheating.** ::: callout-note ## Acknowledgement of the Pledge Copy the statement, below, into your document and replace the underscores with your name acknowledging adherence to the UWC's Honesty Pledge. **I, \_\_\_\_\_\_\_\_\_\_\_\_, hereby state that I have not communicated with or gained information in any way from my peers and that all work is my own.** ::: ## Format and mode of submission This Assignment requires submission as both a Quarto (.qmd) file and the knitted .html product. You are welcome to copy any text from here to use as headers or other pieces of informative explanation to use in your Assignment. ## Style and organisation As part of the assessment, we will look for a variety of features, including, but not limited to the following: - Content: - Questions answered in order - A written explanation of approach included for each question - Appropriate formatting of text, for example, fonts not larger than necessary, headings used properly, etc. Be sensible and tasteful. - Code formatting: - Use **Tidyverse** code - No more than \~80 characters of code per line (pay particular attention to the comments) - Application of [R code conventions](http://adv-r.had.co.nz/Style.html), e.g. spaces around `<-`, after `#`, after `,`, etc. - New line for each `dplyr` function (lines end in `%>%`) or `ggplot` layer (lines end in `+`) - Proper indentation of pipes and `ggplot()` layers - All chunks labelled without spaces - No unwanted / commented out code left behind in the document - Figures: - Sensible use of themes / colours - Publication quality - Informative and complete titles, axes labels, legends, etc. - No redundant features or aesthetics ```{r load-packages, message=FALSE} #| echo: false #| eval: false library(tidyverse) library(ggpubr) ``` ## Questions ### Question 1 **The [`shells.csv`](../data/shells.csv) data** * Produce a *tidy* dataset from the data contained in [`shells.csv`](../data/shells.csv). * For each species, relate two measurement variables within the dataset to one-another and represent the relationship with a straight line. * For each species, concisely produce histograms for each of the measurement variables. * Use the **colorspace** package and assign interesting colours to your graphs (all graphs above). * Use the **ggthemr** package and assign interesting themes to your graphs (all graphs above). ### Question 2 **Head Dimensions in Brothers** The `boot::frets` data: The data consist of measurements of the length and breadth of the heads of pairs of adult brothers in 25 randomly sampled families. All measurements are expressed in millimetres. Please consult the dataset's help file (i.e., load the package **boot** package and type `?frets` on the command line). * Create a *tidy* dataset from the `frets` data. * Demonstrate the most concise way for displaying both brother's data on one set of axes. * Apply your own unique theme modification to the graph in order to produce a publication-worthy figure. ### Question 3 **Results from an Experiment on Plant Growth** The `datasets::PlantGrowth` data: Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. * Concisely present the results of the plant growth experiment as graphs: - a scatterplot with individual `weight` datapoints as a function of `group` - a box and whisker plot showing each `group` (on one set of axes) - a bar plot with associated SD for each `group` (on one set of axes) ### Question 4 **Student's Sleep Data** The `datasets::sleep` data: Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients. * Graphically display these data in two different ways. ### Question 5 **English Narrative for Some Code** * Provide an English description for what the following lines of code does. #### Listing 1 ```{r, eval=FALSE} the_data <- some_data %>% mutate(yr = year(date), mo = month(date)) %>% group_by(country, yr) %>% summarise(med_chl = mean(chl, na.rm = TRUE)) %>% ungroup() ``` ```{r, eval=FALSE} ggplot(the_data, aes(x = yr, y = med_chl)) + geom_line(aes(group = country), colour = "blue3") + facet_wrap(~country, nrow = 3) + labs(x = "Year", y = "Chlorophyll-a (mg/m3)", title = "Chlorophyll-a concentration") ``` #### Listing 2 ```{r, eval=FALSE} library(ggforce) ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) + geom_point() + facet_zoom(x = Species == "versicolor") ``` #### Listing 3 ```{r, eval=FALSE} set.seed(13) my_data = data.frame( gender = factor(rep(c("F", "M"), each=200)), length = c(rnorm(200, 55), rnorm(200, 58))) head(my_data) ggplot(my_data, aes(x = gender, y = length)) + geom_boxplot(aes(fill = gender)) ggplot(my_data, aes(x = gender, y = length)) + geom_violin() ggplot(my_data, aes(x = gender, y = length)) + geom_dotplot(stackdir = "center", binaxis = "y", dotsize = 0.5) ``` ### Question 6 **Create panels of plots** * For this exercise, you'll be expected to accomplish Parts 1, 2 and 3 before producing the final output in Part 4. * Considerations: - take care to use the most appropriate geom considering the nature of the data - creatively modify the graph's appearance (but remain sensible and be cognisant of which aesthetics are suitable for publications!) #### Part 1 **The `datasets::AirPassengers` data** * Create a plot of the monthly totals of international airline passengers, 1949 to 1960. * Construct a figure showing the **annual** number of airline passengers (±SE) from 1949-1960.                            #### Part 2 **The `datasets::Loblolly` and the `datasets::Orange` data** These are some data collected from two kinds of trees at different ages. * Devise a figure with a two-panel 2 x 1 (rows x columns) layout showing: * the relationship between age and height independently for each seed source for the Loblolly data * the relationship between age and circumference for each tree             #### Part 3 **Your 'own' data** * Find your own dataset (one that has not been used in this Assessment or earlier in the BCB744 module) and create a pair of faceted figures of your choice. * Provide an explanation of what you aim to show, and what the figure ultimately tells you.    #### Part 4 **The last steps** * Assemble all graphs (Parts 1-3) into a 2 x 2 layout using a suitable function provided by an appropriate R package. Note that only three of the four facets will be occupied by the figures you created in Parts 1-3.    ### Question 7 **The `datasets::UKDriverDeaths` and `datasets::Seatbelts` datasets** These datasets are meant to be used together---`UKDriverDeaths` has the same data as is provided in the variable `drivers` in `seatbelts`, but it also provides information about the temporal structure of the `Seatbelts` dataset. You will have to devise a way to use this temporal information in your analysis. * Produce a dataframe that combines the temporal information provided in `UKDriverDeaths` with the other information in `Seatbelts`. * Produce a faceted graph (using `facet_wrap()`, placing `drivers`, `front`, `rear`, and `VanKilled` in facets) showing a timeline of monthly means of deaths (means taken across years) whilst distinguishing between the two levels of `law`. * What do you conclude from your analysis?                               ## Submission instructions ::: callout-important ## Submission instructions Submit your .qmd and .html files wherein you provide answers to these Questions by no later than 6 March 2024 at 16:00. Label the files as follows: - `BCB744_<first_name>_<last_name>_Intro_R_Assessment.qmd`, and - `BCB744_<first_name>_<last_name>_Intro_R_Assessment.html` (the `<` and `>` must be omitted as they are used in the example as field indicators only). Failing to follow these instructions carefully, precisely, and thoroughly will cause you to lose marks, which could cause a significant drop in your score as formatting counts for 15% of the final mark (out of 100%). Submit your Tasks on the Google Form when ready. :::