BCB744: End-of-Intro-R Assessment
Honesty Pledge
This assignment requires that you work as an individual and not share your code, results, or discussion with your peers. Penalties and disciplinary action will apply if you are found cheating.
Copy the statement, below, into your document and replace the underscores with your name acknowledging adherence to the UWC’s Honesty Pledge.
I, ____________, hereby state that I have not communicated with or gained information in any way from my peers and that all work is my own.
Format and mode of submission
This Assignment requires submission as both a Quarto (.qmd) file and the knitted .html product. You are welcome to copy any text from here to use as headers or other pieces of informative explanation to use in your Assignment.
Style and organisation
As part of the assessment, we will look for a variety of features, including, but not limited to the following:
- Content:
- Questions answered in order
- A written explanation of approach included for each question
- Appropriate formatting of text, for example, fonts not larger than necessary, headings used properly, etc. Be sensible and tasteful.
- Code formatting:
- Use Tidyverse code
- No more than ~80 characters of code per line (pay particular attention to the comments)
- Application of R code conventions, e.g. spaces around
<-
, after#
, after,
, etc. - New line for each
dplyr
function (lines end in%>%
) orggplot
layer (lines end in+
) - Proper indentation of pipes and
ggplot()
layers - All chunks labelled without spaces
- No unwanted / commented out code left behind in the document
- Figures:
- Sensible use of themes / colours
- Publication quality
- Informative and complete titles, axes labels, legends, etc.
- No redundant features or aesthetics
Questions
Question 1
The shells.csv
data
- Produce a tidy dataset from the data contained in
shells.csv
. - For each species, relate two measurement variables within the dataset to one-another and represent the relationship with a straight line.
- For each species, concisely produce histograms for each of the measurement variables.
- Use the colorspace package and assign interesting colours to your graphs (all graphs above).
- Use the ggthemr package and assign interesting themes to your graphs (all graphs above).
Question 2
Head Dimensions in Brothers
The boot::frets
data: The data consist of measurements of the length and breadth of the heads of pairs of adult brothers in 25 randomly sampled families. All measurements are expressed in millimetres.
Please consult the dataset’s help file (i.e., load the package boot package and type ?frets
on the command line).
- Create a tidy dataset from the
frets
data. - Demonstrate the most concise way for displaying both brother’s data on one set of axes.
- Apply your own unique theme modification to the graph in order to produce a publication-worthy figure.
Question 3
Results from an Experiment on Plant Growth
The datasets::PlantGrowth
data: Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.
- Concisely present the results of the plant growth experiment as graphs:
- a scatterplot with individual
weight
datapoints as a function ofgroup
- a box and whisker plot showing each
group
(on one set of axes) - a bar plot with associated SD for each
group
(on one set of axes)
- a scatterplot with individual
Question 4
Student’s Sleep Data
The datasets::sleep
data: Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.
- Graphically display these data in two different ways.
Question 5
English Narrative for Some Code
- Provide an English description for what the following lines of code does.
Listing 1
Listing 2
Listing 3
set.seed(13)
my_data = data.frame(
gender = factor(rep(c("F", "M"), each=200)),
length = c(rnorm(200, 55), rnorm(200, 58)))
head(my_data)
ggplot(my_data, aes(x = gender, y = length)) +
geom_boxplot(aes(fill = gender))
ggplot(my_data, aes(x = gender, y = length)) +
geom_violin()
ggplot(my_data, aes(x = gender, y = length)) +
geom_dotplot(stackdir = "center", binaxis = "y", dotsize = 0.5)
Question 6
Create panels of plots
- For this exercise, you’ll be expected to accomplish Parts 1, 2 and 3 before producing the final output in Part 4.
- Considerations:
- take care to use the most appropriate geom considering the nature of the data
- creatively modify the graph’s appearance (but remain sensible and be cognisant of which aesthetics are suitable for publications!)
Part 1
The datasets::AirPassengers
data
- Create a plot of the monthly totals of international airline passengers, 1949 to 1960.
- Construct a figure showing the annual number of airline passengers (±SE) from 1949-1960.
Part 2
The datasets::Loblolly
and the datasets::Orange
data
These are some data collected from two kinds of trees at different ages.
- Devise a figure with a two-panel 2 x 1 (rows x columns) layout showing:
- the relationship between age and height independently for each seed source for the Loblolly data
- the relationship between age and circumference for each tree
Part 3
Your ‘own’ data
- Find your own dataset (one that has not been used in this Assessment or earlier in the BCB744 module) and create a pair of faceted figures of your choice.
- Provide an explanation of what you aim to show, and what the figure ultimately tells you.
Part 4
The last steps
- Assemble all graphs (Parts 1-3) into a 2 x 2 layout using a suitable function provided by an appropriate R package. Note that only three of the four facets will be occupied by the figures you created in Parts 1-3.
Question 7
The datasets::UKDriverDeaths
and datasets::Seatbelts
datasets
These datasets are meant to be used together—UKDriverDeaths
has the same data as is provided in the variable drivers
in seatbelts
, but it also provides information about the temporal structure of the Seatbelts
dataset. You will have to devise a way to use this temporal information in your analysis.
- Produce a dataframe that combines the temporal information provided in
UKDriverDeaths
with the other information inSeatbelts
. - Produce a faceted graph (using
facet_wrap()
, placingdrivers
,front
,rear
, andVanKilled
in facets) showing a timeline of monthly means of deaths (means taken across years) whilst distinguishing between the two levels oflaw
. - What do you conclude from your analysis?
Submission instructions
Submit your .qmd and .html files wherein you provide answers to these Questions by no later than 6 March 2024 at 16:00.
Label the files as follows:
BCB744_<first_name>_<last_name>_Intro_R_Assessment.qmd
, andBCB744_<first_name>_<last_name>_Intro_R_Assessment.html
(the <
and >
must be omitted as they are used in the example as field indicators only).
Failing to follow these instructions carefully, precisely, and thoroughly will cause you to lose marks, which could cause a significant drop in your score as formatting counts for 15% of the final mark (out of 100%).
Submit your Tasks on the Google Form when ready.
Reuse
Citation
@online{smit,_a._j.,
author = {Smit, A. J.,},
title = {BCB744: {End-of-Intro-R} {Assessment}},
date = {},
url = {http://tangledbank.netlify.app/assessments/BCB744_Mid_Assessment_2023.html},
langid = {en}
}