BCB744 Task B

Assessment Sheet

5. R and RStudio (Continue)

Question 18

Explain in words what the pipe operator %>% does in R. How does it make your code more readable? (/3)

Answer

  • ✓ The pipe operator %>% (or |>) in R is used to chain together multiple functions or operations in a sequence. It takes the output of one function and passes it as the first argument to the next function, allowing you to create a pipeline of operations.
  • ✓ The pipe operator makes your code more readable by breaking down complex operations into a series of simpler steps. It helps in avoiding nested function calls, improves code clarity, and reduces the need for intermediate variables.
  • ✓ In this way you can write code in a more linear and intuitive way, following the flow of data transformations from one step to the next. This makes it easier to understand the logic of the code and the sequence of operations being performed.

Question 19

Using the various tidyverse functions, calculate the mean ± SD for the crop mass within each combination of block and fertiliser of the crops dataset. (/5)

Answer

# Load the tidyverse package
library(tidyverse)

# Calculate the mean ± SD for crop mass within each combination of block and fertiliser
crops %>%
  group_by(block, fertilizer) %>%
  summarise(mean_mass = mean(mass), sd_mass = sd(mass))
# A tibble: 12 × 4
# Groups:   block [4]
   block fertilizer mean_mass sd_mass
   <chr> <chr>          <dbl>   <dbl>
 1 east  A              4826.    16.6
 2 east  B              4817.    19.9
 3 east  C              4834.    13.8
 4 north A              4803.    19.4
 5 north B              4813.    12.1
 6 north C              4823.    14.6
 7 south A              4800.    12.7
 8 south B              4809.    17.6
 9 south C              4819.    13.7
10 west  A              4812.    16.4
11 west  B              4822.    11.4
12 west  C              4832.    20.2

6. Faceting Figures

Question 1

Create a scatterplot of bill_length_mm against bill_depth_mm for Adelie penguins on Biscoe island. (/10)

Answer

library(palmerpenguins) # ✓ 
library(tidyverse) # ✓ 
data(penguins)
head(penguins)
# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>
penguins %>% # ✓ 
  filter(island == "Biscoe" & species == "Adelie") %>%  # ✓ 
  ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +  # ✓ 
  geom_point() +  # ✓ 
  labs(title = "Adelie Penguins on Biscoe Island",  # ✓ 
         x = "Bill Length (mm)", # ✓ 
         y = "Bill Depth (mm)") # ✓ 

Question 2

Create histograms of bill_length_mm for Adelie penguins on all three islands (one figure per island). Save each figure as a separate R object which you can later reuse. Again for Adelie penguins, create a boxplot for bill_length_mm showing all the data on one plot. Save it too as an R object. Combine the four saved figures into one figure using ggarrange(). (/25)

Answer

library(ggpubr) # ✓

# Create histograms
adelie_biscoe <- penguins %>% # ✓ x 5
  filter(island == "Biscoe" & species == "Adelie") %>% 
  ggplot(aes(x = bill_length_mm)) + 
  geom_histogram() + 
  labs(title = "Adelie Penguins on Biscoe Island", 
       x = "Bill Length (mm)", 
       y = "Frequency")

adelie_dream <- penguins %>% # ✓ x 5
  filter(island == "Dream" & species == "Adelie") %>% 
  ggplot(aes(x = bill_length_mm)) + 
  geom_histogram() + 
  labs(title = "Adelie Penguins on Dream Island", 
       x = "Bill Length (mm)", 
       y = "Frequency")

adelie_torgersen <- penguins %>% # ✓ x 5
  filter(island == "Torgersen" & species == "Adelie") %>% 
  ggplot(aes(x = bill_length_mm)) + 
  geom_histogram() + 
  labs(title = "Adelie Penguins on Torgersen Island", 
       x = "Bill Length (mm)", 
       y = "Frequency")

# Create boxplot # ✓ x 5
adelie_boxplot <- penguins %>% 
  filter(species == "Adelie") %>% 
  ggplot(aes(x = island, y = bill_length_mm)) + 
  geom_boxplot() + 
  labs(title = "Adelie Penguins Bill Length Boxplot", 
       x = "Island", 
       y = "Bill Length (mm)")

# Combine figures # ✓ x 1
ggarrange(adelie_biscoe, adelie_dream, adelie_torgersen, adelie_boxplot, 
          ncol = 2, nrow = 2)

Question 3

Create a scatter plot of flipper_length_mm against body_mass_g and use facet_wrap() to create separate panels for each island (combine all species). Also indicate the effect of species. Add a best-fit straight line with 95% confidence intervals through the points, ignoring the effect of species. Take into account which variable best belongs on x and y. Describe your findings. (/10)

Answer

penguins %>% # ✓ x 6
  ggplot(aes(x = body_mass_g, y = flipper_length_mm)) + 
  geom_point(aes(colour = species)) + 
  geom_smooth(method = "lm", se = TRUE) +
  facet_wrap(~island) + 
  labs(title = "Flipper Length vs Body Mass", 
       x = "Body Mass (g)",
       y = "Flipper Length (mm)")

  • The body_mass_g variable is best suited to the x-axis as it is the independent variable. The flipper_length_mm variable is best suited to the y-axis as it is the dependent variable.
  • ✓ For all penguin species, the flipper_length_mm and body_mass_g variables show a positive correlation, with larger penguins having longer flippers and higher body masses.
  • ✓ The Adelie penguins on Biscoe island have the shortest flippers and lowest body masses, while Gentoo penguins have the longest flippers and highest body masses.
  • Chinstrap and Adelie penguins are present on Dream island; these species’ body masses and flipper lengths are difficult to distinguish from one-another.
  • ✓ Only Adelie penguons are present on Torgersen island. The Adelie penguins appear to have the same flipper length vs body mass relationship across all three islands.

Question 4

Create a scatter plot of bill_length_mm and body_mass_g and use facet_grid() to create separate panels for each species and island. (/6)

Answer

grid_plt <-  penguins %>% # ✓ x 6
  ggplot(aes(x = body_mass_g, y = bill_length_mm)) + 
  geom_point() + 
  facet_grid(species ~ island) + 
  labs(title = "Bill Length vs Body Mass", 
       x = "Body Mass (g)",
       y = "Bill Length (mm)")
grid_plt

Question 5

Using the figure created in point 4, also show the effect of sex and add a best-fit straight line. Explain the findings. (/9)

Answer

grid_plt +  # ✓ x 6
  geom_point(aes(col = sex)) +
  geom_smooth(method = "lm", se = TRUE, colour = "black")

  • ✓ The bill_length_mm and body_mass_g variables show a positive correlation, with larger penguins having longer bills and higher body masses.
  • ✓ The sex variable appear to have an effect on the relationship between bill_length_mm and body_mass_g, with male penguins tending to be heavier with longer bill lengths.
  • ✓ There also appears to be differences in the relationship between bill_length_mm and body_mass_g between the different species and islands.

Question 6

What are the benefits of using faceting in data visualisation? (/3)

Answer

  • ✓ Faceting allows for the visualisation of multiple relationships in a single plot, making it easier to compare relationships between different groups.
  • ✓ Faceting can help to identify patterns and trends in the data that may not be immediately obvious when looking at the data as a whole.
  • ✓ Faceting can help to identify differences in relationships between different groups, such as species or islands, allowing for more detailed analysis of the data.

Reuse

Citation

BibTeX citation:
@online{smit,_a._j.,
  author = {Smit, A. J.,},
  title = {BCB744 {Task} {B}},
  url = {http://tangledbank.netlify.app/assessments/BCB744_Task_B.html},
  langid = {en}
}
For attribution, please cite this work as:
Smit, A. J. BCB744 Task B. http://tangledbank.netlify.app/assessments/BCB744_Task_B.html.