Explain in words what the pipe operator %>% does in R. How does it make your code more readable? (/3)
Answer
✓ The pipe operator %>% (or |>) in R is used to chain together multiple functions or operations in a sequence. It takes the output of one function and passes it as the first argument to the next function, allowing you to create a pipeline of operations.
✓ The pipe operator makes your code more readable by breaking down complex operations into a series of simpler steps. It helps in avoiding nested function calls, improves code clarity, and reduces the need for intermediate variables.
✓ In this way you can write code in a more linear and intuitive way, following the flow of data transformations from one step to the next. This makes it easier to understand the logic of the code and the sequence of operations being performed.
Question 19
Using the various tidyverse functions, calculate the mean ± SD for the crop mass within each combination of block and fertiliser of the crops dataset. (/5)
Answer
# Load the tidyverse packagelibrary(tidyverse)# Calculate the mean ± SD for crop mass within each combination of block and fertilisercrops %>%group_by(block, fertilizer) %>%summarise(mean_mass =mean(mass), sd_mass =sd(mass))
# A tibble: 12 × 4
# Groups: block [4]
block fertilizer mean_mass sd_mass
<chr> <chr> <dbl> <dbl>
1 east A 4826. 16.6
2 east B 4817. 19.9
3 east C 4834. 13.8
4 north A 4803. 19.4
5 north B 4813. 12.1
6 north C 4823. 14.6
7 south A 4800. 12.7
8 south B 4809. 17.6
9 south C 4819. 13.7
10 west A 4812. 16.4
11 west B 4822. 11.4
12 west C 4832. 20.2
6. Faceting Figures
Question 1
Create a scatterplot of bill_length_mm against bill_depth_mm for Adelie penguins on Biscoe island. (/10)
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
penguins %>%# ✓ filter(island =="Biscoe"& species =="Adelie") %>%# ✓ ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +# ✓ geom_point() +# ✓ labs(title ="Adelie Penguins on Biscoe Island", # ✓ x ="Bill Length (mm)", # ✓ y ="Bill Depth (mm)") # ✓
Question 2
Create histograms of bill_length_mm for Adelie penguins on all three islands (one figure per island). Save each figure as a separate R object which you can later reuse. Again for Adelie penguins, create a boxplot for bill_length_mm showing all the data on one plot. Save it too as an R object. Combine the four saved figures into one figure using ggarrange(). (/25)
Answer
library(ggpubr) # ✓# Create histogramsadelie_biscoe <- penguins %>%# ✓ x 5filter(island =="Biscoe"& species =="Adelie") %>%ggplot(aes(x = bill_length_mm)) +geom_histogram() +labs(title ="Adelie Penguins on Biscoe Island", x ="Bill Length (mm)", y ="Frequency")adelie_dream <- penguins %>%# ✓ x 5filter(island =="Dream"& species =="Adelie") %>%ggplot(aes(x = bill_length_mm)) +geom_histogram() +labs(title ="Adelie Penguins on Dream Island", x ="Bill Length (mm)", y ="Frequency")adelie_torgersen <- penguins %>%# ✓ x 5filter(island =="Torgersen"& species =="Adelie") %>%ggplot(aes(x = bill_length_mm)) +geom_histogram() +labs(title ="Adelie Penguins on Torgersen Island", x ="Bill Length (mm)", y ="Frequency")# Create boxplot # ✓ x 5adelie_boxplot <- penguins %>%filter(species =="Adelie") %>%ggplot(aes(x = island, y = bill_length_mm)) +geom_boxplot() +labs(title ="Adelie Penguins Bill Length Boxplot", x ="Island", y ="Bill Length (mm)")# Combine figures # ✓ x 1ggarrange(adelie_biscoe, adelie_dream, adelie_torgersen, adelie_boxplot, ncol =2, nrow =2)
Question 3
Create a scatter plot of flipper_length_mm against body_mass_g and use facet_wrap() to create separate panels for each island (combine all species). Also indicate the effect of species. Add a best-fit straight line with 95% confidence intervals through the points, ignoring the effect of species. Take into account which variable best belongs on x and y. Describe your findings. (/10)
Answer
penguins %>%# ✓ x 6ggplot(aes(x = body_mass_g, y = flipper_length_mm)) +geom_point(aes(colour = species)) +geom_smooth(method ="lm", se =TRUE) +facet_wrap(~island) +labs(title ="Flipper Length vs Body Mass", x ="Body Mass (g)",y ="Flipper Length (mm)")
The body_mass_g variable is best suited to the x-axis as it is the independent variable. The flipper_length_mm variable is best suited to the y-axis as it is the dependent variable.
✓ For all penguin species, the flipper_length_mm and body_mass_g variables show a positive correlation, with larger penguins having longer flippers and higher body masses.
✓ The Adelie penguins on Biscoe island have the shortest flippers and lowest body masses, while Gentoo penguins have the longest flippers and highest body masses.
✓ Chinstrap and Adelie penguins are present on Dream island; these species’ body masses and flipper lengths are difficult to distinguish from one-another.
✓ Only Adelie penguons are present on Torgersen island. The Adelie penguins appear to have the same flipper length vs body mass relationship across all three islands.
Question 4
Create a scatter plot of bill_length_mm and body_mass_g and use facet_grid() to create separate panels for each species and island. (/6)
Answer
grid_plt <- penguins %>%# ✓ x 6ggplot(aes(x = body_mass_g, y = bill_length_mm)) +geom_point() +facet_grid(species ~ island) +labs(title ="Bill Length vs Body Mass", x ="Body Mass (g)",y ="Bill Length (mm)")grid_plt
Question 5
Using the figure created in point 4, also show the effect of sex and add a best-fit straight line. Explain the findings. (/9)
Answer
grid_plt +# ✓ x 6geom_point(aes(col = sex)) +geom_smooth(method ="lm", se =TRUE, colour ="black")
✓ The bill_length_mm and body_mass_g variables show a positive correlation, with larger penguins having longer bills and higher body masses.
✓ The sex variable appear to have an effect on the relationship between bill_length_mm and body_mass_g, with male penguins tending to be heavier with longer bill lengths.
✓ There also appears to be differences in the relationship between bill_length_mm and body_mass_g between the different species and islands.
Question 6
What are the benefits of using faceting in data visualisation? (/3)
Answer
✓ Faceting allows for the visualisation of multiple relationships in a single plot, making it easier to compare relationships between different groups.
✓ Faceting can help to identify patterns and trends in the data that may not be immediately obvious when looking at the data as a whole.
✓ Faceting can help to identify differences in relationships between different groups, such as species or islands, allowing for more detailed analysis of the data.
---title: "BCB744 Task B"format: html: fig-format: svg fig_retina: 2 fig-dpi: 400params: hide_answers: false---# [[Assessment Sheet](BCB744_Task_B_Surname.xlsx)]{.my-highlight} {#sec-assessment}# 5. R and RStudio (Continue)```{r, echo=FALSE, output=FALSE}# Load the tidyverse packagelibrary(tidyverse)# Read the Excel file into Rcrops <- readxl::read_excel("../data/crops.xlsx")```## Question 18Explain in words what the pipe operator `%>%` does in R. How does it make your code more readable? **(/3)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**- ✓ The pipe operator `%>%` (or `|>`) in R is used to chain together multiple functions or operations in a sequence. It takes the output of one function and passes it as the first argument to the next function, allowing you to create a pipeline of operations.- ✓ The pipe operator makes your code more readable by breaking down complex operations into a series of simpler steps. It helps in avoiding nested function calls, improves code clarity, and reduces the need for intermediate variables.- ✓ In this way you can write code in a more linear and intuitive way, following the flow of data transformations from one step to the next. This makes it easier to understand the logic of the code and the sequence of operations being performed.`r if (params$hide_answers) ":::"`## Question 19Using the various **tidyverse** functions, calculate the mean ± SD for the crop mass within each combination of `block` and `fertiliser` of the `crops` dataset. **(/5)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**```{r}# Load the tidyverse packagelibrary(tidyverse)# Calculate the mean ± SD for crop mass within each combination of block and fertilisercrops %>%group_by(block, fertilizer) %>%summarise(mean_mass =mean(mass), sd_mass =sd(mass))````r if (params$hide_answers) ":::"`# 6. Faceting Figures## Question 1Create a scatterplot of `bill_length_mm` against `bill_depth_mm` for `Adelie` penguins on `Biscoe` island. **(/10)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**```{r, fig.width=5,fig.asp=0.65,out.width="60%",fig.align='center'}library(palmerpenguins) # ✓ library(tidyverse) # ✓ data(penguins)head(penguins)penguins %>%# ✓ filter(island =="Biscoe"& species =="Adelie") %>%# ✓ ggplot(aes(x = bill_length_mm, y = bill_depth_mm)) +# ✓ geom_point() +# ✓ labs(title ="Adelie Penguins on Biscoe Island", # ✓ x ="Bill Length (mm)", # ✓ y ="Bill Depth (mm)") # ✓ ````r if (params$hide_answers) ":::"`## Question 2Create histograms of `bill_length_mm` for `Adelie` penguins on all three islands (one figure per island). Save each figure as a separate R object which you can later reuse. Again for `Adelie` penguins, create a boxplot for `bill_length_mm` showing all the data on one plot. Save it too as an R object. Combine the four saved figures into one figure using `ggarrange()`. **(/25)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**```{r, fig.width=8,fig.asp=0.65,out.width="60%",fig.align='center'}library(ggpubr) # ✓# Create histogramsadelie_biscoe <- penguins %>%# ✓ x 5filter(island =="Biscoe"& species =="Adelie") %>%ggplot(aes(x = bill_length_mm)) +geom_histogram() +labs(title ="Adelie Penguins on Biscoe Island", x ="Bill Length (mm)", y ="Frequency")adelie_dream <- penguins %>%# ✓ x 5filter(island =="Dream"& species =="Adelie") %>%ggplot(aes(x = bill_length_mm)) +geom_histogram() +labs(title ="Adelie Penguins on Dream Island", x ="Bill Length (mm)", y ="Frequency")adelie_torgersen <- penguins %>%# ✓ x 5filter(island =="Torgersen"& species =="Adelie") %>%ggplot(aes(x = bill_length_mm)) +geom_histogram() +labs(title ="Adelie Penguins on Torgersen Island", x ="Bill Length (mm)", y ="Frequency")# Create boxplot # ✓ x 5adelie_boxplot <- penguins %>%filter(species =="Adelie") %>%ggplot(aes(x = island, y = bill_length_mm)) +geom_boxplot() +labs(title ="Adelie Penguins Bill Length Boxplot", x ="Island", y ="Bill Length (mm)")# Combine figures # ✓ x 1ggarrange(adelie_biscoe, adelie_dream, adelie_torgersen, adelie_boxplot, ncol =2, nrow =2)````r if (params$hide_answers) ":::"`## Question 3Create a scatter plot of `flipper_length_mm` against `body_mass_g` and use `facet_wrap()` to create separate panels for each island (combine all species). Also indicate the effect of species. Add a best-fit straight line with 95% confidence intervals through the points, ignoring the effect of species. Take into account which variable best belongs on `x` and `y`. Describe your findings. **(/10)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**```{r, warning=FALSE,message=FALSE,fig.width=6,fig.asp=0.65,out.width="60%",fig.align='center'}penguins %>%# ✓ x 6ggplot(aes(x = body_mass_g, y = flipper_length_mm)) +geom_point(aes(colour = species)) +geom_smooth(method ="lm", se =TRUE) +facet_wrap(~island) +labs(title ="Flipper Length vs Body Mass", x ="Body Mass (g)",y ="Flipper Length (mm)")```- The `body_mass_g` variable is best suited to the `x`-axis as it is the independent variable. The `flipper_length_mm` variable is best suited to the `y`-axis as it is the dependent variable.- ✓ For all penguin species, the `flipper_length_mm` and `body_mass_g` variables show a positive correlation, with larger penguins having longer flippers and higher body masses.- ✓ The `Adelie` penguins on `Biscoe` island have the shortest flippers and lowest body masses, while `Gentoo` penguins have the longest flippers and highest body masses.- ✓ `Chinstrap` and `Adelie` penguins are present on `Dream` island; these species' body masses and flipper lengths are difficult to distinguish from one-another.- ✓ Only `Adelie` penguons are present on `Torgersen` island. The `Adelie` penguins appear to have the same flipper length vs body mass relationship across all three islands.`r if (params$hide_answers) ":::"`## Question 4Create a scatter plot of `bill_length_mm` and `body_mass_g` and use `facet_grid()` to create separate panels for each species and island. **(/6)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**```{r, warning=FALSE,message=FALSE,fig.width=6,fig.asp=0.65,out.width="60%",fig.align='center'}grid_plt <- penguins %>%# ✓ x 6ggplot(aes(x = body_mass_g, y = bill_length_mm)) +geom_point() +facet_grid(species ~ island) +labs(title ="Bill Length vs Body Mass", x ="Body Mass (g)",y ="Bill Length (mm)")grid_plt````r if (params$hide_answers) ":::"`## Question 5Using the figure created in point 4, also show the effect of `sex` and add a best-fit straight line. Explain the findings. **(/9)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**```{r, warning=FALSE,message=FALSE,fig.width=6,fig.asp=0.65,out.width="60%",fig.align='center'}grid_plt +# ✓ x 6geom_point(aes(col = sex)) +geom_smooth(method ="lm", se =TRUE, colour ="black")```- ✓ The `bill_length_mm` and `body_mass_g` variables show a positive correlation, with larger penguins having longer bills and higher body masses.- ✓ The `sex` variable appear to have an effect on the relationship between `bill_length_mm` and `body_mass_g`, with male penguins tending to be heavier with longer bill lengths.- ✓ There also appears to be differences in the relationship between `bill_length_mm` and `body_mass_g` between the different species and islands.`r if (params$hide_answers) ":::"`## Question 6What are the benefits of using faceting in data visualisation? **(/3)**`r if (params$hide_answers) "::: {.content-hidden}"`**Answer**- ✓ Faceting allows for the visualisation of multiple relationships in a single plot, making it easier to compare relationships between different groups.- ✓ Faceting can help to identify patterns and trends in the data that may not be immediately obvious when looking at the data as a whole.- ✓ Faceting can help to identify differences in relationships between different groups, such as species or islands, allowing for more detailed analysis of the data.`r if (params$hide_answers) ":::"`