library(tidyverse)
library(ggpubr)
# Filter the data for Day 21
chicks <- as_tibble(ChickWeight)
chicks_day21 <- chicks %>%
filter(Time == 21)
# Create the plot
plt1 <- ggplot(chicks_day21, aes(x = Diet, y = weight, fill = Diet)) +
geom_boxplot(alpha = 1.0, outlier.shape = NA) +
stat_summary(fun = mean, geom = "point", shape = 20, size = 3, color = "black") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = 0.2, color = "black") +
theme_minimal(base_size = 14) +
labs(
x = "Diet Group",
y = "Mass (g)"
) +
theme(legend.position = "none")
# Or one with the raw data points included
plt2 <- ggplot(chicks_day21, aes(x = Diet, y = weight, fill = Diet)) +
geom_boxplot(alpha = 1.0, outlier.shape = NA) +
geom_jitter(aes(fill = Diet), shape = 21, colour = "black", alpha = 0.9, width = 0.1) +
theme_minimal(base_size = 14) +
labs(
x = "Diet Group",
y = "Mass (g)"
) +
theme(legend.position = "none")
# Arrange the plots side by side
ggarrange(plt1, plt2, ncol = 2, labels = c("A", "B")) +
theme(plot.title = element_text(hjust = 0.5))
BCB744 Task G
Assessment Sheet
8. Analysis of Variance (ANOVA)
Refer to the lecture material to see the questions in context (see Task G.x: Do it now!).
Questions 1
Why should we not just apply t-tests once per each of the pairs of comparisons we want to make? (/3)
Answer
Applying t-tests repeatedly across multiple pairwise comparisons increases the probability of committing at least one Type I error – that is, falsely rejecting a true null hypothesis. This inflation of the collective error rate arises because each test is conducted at a fixed significance level (e.g., α = 0.05), but the cumulative chance of error grows with the number of tests. So, the overall inference becomes unreliable. Instead, methods such as ANOVA or adjusted p-values (e.g., Bonferroni correction) should be used to control for this multiplicity.
- ✓ Identifying the problem of multiple comparisons.
- ✓ Explaining the consequence: increased risk of Type I error.
- ✓ Suggesting a solution: using ANOVA or adjusted p-values.
Question 2
- What does the outcome say about the chicken masses? Which ones are different from each other? (/2)
- Devise a graphical display of this outcome. (/4)
Answer
-
Interpretation of ANOVA Output
- ✓ (x 2) The one-way ANOVA tests the null hypothesis that all four diet groups have the same mean chicken mass at Day 21. The output shows an F-statistic of 4.655 and a p-value of < 0.05. Since this p-value is less than the conventional 0.05 threshold, we reject the null hypothesis and conclude that there is evidence of a statistically significant difference in mean chicken mass between at least two of the diet groups.
- Note that before accepting the ANOVA results, we should check the assumptions of normality and homogeneity of variance. The ANOVA assumes that the data are normally distributed and that the variances across groups are equal. If these assumptions are violated, the results may not be valid.
- Assuming the assumptions are met, note that the ANOVA alone does not identify which specific diets differ. To determine that, a post hoc comparison – such as Tukey’s HSD test – is needed. Without such pairwise analysis, we cannot yet say which diets are different from one another. A graph may help visualise these differences.
-
Graphical Display
- ✓ (x 4) A suitable figure would be a boxplot of chicken weight at Day 21, stratified by Diet. This graph shows the distribution, central tendency, and spread for each diet group and makes it easy to infer where potential differences may lie. I have created two options: A) showing also the mean ± CI, and B) showing the raw data points.
- Below, we see that Diet 3’s median is visibly higher and its interquartile range (or CI) does not overlap with Diet 1. This suggests a substantive difference. However, the plot should be interpreted alongside formal statistical comparisons (the ANOVA) to avoid misreading potential noise as a signal.
- You could also have done a boxplot showing the mean ± SD, but this is less informative than the boxplot with the mean ± CI.
Question 3
Look at the help file for the TukeyHSD()
function to better understand what the output means.
- How does one interpret the results? What does this tell us about the effect that that different diets has on the chicken weights at Day 21? (/3)
- Figure out a way to plot the Tukey HSD outcomes in ggplot. (/10)
- Why does the ANOVA return a significant result, but the Tukey test shows that not all of the groups are significantly different from one another? (/3)
Answer
-
Interpretation of Tukey HSD
- ✓ (x 3) The Tukey HSD test, below, compares all possible pairs of means to determine which specific groups are different. The output shows the differences in means between each pair of diets, along with the associated confidence intervals and p-values. A significant difference is indicated by a p-value less than 0.05 and a confidence interval that does not include zero.
- ✓ (x 3) The Tukey HSD test results indicate that Diet 1 and Diet 3 are significantly different from each other, as the confidence interval does not include zero and the p-value is less than 0.05. This suggests that Diet 3 leads to a higher mean chicken mass compared to Diet 1. The other diet comparisons do not show significant differences.
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = weight ~ Diet, data = filter(chicks, Time == 21))
$Diet
diff lwr upr p adj
2-1 36.95000 -32.11064 106.01064 0.4868095
3-1 92.55000 23.48936 161.61064 0.0046959
4-1 60.80556 -10.57710 132.18821 0.1192661
3-2 55.60000 -21.01591 132.21591 0.2263918
4-2 23.85556 -54.85981 102.57092 0.8486781
4-3 -31.74444 -110.45981 46.97092 0.7036249
-
Tukey HSD outcomes presented with
ggplot()
.- ✓ (x 10) One way to do it is like this:
# Create a data frame from the Tukey HSD results
tukey_results <- as.data.frame(TukeyHSD(chicks.aov1)$Diet)
tukey_results <- tukey_results %>%
rownames_to_column(var = "Comparison")
# Create the plot
ggplot(tukey_results, aes(x = Comparison, y = diff)) +
geom_errorbar(aes(ymin = lwr, ymax = upr), width = 0.2) +
geom_point(aes(fill = `p adj` < 0.05), size = 3,
colour = "black", shape = 21) +
scale_fill_manual(values = c("deepskyblue2", "deeppink2"),
labels = c("Not Significant", "Significant")) +
labs(
x = "Diet Comparison",
y = "Difference in Means",
title = "Tukey HSD Results"
) +
theme_minimal(base_size = 14) +
theme(legend.position = "none")
-
Why the ANOVA is significant but Tukey is not
- ✓ (x 3) The ANOVA tests whether there is at least one significant difference among the groups, while the Tukey HSD test specifically identifies which groups are different. The ANOVA can be significant even if not all groups are different, as it only requires one group to differ from the others. In this case, the ANOVA was significant, but the Tukey test showed that not all groups were significantly different from each other, indicating that while there is a difference in means, it may not be substantial enough to be statistically significant for all pairs.
Question 4
- How is time having an effect? (/3)
- What hypotheses can we construct around time? (/2)
Answer
-
Time’s effect
- ✓ The effect of time is should be taken as an important influence (independent variable) in the experimental design, because …
- ✓ … we expect the mean chicken mass changes over time as the chickens eat and grow.
- ✓ Since time is experimentally important, it should be considered in the hypothesis that informs the design of the ANOVA model.
-
Hypotheses around time
- ✓ The null hypothesis is that there is no difference in mean chicken mass across different time points.
- ✓ The alternative hypothesis is that there is a difference in mean chicken mass across different time points.
Question 5
- What do you conclude from the above series of ANOVAs? (/3)
- What problem is associated with running multiple tests in the way that we have done here? (/2)
Answer
-
Conclusions from the series of ANOVAs
- ✓ (x 3) The sequential ANOVAs across times 0, 2, 10, and 21 reveal how the effect of diet on chicken mass develops over time. At time 0, there is no evidence of a difference between diet groups (p > 0.05), which is expected, as all chicks begin from a similar baseline before dietary interventions have had time to act. By time 2, a statistically significant effect appears (p < 0.05), and this effect becomes more pronounced at time 10 (p < 0.001). By time 21, the effect remains significant (p < 0.005), though the F-statistic is slightly lower than at time 10. This temporal pattern suggests that dietary effects on body mass begin to diverge early and become more detectable over time. The implication is that there is a growing differentiation in growth trajectories due to diet.
-
Problems with running multiple tests
- ✓ (x 2) Running multiple ANOVAs increases the risk of Type I error, as each test has a chance of falsely rejecting the null hypothesis. This inflation of the error rate can lead to erroneous conclusions about the significance of results. To mitigate this, we should consider using a single ANOVA model that includes time as a factor, rather than running separate tests for each time point. Or, we may use a regression model that includes time as a continuous variable.
Question 6
- Write out the hypotheses for this ANOVA. (/2)
- What do you conclude from the above ANOVA? (/3)
Answer
-
Hypotheses for the ANOVA
- ✓ Null Hypothesis (H₀): The mean chicken mass is the same at all time points; that is, time has no effect on mass. Formally we would write this as: μ₀ = μ₂ = μ₁₀ = μ₂₁.
- ✓ Alternative Hypothesis (H₁): At least one time point has a mean chick mass that differs from the others; time has an effect on chick mass.
-
Conclusions from the ANOVA
- ✓ (x 3) The ANOVA output reports an F-statistic of 234.8 with a p-value far < 0.001. This provides overwhelming evidence against the null hypothesis. So, we must conclude that time has a significant and very strong effect on chicken mass. This is consistent with biological expectations: as chicks grow over time, their mass increase.
- Note, however, that this model ignores diet and treats all chickens as a single population observed at different times. The significant result reflects that growth occurs over time, but it does not tell us whether or how different diets contribute to that growth – only that time, on average, is strongly associated with (causing, in this instance) increasing body mass.
Question 7
- What question are we asking with the above line of code? (/3)
- What is the answer? (/2)
- Why did we wrap
Time
inas.factor()
? (/2)
Answer
-
Question being asked
- ✓ We are testing whether both diet and time, considered as independent explanatory variables, have main effects on chicken mass. Specifically, the model asks:
- ✓ Does mean chicken mass differ between diet groups (irrespective of time)?
- ✓ Does mean mass change between the two time points (Day 0 and Day 21), regardless of diet?
- This model does not test for interaction – i.e., it does not ask whether the effect of time depends on diet or vice versa. It is strictly additive.
- ✓ We are testing whether both diet and time, considered as independent explanatory variables, have main effects on chicken mass. Specifically, the model asks:
-
Answer to the question
- Yes, both diet and time have statistically significant main effects on chicken mass. Specifically:
- ✓ Diet: F = 5.987, p < 0.001 → Evidence that chickens on different diets attain a different average mass.
- ✓ Time: F = 333.120, p < 0.001 → Very strong evidence that chickens weigh more at Day 21 than at Day 0.
- This indicates that both what chickens are fed and how long they’ve been growing are independently associated with their change in mass.
- Yes, both diet and time have statistically significant main effects on chicken mass. Specifically:
-
Why
as.factor()
?- ✓ We wrap
Time
inas.factor()
to treat it as a categorical variable. - ✓ This is important because we want to compare the means of chicken mass at two discrete time points (0 and 21 days), rather than treating time as a continuous variable.
- By converting it to a factor, we ensure that the ANOVA model treats each time point as a separate group for comparison.
- ✓ We wrap
Question 8
How do these results differ from the previous set? (/3)
Answer
- ✓ The results in this model differ from the previous set by the inclusion of an interaction term –
Diet:as.factor(Time)
– which tests whether the effect of time on chicken mass depends on the diet group. The previous model (withDiet + Time
) only assumed that the effects of diet and time were additive and independent. - ✓ Here, the significant interaction term (F = 4.348, p < 0.005) indicates that the change in mass between Day 4 and Day 21 is not uniform across all diets. So, some diets may lead to more rapid mass gain over time than others. This result qualifies and complicates the earlier interpretations: diet and time each have significant main effects, but those effects are not simply additive – they vary depending on how the two variables combine.
- ✓ Modelling this interaction allows for a more realistic biological scenario… growth trajectories may diverge not just due to time or diet alone, but due to their joint influence.
Question 9
Yikes! That’s a massive amount of results. What does all of this mean, and why is it so verbose? (/5)
Answer
- ✓ (3) This Tukey HSD shows pairwise comparisons for all combinations of diet and time, together with their interactions. It adjusts for multiple testing to control the combined error rate that would arise from multiple comparisons. The verbosity arises because it calculates differences, confidence intervals, and adjusted p-values for every possible pair, across the main effects and their interaction.
- ✓ (2) We conclude that only a subset of these comparisons are statistically significant (e.g., some differences between Diet 3 and 2 at Time 20), while most are not.
- Although informative, such detailed and highly specific output can overwhelm interpretation without visual aids or being very clear about our hypotheses. We would seldom really use all of this information in practice. Instead, we would focus on the most relevant comparisons that address our specific research questions.
Reuse
Citation
@online{smit,_a._j.,
author = {Smit, A. J.,},
title = {BCB744 {Task} {G}},
url = {http://tangledbank.netlify.app/assessments/BCB744_Task_G.html},
langid = {en}
}