---
date: "2021-01-01"
title: "13. Confidence Intervals"
subtitle: ""
---
```{r code-brewing-opts, echo=FALSE}
knitr::opts_chunk$set(
comment = "R>",
warning = FALSE,
message = FALSE,
fig.width = 4.5,
fig.height = 2.625,
out.width = "75%",
fig.asp = NULL, # control via width/height
dpi = 300
)
ggplot2::theme_set(
ggplot2::theme_minimal(base_size = 8)
)
ggplot2::theme_set(
ggplot2::theme_bw(base_size = 8)
)
```
<!-- # Confidence intervals -->
<iframe width="750" height="422" src="https://ajsmit.github.io/BCB744/Confidence_intervals_slides.html" frameborder="0" allowfullscreen></iframe>
# Introduction
A confidence interval (CI) tells us within what range we may be certain to find the true mean from which any sample has been taken. If we were to repeatedly sample the same population over and over and calculated a mean every time, the 95% CI indicates the range that 95% of those means would fall into.
# Calculating Confidence Intervals
```{r code-input}
Input <- ("
Student Grade Teacher Score Rating
a Gr_1 Vladimir 80 7
b Gr_1 Vladimir 90 10
c Gr_1 Vladimir 100 9
d Gr_1 Vladimir 70 5
e Gr_1 Vladimir 60 4
f Gr_1 Vladimir 80 8
g Gr_10 Vladimir 70 6
h Gr_10 Vladimir 50 5
i Gr_10 Vladimir 90 10
j Gr_10 Vladimir 70 8
k Gr_1 Sadam 80 7
l Gr_1 Sadam 90 8
m Gr_1 Sadam 90 8
n Gr_1 Sadam 80 9
o Gr_10 Sadam 60 5
p Gr_10 Sadam 80 9
q Gr_10 Sadam 70 6
r Gr_1 Donald 100 10
s Gr_1 Donald 90 10
t Gr_1 Donald 80 8
u Gr_1 Donald 80 7
v Gr_1 Donald 60 7
w Gr_10 Donald 60 8
x Gr_10 Donald 80 10
y Gr_10 Donald 70 7
z Gr_10 Donald 70 7
")
data <- read.table(textConnection(Input), header = TRUE)
summary(data)
```
The package **rcompanion** has a convenient function for estimating the confidence intervals for our sample data. The function is called `groupwiseMean()` and it has a few options (methods) for estimating the confidence intervals, *e.g.* the 'traditional' way using the *t*-distribution, and a bootstrapping procedure.
Let us produce the confidence intervals using the traditional method for the group means:
```{r code-library-rcompanion}
library(rcompanion)
# Ungrouped data are indicated with a 1 on the right side of the formula,
# or the group = NULL argument; so, this produces the overall mean
groupwiseMean(Score ~ 1, data = data, conf = 0.95, digits = 3)
# One-way data
groupwiseMean(Score ~ Grade, data = data, conf = 0.95, digits = 3)
# Two-way data
groupwiseMean(Score ~ Teacher + Grade, data = data, conf = 0.95, digits = 3)
```
Now let us do it through bootstrapping:
```{r code-groupwisemean-score-grade}
groupwiseMean(Score ~ Grade,
data = data,
conf = 0.95,
digits = 3,
R = 10000,
boot = TRUE,
traditional = FALSE,
normal = FALSE,
basic = FALSE,
percentile = FALSE,
bca = TRUE)
groupwiseMean(Score ~ Teacher + Grade,
data = data,
conf = 0.95,
digits = 3,
R = 10000,
boot = TRUE,
traditional = FALSE,
normal = FALSE,
basic = FALSE,
percentile = FALSE,
bca = TRUE)
```
These upper and lower limits may then be used easily within a figure.
```{r fig-library-tidyverse, fig.cap="A very basic figure showing confidence intervals (CI) for a random normal distribution.", message=FALSE, warning=FALSE}
# Load libraries
library(tidyverse)
# Create dummy data
r_dat <- data.frame(value = rnorm(n = 20, mean = 10, sd = 2),
sample = rep("A", 20))
# Create basic plot
ggplot(data = r_dat, aes(x = sample, y = value)) +
geom_errorbar(aes(ymin = mean(value) - sd(value), ymax = mean(value) + sd(value))) +
geom_jitter(colour = "firebrick1")
```
# CI of Compared Means
AS stated above, we may also use CI to investigate the difference in means between two, more sample sets of data. We have already seen this in the ANOVA Chapter but we shall look at it again here with our now expanded understanding of the concept.
```{r fig-iris-aov-aov-sepal-length, fig.cap="Results of a post-hoc Tukey test showing the confidence interval for the effect size between each group."}
# First calculate ANOVA of seapl length of different iris species
iris_aov <- aov(Sepal.Length ~ Species, data = iris)
# Then run a Tukey test
iris_Tukey <- TukeyHSD(iris_aov)
# Lastly use base R to quickly plot the results
plot(iris_Tukey)
```