13. Confidence Intervals

Author

Affiliation

Published

January 1, 2021

Introduction

A confidence interval (CI) tells us within what range we may be certain to find the true mean from which any sample has been taken. If we were to repeatedly sample the same population over and over and calculated a mean every time, the 95% CI indicates the range that 95% of those means would fall into.

Calculating confidence intervals

Input <- ("
Student  Grade   Teacher   Score  Rating
a        Gr_1    Vladimir  80     7
b        Gr_1    Vladimir  90    10
c        Gr_1    Vladimir 100     9
d        Gr_1    Vladimir  70     5
e        Gr_1    Vladimir  60     4
f        Gr_1    Vladimir  80     8
g        Gr_10   Vladimir  70     6
h        Gr_10   Vladimir  50     5
i        Gr_10   Vladimir  90    10
j        Gr_10   Vladimir  70     8
k        Gr_1    Sadam     80     7
l        Gr_1    Sadam     90     8
m        Gr_1    Sadam     90     8
n        Gr_1    Sadam     80     9
o        Gr_10   Sadam     60     5
p        Gr_10   Sadam     80     9
q        Gr_10   Sadam     70     6
r        Gr_1    Donald   100    10
s        Gr_1    Donald    90    10
t        Gr_1    Donald    80     8
u        Gr_1    Donald    80     7
v        Gr_1    Donald    60     7
w        Gr_10   Donald    60     8
x        Gr_10   Donald    80    10
y        Gr_10   Donald    70     7
z        Gr_10   Donald    70     7
")

data <- read.table(textConnection(Input), header = TRUE)
summary(data)

   Student             Grade             Teacher              Score       
 Length:26          Length:26          Length:26          Min.   : 50.00  
 Class :character   Class :character   Class :character   1st Qu.: 70.00  
 Mode  :character   Mode  :character   Mode  :character   Median : 80.00  
                                                          Mean   : 76.92  
                                                          3rd Qu.: 87.50  
                                                          Max.   :100.00  
     Rating      
 Min.   : 4.000  
 1st Qu.: 7.000  
 Median : 8.000  
 Mean   : 7.615  
 3rd Qu.: 9.000  
 Max.   :10.000

The package rcompanion has a convenient function for estimating the confidence intervals for our sample data. The function is called groupwiseMean() and it has a few options (methods) for estimating the confidence intervals, e.g. the ‘traditional’ way using the t-distribution, and a bootstrapping procedure.

Let us produce the confidence intervals using the traditional method for the group means:

library(rcompanion)
# Ungrouped data are indicated with a 1 on the right side of the formula,
# or the group = NULL argument; so, this produces the overall mean
groupwiseMean(Score ~ 1, data = data, conf = 0.95, digits = 3)

   .id  n Mean Conf.level Trad.lower Trad.upper
1 <NA> 26 76.9       0.95       71.7       82.1

# One-way data
groupwiseMean(Score ~ Grade, data = data, conf = 0.95, digits = 3)

  Grade  n Mean Conf.level Trad.lower Trad.upper
1  Gr_1 15   82       0.95       75.3       88.7
2 Gr_10 11   70       0.95       62.6       77.4

# Two-way data
groupwiseMean(Score ~ Teacher + Grade, data = data, conf = 0.95, digits = 3)

   Teacher Grade n Mean Conf.level Trad.lower Trad.upper
1   Donald  Gr_1 5   82       0.95       63.6      100.0
2   Donald Gr_10 4   70       0.95       57.0       83.0
3    Sadam  Gr_1 4   85       0.95       75.8       94.2
4    Sadam Gr_10 3   70       0.95       45.2       94.8
5 Vladimir  Gr_1 6   80       0.95       65.2       94.8
6 Vladimir Gr_10 4   70       0.95       44.0       96.0

Now let us do it through bootstrapping:

groupwiseMean(Score ~ Grade,
              data = data,
              conf = 0.95,
              digits = 3,
              R = 10000,
              boot = TRUE,
              traditional = FALSE,
              normal = FALSE,
              basic = FALSE,
              percentile = FALSE,
              bca = TRUE)

  Grade  n Mean Boot.mean Conf.level Bca.lower Bca.upper
1  Gr_1 15   82        82       0.95      74.7      86.7
2 Gr_10 11   70        70       0.95      62.7      75.5

groupwiseMean(Score ~ Teacher + Grade,
              data = data,
              conf = 0.95,
              digits = 3,
              R = 10000,
              boot = TRUE,
              traditional = FALSE,
              normal = FALSE,
              basic = FALSE,
              percentile = FALSE,
              bca = TRUE)

   Teacher Grade n Mean Boot.mean Conf.level Bca.lower Bca.upper
1   Donald  Gr_1 5   82        82       0.95      68.0      90.0
2   Donald Gr_10 4   70        70       0.95      62.5      75.0
3    Sadam  Gr_1 4   85        85       0.95      80.0      87.5
4    Sadam Gr_10 3   70        70       0.95      60.0      76.7
5 Vladimir  Gr_1 6   80        80       0.95      68.3      88.3
6 Vladimir Gr_10 4   70        70       0.95      55.0      80.0

These upper and lower limits may then be used easily within a figure.

# Load libraries
library(tidyverse)

# Create dummy data
r_dat <- data.frame(value = rnorm(n = 20, mean = 10, sd = 2),
                    sample = rep("A", 20))

# Create basic plot
ggplot(data = r_dat, aes(x = sample, y = value)) +
  geom_errorbar(aes(ymin = mean(value) - sd(value), ymax = mean(value) + sd(value))) +
  geom_jitter(colour = "firebrick1")

A very basic figure showing confidence intervals (CI) for a random normal distribution.

CI of compared means

AS stated above, we may also use CI to investigate the difference in means between two or more sample sets of data. We have already seen this in the ANOVA Chapter, but we shall look at it again here with our now expanded understanding of the concept.

# First calculate ANOVA of seapl length of different iris species
iris_aov <- aov(Sepal.Length ~ Species, data = iris)

# Then run a Tukey test
iris_Tukey <- TukeyHSD(iris_aov)

# Lastly use base R to quickly plot the results
plot(iris_Tukey)

Results of a post-hoc Tukey test showing the confidence interval for the effect size between each group.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2021,
  author = {Smit, A. J.,},
  title = {13. {Confidence} {Intervals}},
  date = {2021-01-01},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/13-confidence.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J. (2021) 13. Confidence Intervals. http://tangledbank.netlify.app/BCB744/basic_stats/13-confidence.html.

--- date: "2021-01-01" title: "13. Confidence Intervals" subtitle: "" ---  <iframe width="750" height="422" src="https://ajsmit.github.io/BCB744/Confidence_intervals_slides.html" frameborder="0" allowfullscreen></iframe> # Introduction A confidence interval (CI) tells us within what range we may be certain to find the true mean from which any sample has been taken. If we were to repeatedly sample the same population over and over and calculated a mean every time, the 95% CI indicates the range that 95% of those means would fall into. # Calculating confidence intervals ```{r} Input <- (" Student Grade Teacher Score Rating a Gr_1 Vladimir 80 7 b Gr_1 Vladimir 90 10 c Gr_1 Vladimir 100 9 d Gr_1 Vladimir 70 5 e Gr_1 Vladimir 60 4 f Gr_1 Vladimir 80 8 g Gr_10 Vladimir 70 6 h Gr_10 Vladimir 50 5 i Gr_10 Vladimir 90 10 j Gr_10 Vladimir 70 8 k Gr_1 Sadam 80 7 l Gr_1 Sadam 90 8 m Gr_1 Sadam 90 8 n Gr_1 Sadam 80 9 o Gr_10 Sadam 60 5 p Gr_10 Sadam 80 9 q Gr_10 Sadam 70 6 r Gr_1 Donald 100 10 s Gr_1 Donald 90 10 t Gr_1 Donald 80 8 u Gr_1 Donald 80 7 v Gr_1 Donald 60 7 w Gr_10 Donald 60 8 x Gr_10 Donald 80 10 y Gr_10 Donald 70 7 z Gr_10 Donald 70 7 ") data <- read.table(textConnection(Input), header = TRUE) summary(data) ``` The package **rcompanion** has a convenient function for estimating the confidence intervals for our sample data. The function is called `groupwiseMean()` and it has a few options (methods) for estimating the confidence intervals, e.g. the 'traditional' way using the *t*-distribution, and a bootstrapping procedure. Let us produce the confidence intervals using the traditional method for the group means: ```{r} library(rcompanion) # Ungrouped data are indicated with a 1 on the right side of the formula, # or the group = NULL argument; so, this produces the overall mean groupwiseMean(Score ~ 1, data = data, conf = 0.95, digits = 3) # One-way data groupwiseMean(Score ~ Grade, data = data, conf = 0.95, digits = 3) # Two-way data groupwiseMean(Score ~ Teacher + Grade, data = data, conf = 0.95, digits = 3) ``` Now let us do it through bootstrapping: ```{r} groupwiseMean(Score ~ Grade, data = data, conf = 0.95, digits = 3, R = 10000, boot = TRUE, traditional = FALSE, normal = FALSE, basic = FALSE, percentile = FALSE, bca = TRUE) groupwiseMean(Score ~ Teacher + Grade, data = data, conf = 0.95, digits = 3, R = 10000, boot = TRUE, traditional = FALSE, normal = FALSE, basic = FALSE, percentile = FALSE, bca = TRUE) ``` These upper and lower limits may then be used easily within a figure. ```{r, fig.cap="A very basic figure showing confidence intervals (CI) for a random normal distribution.", message=FALSE, warning=FALSE} # Load libraries library(tidyverse) # Create dummy data r_dat <- data.frame(value = rnorm(n = 20, mean = 10, sd = 2), sample = rep("A", 20)) # Create basic plot ggplot(data = r_dat, aes(x = sample, y = value)) + geom_errorbar(aes(ymin = mean(value) - sd(value), ymax = mean(value) + sd(value))) + geom_jitter(colour = "firebrick1") ``` # CI of compared means AS stated above, we may also use CI to investigate the difference in means between two or more sample sets of data. We have already seen this in the ANOVA Chapter, but we shall look at it again here with our now expanded understanding of the concept. ```{r, fig.cap="Results of a post-hoc Tukey test showing the confidence interval for the effect size between each group."} # First calculate ANOVA of seapl length of different iris species iris_aov <- aov(Sepal.Length ~ Species, data = iris) # Then run a Tukey test iris_Tukey <- TukeyHSD(iris_aov) # Lastly use base R to quickly plot the results plot(iris_Tukey) ```