5. Statistical Inference
An introduction to inferential statistics
- The concept of inferential statistics
- Hypothesis testing
- Probabilities
- Assumptions and parametric statistics
- Normality and the Shapiro-Wilk test
- Homoscedasticity
- None
Introduction
We have seen in Chapter 2 and Chapter 3 how to summarise, describe, and visualise our data—these processes form part of descriptive statistics. The next step is the process of conducting inferential statistics.
Inferential statistics is a branch of statistics that focuses on drawing conclusions and making generalisations about a larger population based on the analysis of a smaller, representative sample. This is particularly valuable in research situations where it is impractical or impossible to collect data from every member of a population—i.e. all of biology and ecology. By employing probabilistic reasoning, inferential statistics enable us to estimate population parameters, make predictions, and test hypotheses with a certain level of confidence.
One of the key aspects of inferential statistics is the concept of sampling variability. Since samples are only a subset of the population, they imperfectly represent whole populations, leading to variations in the estimates of population parameters (repeatedly drawing samples at random from a population will result in slightly different values for key statistical parameters, such as the sample mean and variance). Inferential statistics accounts for this variability by providing measures of uncertainty, such as confidence intervals and margins of error, which convey the range within which the true population parameter is likely to fall.
Parametric statistics form the foundation of inferential statistics, and they are used to make inferences about population parameters based on sample data. These statistics assume that the data are generated from a specific probability distribution—the normal distribution. An alternative to parametric tests is non-parametric statistics, and we shall hear more about it in Chapter 6.
The most common parametric statistics used in inferential statistics include:
t-tests (Chapter 7) used to determine if there is a significant difference between the means of two groups of continuous dependent (response) variables.
ANOVA (Chapter 8) used to determine if there is a significant difference between the means of three or more groups of continuous variables.
Regression analysis (Chapter 9) used to model the relationship between one or more continuous predictor variables and a continuous response variable.
Pearson correlation (Chapter 10) used to measure the linear association or relationships between two continuous variables.
Chi-squared tests used to determine if there is a significant association between two categorical variables.
These tests typically involve the calculation of a test statistic and the comparison of this value with a critical value and then establishing a p-value to determine whether the results are statistically significant or likely due to chance. These methods are included within a subset of inferential statistics called probablilistic statistics.
Probabilistic and Bayesian statistics are two related but distinct branches of statistics that offer tools for modelling, analysing, and drawing inferences from complex data sets. At their core, both approaches rely on the use of probability theory to quantify uncertainty and variability in data, but they differ in their assumptions about the nature of this uncertainty and how it should be modelled.
Probabilistic statistics is a classical approach that assumes that all sources of variability in a data set can be described by a fixed set of probability distributions, such as the normal distribution or the Poisson distribution. These distributions are characterised by a set of parameters, such as the mean and standard deviation, that can be estimated from the data. Probabilistic statistics is widely used in fields such as biology, physics, and economics, where the data are often assumed to be generated by a deterministic process with some random noise present. In contrast, Bayesian statistics takes a more flexible approach to modelling uncertainty, allowing for uncertainty in both the parameters of the model and the underlying distribution itself. Bayesian methods are useful when dealing with complex and high-dimensional data sets, with lots of unknowns and assumptions, and have become increasingly popular in fields such as ecology and machine learningin recent years.
Hypothesis testing
Hypothesis testing is a fundamental aspect of the scientific method and is used to evaluate the validity of scientific hypotheses. A hypothesis is a proposed explanation for a phenomenon or observation that can be tested through experimentation or observation. To test a hypothesis, we design experiments or collect data, which we analyse using inferential statistical methods to determine whether the data support or refute the hypothesis.
Two competing hypotheses about the data are set up at the onset of hypothesis testing: a null hypothesis (H0) and an alternative hypothesis (Ha). The null hypothesis typically represents the status quo or a default assumption (a statement of no difference), while the alternative hypothesis represents a new or alternative explanation for the data.
The goal is to make objective and evidence-based conclusions about the validity of the hypothesis, and to determine whether it can be accepted or rejected based on the available evidence. Hypothesis testing is a critical tool for advancing scientific knowledge and understanding, as it allows us to identify the most promising hypotheses and develop more accurate models of the natural world. Effectively, scientific progress can only be made if the null hypothesis is rejected and the alternative hypothesis accepted.
Hypotheses and theories are both important components of the scientific process, but they serve different functions and represent distinct levels of understanding.
A hypothesis is a tentative explanation or proposition for a specific phenomenon, often based on observations and grounded in existing knowledge. It is a testable statement that can be either supported or refuted through further observation, experimentation, and hypothesis testing through the application of inferential statistics. Hypotheses are typically formulated at the beginning of a research study. They guide the design of experiments and the collection of data. Hypotheses help us make predictions and answer specific questions about the phenomena under investigation. If a hypothesis is repeatedly tested and confirmed through various experiments, it may gain credibility and contribute to the development of a theory.
A theory is a well-substantiated explanation for a broad range of observed phenomena that has been consistently supported by a large body of evidence. Theories are more comprehensive and mature than hypotheses, as they integrate and generalise multiple related hypotheses and empirical findings to explain complex phenomena. They are built upon a solid foundation of tested hypotheses and provide a coherent framework that enables us to make accurate predictions, generate new hypotheses, and further advance our understanding of the natural world.
At the heart of many basic scientific inquiries, and hence hypotheses, is the simple question “Is A different from B?” The scientific notation for this question is:
- H0: Group A is not different from Group B
- Ha: Group A is different from Group B
More formally, one would say:
-
vs. the alternative hypothesis that -
vs. the alternative hypothesis that -
vs. the alternative hypothesis that
Hypothesis 1 is a two-sided t-test and hypotheses 2 and 3 are one-sided tests. This will make sense once you have studied the material in Chapter 7 about t-tests.
Probabilities
The p-value (the significance level,
In inferential statistics, when conducting hypothesis testing, we don’t “accept” or “prove” the null hypothesis. Instead, we either “reject” or “fail to reject” the null hypothesis based on the evidence provided by our sample data. So, it doesn’t mean the null hypothesis is true, just that there isn’t enough evidence in your sample to reject it.
The choice of p-value at which we reject
Statistical tests indicate a statistically significant outcome (the
We generally refer to
A Type I error is the false rejection of the
The choice of p-value threshold depends on several factors, including the nature of the data, the research question, and the desired level of statistical significance. In medical sciences, where the consequences of false positive or false negative results can have significant implications for patient health, a more stringent threshold is often used. A p-value of 0.001 is commonly used in medical research to minimise the risk of Type I errors (rejecting the null hypothesis when it is actually true) and to ensure a high level of statistical confidence in the results.
In biological sciences, the consequences of false positive or false negative results may be less severe, and a p-value of 0.05 is often considered an appropriate threshold for statistical significance. However, it is important to note that the choice of p-value threshold is ultimately subjective and should be based on a careful consideration of the research question, the nature of the data, and the potential consequences of false positive or false negative results.
To conclude, when
Assumptions
Irrespective of the kind of statistical test we wish to perform, we have to make a couple of important assumptions that are not guaranteed to be true. In fact, these assumptions are often violated because real data, especially biological data, are messy.
The issue of assumption is an important one, and one that we need to understand well. This is will be the purpose of Chapter 6, where we will learn about how to test the assumptions, and discover what to do when it does.
Conclusion
We use inferential statistics to draw conclusions about a population based on a sample of data. By using probability theory and statistical inference, we can make inferences about the characteristics of a larger population with a certain level of confidence. We must always keep the assumptions behind inferential statistics in mind so that we can apply the right statistical test and answer our research question within the limits of what our data can tell us.
In practice, the process works like this:
-
Setting the significance level (
):- Before conducting the test, you decide on a significance level,
, which is the probability of rejecting the null hypothesis when it’s actually true (Type I error). Common choices for are 0.05, 0.01, and 0.10, though the choice is context-dependent.
- Before conducting the test, you decide on a significance level,
-
Conducting the test:
- You then compute the test statistic (like a t-statistic, F-statistic, etc.) based on your sample data.
- This test statistic is then compared to a distribution (like the t-distribution for the t-test) to find the p-value.
-
Interpreting the p-value:
- The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the statistic computed from the sample, assuming that the null hypothesis is true.
- If the p-value is less than
(i.e., below the critical value), then the evidence suggests that the null hypothesis can be rejected in favour of the alternative hypothesis. - If the p-value is greater than
, you fail to reject the null hypothesis. This doesn’t mean the null hypothesis is true, just that there isn’t enough evidence in your sample to reject it.
Reuse
Citation
@online{smit,_a._j.2021,
author = {Smit, A. J.,},
title = {5. {Statistical} {Inference}},
date = {2021-01-01},
url = {http://tangledbank.netlify.app/BCB744/basic_stats/05-inference.html},
langid = {en}
}