11. Parametric Tests

A recap of inferential statistics

Author

AJ Smit

Published

Invalid Date

In this Chapter

New users sometimes find it challenging to select the right statistical test for their data. Here, I provide guides that might help you make the right choice.

Cheatsheet

Find here a Cheatsheet on statistical methods. The cheatsheet summarises much of what I outline below in one convenient PDF.

The importance of selecting the correct test

Selecting the appropriate inferential statistical method is important for correctly and accurately analysing the outcome of our sampling campaign or experimental treatment. The decision typically hinges on the type and distribution of our data, our research question or hypothesis, and the assumptions each test requires.

The main decision-making process starts with the following considerations:

Research Question/Hypothesis: Start by clearly defining what we’re trying to investigate or determine. Are we comparing group means? Investigating relationships between variables? Or assessing associations between categorical variables?
Type and Distribution of Data: Identify the types of variables we have (e.g., continuous, ordinal, nominal) and check the distribution of our data (e.g., normal vs. non-normal).

The foundation of the scientific process is hypotheses. These are the propositions or expectations that we set out to test. A hypothesis provides a direction to our research and guides us towards what we aim to prove.

The next step is to anticipate the nature of the data that our research will generate. This involves understanding not just the type of data (e.g., continuous, categorical), but also its potential distribution and variability. Such foresight stems from a clear understanding of the research design, the instruments we use, and the population we study. This might seem daunting to a novice, but experienced scientists should be able to do this with ease.

Once we have a firm grip on our hypotheses and a clear anticipation of the nature of our forthcoming data, we are in a position to choose the most suitable statistical inference test. Different tests are designed to handle different types of data and answer varied research questions. For instance, a t-test might be appropriate for comparing the means of two groups, while a linear model might shed insight into cause-effect relationships.

Well-defined scientific enquiry should offer clarity. With this clarity, we can predict the statistical tests to use, even before the actual data are available. This is not just an academic exercise; it reflects thorough planning and a deep understanding of the research process. Knowing which tests to employ ahead of time also helps one to design the research methodology and ensure the data collected will indeed serve the purpose of the study.

A robust scientific approach requires us to anticipate the nature of our data and understand our hypotheses thoroughly. This ensures that, even before our data are available, we’re prepared with the appropriate statistical tools to analyse it and draw meaningful conclusions.

A detailed breakdown of inferential statistical tests

Here is a moderately detailed breakdown of the tests you’ll encounter in this module. Also included are tests that I have not (yet) covered, including Generalised Linear Models (GLMs), Generalised Additive Models (GAMs), and non-Linear Regressions.

t-tests:
- Used to compare means between two groups.
- Assumes independent samples, normally distributed data, and homogeneity of variance.
- If the data are paired (e.g., before and after scores from the same group), then a paired t-test is used.
- If assumptions are not valid, use the Wilcoxon rank-sum (in lieu of a paired sample t-test) test or Mann-Whitney U test (in lieu of a Student or Welch’s t-test).
ANOVA (Analysis of Variance):
- Used to compare means of three or more independent groups.
- Assumes independence, normal distribution, and homogeneity of variance across groups.
- If assumptions are violated, consider a non-parametric equivalent (e.g., Kruskal-Wallis).
ANCOVA (Analysis of Covariance):
- Extends ANOVA by including one or more continuous covariates that might account for variability in the dependent variable.
- Used to compare means of independent groups while statistically controlling for the effects of other continuous variables (covariates).
- If assumptions are violated, consider a non-parametric equivalent (e.g., Kruskal-Wallis).
Chi-square Analysis:
- Used for testing relationships between categorical variables.
- Assumes that observations are independent and that there are adequate expected frequencies in each cell of a contingency table.
Linear Regression:
- Examines the linear relationship between a continuous dependent variable and one or more independent variables.
- Causality is typically implied (independent variable influences the outcome or measurement).
- Assumes linearity, independence of observations, homoscedasticity, and normally distributed residuals.
Generalised Linear Model (GLM):
- An extension of linear regression that allows for response variables with error distribution models other than a normal distribution (e.g., Poisson, binomial).
- Useful when dealing with non-normally distributed dependent variables.
non-Linear Regression:
- Used to model non-linear relationships which are described by cause-effect responses that are underpinned by well-defined mechanistic models or responses, often with parameter estimates that relate to components of the mechanistic model.
- Assumes independence of observations, homoscedasticity, and normally distributed residuals.
Generalised Additive Models (GAM):
- Used to model non-linear relationships. It’s an extension of GLM but doesn’t restrict the relationship to be linear.
- Allows for flexible curves to be fit to data.
Correlations:
- Used to examine the strength and direction of the linear relationship between two continuous variables.
  - Pearson’s: Assumes a linear relationship and that both variables are normally distributed.
  - Spearman’s: Used when the relationship is monotonic but not necessarily linear, or when one/both of the variables are ordinal.
  - Kendall’s: Similar to Spearman’s but based on the concordant and discordant pairs. Useful for smaller sample sizes or when there are many tied ranks.

Remember to always visualise your data and examine it thoroughly before selecting a test. If unsure, consider consulting with a statistician who can guide the decision-making process.

A tabulated view

A tabulated summary of these tests is included below. Refer to 12. Non-parametric statistical tests at a glance for information about non-parametric tests to use when assumptions fail.

Statistic	Application	Data Requirements	Assumptions
t-tests	Compare means between two groups.	Continuous dependent, categorical independent (2 groups).	Independent samples, normal distribution, homogeneity of variance.
ANOVA	Compare means of three or more independent groups.	Continuous dependent, categorical independent (3+ groups).	Independence, normal distribution, homogeneity of variance across groups.
ANCOVA	Compare means while controlling for other continuous variables.	Continuous dependent, categorical and continuous independents.	Same as ANOVA plus linearity and homogeneity of regression slopes.
Chi-square Analysis	Test relationships between categorical variables.	Categorical variables.	Independent observations, adequate expected frequencies in each cell.
Linear Regression	Examine linear relationship between continuous variables.	Continuous dependent and independent(s).	Linearity, independence, homoscedasticity, normally distributed residuals.
Non-linear Regression	Model relationships that follow a specific non-linear equation.	Continuous dependent and independent(s).	Specific to the equation/form used, residuals should be random and normally distributed around zero.
Generalised Linear Model (GLM)	Model relationships for non-normally distributed dependent variables.	Depending on link function (e.g., continuous, binary).	Depending on family (e.g., binomial: binary dependent; Poisson: count dependent).
Generalised Additive Models (GAM)	Model non-linear relationships flexibly.	Continuous dependent, continuous/categorical independents.	Depending on response distribution but more flexible regarding the form of the predictors.
Pearson’s Correlation	Measure linear association between two continuous variables.	Two continuous variables.	Both variables should be normally distributed, linear relationship.
Spearman’s Correlation	Measure monotonic relationship between two ordinal/continuous variables.	Two ordinal/continuous variables.	Monotonic relationship. Doesn’t assume normality.
Kendall’s Tau	Measure association between two ordinal variables.	Two ordinal variables.	No specific distributional assumptions. Measures strength of association based on concordant/discordant pairs.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.,
  author = {Smit, A. J., and Smit, AJ},
  title = {11. {Parametric} {Tests}},
  date = {},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/11-decision_guide.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., Smit A 11. Parametric Tests. http://tangledbank.netlify.app/BCB744/basic_stats/11-decision_guide.html.

--- author: "AJ Smit" title: "11. Parametric Tests" date: "`r Sys.Date()`" subtitle: "A recap of inferential statistics" --- ::: {.callout-note appearance="simple"} ## In this Chapter New users sometimes find it challenging to select the right statistical test for their data. Here, I provide guides that might help you make the right choice. ::: ::: {.callout-note appearance="simple"} ## Cheatsheet Find here a [Cheatsheet](../../docs/Methods_cheatsheet_v1.pdf) on statistical methods. The cheatsheet summarises much of what I outline below in one convenient PDF. ::: ## The importance of selecting the correct test Selecting the appropriate inferential statistical method is important for correctly and accurately analysing the outcome of our sampling campaign or experimental treatment. The decision typically hinges on the **[type](01-data-in-R.qmd) and [distribution](04-distributions.qmd)** of our data, our **research question or [hypothesis](05-inference.qmd)**, and the **[assumptions](06-assumptions.qmd)** each test requires. The main decision-making process starts with the following considerations: 1. **Research Question/Hypothesis**: Start by clearly defining what we're trying to investigate or determine. Are we comparing group means? Investigating relationships between variables? Or assessing associations between categorical variables? 2. **Type and Distribution of Data**: Identify the types of variables we have (e.g., continuous, ordinal, nominal) and check the distribution of our data (e.g., normal vs. non-normal). The foundation of the scientific process is hypotheses. These are the propositions or expectations that we set out to test. A hypothesis provides a direction to our research and guides us towards what we aim to prove. The next step is to anticipate the nature of the data that our research will generate. This involves understanding not just the type of data (e.g., continuous, categorical), but also its potential distribution and variability. Such foresight stems from a clear understanding of the research design, the instruments we use, and the population we study. This might seem daunting to a novice, but experienced scientists should be able to do this with ease. Once we have a firm grip on our hypotheses and a clear anticipation of the nature of our forthcoming data, we are in a position to choose the most suitable statistical inference test. Different tests are designed to handle different types of data and answer varied research questions. For instance, a *t*-test might be appropriate for comparing the means of two groups, while a linear model might shed insight into cause-effect relationships. Well-defined scientific enquiry should offer clarity. With this clarity, we can predict the statistical tests to use, even before the actual data are available. This is not just an academic exercise; it reflects thorough planning and a deep understanding of the research process. Knowing which tests to employ ahead of time also helps one to design the research methodology and ensure the data collected will indeed serve the purpose of the study. A robust scientific approach requires us to anticipate the nature of our data and understand our hypotheses thoroughly. This ensures that, even before our data are available, we're prepared with the appropriate statistical tools to analyse it and draw meaningful conclusions. ## A detailed breakdown of inferential statistical tests Here is a moderately detailed breakdown of the tests you'll encounter in this module. Also included are tests that I have not (yet) covered, including Generalised Linear Models (GLMs), Generalised Additive Models (GAMs), and non-Linear Regressions. - [***t*-tests**](07-t_tests.qmd): - Used to compare means between two groups. - Assumes independent samples, normally distributed data, and homogeneity of variance. - If the data are paired (e.g., before and after scores from the same group), then a paired *t*-test is used. - If assumptions are not valid, use the Wilcoxon rank-sum (*in lieu* of a paired sample *t*-test) test or Mann-Whitney U test (*in lieu* of a Student or Welch's *t*-test). - **[ANOVA (Analysis of Variance)](08-anova.qmd)**: - Used to compare means of three or more independent groups. - Assumes independence, normal distribution, and homogeneity of variance across groups. - If assumptions are violated, consider a non-parametric equivalent (e.g., Kruskal-Wallis). - **ANCOVA (Analysis of Covariance)**: - Extends ANOVA by including one or more continuous covariates that might account for variability in the dependent variable. - Used to compare means of independent groups while statistically controlling for the effects of other continuous variables (covariates). - If assumptions are violated, consider a non-parametric equivalent (e.g., Kruskal-Wallis). - **Chi-square Analysis**: - Used for testing relationships between categorical variables. - Assumes that observations are independent and that there are adequate expected frequencies in each cell of a contingency table. - **[Linear Regression](09-regressions.qmd)**: - Examines the linear relationship between a continuous dependent variable and one or more independent variables. - Causality is typically implied (independent variable influences the outcome or measurement). - Assumes linearity, independence of observations, homoscedasticity, and normally distributed residuals. - **Generalised Linear Model (GLM)**: - An extension of linear regression that allows for response variables with error distribution models other than a normal distribution (e.g., Poisson, binomial). - Useful when dealing with non-normally distributed dependent variables. - **non-Linear Regression**: - Used to model non-linear relationships which are described by cause-effect responses that are underpinned by well-defined mechanistic models or responses, often with parameter estimates that relate to components of the mechanistic model. - Assumes independence of observations, homoscedasticity, and normally distributed residuals. - **Generalised Additive Models (GAM)**: - Used to model non-linear relationships. It's an extension of GLM but doesn't restrict the relationship to be linear. - Allows for flexible curves to be fit to data. - **[Correlations](10-correlations.qmd)**: - Used to examine the strength and direction of the linear relationship between two continuous variables. - **[Pearson's](10-correlations.html#pearson-correlation)**: Assumes a linear relationship and that both variables are normally distributed. - **[Spearman's](10-correlations.html#spearman-rank-correlation)**: Used when the relationship is monotonic but not necessarily linear, or when one/both of the variables are ordinal. - **[Kendall's](10-correlations.html#kendall-rank-correlation)**: Similar to Spearman's but based on the concordant and discordant pairs. Useful for smaller sample sizes or when there are many tied ranks. Remember to always visualise your data and examine it thoroughly before selecting a test. If unsure, consider consulting with a statistician who can guide the decision-making process. ## A tabulated view A tabulated summary of these tests is included below. Refer to [12. Non-parametric statistical tests at a glance](12-glance.qmd) for information about non-parametric tests to use when assumptions fail. | **Statistic** | **Application** | **Data Requirements** | **Assumptions** | |--------------------------|--------------------------------------------------------------------------------------|------------------------------------------|-----------------------------------------------------------------------------------------------------| | [***t*-tests**](07-t_tests.qmd) | Compare means between two groups. | Continuous dependent, categorical independent (2 groups). | Independent samples, normal distribution, homogeneity of variance. | | **[ANOVA](08-anova.qmd)** | Compare means of three or more independent groups. | Continuous dependent, categorical independent (3+ groups). | Independence, normal distribution, homogeneity of variance across groups. | | **ANCOVA** | Compare means while controlling for other continuous variables. | Continuous dependent, categorical and continuous independents. | Same as ANOVA plus linearity and homogeneity of regression slopes. | | **Chi-square Analysis** | Test relationships between categorical variables. | Categorical variables. | Independent observations, adequate expected frequencies in each cell. | | **[Linear Regression](09-regressions.qmd)** | Examine linear relationship between continuous variables. | Continuous dependent and independent(s). | Linearity, independence, homoscedasticity, normally distributed residuals. | | **Non-linear Regression**| Model relationships that follow a specific non-linear equation. | Continuous dependent and independent(s). | Specific to the equation/form used, residuals should be random and normally distributed around zero. | | **Generalised Linear Model (GLM)** | Model relationships for non-normally distributed dependent variables. | Depending on link function (e.g., continuous, binary). | Depending on family (e.g., binomial: binary dependent; Poisson: count dependent). | | **Generalised Additive Models (GAM)** | Model non-linear relationships flexibly. | Continuous dependent, continuous/categorical independents. | Depending on response distribution but more flexible regarding the form of the predictors. | | **[Pearson's Correlation](10-correlations.html#pearson-correlation)** | Measure linear association between two continuous variables. | Two continuous variables. | Both variables should be normally distributed, linear relationship. | | **[Spearman's Correlation](10-correlations.html#spearman-rank-correlation)**| Measure monotonic relationship between two ordinal/continuous variables. | Two ordinal/continuous variables. | Monotonic relationship. Doesn't assume normality. | | **[Kendall's Tau](10-correlations.html#kendall-rank-correlation)** | Measure association between two ordinal variables. | Two ordinal variables. | No specific distributional assumptions. Measures strength of association based on concordant/discordant pairs. |