15: Pseudoreplication

Author

A. J. Smit

Published

2026/03/18

1 Introduction

In the design of ecological experiments, the integrity of our conclusions hinges on the statistical validity of our tests. A foundational concept in this regard is replication—the repetition of an experiment on independent units. However, a common and critical error known as pseudoreplication can undermine our statistical inferences and lead to spurious conclusions.

Here I address the concept of pseudoreplication. I reference the seminal paper by Hurlbert (1984), which brought this issue to the forefront of ecological science. A copy of the paper is available for download here:

Hurlbert, S. H. (1984). Pseudoreplication and the design of ecological field experiments. Ecological Monographs, 54(2), 187–211.

2 What is Pseudoreplication?

Hurlbert defines pseudoreplication as…

“…the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent” (Hurlbert (1984)).

So, pseudoreplication occurs when we treat multiple samples from the same experimental unit as if they are independent replicates. For example, taking ten water samples from a single pond that has been treated with a fertilizer does not give us ten replicates of the fertilization effect. It gives us ten subsamples from one replicate — there is only one pond, so we have one replicate. A statistical test that treats these subsamples as independent replicates commits the error of pseudoreplication.

3 Why is it a Problem?

The main issue with pseudoreplication is that it leads to an invalid estimation of the true experimental error. The variation among subsamples within a single experimental unit is almost always less than the variation among the experimental units themselves. By using this smaller, inappropriate variance as the error term in a statistical test (like a t-test or ANOVA), we artificially inflate our statistical power and dramatically increase the probability of a Type I error — that is, falsely concluding that a treatment effect exists when it does not.

Hurlbert argues that this is not a minor statistical fail; it is a fundamental flaw in experimental design that invalidates the conclusions drawn. He found that a large percentage of published ecological studies at the time were guilty of pseudoreplication and it placed in doubt the findings of a large proportion of experimental studies. I see this problem continuing today, even among seasoned academics.

4 Simple Examples

Each example identifies (i) the experimental unit, (ii) what is incorrectly treated as a replicate, and (iii) how this mismatch distorts the estimate of experimental error and the resulting inference.

Hurlbert provides several clear examples to illustrate the concept.

4.1 Example 1: Comparing Two Ponds

Incorrect Design (Simple Pseudoreplication): Imagine you want to test if a new fish food increases fish growth. You have two ponds. You add the new food to Pond A and use Pond B as a control. After three months, you sample 50 fish from each pond and run a t-test on their weights.
The Flaw: The analysis treats the 50 fish as independent replicates, but the experimental unit is the pond. All fish within a pond share the same environment and treatment, so they do not provide independent information about the treatment effect. The test therefore uses within-pond variation as if it were between-replicate variation, which underestimates the error term and inflates the test statistic. The result cannot distinguish a treatment effect from inherent differences between the two ponds.
Correct Design: To properly test the hypothesis, you would need multiple ponds for each treatment (e.g., five ponds with the new food and five control ponds). You would then sample fish from each of the ten ponds and use the mean weight per pond as your data points for the t-test (n=5 for each group).

4.2 Example 2: Interspersion

Incorrect Design (Segregated Treatments): You want to test the effect of an insecticide on an insect population in a field. You divide the field in half, spray one half (Treatment), and leave the other half unsprayed (Control). You then take 20 sweep net samples from each half.
The Flaw: The analysis treats the 20 samples within each half of the field as independent replicates, but the experimental unit is the field plot at the scale of treatment application. Because treatments are spatially segregated, any underlying environmental gradient is confounded with the treatment. The test therefore attributes spatial variation to the treatment effect, which biases the estimate of experimental error and leads to unreliable inference.
Correct Design: You should divide the field into a grid of smaller plots and randomly assign the insecticide or control treatment to each plot in an interspersed manner. Each plot is now an independent experimental unit.

4.3 Example 3: Seasonal Sampling (Temporal Pseudoreplication)

Incorrect Design (Temporal Pseudoreplication): You want to test whether a nutrient addition increases algal biomass in a lake. You apply the treatment to a single lake and measure chlorophyll-a concentration monthly over one year. You then compare the 12 monthly measurements after treatment to 12 monthly measurements from the previous year (control period) using a t-test.
The Flaw: The analysis treats each monthly observation as an independent replicate of the treatment effect. However, all measurements come from the same lake, so the experimental unit is the lake, not the month. The observations are temporally autocorrelated: conditions in one month influence the next. As a result, the test uses within-lake temporal variation as if it were between-replicate variation, which underestimates the true error term and inflates the probability of detecting a treatment effect.
Correct Design: To test the hypothesis, you need multiple independent lakes assigned to treatment and control conditions. Each lake is an experimental unit. You may still sample each lake repeatedly over time, but the analysis must account for this structure—for example, by using the mean response per lake or by fitting a model that includes lake as a grouping factor and time as a repeated measure.

Example 3 examines a before–after comparison within a single unit, where repeated measurements through time are treated as independent replicates. Example 4 examines a comparison between categories (seasons) where the categories themselves are not replicated. In both cases, the analysis substitutes within-unit temporal variation for true replication, but the source of the error differs: in Example 3, the issue is repeated measures on one unit; in Example 4, the issue is the absence of replication of the seasonal contrast.

4.4 Example 4: Comparing Seasons (Season as Treatment)

Incorrect Design (Seasonal Pseudoreplication): You want to test whether algal biomass differs between summer and winter. You sample chlorophyll-a weekly from a single coastal bay over one year and compare all summer observations (e.g., December–February) to all winter observations (e.g., June–August) using a t-test.
The Flaw: The analysis treats each weekly observation as an independent replicate of the seasonal effect, but the experimental unit is the bay-year, not the individual weeks. All observations come from the same system and are structured in time, so they are correlated. In addition, “season” is not replicated: there is only one summer and one winter in that year. The test therefore uses within-season temporal variation as if it were between-replicate variation. This underestimates the error term and inflates the apparent strength of the seasonal effect.
Correct Design: To test for seasonal differences, you need replication of the seasonal contrast across independent units. This could involve sampling multiple coastal bays within each season, or repeating the seasonal comparison across multiple years and treating year as a replicate. The analysis must then reflect the structure of the data, for example by including bay or year as a grouping factor and modelling repeated observations within each unit.

5 Types of Pseudoreplication

Hurlbert categorises pseudoreplication into several types, but two are most common:

Simple Pseudoreplication: This occurs when there is only one experimental unit (a spatial aspect of the study) per treatment, but multiple subsamples are taken from each. The pond example above is a classic case.
Temporal Pseudoreplication: This occurs when samples taken sequentially over time from the same experimental unit are treated as independent replicates. For example, measuring the photosynthetic rate of a single plant every hour after a treatment and treating each measurement as a separate replicate in a statistical test. The measurements are not independent because they come from the same plant.

6 Conclusion

Understanding pseudoreplication is essential for any biologist conducting field or lab experiments. It forces us to think critically about what constitutes a true, independent replicate. The solution is always to ensure that your experimental units are independent and that you have multiple, independent units for each treatment level being tested. As Hurlbert famously stated, “The question of what is the experimental unit… is the most important one that an experimenter has to answer.”

References

Hurlbert SH (1984) Pseudoreplication and the design of ecological field experiments. Ecological Monographs 54:187–211.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.2026,
  author = {Smit, A. J., and J. Smit, A.},
  title = {15: {Pseudoreplication}},
  date = {2026-03-18},
  url = {http://tangledbank.netlify.app/BCB744/basic_stats/15-pseudoreplication.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., J. Smit A (2026) 15: Pseudoreplication. http://tangledbank.netlify.app/BCB744/basic_stats/15-pseudoreplication.html.

--- title: "15: Pseudoreplication" author: "A. J. Smit" date: last-modified date-format: "YYYY/MM/DD" # breadcrumb: true # comments: false # toc: true # code-fold: true # code-tools: true # code-copy: true # fig-width: 8 # fig-height: 5 # fig-align: center # out-width: "90%" # fig-dpi: 300 --- ![](/images/thesaurus.jpg) *** ## Introduction In the design of ecological experiments, the integrity of our conclusions hinges on the statistical validity of our tests. A foundational concept in this regard is replication—the repetition of an experiment on independent units. However, a common and critical error known as **pseudoreplication** can undermine our statistical inferences and lead to spurious conclusions. Here I address the concept of pseudoreplication. I reference the seminal paper by Hurlbert (1984), which brought this issue to the forefront of ecological science. A copy of the paper is available for download here: [**Hurlbert, S. H. (1984). Pseudoreplication and the design of ecological field experiments. *Ecological Monographs*, 54(2), 187–211.**](../../../docs/Hurlbert - Pseudoreplication (1984).pdf) ## What is Pseudoreplication? Hurlbert defines pseudoreplication as... > "...the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent" (@Hurlbert1984). So, pseudoreplication occurs when we treat multiple samples from the same experimental unit as if they are independent replicates. For example, taking ten water samples from a single pond that has been treated with a fertilizer does not give us ten replicates of the fertilization effect. It gives us ten *subsamples* from one replicate --- there is only one pond, so we have one replicate. A statistical test that treats these subsamples as independent replicates commits the error of pseudoreplication. ## Why is it a Problem? The main issue with pseudoreplication is that it leads to an invalid estimation of the true experimental error. The variation among subsamples within a single experimental unit is almost always less than the variation among the experimental units themselves. By using this smaller, inappropriate variance as the error term in a statistical test (like a t-test or ANOVA), we artificially inflate our statistical power and dramatically increase the probability of a **Type I error** --- that is, falsely concluding that a treatment effect exists when it does not. Hurlbert argues that this is not a minor statistical fail; it is a fundamental flaw in experimental design that invalidates the conclusions drawn. He found that a large percentage of published ecological studies at the time were guilty of pseudoreplication and it placed in doubt the findings of a large proportion of experimental studies. I see this problem continuing today, even among seasoned academics. ## Simple Examples Each example identifies (i) the experimental unit, (ii) what is incorrectly treated as a replicate, and (iii) how this mismatch distorts the estimate of experimental error and the resulting inference. Hurlbert provides several clear examples to illustrate the concept. ### Example 1: Comparing Two Ponds * **Incorrect Design (Simple Pseudoreplication):** Imagine you want to test if a new fish food increases fish growth. You have two ponds. You add the new food to Pond A and use Pond B as a control. After three months, you sample 50 fish from each pond and run a t-test on their weights. * **The Flaw:** The analysis treats the 50 fish as independent replicates, but the experimental unit is the pond. All fish within a pond share the same environment and treatment, so they do not provide independent information about the treatment effect. The test therefore uses within-pond variation as if it were between-replicate variation, which underestimates the error term and inflates the test statistic. The result cannot distinguish a treatment effect from inherent differences between the two ponds. * **Correct Design:** To properly test the hypothesis, you would need multiple ponds for each treatment (e.g., five ponds with the new food and five control ponds). You would then sample fish from each of the ten ponds and use the *mean weight per pond* as your data points for the t-test (n=5 for each group). ### Example 2: Interspersion * **Incorrect Design (Segregated Treatments):** You want to test the effect of an insecticide on an insect population in a field. You divide the field in half, spray one half (Treatment), and leave the other half unsprayed (Control). You then take 20 sweep net samples from each half. * **The Flaw:** The analysis treats the 20 samples within each half of the field as independent replicates, but the experimental unit is the field plot at the scale of treatment application. Because treatments are spatially segregated, any underlying environmental gradient is confounded with the treatment. The test therefore attributes spatial variation to the treatment effect, which biases the estimate of experimental error and leads to unreliable inference. * **Correct Design:** You should divide the field into a grid of smaller plots and randomly assign the insecticide or control treatment to each plot in an interspersed manner. Each plot is now an independent experimental unit. ### Example 3: Seasonal Sampling (Temporal Pseudoreplication) * **Incorrect Design (Temporal Pseudoreplication):** You want to test whether a nutrient addition increases algal biomass in a lake. You apply the treatment to a single lake and measure chlorophyll-a concentration monthly over one year. You then compare the 12 monthly measurements after treatment to 12 monthly measurements from the previous year (control period) using a t-test. * **The Flaw:** The analysis treats each monthly observation as an independent replicate of the treatment effect. However, all measurements come from the same lake, so the experimental unit is the lake, not the month. The observations are temporally autocorrelated: conditions in one month influence the next. As a result, the test uses within-lake temporal variation as if it were between-replicate variation, which underestimates the true error term and inflates the probability of detecting a treatment effect. * **Correct Design:** To test the hypothesis, you need multiple independent lakes assigned to treatment and control conditions. Each lake is an experimental unit. You may still sample each lake repeatedly over time, but the analysis must account for this structure—for example, by using the mean response per lake or by fitting a model that includes lake as a grouping factor and time as a repeated measure. Example 3 examines a before–after comparison within a single unit, where repeated measurements through time are treated as independent replicates. Example 4 examines a comparison between categories (seasons) where the categories themselves are not replicated. In both cases, the analysis substitutes within-unit temporal variation for true replication, but the source of the error differs: in Example 3, the issue is repeated measures on one unit; in Example 4, the issue is the absence of replication of the seasonal contrast. ### Example 4: Comparing Seasons (Season as Treatment) * **Incorrect Design (Seasonal Pseudoreplication):** You want to test whether algal biomass differs between summer and winter. You sample chlorophyll-a weekly from a single coastal bay over one year and compare all summer observations (e.g., December–February) to all winter observations (e.g., June–August) using a t-test. * **The Flaw:** The analysis treats each weekly observation as an independent replicate of the seasonal effect, but the experimental unit is the bay-year, not the individual weeks. All observations come from the same system and are structured in time, so they are correlated. In addition, “season” is not replicated: there is only one summer and one winter in that year. The test therefore uses within-season temporal variation as if it were between-replicate variation. This underestimates the error term and inflates the apparent strength of the seasonal effect. * **Correct Design:** To test for seasonal differences, you need replication of the seasonal contrast across independent units. This could involve sampling multiple coastal bays within each season, or repeating the seasonal comparison across multiple years and treating year as a replicate. The analysis must then reflect the structure of the data, for example by including bay or year as a grouping factor and modelling repeated observations within each unit. ## Types of Pseudoreplication Hurlbert categorises pseudoreplication into several types, but two are most common: 1. **Simple Pseudoreplication:** This occurs when there is only one experimental unit (a spatial aspect of the study) per treatment, but multiple subsamples are taken from each. The pond example above is a classic case. 2. **Temporal Pseudoreplication:** This occurs when samples taken sequentially over time from the same experimental unit are treated as independent replicates. For example, measuring the photosynthetic rate of a single plant every hour after a treatment and treating each measurement as a separate replicate in a statistical test. The measurements are not independent because they come from the same plant. ## Conclusion Understanding pseudoreplication is essential for any biologist conducting field or lab experiments. It forces us to think critically about what constitutes a true, independent replicate. The solution is always to ensure that your experimental units are independent and that you have multiple, independent units for each treatment level being tested. As Hurlbert famously stated, "The question of what is the experimental unit... is the most important one that an experimenter has to answer." ***