Analysis Guides
Oct 13, 2021

Batch effect correction

Share via:

Note: 10x Genomics does not provide support for community-developed tools and makes no guarantees regarding their function or performance. Please contact tool developers with any questions. If you have feedback about Analysis Guides, please email [email protected].

Batch effects come from technical variation across samples. This can often be prevented with good experimental design. When it cannot, there are computational approaches that can help.

Background

Problem: Variation in single-cell and spatial RNA sequencing data is known to be influenced by technical factors. In some cases, these technical factors may confound our ability to measure true biological variation between samples, making it more challenging to address the research question at hand.

Cause: These confounding factors include experimental biases and batch effects. Unavoidable systematic technical biases can include unequal amplification during PCR, cell lysis, reverse transcriptase enzyme efficiency, and stochastic molecular sampling during sequencing. By contrast, batch effects are technical, non-biological factors that also affect variation in the resulting data, but they occur in batches of samples. A “batch” refers to an individual group of samples that are processed differently relative to other samples in the experiment.

Solution: Technical factors that potentially lead to batch effects may be avoided with mitigation strategies in the lab and during sequencing. Examples of lab strategies include: sampling cells on the same day, using the same handling personnel, reagent lots, protocols, reducing PCR amplification bias, and generally using the same equipment. Sequencing strategies can include multiplexing libraries across flow cells. For example, if samples came from two patients, pooling libraries together and spreading them across flow cells can potentially spread out the flow cell-specific variation across samples.

Computational batch correction aims to remove technical variation from the data preventing this variation from confounding downstream analysis. There are several batch correction methods and tools that have implemented them. 

The list below is not comprehensive. New and exciting tools, algorithms, and other resources continue to be released. We compiled this list based on a combination of factors including citations, quality of documentation, functionality/ease of use, and active support.

Tools and Algorithms

Harmony:

Mutual Nearest Neighbors (MNN):

LIGER:

Related review and benchmarking articles


Required skills and resources:

  • Ability to program in a scripting language (most commonly R or Python)
  • Comfortable in the Linux environment
  • Comfortable running command line bioinformatic tools
  • Understanding of the experimental design and how it influences analysis

Things to watch out for:

  • “Correcting” away the biological signal
  • Batch correction should not be used to try and save failed experiments
  • Different tools may perform better on different data sets try a variety of methods