Many experiments involve generating data from multiple samples. Depending on the experimental design, these could be multiple Capture Areas from consecutive sections of the same tissue block, or samples from different biological conditions of the same tissue. The
spaceranger aggr pipeline can be used to aggregate samples from these scenarios into a single feature-barcode matrix.
The first step is to run
spaceranger count on each individual capture area from the Visium slide.
For example, suppose you ran three count pipelines as follows:
Now you can aggregate these three runs to get a single feature-barcode matrix and analysis. In order to do so, you need to create an aggregation CSV.
Create a CSV file with a header line containing the following columns:
|Required. Unique identifier for this input capture area. This will be used for labeling purposes only; it doesn't need to match any previous ID you have assigned to the capture area.|
|Required. Path to the |
|Required. Path to the |
|Optional. Path to the |
The CSV can be made either in a text editor or created in Excel and exported to CSV. Continuing the example from the previous section, the saved CSV would look like this:
In addition to the CSV columns expected by
spaceranger aggr, you may optionally supply additional columns containing library meta-data (e.g., lab or sample origin). These custom library annotations do not affect the analysis pipeline but can be visualized downstream in the Loupe Browser (see below). Note that unlike other CSV inputs to Space Ranger, these custom columns may contain characters outside the ASCII range (e.g., non-Latin characters).
It is possible to assign categories and values to individual samples prior to the
spaceranger aggr run by adding additional columns to the aggregation CSV. These category assignments propagate into Loupe Browser, where you can view them, and determine genes that drive differences between samples. For example, the following spreadsheet was used to aggregate the bundled Loupe tutorial dataset:
Any columns in addition to
spatial_folder will be converted into categories, and the spots in each sample will be assigned to one of the values in that category.
spaceranger aggr does not perform batch correction for removal of technical artifacts due to differences in assays. For this reason, 10x Genomics does not recommend combining Visium data from fundamentally different treatments such as different staining protocols (e.g. immunofluorescence vs H&E stained tissue sections), different tissue storage conditions (e.g. Fresh Frozen vs FFPE tissue sections), different library preparation protocols (e.g. short cDNA vs long cDNA), or other variations in the assay preparation.
spaceranger aggr --help for a full list or refer to the Command Line Arguments Reference page for details. After specifying these input arguments, run
cd /opt/runs spaceranger aggr --id=AGG123 \ --csv=AGG123_libraries.csv \ --normalize=mapped
The pipeline will begin to run, creating a new folder named with the aggregation ID you specified (e.g.
/opt/runs/AGG123) for its output. If this folder already exists,
spaceranger will assume it is an existing pipestance and attempt to resume running it.
spaceranger aggr pipeline generates output files that contain all of the data from the individual input jobs, aggregated into single output files, for convenient multi-sample analysis.
When combining multiple capture areas, the barcode sequences for each channel are distinguished by a capture area suffix appended to the barcode sequence.
By default, the reads from each capture area are subsampled such that all capture areas have the same effective sequencing depth, measured in terms of reads that are confidently mapped to the transcriptome or assigned to the feature IDs per spot. However, it is possible to change the depth normalization mode.
New slide versions and chemistries were introduced in Space Ranger 2.0. Aggregating samples must obey the chemistry compatibility matrix. Data from Visium for FFPE sections is supported by
spaceranger aggrprovided that the same probe set reference CSV file (#) is used for all samples.
|Visium Spatial Gene Expression||Visium Spatial Targeted||Visium Spatial Gene Expression for FFPE||CytAssist Spatial Gene Expression|
|Command line argument|
|Visium Spatial Gene Expression||Supported||Supported||Unsupported||Unsupported|
|Visium Spatial Targeted||-||Supported||Unsupported||Unsupported|
|Visium Spatial Gene Expression for FFPE||-||-||Supported#||Unsupported|
|CytAssist Spatial Gene Expression||-||-||-||Supported#|
Samples run with an antibody feature_reference.csv in each individual capture area can only be aggregated with other samples that were ran with
spaceranger count using the same feature reference. Re-run
spaceranger count with a common feature reference (see
--no-libraries) to allow aggregation.
spaceranger aggr command can aggregate results that include Targeted Spatial Gene Expression analyses, provided that the same target panel CSV file is used for the targeted libraries, and can also be aggregated with whole transcriptome Spatial Gene Expression libraries. Secondary analysis for all libraries is done with the non-targeted genes excluded from the feature-barcode matrices. Aggregated feature-barcode matrices follow the same convention as Targeted Spatial Gene Expression analysis: the filtered feature-barcode matrices do not include non-targeted genes, whereas the raw feature-barcode matrices include all genes.
When combining data from multiple capture areas, the
spaceranger aggr pipeline automatically equalizes the read depth between groups before merging, which is the recommended approach in order to avoid the batch effect introduced by sequencing depth. It is Possible to turn off normalization or change the way normalization is done. The
none option may be appropriate if you want to maximize sensitivity and plan to deal with depth normalization in a downstream step.
There are two normalization modes:
|Default. Subsample reads from higher-depth capture areas until they all have, on average, an equal number of reads per tissue covered spot that are confidently mapped to the transcriptome. If Targeted Spatial Gene Expression libraries are included, then normalization is performed on the basis of mean reads per spot mapped confidently to the targeted transcriptome. The subsampling rates for Targeted Spatial Gene Expression libraries are multiplied by 2, provided all libraries can achieve that depth. This multiple is consistent with [sequencing depth recommendations] and is also done to avoid removing large fractions of reads from targeted libraries whenever they are combined with whole transcriptome libraries.|
|Optional. Do not normalize at all.|
Each capture area is a physically distinct partition on a Visium slide. However, each of these capture areas are printed with the same set of barcode tagged mRNA capture sequences known as the barcode whitelist. To keep the barcodes unique when aggregating multiple libraries, we append a small integer identifying the capture area to the barcode nucleotide sequence, and use that nucleotide sequence plus ID as the unique identifier in the feature-barcode matrix. For example,
AAACAACGAATAGTTC-2 are distinct spot barcodes from different capture areas, despite having the same barcode nucleotide sequence.
This number, called the capture area suffix, informs which capture area the barcode sequence came from. The numbering of the capture area will reflect the order that the capture areas were provided in the Aggregation CSV.