Space Ranger Count Outputs

The spaceranger count pipeline will produce an outs/ directory. The contents of this directory will vary somewhat depending on the parameters of the run. This page outlines the output files produced by spaceranger count for whole transcriptome Gene Expression (GEX) and Protein Expression (PEX) libraries generated with the CytAssist or Direct Placement assays. It covers Fresh Frozen (FF), Fixed Frozen (FxF), and formalin-fixed paraffin-embedded (FFPE) tissues.

For GEX libraries, the following files can be found within the outs/ subfolder:

File Name	Description
`web_summary.html`	Run summary metrics and plots in HTML format
`cloupe.cloupe`	Loupe Browser visualization and analysis file
`spatial/`	Folder containing outputs that capture the spatiality of the data.
`spatial/aligned_fiducials.jpg`	Aligned fiducials QC image
`spatial/aligned_tissue_image.jpg`	Aligned CytAssist and Microscope QC image. Present only for CytAssist workflow
`spatial/barcode_fluorescence_intensity.csv`	CSV file containing the mean and standard deviation of fluorescence intensity for each spot and each channel. Present for the fluorescence image input specified by `--darkimage`
`spatial/cytassist_image.tiff`	Input CytAssist image in original resolution that can be used to re-run the pipeline. Present only for CytAssist workflow
`spatial/detected_tissue_image.jpg`	Detected tissue QC image.
`spatial/scalefactors_json.json`	Scale conversion factors for spot diameter and coordinates at various image resolutions
`spatial/spatial_enrichment.csv`	Feature spatial autocorrelation analysis using Moran's I in CSV format
`spatial/tissue_hires_image.png`	Downsampled full resolution image. The image dimensions depend on the input image and slide version
`spatial/tissue_lowres_image.png`	Full resolution image downsampled to 600 pixels on the longest dimension
`spatial/tissue_positions.csv`	CSV containing spot barcode; if the spot was called under (1) or out (0) of tissue, the array position, image pixel position x, and image pixel position y for the full resolution image
`analysis/`	Folder containing secondary analysis data including graph-based clustering and K-means clustering (K = 2-10); differential gene expression between clusters; PCA, t-SNE, and UMAP dimensionality reduction.
`metrics_summary.csv`	Run summary metrics in CSV format
`probe_set.csv`	Copy of the input probe set reference CSV file. Present for Visium FFPE and CytAssist workflow
`possorted_genome_bam.bam`	Indexed BAM file containing position-sorted reads aligned to the genome and transcriptome, annotated with barcode information
`possorted_genome_bam.bam.bai`	Index for `possorted_genome_bam.bam`. In cases where the reference transcriptome is generated from a genome with very long chromosomes (>512 Mbp), Space Ranger v2.0+ generates a `possorted_genome_bam.bam.csi` index file instead.
`filtered_feature_bc_matrix/`	Contains only tissue-associated barcodes in MEX format. Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column). This file can be input into third-party packages and allows users to wrangle the barcode-feature matrix (e.g. to filter outlier spots, run dimensionality reduction, normalize gene expression).
`filtered_feature_bc_matrix.h5`	Same information as `filtered_feature_bc_matrix/` but in HDF5 format.
`raw_feature_bc_matrices/`	Contains all detected barcodes in MEX format. Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column).
`raw_feature_bc_matrix.h5`	Same information as `raw_feature_bc_matrices/` in HDF5 format.
`raw_probe_bc_matrix.h5`	Contains UMI counts of each probe for all detected barcodes in HDF5 format. Only produced when running pipelines for probe-based assays.
`molecule_info.h5`	Contains per-molecule information for all molecules that contain a valid barcode, valid UMI, and were assigned with high confidence to a gene or protein barcode. This file is required for additional analysis `spaceranger` pipelines including `aggr`, `targeted-compare` and `targeted-depth`.

The spaceranger count pipeline outputs metrics_summary.csv which contains a number of key metrics about the barcoding and sequencing process.

Metric	Description
`Number of Spots Under Tissue`	The number of barcodes associated with a spot under tissue.
`Number of Reads`	Total number of read pairs that were assigned to this library in demultiplexing.
`Mean Reads per Spot`	The number of reads, both under and outside of tissue, divided by the number of barcodes associated with a spot under tissue.
`Mean Reads Under Tissue per Spot`	The number of reads under tissue divided by the number of barcodes associated with a spot under tissue.
`Fraction of Spots Under Tissue`	The fraction of the spots under the tissue.
`Median Genes per Spot`	The median number of genes detected per spot under tissue-associated barcode. Detection is defined as the presence of at least one UMI count.
`Median UMI Counts per Spot`	The median number of UMI counts per tissue covered spot.
`Valid Barcodes`	Fraction of reads with barcodes that match the inclusion list after barcode correction.
`Valid UMIs`	Fraction of reads with valid UMIs; i.e. UMI sequences that do not contain Ns and that are not homopolymers.
`Sequencing Saturation`	The fraction of reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is the fraction of confidently mapped, valid spot-barcode, valid UMI reads that had a non-unique (spot-barcode, UMI, gene).
`Q30 Bases in Barcode`	Fraction of spot barcode bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
`Q30 Bases in RNA Read`	Fraction of RNA read bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator. This is Read 2 for the Visium v1 chemistry.
`Q30 Bases in UMI`	Fraction of UMI bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
`Reads Mapped to Genome`	Fraction of reads that mapped to the genome.
`Reads Mapped Confidently to Genome`	Fraction of reads that mapped uniquely to the genome. If a gene mapped to exonic loci from a single gene and also to non-exonic loci, it is considered uniquely mapped to one of the exonic loci.
`Reads Mapped Confidently to Intergenic Regions`	Fraction of reads that mapped uniquely to an intergenic region of the genome.
`Reads Mapped Confidently to Intronic Regions`	Fraction of reads that mapped uniquely to an intronic region of the genome.
`Reads Mapped Confidently to Exonic Regions`	Fraction of reads that mapped uniquely to an exonic region of the genome.
`Reads Mapped Confidently to Transcriptome`	Fraction of reads that mapped to a unique gene in the transcriptome. The read must be consistent with annotated splice junctions. These reads are considered for UMI counting.
`Reads Mapped Antisense to Gene`	Fraction of reads confidently mapped to the transcriptome, but on the opposite strand of their annotated gene. A read is counted as antisense if it has any alignments that are consistent with an exon of a transcript but antisense to it, and has no sense alignments.
`Fraction Reads in Spots Under Tissue`	The fraction of valid-barcode, confidently-mapped-to-transcriptome reads with spot-associated barcodes.
`Total Genes Detected`	The number of genes with at least one UMI count in any tissue covered spot.

The spaceranger count for FFPE pipeline outputs metrics_summary.csv which differs from spaceranger count with regard to a few key metrics.

Metric	Description
`Reads Mapped to the Probe Set`	Fraction of reads that map with at least one read half to the probe reference.
`Reads Mapped Confidently to the Probe Set`	Fraction of reads that map uniquely with both read halves to the probe reference.
`Reads Mapped Confidently to the Filtered Probe Set`	Fraction of reads that map uniquely with both read halves to the filtered probe reference. The probe reference is filtered to remove genes/features where one or more of the probes targeting this feature might hybridize and ligate at a non targeted loci. This metric will be "None" when probe filtering is disabled.
`Genes Detected`	The number of unique genes from the filtered probe set with at least one UMI count in any tissue covered spot.
`Number of Genes`	The number of genes as defined by the probe set.
`Number of Genes greater than or equal to 10 UMIs`	Number of genes with at least 10 filtered UMIs from tissue-associated barcodes. These genes are used to calculate per-gene enrichments.

Space Ranger outputs specific files, along with modifications to the GEX outputs, for PEX analysis.

Feature-Barcode Matrices (MEX & HDF5 formats): All Protein Capture counts become new features in addition to the standard per-gene features, and are output alongside gene counts in the feature-barcode matrices. For every row in the feature_reference.csv file, there will be a corresponding row in the feature-barcode matrix and the rowname will correspond to the value in the id column in the Feature Reference file for that feature.
Molecule Info (HDF5 format): The per-molecule information file (molecule_info.h5) captures the protein-associated feature information in the feature reference HDF5 group.
BAM: New tags are added to the BAM file to capture the quality, sequence and feature id for the protein Captured reads.
Run Metrics: The metrics_summary.csv file includes additional protein-specific metrics in addition to the Gene Expression metrics and are tabulated in the Protein Expression Metrics section below.
Web Summary: The web_summary.html (learn more) captures protein-specific metrics. The seconadry analysis results including histograms on UMI counts and a treemap plot of the protein distribution are displayed.
Secondary Analysis: protein-specific secondary analysis are included. The k-means and graph-based clustering for protein counts are run on PCA-reduced space similar to the Gene Expression. Both t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) are generated using raw protein counts. This contrasts the t-SNE and UMAP projections for gene counts which are run on PCA-reduced space. The spatial enrichment file spatial_enrichment.csv also includes the value for Antibodies and an additional column for the secondary name if included.
Isotype Correlation: The isotype_correlations.csv file is an protein-specific output. It includes the correlation values of the antibodies in the panel against the four included isotype controls. This file is helpful is determining the background signal.

Space Ranger computes sequencing quality and application results metrics on each supported library, which currently are Gene Expression and Antibody Capture. These metrics will be computed and displayed only when one of these library types was used. This page describes Antibody Capture libraries metrics for QC of the library preparation and sequencing of Protein Expression libraries, which appear in the metrics_summary.csv file and on the web_summary.html page.

Metric	Description
`Antibody Number of Reads`	Total number of reads.
`Mean Antibody Reads per Spot`	The number of reads, both under and outside of tissue, divided by the number of barcodes associated with a spot under tissue.
`Valid Antibody Barcodes`	Fraction of reads with a spot-barcode found in or corrected to one that is found in the inclusion list.
`Valid Antibody UMIs`	Fraction of reads with valid UMIs; i.e. UMI sequences that do not contain Ns and that are not homopolymers.
`Antibody Sequencing Saturation`	Fraction of antibody library reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is a ratio where: the denominator is the number of reads with a recognized antibody barcode, valid cell-barcode, and valid UMI, and the numerator is the subset of those reads that had a non-unique combination of (spot-barcode, UMI, antibody barcode).
`Q30 Bases in Antibody Barcode`	Fraction of cell barcode bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
`Q30 Bases in Antibody Read`	Fraction of bases from the read containing the antibody barcode with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator. This is Read 2 for the Visium v2 chemistry.
`Q30 Bases in Antibody UMI`	Fraction of UMI bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
`Fraction Antibody Reads`	Fraction of reads that contain a recognized antibody barcode
`Fraction Antibody Reads Usable`	Fraction of reads that contain a recognized antibody barcode, a valid UMI, and a spot-associated barcode.
`Antibody Reads Usable per Spot`	Number of antibody reads usable divided by the number of spot-associated barcodes.
`Fraction Unrecognized Antibody`	Among all reads, the fraction with an unrecognizable antibody barcode
`Antibody Reads in Spots Under Tissue`	Among Antibody library reads with a recognized antibody barcode, a valid UMI, and a valid barcode, the fraction associated with a spot under tissue.

Gene Expression (GEX) output overview

Metrics definitions for spaceranger count (FF)

Metrics definitions for spaceranger count (FFPE)

Protein Expression overview

Protein Expression metrics