10x Genomics Support/Space Ranger 2.1/Analysis/

Space Ranger Count Outputs

The spaceranger count pipeline will produce an outs/ directory. The contents of this directory will vary somewhat depending on the parameters of the run. This page outlines the output files produced by spaceranger count for whole transcriptome Gene Expression (GEX) and Protein Expression (PEX) libraries generated with the CytAssist or Direct Placement assays. It covers Fresh Frozen (FF), Fixed Frozen (FxF), and formalin-fixed paraffin-embedded (FFPE) tissues.

For GEX libraries, the following files can be found within the outs/ subfolder:

File NameDescription
web_summary.htmlRun summary metrics and plots in HTML format
cloupe.cloupeLoupe Browser visualization and analysis file
spatial/Folder containing outputs that capture the spatiality of the data.
spatial/aligned_fiducials.jpgAligned fiducials QC image
spatial/aligned_tissue_image.jpgAligned CytAssist and Microscope QC image. Present only for CytAssist workflow
spatial/barcode_fluorescence_intensity.csvCSV file containing the mean and standard deviation of fluorescence intensity for each spot and each channel. Present for the fluorescence image input specified by --darkimage
spatial/cytassist_image.tiffInput CytAssist image in original resolution that can be used to re-run the pipeline. Present only for CytAssist workflow
spatial/detected_tissue_image.jpgDetected tissue QC image.
spatial/scalefactors_json.jsonScale conversion factors for spot diameter and coordinates at various image resolutions
spatial/spatial_enrichment.csvFeature spatial autocorrelation analysis using Moran's I in CSV format
spatial/tissue_hires_image.pngDownsampled full resolution image. The image dimensions depend on the input image and slide version
spatial/tissue_lowres_image.pngFull resolution image downsampled to 600 pixels on the longest dimension
spatial/tissue_positions.csvCSV containing spot barcode; if the spot was called under (1) or out (0) of tissue, the array position, image pixel position x, and image pixel position y for the full resolution image
analysis/Folder containing secondary analysis data including graph-based clustering and K-means clustering (K = 2-10); differential gene expression between clusters; PCA, t-SNE, and UMAP dimensionality reduction.
metrics_summary.csvRun summary metrics in CSV format
probe_set.csvCopy of the input probe set reference CSV file. Present for Visium FFPE and CytAssist workflow
possorted_genome_bam.bamIndexed BAM file containing position-sorted reads aligned to the genome and transcriptome, annotated with barcode information
possorted_genome_bam.bam.baiIndex for possorted_genome_bam.bam. In cases where the reference transcriptome is generated from a genome with very long chromosomes (>512 Mbp), Space Ranger v2.0+ generates a possorted_genome_bam.bam.csi index file instead.
filtered_feature_bc_matrix/Contains only tissue-associated barcodes in MEX format. Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column). This file can be input into third-party packages and allows users to wrangle the barcode-feature matrix (e.g. to filter outlier spots, run dimensionality reduction, normalize gene expression).
filtered_feature_bc_matrix.h5Same information as filtered_feature_bc_matrix/ but in HDF5 format.
raw_feature_bc_matrices/Contains all detected barcodes in MEX format. Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column).
raw_feature_bc_matrix.h5Same information as raw_feature_bc_matrices/ in HDF5 format.
 raw_probe_bc_matrix.h5Contains UMI counts of each probe for all detected barcodes in HDF5 format. Only produced when running pipelines for probe-based assays.
molecule_info.h5Contains per-molecule information for all molecules that contain a valid barcode, valid UMI, and were assigned with high confidence to a gene or protein barcode. This file is required for additional analysis spaceranger pipelines including aggr, targeted-compare and targeted-depth.

The spaceranger count pipeline outputs metrics_summary.csv which contains a number of key metrics about the barcoding and sequencing process.

MetricDescription
Number of Spots Under TissueThe number of barcodes associated with a spot under tissue.
Number of ReadsTotal number of read pairs that were assigned to this library in demultiplexing.
Mean Reads per SpotThe number of reads, both under and outside of tissue, divided by the number of barcodes associated with a spot under tissue.
Mean Reads Under Tissue per SpotThe number of reads under tissue divided by the number of barcodes associated with a spot under tissue.
Fraction of Spots Under TissueThe fraction of the spots under the tissue.
Median Genes per SpotThe median number of genes detected per spot under tissue-associated barcode. Detection is defined as the presence of at least one UMI count.
Median UMI Counts per SpotThe median number of UMI counts per tissue covered spot.
Valid BarcodesFraction of reads with barcodes that match the whitelist after barcode correction.
Valid UMIsFraction of reads with valid UMIs; i.e. UMI sequences that do not contain Ns and that are not homopolymers.
Sequencing SaturationThe fraction of reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is the fraction of confidently mapped, valid spot-barcode, valid UMI reads that had a non-unique (spot-barcode, UMI, gene).
Q30 Bases in BarcodeFraction of spot barcode bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
Q30 Bases in RNA ReadFraction of RNA read bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator. This is Read 2 for the Visium v1 chemistry.
Q30 Bases in UMIFraction of UMI bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
Reads Mapped to GenomeFraction of reads that mapped to the genome.
Reads Mapped Confidently to GenomeFraction of reads that mapped uniquely to the genome. If a gene mapped to exonic loci from a single gene and also to non-exonic loci, it is considered uniquely mapped to one of the exonic loci.
Reads Mapped Confidently to Intergenic RegionsFraction of reads that mapped uniquely to an intergenic region of the genome.
Reads Mapped Confidently to Intronic RegionsFraction of reads that mapped uniquely to an intronic region of the genome.
Reads Mapped Confidently to Exonic RegionsFraction of reads that mapped uniquely to an exonic region of the genome.
Reads Mapped Confidently to TranscriptomeFraction of reads that mapped to a unique gene in the transcriptome. The read must be consistent with annotated splice junctions. These reads are considered for UMI counting.
Reads Mapped Antisense to GeneFraction of reads confidently mapped to the transcriptome, but on the opposite strand of their annotated gene. A read is counted as antisense if it has any alignments that are consistent with an exon of a transcript but antisense to it, and has no sense alignments.
Fraction Reads in Spots Under TissueThe fraction of valid-barcode, confidently-mapped-to-transcriptome reads with spot-associated barcodes.
Total Genes DetectedThe number of genes with at least one UMI count in any tissue covered spot.

The spaceranger count for FFPE pipeline outputs metrics_summary.csv which differs from spaceranger count with regard to a few key metrics.

MetricDescription
Reads Mapped to the Probe SetFraction of reads that map with at least one read half to the probe reference.
Reads Mapped Confidently to the Probe SetFraction of reads that map uniquely with both read halves to the probe reference.
Reads Mapped Confidently to the Filtered Probe SetFraction of reads that map uniquely with both read halves to the filtered probe reference. The probe reference is filtered to remove genes/features where one or more of the probes targeting this feature might hybridize and ligate at a non targeted loci. This metric will be "None" when probe filtering is disabled.
Genes DetectedThe number of unique genes from the filtered probe set with at least one UMI count in any tissue covered spot.
Number of GenesThe number of genes as defined by the probe set.
Number of Genes greater than or equal to 10 UMIsNumber of genes with at least 10 filtered UMIs from tissue-associated barcodes. These genes are used to calculate per-gene enrichments.

Space Ranger outputs specific files, along with modifications to the GEX outputs, for PEX analysis.

  • Feature-Barcode Matrices (MEX & HDF5 formats): All Protein Capture counts become new features in addition to the standard per-gene features, and are output alongside gene counts in the feature-barcode matrices. For every row in the feature_reference.csv file, there will be a corresponding row in the feature-barcode matrix and the rowname will correspond to the value in the id column in the Feature Reference file for that feature.

  • Molecule Info (HDF5 format): The per-molecule information file (molecule_info.h5) captures the protein-associated feature information in the feature reference HDF5 group.

  • BAM: New tags are added to the BAM file to capture the quality, sequence and feature id for the protein Captured reads.

  • Run Metrics: The metrics_summary.csv file includes additional protein-specific metrics in addition to the Gene Expression metrics and are tabulated in the Protein Expression Metrics section below.

  • Web Summary: The web_summary.html (learn more) captures protein-specific metrics. The seconadry analysis results including histograms on UMI counts and a treemap plot of the protein distribution are displayed.

  • Secondary Analysis: protein-specific secondary analysis are included. The k-means and graph-based clustering for protein counts are run on PCA-reduced space similar to the Gene Expression. Both t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) are generated using raw protein counts. This contrasts the t-SNE and UMAP projections for gene counts which are run on PCA-reduced space. The spatial enrichment file spatial_enrichment.csv also includes the value for Antibodies and an additional column for the secondary name if included.

  • Isotype Correlation: The isotype_correlations.csv file is an protein-specific output. It includes the correlation values of the antibodies in the panel against the four included isotype controls. This file is helpful is determining the background signal.

Space Ranger computes sequencing quality and application results metrics on each supported library, which currently are Gene Expression and Antibody Capture. These metrics will be computed and displayed only when one of these library types was used. This page describes Antibody Capture libraries metrics for QC of the library preparation and sequencing of Protein Expression libraries, which appear in the metrics_summary.csv file and on the web_summary.html page.

MetricDescription
Antibody Number of ReadsTotal number of reads.
Mean Antibody Reads per SpotThe number of reads, both under and outside of tissue, divided by the number of barcodes associated with a spot under tissue.
Valid Antibody BarcodesFraction of reads with a spot-barcode found in or corrected to one that is found in the whitelist.
Valid Antibody UMIsFraction of reads with valid UMIs; i.e. UMI sequences that do not contain Ns and that are not homopolymers.
Antibody Sequencing SaturationFraction of antibody library reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is a ratio where: the denominator is the number of reads with a recognized antibody barcode, valid cell-barcode, and valid UMI, and the numerator is the subset of those reads that had a non-unique combination of (spot-barcode, UMI, antibody barcode).
Q30 Bases in Antibody BarcodeFraction of cell barcode bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
Q30 Bases in Antibody ReadFraction of bases from the read containing the antibody barcode with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator. This is Read 2 for the Visium v2 chemistry.
Q30 Bases in Antibody UMIFraction of UMI bases with Q-score greater than or equal to 30, excluding very low quality/no-call (Q lesser than or equal to 2) bases from the denominator.
Fraction Antibody ReadsFraction of reads that contain a recognized antibody barcode
Fraction Antibody Reads UsableFraction of reads that contain a recognized antibody barcode, a valid UMI, and a spot-associated barcode.
Antibody Reads Usable per SpotNumber of antibody reads usable divided by the number of spot-associated barcodes.
Fraction Unrecognized AntibodyAmong all reads, the fraction with an unrecognizable antibody barcode
Antibody Reads in Spots Under TissueAmong Antibody library reads with a recognized antibody barcode, a valid UMI, and a valid barcode, the fraction associated with a spot under tissue.