Support homeSpace Ranger 2.1Analysis
Evaluating Sequencing Depth for Targeted Gene Expression Libraries with spaceranger targeted-depth

Evaluating Sequencing Depth for Targeted Gene Expression Libraries with spaceranger targeted-depth

Important
After June 30, 2023, new Space Ranger releases will no longer support Targeted Gene Expression analysis.

Space Ranger 1.3 includes targeted-depth, a lightweight program for summarizing a whole transcriptome analysis (WTA) dataset in the context of a hypothetical Targeted Gene Expression experiment. Given an existing WTA dataset and a target panel CSV file, targeted-depth computes the fraction of reads mapped to targeted genes from the panel. This metric can help in two ways when performing a Targeted Gene Expression experiment based on the analyzed whole transcriptome library (or one from a similar sample):

  • The targeted fraction computed from the WTA dataset estimates the fraction of library molecules likely to be recovered from the target enrichment step, which can be used to help select appropriate PCR cycling conditions.
  • The targeted fraction is helpful in choosing the sequencing depth of the targeted experiment. The targeted-depth tool provides depth recommendations designed to take advantage of the efficiency enabled by targeting, while sequencing enough to match the sensitivity of the WTA dataset.

For a list of accepted arguments, see the [Command Line Argument Reference], or run spaceranger targeted-depth --help.

The targeted-depth tool requires two inputs:

  • A molecule info H5 file produced by spaceranger count from a WTA sample. For more information on using spaceranger count , see [Single-Library Analysis with Space Ranger].
  • A target panel CSV file. This file can be taken from the target_panel directory included with Space Ranger or downloaded from the Panel Selection page. For more details on the use and structure of these files, see [Targeted GEX (count)]. The spaceranger targeted-depth command can be invoked as follows:
spaceranger targeted-depth --molecule-h5 sample345/outs/molecule_info.h5 \ --target-panel /opt/spaceranger-2.0.0/target_panels/pan_cancer_v1.0_GRCh38-2020-A.target_panel.csv

The top section of output from spaceranger targeted-depth contains metrics relating to the WTA input sample:

Whole Transcriptome Analysis (WTA) Input Sample Metrics: --------------------------------------------------------- Number of Tissue-Covered Spots 7,742 Number of Reads 502,358,896 Mean Reads per Spot 64,887 Sequencing Saturation 85.2% Fraction of Reads from Targeted Genes 4.79% Number of Reads from Targeted Genes 24,045,376 Mean Reads per Spot from Targeted Genes 3,105

The first four metrics are Gene Expression metrics that do not involve the target gene panel, but are shown to give context to the other results.

The next three metrics quantify the sample's target gene content:

  • Fraction of Reads from Targeted Genes: Among all reads from the Gene Expression library, these reads come from a tissue-covered spot, are mapped confidently to a targeted gene in the transcriptome, and are not removed during UMI counting due to annotation disagreements.
  • Number of Reads from Targeted Genes: Number of reads counted towards targeted gene UMIs, i.e., the numerator of the fraction in the metric above.
  • Mean Reads per Spot from Targeted Genes: Equal to Number of Reads from Targeted Genes divided by Number of Tissue-Covered Spots. This number is the effective sequencing depth of the targeted portion of the whole transcriptome library.

The final section of the output indicates recommended sequencing depths for a Targeted Gene Expression library enriched from the analyzed whole transcriptome library (or one from a similar sample):

NOTE: The recommended sequencing depth for Targeted Gene Expression Libraries with spaceranger is 5,000-10,000 reads per tissue-covered spot. Targeted GEX Recommended Sequencing Depths: WTA Depth Mean Reads per Spot Total Reads Original 6,211 48,090,752 30k rps 4,786 37,057,032 The recommended Targeted Gene Expression sequencing depth is calculated as 2.0 * WTA Depth * Fraction of Reads from Targeted Genes. The 2.0 depth adjustment factor can help compensate for reads outside of tissue-covered spots, non-uniform read coverage, and reads that cannot be mapped confidently to targeted genes. These are approximate estimates, and final results may vary. rps = Reads per Tissue-Covered Spot.

The recommended depths in the two columns above are computed as follows, based on the targeted fraction and depth (Mean Reads per Spot) of the WTA sample:

Recommended Mean Reads per Tissue-Covered Spot = 2.0 * [WTA Depth] * [WTA Fraction of Reads from Targeted Genes] Recommended Total Reads = [Recommended Mean Reads per Tissue-Covered Spot] * [WTA Number of Tissue-Covered Spots]

The recommended Mean Reads per Tissue-Covered Spot is also equal to twice the Mean Reads per Tissue-Covered Spot from Targeted Genes in the WTA sample. The depth adjustment factor of 2 is used to provide a conservative recommendation. For example, this WTA sample had 64,887 mean reads per tissue-covered spot (rps) and 4.79% of reads from targeted genes, translating to a recommendation of 6,211 rps for the Targeted Gene Expression library, or about 48 million reads if the number of detected tissue-covered spots matches the WTA sample.

Recommendations are also given to approximately match the sensitivity of a whole transcriptome library sequenced to a depth of 30k rps. In the case where the WTA sample has very high sequence depth, and thus the targeted depth will be high, the recommended maximum targeted rps is 15,000.

Output can be saved to a file instead of output to the console by appending > followed by a filename:

spaceranger targeted-depth --molecule-h5 sample345/outs/molecule_info.h5 \ --target-panel /opt/spaceranger-2.0.0/target_panels/pan_cancer_v1.0_GRCh38-2020-A.target_panel.csv > sample345_pan_cancer_depth.txt

Incompatible reference and target gene panel: The molecule info H5 file must be created using a reference genome compatible with the target gene panel. The current required reference version is GRCh38 2020-A, which can be obtained from the [Downloads] page. An incompatible reference genome generates an error like this:

error: The gene ENSG00000286522 from the target panel csv is not present in the reference transcriptome used by the molecule info h5 file.

Space Ranger does not support fully-custom panels, but does support add-ons to pre-defined panels.