10x Genomics Support/Cell Ranger 7.2/Release Notes/

Release Notes for Cell Ranger

New feature: Fixed RNA Profiling with multiplex Antibody Capture

  • Cell Ranger v7.2 is required for analysis of Fixed RNA Profiling data with multiplexed Gene Expression and Antibody Capture libraries. Instructions for running the cellranger multi subcommand are described in the running multi pipeline page. Output files are described in the Understanding Outputs section. The Fixed RNA Profiling algorithms section includes descriptions of the new methods that were developed for processing multiplexed Gene Expression and Antibody Capture data.

  • New probe-level count matrix output files for FRP: raw_probe_bc_matrix.h5 and sample_raw_probe_bc_matrix.h5.

  • The frp_gem_barcode_overlap.csv probe overlap file now contains content for Antibody Multiplexing Barcodes.

5’ Immune Profiling

  • New feature: Cell Ranger v7.2 supports the aggregation of BEAM (Antigen Capture) libraries with cellranger aggr to combine and normalize the calculation of antigen specificity scores across multiple (or large samples) split across wells.

  • The outputs of cellranger aggr for 5' Immune Profiling libraries now include the airr_rearrangement.tsv (not produced in previous versions of this pipeline).

  • A bug that caused all alignments in the consensus.bam and concat_ref.bam files to have their POS field set to the default value of 1 has been fixed.

  • The all_contig_annotations.json output file has an additional field called junction_support that is a map of {reads: x, umis: y} supporting the junction region of a contig. This information is generated by the cellranger vdj assembler for productive contigs in reference-assisted assembly (or valid contigs in de novo assembly) and used for confidence determination and cell filtering.

General updates and bug fixes

  • Targeted Gene Expression analysis deprecated: count and multi pipelines in v7.2 and later do not support the analysis of Targeted Gene Expression libraries.

  • NovaSeq X demultiplexing support with bcl2fastq v2.20 dependency.

  • An new command line argument --outputs-dir is available for count, multi, mkref, mkvdjref, and vdj pipelines to specify a custom output directory.

  • Batch effect score calculations modified to normalize and scale with the number of cells in the dataset.

  • Cell Ranger now performs a preflight check to verify that the prebuilt reference file is complete. The use of an incomplete reference (due to incomplete download or corrupt file) produces an error message.

  • BAM tags update: MM changed to mm to enable compatibility with IGV.

  • BAM files now have @PG header noting which pipeline version was used to generate the file.

  • Improvements to the cellranger aggr web summary:

    • Layout improved.

    • Metrics table per library.

    • Provides aggregation summary for BEAM (Antigen Capture) libraries.

  • Improvements to the cellranger multi web summary:

    • In the Feature Barcode Expression Metrics table of the cells tab, the Median UMI counts per cell metric has been renamed in the Antibody tab:

      • Antibody Capture: Median antibody UMI counts per cell.

      • CRISPR, Antigen Capture, and Custom library: remains unchanged.

    • Antibody histogram has moved from the library tab to the cells tab.

    • A per-sample antibody barcode rank plot has been added for 3’ Cell Multiplexing web summaries.

    • FRP new metrics to capture the fraction of reads mapping to two different halves of probes: Reads half-mapped to probe set, Reads split-mapped to probe set.

  • Improvement to the Batch Effect Score (BES) calculation: normalization and scaling has changed from using √N nearest neighbors to 0.01*N nearest neighbors (with N being number of cells).

Changes that apply to Fixed RNA Profiling analysis

  • v1.0.1 probe set reference CSVs for human and mouse have a new region column, which indicates whether a probe spans a splice junction by at least 10 bp (spliced) or not (unspliced).

  • When the v1.0.1 probe set reference CSV is used in Cell Ranger v7.1, the web summary and metrics_summary.csv files will include genomic DNA metrics. The region column information is used to calculate these metrics.

  • The molecule_info.h5 files include the probe to which each molecule is mapped.

  • In the probe set reference CSV, the included column is set to FALSE for all deprecated probes.

General improvements

  • Calling cell barcode improvement: The auto-estimated expect-cells upper range has been restricted to the lower range of the EmptyDrops method. Previously, the upper range was 262,000 cells. The new upper range is 45,000 for single cell gene expression analyses. The exception is super-loaded multiplex Fixed RNA Profiling analyses, where the range is calculated as: max(45,000, number of probe barcodes × 22,500).

  • Improvement to the Batch Effect Score (BES) calculation to use √n nearest neighbors, where n is the total number of cells, instead of 100. Cells are no longer subsampled to 10%.

  • A new compression format, .tar.xz, is available on the Cell Ranger downloads page. The smaller file size enables faster download.

  • Cell Ranger 7.1 introduces a new subcommand, cellranger multi-template, which provides descriptions for all multi config CSV parameters and produces a config CSV template. Run cellranger multi-template -h for help.

  • The ARC-v1 chemistry may be used to analyze only the Gene Expression library portion of a Multiome ATAC + Gene Expression experiment.

  • The aggregate_barcodes.csv output file for Antibody Capture analyses is no longer stored in a antibody_analysis/ sub-directory. In the cellranger multi pipeline, it is found in outs/per_sample_outs/<sample_name>/count/aggregate_barcodes.csv. In the cellranger count pipeline, it is found in outs/aggregate_barcodes.csv. For both pipelines, the CSV file is only generated if antibody aggregates are detected.

  • The web summaries for Antibody Capture libraries include a Distribution of Antibody Counts plot to show the relative composition of antibody counts for antibodies with at least one UMI.

Bug fixes

  • Fixed a bug in the OrdMag algorithm, which could result in all barcodes being called as cells when there are very few cells.

  • Improvement to 3' Cell Multiplexing tag assignment for samples with a large number of zero CMO UMI counts.

  • Fixed a 3' Cell Multiplexing t-SNE plot bug where the plots were generated assuming a full set of CMOs in cmo-set instead of those used in the experiment.

  • Fixed conditions resulting in negative gap errors.

  • Fixed a bug that causes pipestance failure with an IOError message: "directory %s exists but it can not be written".

The updates are explained further in these Knowledge Base articles:

New feature: Barcode Enabled Antigen Mapping (BEAM) or Antigen Capture

  • Cell Ranger 7.1 is required for the analysis of BEAM libraries. Instructions for running cellranger multi are described in the Antigen Capture page. The new feature_type, antigen capture, the Feature Reference CSV that specifies the list of antigens (and MHC alleles) included in the experiment, and all the antigen specific customizable parameters in a multi config CSV are described in detail. Example multi config CSV for TCR and BCR Antigen Capture libraries are also provided.

  • The algorithms section includes a page called Antigen Algorithms with a description of the new methods developed for processing Antigen Capture (BEAM) data.

  • If an Antigen Capture library is included, some new/updated output files are generated (described in the Understanding Outputs section):

    • antigen_specificity_scores.csv (new file)
    • per_barcode.csv (new file)
    • aggregate_barcodes.csv (updated location and format)

Changes that apply to 5' Immune Profiling analysis

  • V(D)J cell calling improvement: If a Gene Expression library is present, the V(D)J cell calling algorithm does not filter out two or more clonotypes that have identical chains. This helps improve V(D)J cell calling, especially for transgenic strains. This change does not apply to V(D)J datasets in the absence of a Gene Expression library.

  • The Human V(D)J reference has been updated to exclude the following genes:

    • IGHV4-30-2
    • IGKV1D-33
    • IGKV1D-37
    • IGKV1D-39
    • IGKV2D-28

These genes have counterparts with identical V, D, J, and C gene sequences, but differ in the length of their 5' UTRs. Removing duplicates improves clonotype assignment.

Updates are explained further in this Knowledge Base article: What are the major updates in Cell Ranger v7.1 that impacts V(D)J data?

Bug fixes

  • Fixed a bug where an upgrade to Illumina NovaSeq control software v1.8 (reagent name change in recipe XML file) resulted in a silent cellranger mkfastq error and a significant number of reads going into Undetermined/ because the orientation of i5 (Index2) could not be autodetected.

  • Improved Deplex Error message for cellranger multi when no valid cell multiplexing tags are detected in the Multiplexing Capture library. Common failure modes are provided to help with troubleshooting.

  • Updated Fixed RNA Profiling web summary metric names and definitions for consistency.

New feature: Fixed RNA Profiling

Major updates

  • To maximize sensitivity for whole transcriptome 3’/5’ Single Cell Gene Expression and 3’ Cell Multiplexing experiments, introns will be included in the analysis by default for cellranger count and multi. There will be an informational alert in the count and multi web summaries to indicate that intronic reads were included in your analysis. While not recommended, users can exclude introns by setting include-introns=false in Cell Ranger. This change does not apply to the 3’/5’ Targeted Gene Expression or Fixed RNA Profiling assays, as both target exonic sequences. Learn more.

  • CRISPR Guide Capture libraries can be aggregated with cellranger aggr. This addition allows users to combine large CRISPR assays across multiple GEM wells. There are no changes to aggr inputs – the presence of CRISPR libraries in the molecule_info.h5 input files enables CRISPR aggregation. Normalization is enabled by default for both Gene Expression and CRISPR libraries; changes to the normalization parameters affect both libraries. Protospacer calling is performed again on the combined data included in the cellranger aggr run. CRISPR aggregation generates the crispr_analysis/ folder in the outs/ directory. The structure of the crispr_analysis folder is similar to the CRISPR outputs from count.

General improvements

  • Users no longer need to specify expect-cells for cellranger count and multi pipelines due to improvements in the gene expression cell calling algorithm. The expected number of cells can either be auto-estimated (recommended) or users can still provide a reasonable estimate to expect-cells.

  • The new check-library-compatibility option allows users to disable the default check for 10x Barcode overlap when multiple libraries are specified for cellranger count and multi (3' Gene Expression, 5' Immune Profiling).

  • For 3’ Cell Multiplexing analysis in cellranger multi, users can override Cell Ranger’s cell calling and tag calling algorithms with the custom cell assignment input file specified by the barcode-sample-assignment option in the multi config CSV file.

  • Modifications to the 3’ Cell Multiplexing CMO tag calling algorithm enable users to recover viable singlet data from “blank” assignments.

  • The following per-sample output files from cellranger multi have been renamed:

Cell Ranger 6.1.2 outputsCell Ranger 7.0 outputs
cloupe.cloupesample_cloupe.cloupe
sample_barcodes.csvsample_filtered_barcodes.csv
sample_feature_bc_matrixsample_filtered_feature_bc_matrix
sample_feature_bc_matrix.h5sample_filtered_feature_bc_matrix.h5
  • Secondary analysis outputs will be named to reflect which library they are specific to (gene_expression_*, antibody_capture_*, crispr_guide_capture_*, multiplexing_capture_*). The secondary analysis clustering, PCA/t-SNE/UMAP, and differential gene expression outputs are supported for Gene Expression and Antibody Capture libraries, while PCA/t-SNE/UMAP outputs are supported for CRISPR Guide Capture and Cell Multiplexing libraries. For example:
└── analysis └── pca ├── antibody_capture_10_components/ └── gene_expression_10_components/
  • The cellranger count web summary “Analysis” tab has been renamed to “Gene Expression”. There is an “Antibody” tab for Antibody Capture analysis, which includes a t-SNE projection plot by clustering and a histogram of antibody counts.

  • The cellranger multi web summary (3' Gene Expression, 5' Immune Profiling) “Sample” view has been renamed to “Cells”. The “Antibody” tab includes a t-SNE projection plot by clustering. The mapping metrics, sequencing saturation plot, and median genes per cell plot are displayed on the “Library” view (previously appeared on “Sample” and “Library” view).

  • Cell Ranger can now ingest FASTQs with a quality score up to the full supported range (93 instead of 41).

Bug fixes

  • Improved error messages and better handling of poorly formatted inputs in cellranger mkref. Enable users to generate references for analyses with large genomes containing chromosomes longer than 512 Mbp. cellranger count and multi pipelines will output a .csi BAM index file instead of .bai in these cases.

  • Fixed a bug that resulted in a segmentation fault error when mapping to references that contain small contigs, for example, the rabbit genome.

  • Removed the Inconsistent Throughput Detected alert in web summary when it should not appear.

  • Fixed a bug where vdj pipeline failed for specific CentOS/RHEL 7 kernels.

  • Bundles the latest version of bamtofastq (v1.4.1) in Cell Ranger 7.0 tarball.

  • Fixed a bug where bamtofastq failed if the R1 read length was >26bp.

Changes that apply to 5' Immune Profiling analysis

  • Support for gamma-delta libraries: The cellranger multi pipeline can process T cell receptor (TCR) libraries enriched for gamma (TRG) and delta (TRD) chains. 10x Genomics does not officially support TRG/D analysis with a reagent kit. Please note that, only CDR3 annotation is available for TRG/D, and the quality of annotations cannot be guaranteed. Users must specify VDJ-T-GD as the feature_type in the cellranger multi config CSV as TRG/D chains cannot be autodetected. A web_summary alert is displayed to indicate the use of an unsupported workflow. No TRG/D analysis is available via the cellranger vdj pipeline.

  • V(D)J Reference updated: The recommended V(D)J reference packages for human and mouse have been updated from version 5.0 to 7.0. The changes to the V(D)J reference sequences are listed below:

HUMAN:

  • Added human IGHV3-9

  • For two genes that are identical except for extra bases on the 3' end, only the longer version was retained. List of affected genes:



MOUSE

  • Added missing mouse TRGV and TRGC genes
TRGC1 ENSMUST00000103558 TRGC2 ENSMUST00000103561 TRGC3 ENSMUST00000198163 TRGC4 ENSMUST00000179181 TRGV1 ENSMUST00000103564 TRGV3 ENSMUST00000198663 TRGV4 ENSMUST00000103554 TRGV5 ENSMUST00000199017 TRGV6 ENSMUST00000198330 TRGV7 ENSMUST00000103553
  • For two genes that are identical except for extra bases on the 3' end, only the longer version was retained. List of affected genes:
IGHD2-5 ENSMUST00000178549 IGHD5-2 ENSMUST00000179166 TRAV11D ENSMUST00000103648 TRAV12D-1 ENSMUST00000181360 TRAV12D-2 ENSMUST00000197007 TRAV13D-2 ENSMUST00000197954 TRAV14D-1 ENSMUST00000181038 TRAV14D-2 ENSMUST00000196802 TRAV15D-2-DV6D-2 ENSMUST00000199800 TRAV3D-3 ENSMUST00000196023 TRAV4D-3 ENSMUST00000103592 TRAV4D-4 ENSMUST00000103600 TRAV5D-4 ENSMUST00000179701 TRAV6-6 ENSMUST00000103584 TRAV7-2 ENSMUST00000103636 TRAV7D-5 ENSMUST00000197128 TRAV9D-2 ENSMUST00000199746 TRAV9D-3 ENSMUST00000178252
  • V(D)J web summary: In the web_summary.html file produced by cellranger vdj, the Analysis tab has been renamed to VDJ.

  • The default for the fiveprime_multiplexing parameter in the cellranger-7.0.0/lib/bin/parameters.toml file has changed to True.

Bug fixes

  • Improved handling of memory requests for large genomes.

  • Fixes an issue in the molecule_info.h5 file, where if a species was not present in a barnyard run, it was omitted from the genome information.

  • Reduces the size of the bundled reference included in cellranger testrun.

  • Adds additional guidance in the web summary when there is a low fraction of targeted genes enriched.

  • Fixes a metric issue where aggregate antibodies could be double-counted.

  • Unsets additional sysconfig environment variables prior to pipeline execution, which may otherwise interfere with the pipeline conda environment.

Bug Fixes

  1. Fix an issue where cellranger vdj could fail if executed by a user without a home directory.

New Feature: High Throughput (HT) for Chromium X

  • Cell Ranger 6.1 introduces support for the 3' and 5' High Throughput (HT) kits with 16 channels per chip, allowing users to process 2,000-20,000 cells per channel (3' and 5') or 2,000-60,000 cells per channel with CellPlex (3' only). HT kits are only compatible with Chromium X, which is backwards compatible with all 10x Genomics dual indexed assays. For more information see What is HT?

  • Cell Ranger 6.1 includes a new throughput detection algorithm to detect HT samples in 3' CellPlex data as described in the CellPlex algorithms page. In the event that chemistry detection fails, it can be overridden with the option (e.g., --chemistry=SC3Pv3HT) in cellranger count, or in the cellranger multi CSV file to detect HT samples when 3' CellPlex libraries are run.

  • Minor changes to web_summary.html include a new alert when the user specifies --chemistry=SC3Pv3HT, but Cell Ranger detects otherwise. HT will be appended to the detected chemistry if the pipestance was a multiplexing run.

  • Note that with HT chemistry and 3' CellPlex it is now possible to run 60,000 cells per GEM well, and aggregate million-cell datasets. Larger datasets will require additional memory beyond our stated minimum requirement of 64 GB. See the 3' system requirements page for more details and time trial data.

General Improvements

  • Numerous performance optimizations have been made, especially for pipeline stages that iterated over molecule_info.h5 files, such as cellranger aggr, but also memory allocation improvements. We have seen up to 2-3x speed improvements for cellranger aggr.

  • Changed certain parameters in the cell calling algorithm for improved results. Changed the Empty Drops stage for multispecies experiments to call cells using only the UMI counts for each species separately.

  • Raw feature barcode matrices are no longer output by cellranger aggr. It is no longer possible to specify --force-cells of an aggr output in cellranger reanalyze with more cells than were originally called.

  • The secondary analysis implementation is now shared with Loupe Browser's Filtering and Reclustering Wizard. These changes improve the performance of most stages, in either time (t-SNE) or memory (PCA). There will be changes in outputs compared to previous versions, reflecting either slight variations in outputs (PCA, t-SNE), or as if a different randomized seed had been chosen (graph clustering, UMAP).

  • Starting from Cell Ranger 6.1, antibody histograms of UMI counts are shown on Library tab of the web_summary.html, and protein aggregate barcodes are provided as aggregate_barcodes.csv. These are meant to help feature barcoding users diagnose issues of aggregating antibodies on cell surface proteins.

  • Fixed a bug that caused Cell Ranger v4 and higher to ignore user-supplied parameters to --nthreads, defaulting to 1. Parallelization has been re-enabled in Cell Ranger 6.1.

Deprecating OS

  • The recommended operating systems for Cell Ranger v6.1 are CentOS 7 or Ubuntu 14 Linux variants or newer. CentOS 6 and Ubuntu 12 are still supported but have been deprecated (unsupported for future Cell Ranger releases). Support may be dropped in future versions. See the OS support page for more details.

Bug fixes

  • Fixes additional issues with file copying on BeeGFS filesystems.

  • cellranger multi: Adds an optional min-assignment-confidence in the config CSV to allow adjustment of the Cell Multiplexing minimum assignment confidence threshold (default: 0.9). Decreasing the threshold will likely increase the number of singlets assigned to samples, but at the cost of potentially increasing the rate of mis-assignment.

  • Adds a warning to the cellranger multi web_summary.html if contaminant tags are detected in Cell Multiplexing experiments.

General improvements

  • The fetch-imgt script, to build an IMGT-compatible custom reference for Single Cell Immune Profiling data analysis, has been updated to be compatible with Python 3.

  • Cell Multiplexing analysis has been updated to be more memory-efficient.

Bug fixes

  • The [sample] section of the configuration CSV file is now required for Cell Multiplexing analysis.

  • Fixes an issue where cellranger multi would only accept a single VDJ library.

  • Fixes an issue where cellranger vdj preflight would fail if custom primers were passed in.

  • The sample_id information in Cell Ranger 6 aggr runs are now correctly propagated to Loupe Browser.

  • Fixes an issue where .vloupe files fail to generate on some filesystems and operating systems.

  • Fixes an issue in cluster mode where the pipeline could fail to correctly identify which jobs were still queued.

  • Fixes an issue where including an aggr csv in reanalyze would cause the pipeline to exit.

  • The "Number of reads for Custom Feature by Physical library ID" in the multi web summary and metrics summary is now rendered properly.

  • Fixes an issue with file copying on BeeGFS filesystems.

New Feature: Cell Multiplexing

  • Cell Ranger 6.0 now supports analysis of Cell Multiplexing data for the 3' Gene Expression, Targeted Gene Expression, and Feature Barcode solutions. Instructions for running the cellranger multi subcommand are described in the running multi page. A new Getting Started Tutorial is also available. The Cell Multiplexing algorithms include a new method to call singlets, multiplets, and empty drops. The output file structure has also changed to accommodate multiple samples multiplexed in a single GEM well.

  • The aggr subcommand now supports analysis of cellranger multi outputs for the 3' Gene Expression, Targeted Gene Expression, and Feature Barcode solutions. Further details are described in the running aggr page.

New Feature: LT (Low Throughput) support

  • Cell Ranger 6.0 supports the analysis of data from 3' Gene Expression and Feature Barcode (Cell Surface Protein) LT (Low Throughput) kits.

Changes that apply to Gene Expression and Feature Barcode analysis

  • The column names for the Aggregation CSV file required by the aggr sub-command have changed: library_id has been changed to sample_id and library_outs has been changed to sample_outs. Further details are described in the running aggr page.

  • The molecule_info.h5 and unfiltered feature-barcode matrix files (raw_feature_bc_matrix in H5 and MEX formats) will only contain barcodes with at least one read, rather than all barcodes in the whitelist.

  • The change to the unfiltered feature-barcode matrix summarized in (4) above results in a subtle change to the distribution of UMI counts amongst background, i.e. non-cell barcodes, which results in minor changes to the results of the cell calling algorithm. This change occurs due to the second step that identifies non-ambient cell-barcodes as described in the algorithms page.

  • Cell Ranger 6.0 is the first Cell Ranger release to use Python 3.

Bug fixes and deprecations

  • A bug has been fixed in the graph-based clustering output: previously, in a sample with K clusters, the first K cell-associated barcodes (ordered as in the filtered feature-barcode matrix) may have been assigned incorrect cluster labels. This change does not affect the number of clusters output.

  • A bug has been fixed for multi-genome experiments, wherein the species annotation may have been incorrect for cell-associated barcodes identified by the second step of the cell-calling algorithm, as described in the algorithms page. Changes in metrics are expected to be minor, unless the the proportion of such cells is large.

  • The --qc option has been deprecated from cellranger mkfastq.

  • A bug has been fixed for multi-genome experiments, wherein the species annotation may have been incorrect for cell-associated barcodes identified by the second step of the cell-calling algorithm, as described in the algorithms page. Changes in metrics are expected to be minor, unless the the proportion of such cells is large.

Changes that apply to 5' Immune Profiling analysis

In Cell Ranger 6.0, the following changes apply to joint analysis of Immune Profiling, Gene Expression, and Feature Barcode data with the multi sub-command:

  • The structure of the outs/ folder has been updated, as described in running cellranger multi.

  • When running the cellranger aggr subcommand on samples that have Immune Profiling, Gene Expression, and/or Feature Barcode data analyzed with multi, the sample_outs field now contains the path to the outputs for that sample (e.g. outs/per_sample_outs/sample_x). Further details are described in running aggr.

Cell Ranger 6.0 also introduces some improvements and bug fixes related to the clonotype inference algorithms:

  • There are subtle changes to clonotyping heuristics that have little effect on overall behavior, but recover a small number of joins that were previously missed and might be critical for a particular experiment. These changes are described in terms of technical parameters to the algorithm, specifically raising the default for MAX_DIFFS from 50 to 55 and raising the default for MAX_CDR3_DIFFS from 10 to 15. There were also compensatory changes to prevent the rate of false positive joins from increasing: the default for MAX_DEGRADATION was lowered from 3 to 2, and the default for MAX_SCORE was lowered from 1,000,000 to 500,000. For more details, visit enclone help.

  • Single-chain clonotypes are now more likely to be merged with two-chain and three-chain clonotypes. This causes significantly more clonotypes to have single-chain exact subclonotypes.

  • Fixed a bug that caused failures on some very short (defective) V gene reference sequences.

  • The algorithm for deciding to use a donor reference allele now checks all donor reference alleles for all V genes having the same name as the one originally assigned to a contig. For more details, visit enclone help.

  • A doublet test has been added. This removes some exact subclonotypes that appear to represent doublets. Details are documented on the enclone pages. The typical effect is to remove some three-chain and four-chain clonotypes, with the fraction removed depending on the emperical doublet rate. In some cases, large, complex clonotypes are accurately split into multiple smaller clonotypes by this change.

  • There is no longer a restriction on the length of CDR3 sequences (previously maximum 27).

  • The Immune Profiling output file all_contig_annotations.csv contains new fields fwr1, ..., fwr4 and cdr1, cdr2, providing the amino acid sequences of framework and complementarity-determining regions (in addition to cdr3, which was already present). The definitions used to define these regions are provided in the enclone features page. The corresponding nucleotide sequences are provided (e.g.fwr1_nt). These fields are also provided in the file consensus_annotations.csv, as are nucleotide start and end positions (e.g. fwr1_start).

  • The Immune Profiling output file all_contig_annotations.csv contains new field exact_subclonotype_id providing the exact subclonotype ID to which the cell barcode was assigned. Details about exact subclonotypes can be found on the clonotype grouping page.

  • The --qc option has been deprecated from cellranger mkfastq.

Bug fixes

  • Fixes an issue in aggr where files would fail to be copied on NFSv4 File Systems.

  • Fixes an issue in multi where r1-length and r2-length settings were ignored for vdj.

Changes that apply to Gene Expression and Feature Barcode analysis

  • Cell Ranger v5.0 introduces a --no-bam option that disables the generation of aligned BAMs for gene expression and feature barcode datasets. If you have no need for these files, then disabling their generation can significantly speed up the pipeline.

  • Cell Ranger v5.0 introduces upgraded protein aggregation detection and filtering algorithm. By directly using the protein counts, more aggregate GEMs are detected and filtered out before proceeding with cell calling.

  • Cell Ranger v5.0 introduces an --include-introns option for counting intronic reads using 3’ Gene Expression and 5’ Gene Expression products. The usage of pre-mRNA references for counting intronic reads is now deprecated.

    • The --include-introns option, introduced in Cell Ranger 5.0, works by aligning reads to a normal reference transcriptome with STAR. After alignment, the reads mapping to introns are annotated and counted similarly to reads that are aligned to exons. Previously, the pre-mRNA reference strategy implemented with Cell Ranger 4.0 and earlier involves alignment to a modified reference transcriptome that categorizes intronic regions as exonic. There are slight differences in read alignments produced by the STAR aligner when a pre-mRNA reference is used compared to a normal reference using --include-introns. These differences result in small overall differences in counted UMIs for intron-mode compared to pre-mRNA-reference.
  • Ported a fix from upstream IRLBA that fixes incorrect behavior in rare circumstances.

  • On some Linux distributions, NFS implementations would surface an improper error during file copy. We have implemented a workaround for our affected native code.

Changes that apply to Gene Expression, Feature Barcode, and V(D)J analysis

  • Cell Ranger 5.0 introduces the multi pipeline that can simultaneously process any combination of 5' Gene Expression, Feature Barcode (cell surface protein or antigen) and V(D)J libraries from a single GEM well. The multi pipeline uses the cell calls provided by the gene expression data to improve the cell calls inferred by the V(D)J library.

  • A new metric, “Number of Short Reads Skipped”, is added to the web summary, indicating the total number of read pairs that were ignored by the pipeline because they do not satisfy the minimum length requirements.

Changes that apply to V(D)J analysis

  • Cell Ranger v5.0 introduces a new clonotype grouping algorithm that computationally approximates groups of cells which are descendants of a single, fully rearranged common ancestor and infers the germline sequence of the V genes from each individual in the dataset. In previous versions (4.0 and earlier), the algorithm grouped cells based only on the set of productive CDR3 nucleotide sequences. As a consequence, whenever a true clonotype had a CDR3 mutation, the true exact subclonotypes were presented by the algorithm as multiple separate clonotypes. The previous approach to clonotyping in Cell Ranger 4.0 and earlier led to inaccuracies in the B cell clonotypes due to the grouping by unique CDR3 sequence. Additionally, single-chain clonotypes were reported as separate clonotypes, which could lead to both over- and under-estimation of the size of a given clonotype. The new clonotyping algorithm is improved in specificity, sensitivity, and overall accuracy because it accounts for mutations found in the V(D)J transcript and in the V(D)J junction. It also merges single chain clonotypes into the correct fully-paired clonotypes for both T cells and B cells. Additional cell filters are also imposed during clonotyping to improve data quality.

  • Changes to V(D)J outputs:

    • The following output files are removed in 5.0: consensus.fastq and consensus_annotations.json

    • The following output files are added in 5.0: - Contig info binary file, which would be used as an input to aggregate V(D)J samples - Donor reference fasta

    • Two new columns are added to the clonotypes.csv file that displays the iNKT/MAIT evidence.

    • The files filtered_contig_annotations.csv, filtered_contig.fasta, filtered_contig.fastq now only contain data from the contigs in cell barcodes that are productive.

    • A number of new fields are added to consensus_annotations.csv: v_start, v_end, v_end_ref, j_start, j_start_ref, j_end, cdr3_start, cdr3_end

  • The recommended V(D)J reference packages for human and mouse have been updated from v4.0-5.0. The changes to the V(D)J reference sequences are listed below:

    HUMAN:

    • Replace IGKV2D-40, whose leader sequence appears to be truncated.
    • Delete IGKV2-18, which is probably a pseudogene.
    • Delete IGLV5-48, which is truncated on the right.
    • Delete TRBV21-1, which has multiple frameshifts.
    • Add IGHV4-30-4, which was missing.
    • Add IGKV1-NL1, which was missing.
    • Add IGHV4-38-2, which was missing.

    MOUSE:

    • Delete TRAV23, which is frame-shifted.
    • Delete the first base of the constant region gene IGHG2B.
    • Make a six-base insertion in IGKV12-89, based on empirical data.
    • Correct IGHV8-9, whose amino acid sequence showed the canonical C at the end of FWR3 as S. This is consistent with 10x data.
    • Add an allele of IGKV2-109, which was missing.
    • Add IGKV4-56, which was missing.
    • Add IGHV1-2, which was missing.
  • cellranger aggr now aggregates V(D)J data, allowing users to recompute V(D)J clonotype groupings across the combined data.

  • Soft deprecation of --force-cells in cellranger vdj:

    • Since Cell Ranger 3.1, due to filters in the VDJ assembler, --force-cells in VDJ pipelines did not behave as users would expect it to behave. Users can only apply --force-cells to the number of barcodes passing the combined filters in the assembler.

    • This makes it effectively impossible for users to increase the number of recovered cells. Rather, it is only possible to reduce the number of recovered cells using --force-cells in this context, unlike the behavior in the cellranger count pipeline.

    • Because this specific flag is likely to be misunderstood by users, and is also not highly requested, we are starting to deprecate it. In Cell Ranger 5.0, --force-cells will be available only as an undocumented silent option. This will also allow users who are using this routinely in their workflows to anticipate eventual deprecation.

Changes that apply to Gene Expression and Feature Barcode analysis

  • Targeted Gene Expression analysis is available in Cell Ranger 4.0 and is invoked by specifying the --target-panel option when running the cellranger count command.

  • Cell Ranger 4.0 introduces the new targeted-compare pipeline for direct comparative analysis of matched parent Whole Transcriptome Amplification (WTA) and Targeted Gene Expression datasets.

  • Cell Ranger 4.0 includes the new targeted-depth subcommand to estimate sequencing depths appropriate for Targeted Gene Expression experiments based on input WTA results and an associated target panel file.

  • Recommended reference packages for human and mouse have been updated from version 3.0.0 to 2020-A:

    • Transcriptome annotations updated from Ensembl 93 to GENCODE v32 (human) and vM23 (mouse), which are equivalent to Ensembl 98.

    • GRCh38 and mm10 sequences are not changed; chromosome names now follow the GENCODE/UCSC convention (e.g., chr1 and chrM) rather than the Ensembl convention (1 and MT).

    • Additional filtering removes genes with unreliable annotations that often overlap more legitimate genes (see build scripts for details), resulting in improved overall sensitivity. 2020-A reference packages are backwards compatible with Cell Ranger v3.1.0 and prior.

Mapping rates and gene/UMI sensitivity are increased due to more comprehensive annotations and improved manual curation of genes:


  • When analyzing 3’ Gene Expression data, Cell Ranger 4.0 trims the template switch oligo (TSO) sequence from the 5’ end of Read-2 and the poly-A sequence from the 3’ end before aligning reads to the reference transcriptome. This behavior is different from Cell Ranger 3.1, which does not perform any trimming.

A full length cDNA molecule is normally flanked by the 30-bp TSO sequence, AAGCAGTGGTATCAACGCAGAGTACATGGG, at the 5' end and the poly-A sequence at the 3' end. Some fraction of sequencing reads are expected to contain either or both of these sequences, depending on the fragment size distribution of the library. Reads derived from short RNA molecules are more likely to contain either or both TSO and poly-A sequence than longer RNA molecules.

Trimming results in better alignment, with the fraction of reads mapped to a gene increasing by up to 1.5%, because the presence of non-template sequence in the form of either TSO or poly-A confounds read mapping. Trimming improves the sensitivity of the assay as well as the computational efficiency of the pipeline. Tags ts:i and pa:i in the output BAM files indicate the number of TSO nucleotides trimmed from the 5' end of Read-2 and the number of poly-A nucleotides trimmed from the 3' end. The trimmed bases are present in the sequence of the BAM record and are soft clipped in the CIGAR string.

Below, we illustrate how the fraction of reads mapped confidently to the transcriptome varies for both trimmed and untrimmed alignment as a function of read-length for a variety of sample types .


  • Cell Ranger 4.0 adds support for an “un-tethered” Feature Barcode pattern, (BC) without an anchor, specified in the Feature Reference CSV. This option allows the user to specify the sequence of the Feature Barcode without specifying a particular location on the read where the sequence is expected to be found.

  • cellranger reanalyze now outputs the count matrix used in the analysis, so as to reflect any subsetting of barcodes used.

  • Bug fixes for GTF files output by mkref. These changes do not affect the pipeline results.

    • GTF attributes with duplicate keys (e.g., tag "value1"; tag "value2";) are handled correctly. Previously, only the last such attribute was kept.
    • GTF attributes with unquoted integer values (e.g., exon_number 1;) are kept. Previously, they were removed.
    • GTF lines end with semicolons.
    • Unix line endings are used rather than DOS line endings, consistent with other Cell Ranger outputs.
  • Bug fixes for the BAM file

    • The duplicate flag (0x400) is set correctly in the secondary alignments (flag 0x100) of PCR duplicate reads and low-support UMI reads (xf:i:2)
    • Low-support UMI reads (xf:i:2) have the corrected barcode in UB:Z. Previously, it contained the raw barcode.
  • BAM file changes

    • Cell Ranger v4.0 will not output the li:i tag. The RG:Z tag contains this information.
    • Cell Ranger v4.0 will not output the BC:Z and QT:Z tags.
  • Cell Ranger v4.0 now relies on Orbit to perform transcriptome alignment, which leverages a modified STAR v2.7.2a. These modifications provide compatibility with “versionGenome 20201” references, such as those generated by STAR v2.5.1b. In Cell Ranger 4.0 we still provide and use STAR v2.5.1b for other purposes such as cellranger mkref. In our testing we did not note any differences in transcriptome alignments between the STAR shipped in Cell Ranger 3.1 (STAR v2.5.1b), STAR v2.7.2a, or Orbit.

  • mkfastq now accepts file names without lane number, e.g., sample1_S1_R1_001.fastq.gz.

  • Cell Ranger's aggr pipeline no longer supports the aggregation of v1 mol_info.h5 files.

Changes that apply to Gene Expression, Feature Barcode, and V(D)J analysis

  • mkfastq supports dual-indexed libraries for gene expression, both WTA and Targeted, V(D)J, and Feature Barcode datasets.

  • mkfastq supports a new sequencing configuration for Novaseq where the I2 index may need to be reverse-complemented before demultiplexing dual-indexed libraries.

  • mkfastq now accepts file names without lane number, e.g., sample1_S1_R1_001.fastq.gz.

  • count and vdj run approximately two to four times faster than in Cell Ranger v3.1, depending on the sequencing data, and reduces disk I/O by half.

  • A new command-line interface with improved error-handling has been engineered into Cell Ranger v4.0.

  • The Martian pipeline framework has been upgraded to v4.0. mrp and mrjob will shut down if they detect that their log files were deleted or renamed. See the Martian release notes for more details.

  • The following features present in Cell Ranger v3.1 are no longer present in Cell Ranger 4.0:

    • mkfastq no longer supports data from the Single Cell 3′ v1 chemistry.
    • The cellranger demux subcommand has been removed.
    • The command-line interface does not accept FASTQs created by the deprecated cellranger demux pipeline. If you need to process FASTQs in this layout, contact [email protected] for assistance.
    • cellranger count and cellranger vdj are no longer able to process data from multiple gem-wells through manual editing of MRO files. The Single Cell 3′ v1 and Single Cell 5′-R1 assay configurations will no longer be autodetected in Cell Ranger 4.0. Users who want to analyze data from those chemistries must explicitly specify the chemistry (SC3Pv1 or SC5P-R1 respectively) using the --chemistry argument.
    • The --id argument used by the pipelines has a 64 character limit in Cell Ranger 4.0.
  • The --id argument used by the pipelines has a 64 character limit in Cell Ranger 4.0.

Changes that apply to V(D)J analysis

  • Recommended VDJ reference packages for human and mouse have been updated from version 3.1.0 to 4.0.0. The changes to the VDJ reference sequences are listed below:

    • Remove the first base of the C region in certain cases. In these cases we observe that in most transcripts, the J region and C region overlap by exactly one base.
    • Add an allele of the gene IGHJ6 to the human VDJ reference.
  • Bug fix in contig annotation: If a reference D region matches a contig perfectly, annotate the contig with that D region.

  • The command line argument --chain is added back in 4.0 for rare cases when the automatic chain detection fails.

  • A new output airr_rearrangement.tsv is added, which contains annotated contigs of VDJ rearrangements in the AIRR TSV format.

  • The VDJ reference is copied to the outputs folder starting with Cell Ranger v4.0.

  • Feature Barcoding Only Analysis - It is now possible to run cellranger count using Cell Surface Protein (antibody captured) libraries without a GEX library. The previous version of Cell Ranger required a Gene Expression library along with a library generated by Feature Barcoding technology. However, the new version of Cell Ranger provides customers with flexibility to sequence either one of the libraries, or both. In particular, cell calling now works with antibody counts only, and all secondary analyses (PCA, t-SNE, UMAP, clusterings) work with antibody-only count matrix as well. More details are available on the Feature Barcoding Only Analysis page.

  • UMAP based lower dimensionality projections of datasets analyzed by cellranger count are now produced in addition to the previously produced t-SNE projections. The projections are made available both as CSV files and as data that can be directly viewed in Loupe Browser. The parameters for the projection can also be modified and experimented with using cellranger reanalyze. This alternate visualization method has become increasingly popular for visualizing single cell data since the earliest report that used it. For more details, see the description in the algorithms overview section.

  • New Web Summary Look - The Cell Ranger web_summary.html file has been updated to match the styles and formats of other 10x products. Compared to the old version users will notice new fonts and some aesthetic changes in the new version.

  • Bug Fix: If equal numbers of reads with given Barcode / UMI combination map to two genes, the assignment of the Barcode / UMI are now considered ambiguous and not reported in moleculeinfo.h5 or the count matrix. Previously they were reported _twice, once for each gene.

  • Other minor bug fixes

Release Notes for Martian 3.2.3: Job Scheduling

  • Fix a crash in cases where the mrp binary becomes unavailable on disk during a pipestance run.

  • In addition to logging the type of filesystem for the pipestance directory, mrp will also log the type of filesystem for the martian bin directory (which is often different from the pipestance directory), and also the mount options for both directories.

  • Regardless of --jobinterval setting, mrp will now never attempt to submit more than one job at a time to the queue in cluster mode.

  • mrp will now shut down if the pipestance log file has been deleted, even if a new one has been created in its place. This prevents problems in the case where the pipestance directory (including the log and lock files) have been deleted.

  • Memory cgroups limits are now detected, reported, and used as default limits where applicable. This should be especially helpful for users submitting mrp to a cluster such as SLURM which uses memory cgroups to prevent jobs from using too much memory, by preventing mrp from trying to use more than the job's allowance.

  • Other small bug fixes and performance improvements.

V(D)J Release Notes

Major algorithm changes and effects on performance

  • The assembly, annotation and cell calling algorithms have all been replaced, as have the reference sequences. However with noted exceptions, the interface is unchanged.

  • Many changes were made to the assembly algorithm that allow it to achieve the same sensitivity using less data. After these changes, the recommended sequencing configuration was changed to 26 x 91 (from 150 x 150), while leaving the number of read pairs per cell fixed at 5000. This enables V(D)J, Gene Expression and Feature Barcoding libraries to be sequenced in a single run, thereby simplifying the workflow.

  • The effect of the new changes varies considerably from sample to sample and we have added a discussion on Experimental Design that explains some of this. In some instances the number of productive pairs increases markedly if the same dataset is rerun with the new code.

  • The old read configuration 150 x 150 is still supported and may be preferable for some users, because of pricing or availability, particularly for users who are running only V(D)J data. For 150 x 150, the recommended depth is proportionally lower, 2000 read pairs per cell.

  • Many corrections were made to the Prebuilt reference sequences.

  • Contig annotation has been improved in several ways. This includes more accurate detection of CDR3 regions, a more stringent full-length requirement, and a requirement that V segments begin with a start codon (coupled to reference sequence corrections). This could affect annotation for species other than human or mouse, having incomplete reference sequences.

  • A productive pair is no longer declared in cases where there are three or more contigs having the same chain type (e.g. TRB, TRB, TRB). In such cases the GEM may contain two or more cells.

  • Some new large clones are now reported, that were missed previously for a variety of reasons, including failure to align J segments having high somatic hypermutation.

  • A productive pair is no longer declared in cases where three or more contigs share the same chain type (e.g. TRB, TRB, TRB). In such cases the GEM may contain two or more cells. In addition, certain clonal expansions of plasma cells are now contracted because the expansion represents mRNA leakage during processing, rather than a true biological expansion. Finally, requirements for small clones sharing a chain with a large clone have been tightened to reduce the likelihood of false clones arising from ambient mRNA or doublets. All of these changes correctly reduce the number of reported productive pairs (usually by a small fraction).

  • Because of these changes, we recommend that customers rerun existing datasets using Cell Ranger 3.1 if possible.

  • Because cell calling is changed, the denominator used for computing the Cells With Productive V-J Spanning Pair metric may have changed. For this reason, differences in performance between Cell Ranger 3.0 and 3.1 are better assessed using the Number of Cells With Productive V-J Spanning Pair metric.

  • Cell Ranger 3.1 is significantly faster. There are five fewer stages in the pipeline.

Interface Changes:

  • Cell Count Confidence is no longer reported because we found that in some cases incorrect counts were reported with high confidence. Cell counting from V(D)J data alone is limited in accuracy because targeted cells having sufficiently low expression cannot be detected.

  • Contigs Unannotated is no longer reported because all contigs are now annotated. The justification for this is that since enrichment uses primers binding to constant regions, bona fide contigs would be expected to have at least a C annotation.

  • For species other than human or mouse, for which custom primers are needed, the sequences of the inner enrichment primers must now be supplied as a command-line argument.

Job Scheduling Changes

  • Add support for SGE and LSF clusters that track virtual memory use.

Enable Analysis of CITE-seq Experiments

  • Cell Ranger can now process data from experiments where the antibodies were conjugated to oligonucleotides that were captured by oligo-dT primers. Previously, only experiments which used the Chromium Single Cell 3' Feature Barcode Library Kit, which utilizes a different capture sequence for Gene Expression and Feature Barcoding data, could be analyzed.

  • Please note that while Cell Ranger is now compatible with CITE-seq data, CITE-seq is not a supported application. To ensure full support for your 10x data analysis please visit the Feature Barcode Analysis page to see the supported Feature Barcoding technology.

Bug fixes

  • Fix an issue where STAR would crash on CPUs without AVX support.

  • Fix a determinism issue when aggregating 3' v2 and v3 data.

  • Increase the memory reservation for the SORT_BY_POS stage.

General

  • Cell Ranger has been overhauled to support user-defined Feature Barcoding reagents, and to quantify these features alongside standard gene-expression reads. See Feature Barcoding for details. For users who have already run their data through earlier versions, there is no need to rerun it again using this new version.

Cell Calling Changes

  • Cell Ranger 3.0 implements a version of the EmptyDrops cell calling algorithm that will call more low RNA content cells, especially when they are mixed with a population of high RNA content cells. See Cell Calling Algorithms for details.

  • The cell calling 'knee-plot' in the web summary now indicates what fraction of barcodes in each segment of the curve were called as cells, since the new cell calling algorithm no longer makes a hard threshold on UMI counts.

Output File Format Changes

  • The file formats of the gene-barcode matrix (now called the feature-barcode matrix) have changed to accommodate Feature Barcoding results.

  • The mtx and barcodes.tsv files are now gzipped to save disk space The genes.tsv file has been renamed features.tsv.gz, and contains extra columns indicating the feature_type of each gene / feature.

  • See Feature-Barcode Matrices for details.

  • As part of this change, cellranger-rkit is deprecated. We recommend Seurat for analysis in R.

  • The Molecule info file format has been substantially changed to enable output from the new Feature Barcoding technology and remove rarely used mapping metrics.

Cell Ranger 2.2.0 will require CentOS/RedHat 6 or Ubuntu 12 or later. See the 10x OS Support page for further information.

  • Fix Martian UI display in FireFox

  • Fix non-integral resource requests (memory/threads)

  • Fix SUBSAMPLE_READS producing wrong metric names. Newer version of Martian no longer casts zero-fractional floats to ints, which this code was relying on to produce metric names with integral subsampling rates in them.

  • Fix failure to detect whitelist with demux when a single Sample Index is bad

  • Fix always-on multi-chromosome transcript warning in mkref

  • Fix stall in ALIGN_READS on filesystems that don't support named pipes

  • Fix python error when autodetect of chemistry fails with multiple FASTQ paths

  • Fix handling of sample names with multiple underscores in mkfastq pipeline

  • Fix suppression of process limit errors in the mkfastq QC stage

Changes to mkfastq

  • Barcode-aware QC stage is now opt-in via the --qc flag.

  • Limit total CPU usage across stages to 12 cores unless --localcores is specified. This should improve reliability on machines with high numbers of cores.

Cell Ranger 2.1.1 Gene Expression

Note: This is expected to be the last version of Cell Ranger to support CentOS/RedHat 5 and Ubuntu 10. If you are using one of those operating systems, Cell Ranger will now warn you. Future versions of Cell Ranger will require CentOS/RedHat 6 or Ubuntu 12 or later. See the 10x OS Support page for further information.

Bug Fixes

  • Fix library ID labels being out of order in the matrix HDF5 file produced by cellranger aggr when 10 or more libraries are aggregated. This manifests as Loupe Cell Browser showing the library ID labels out of order after running cellranger reanalyze.

  • Fix an out-of-memory error occurring when generating the kmer index on a reference with very long transcripts, e.g. on a pre-mRNA reference used when analyzing nuclei samples.

  • Fix crash when analyzing FASTQs produced by SRA's fastq-dump.

  • Fix the Differential Expression table in the web summary disappearing when gene IDs are equal to gene names in the reference GTF.

  • Fix a few web summary metrics becoming negative when more than 2.1 billion reads are analyzed at once.

  • Fix incorrect parsing of the --localcores argument, causing --localmem to be ignored when specified immediately after --localcores.

  • Fix crash in mkfastq on NovaSeq when RunParameters.xml is named runParameters.xml.

  • Fix hang when running sitecheck on some systems.

  • Fix error reporting in python stage code imports.

  • Fix estimation of stage virtual memory usage.

Improvements

  • Truncate large metadata files when generating a tarball for upload to 10x, rather than omitting them. Remove the requirement that the reference FASTA file modification time precede the STAR index file modification times.

  • The default --localmem in cluster mode will no longer ever be more than the free memory available when the cellranger starts.

New Features

  • Add support for and autodetection of Single Cell 5' gene expression libraries, with support for both paired-end alignment (150x150) and R2-only alignment (26x98).

  • Add --r1-length and --r2-length options to cellranger count which enable hard trimming of input FASTQs.

  • Add --exclude-genes option to cellranger reanalyze which, analogously to --genes, allows for the exclusion of some genes from the secondary analysis (PCA, clustering, etc.).

  • Add --chemistry to cellranger count to override the automatic chemistry detection.

Performance Improvements

  • Reduce the run time by 30%.

  • Reduce the disk storage high-water-mark by 60%.

Algorithm Improvements

  • Change the Antisense Reads Metric to only count a read as antisense if it has no sense alignments, effectively prioritizing sense alignments over antisense for this computation.

Output File Changes

  • Stop generating the TR and TQ BAM tags because these tags were retaining trimmed sequences that Cell Ranger would ignore anyway after converting the BAM back to FASTQ.

  • Add more mapping metrics (Reads Mapped to Genome, Reads Mapped Confidently to Genome), and reorder the mapping metrics to be consistent with their order of computation.

Bug Fixes

  • Fix mkfastq allowing max bcl2fastq threads to exceed --localcores.

  • Fix mkfastq crashing when reading NovaSeq quality data from RTA 3.3 and later.

  • Fix excessive memory requests in SC_RNA_ANALYZER.

  • Fix nondetection of louvain binary failure in RUN_GRAPH_CLUSTERING.

  • Fix crash in RUN_GRAPH_CLUSTERING when /dev/stdin doesn't exist.

  • Fix the barcode rank plot concatenating instead of unioning barcodes in multi-genome datasets.

System Requirements Changes

  • Cell Ranger no longer supports Ubuntu 8 or CentOS 5.2 Linux distributions. Ubuntu 10.04 LTS or CentOS 5.5 or greater are now required.

Job Scheduling

  • The pipeline management system, mrp, is now open source on GitHub.

  • The monitoring port for the user interface is now always on by default, with an OS-selected port if none is specified.

    • This behavior can be disabled with --disable-ui.
    • Access to the user interface port, if no port was specified explicitly, now requires a randomly-generated authentication token. This token is visible in the pipeline standard output and in the _uiport file.
  • A new tool, mrstat is now available.

    • Given the path to the directory with a running pipeline, mrstat will return basic information about the progress of the pipeline.
    • With the --stop flag, it will cause the pipeline to fail and exit.
  • Two new variables are available for use in cluster-mode templates:

    • __MRO_JOB_WORKDIR__ can be used to specify the absolute path to the directory where the job should execute. This should alleviate issues on clusters such as PBS which sometimes do not set the working directory correctly.
    • __MRO_ACCOUNT__ passes the MRO_ACCOUNT environment variable from mrp's environment. This is intended for cluster managers which support charging resources to specific accounts.
  • The pipeline standard output and log will now periodically provide progress updates for in-progress stages.

  • mrp will now provide more clear and useful error reporting when the pipeline directory runs out of disk space.

  • Several enhancements to the reliability of pipeline restart.

  • Fixes for several cases where a pipeline could "hang" indefinitely without making further progress.

  • Pipelines should now do a better job of staying within their CPU usage allocation.

Bug fixes

  • Properly ignore SIGHUP when a pipeline is run using nohup.

Pipeline Argument Changes

  • Add --override option to all pipelines, allowing for stage-level overrides for cores and memory.

  • Reanalyze no longer requires --agg to persist library ID; it is only required for persisting user-defined fields.

Bug fixes

  • Fix CHUNK_READS using more cores and using them less efficiently than intended.

  • Fix aggr using incorrect downsampling rates when more than 10 libraries are aggregated.

  • Fix mkfastq proceeding even after bcl2fastq is killed.

  • Fix lack of robustness to rare events where NFS latency induces double file deletion or double directory creation events.

  • Fix ALIGN_READS proceeding after the STAR subprocess fails, causing crashes in ATTACH_BCS_AND_UMIS.

  • Improve error messages when STAR or samtools fail in ALIGN_READS.

  • Fix spaces in transcript IDs causing ATTACH_BCS_AND_UMIS to crash. mkref no longer allows spaces in transcript IDs.

  • Fix crash when reads are adapter-trimmed by bcl2fastq and some reads end up empty.

  • Fix out-of-memory condition in ATTACH_BCS_AND_UMIS for some libraries with >800M reads.

  • Fix question marks replacing axis titles of barcode rank plot in web summary.

  • Fix excessive memory consumption and runtime of mkfastq on large sample sheets.

Job Scheduling

  • Fix several cases where, after mrp (which is invoked by cellranger) gets killed, it was not able to restart correctly.

  • On SGE clusters, cellranger/mrp now periodically runs qstat to verify that the jobs it queued have not been killed or canceled.

  • If the run fails, instead of just displaying a message pointing the user to the relevant _errors file, the contents of that file is printed.

-On automatic retry of failed stages, the reason for the original failure is logged. mrp is now more resilient against certain kinds of filesystem errors.

  • In the event of certain types of filesystem problems (such as permissions errors or disk quota), mrp/cellranger should now sometimes be able to provide more useful and immediate error messages.

  • Additional information about the environment cellranger runs in is now logged and included in mri.tgz.

  • Additional information about the environment the analysis runs in is now logged and included in mri.tgz.

  • mrp now correctly handles the signals sent by SGE and LSF when a soft time limit is reached (e.g. for SGE, -l s_rt 23:00:00).

  • Now supports --overrides method to dynamically change additional CPU and memory per stage.

Pipeline argument changes

  • Add --barcodes and --genes options to reanalyze, which allow selection of a specific subset of barcodes and/or genes to use in the secondary analysis.

  • Add --force-cells option to count and reanalyze to explicitly set the cell count. If specified, Cell Ranger will take the top N barcodes (by UMI count) as cells instead of doing dynamic cell count estimation.

  • Rename the estimated cells option from --cells to --expect-cells for clarity.

  • Add --nosecondary flag to count, which skips the secondary analysis. Disallow slashes in the --genome argument in mkref.

Add --id option to mkfastq which allows you to name the output directory.

New subcommands

  • Add cellranger mat2csv command, which converts a Cell Ranger sparse gene-barcode matrix to a dense CSV format. Note that the resulting file will be very large, even for a few hundred cells.

Web summary changes

  • Add "Reads Mapped Antisense to Gene" metric, which quantifies reads that are mapped to the non-coding strand of a gene. High values can indicate the use of an unsupported chemistry type, e.g. passing a Single Cell V(D)J library to cellranger count.

  • Add "Fraction GEMs with >1 Cell (Lower / Upper Bound)" metrics, which define a confidence interval for the multiplet rate estimate in multi-genome samples.

  • Add more details to various metric descriptions.

Algorithm improvements

  • Add the requirement that reads overlap annotated exons by at least 50% in order to be considered exonic. As a result, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.

  • Reduce EXTRACT_READS per-read runtime by 50% by avoiding OrderedDict and caching metric calculations.

  • Reduce SUBSAMPLE_READS runtime by reducing the number of fixed target values for subsampling (to just 25k and 50k reads per cell).

File format improvements

  • Due to a format change (removal of the IntervalTree object), references produced with cellranger mkref using Cell Ranger v2.0 are not compatible with pipelines from Cell Ranger v1.x.

  • Modify the TX, GX, and GN tags to have more granular transcript/gene annotations. Each BAM record is only annotated with transcripts/genes specific to that alignment, instead of combining annotations from all alignments of the corresponding read.

  • Add RE tag, which indicates whether an alignment is exonic, intronic or intergenic.

Bug fixes

  • Fix rare bug in interval arithmetic, leading to exonic reads being falsely annotated as intronic or intergenic. As a result of this bugfix, "Reads Mapped Confidently to Exonic Regions" may differ slightly from previous versions.

  • Fix excessive EXTRACT_READS runtime (10+ hours) on very large FASTQs such as those produced by mkfastq.

  • Fix a crash in RUN_GRAPH_CLUSTERING on filesystems that do not support named pipes.

  • Fix SUBSAMPLE_READS using more VMEM than expected, causing it to be killed by SGE when exceeding the h_vmem limit on certain clusters.

  • Fix mkfastq not merging output files properly due to sample numbering issues.

  • Fix mkfastq crash due to -d(demultiplexing-threads) argument being deprecated in bcl2fastq 2.19.

  • Fix the components.csv file produced by PCA, which did not contain the correct matrix.

  • Fix a crash in RUN_PCA when the number of nonzero genes is smaller than the number of principal components.

  • Fix a crash in mkref with very large genomes; use the limitGenomeGenerateRAM option in STAR to overcome its default reference size limit.

  • Fix certain special characters (like dashes) in reference names breaking the subsampled genes detected plot.

  • Fix mkloupe displaying an unhelpful error message when run on mixed-species runs and those from Cell Ranger v1.1 or earlier.

  • Fix the open-file-handle-limit check using the submit host rather than the execution machine.

  • Fix cellranger aggr allowing duplicate library_ids.

  • Fix CLOUPE_PREPROCESS taking the full matrix even after reanalyze subselects barcodes.

  • Fix a crash in mkfastq on RunInfo.xml files produced by the NovaSeq.

  • Fix a crash in mkfastq when bcl2fastq 2.19 is used in cluster mode or with the --demultiplexing-threads argument.

  • Fix mkfastq sometimes not properly merging samples in bcl2fastq 2.18 and 2.19 due to a change in the order in which lanes are processed by bcl2fastq.

Martian Runtime Changes

  • Add caching for deserialized JSON metadata. This improves performance for stages with many chunks.

Miscellaneous

  • Update samtools from 0.1.19 to 1.4.

  • Rename RUN_PREPROCESS to PREPROCESS_MATRIX in the SC_RNA_ANALYZER pipeline.

  • Add alerts.json as an output of the SUMMARIZE_REPORTS stage. This file is a machine-readable list of any abnormal metric values that raised alarms in the web summary.

  • For multi-genome samples, display the full reference name rather than a comma delimited list of genomes in the web summary ("hg19, mm10" becomes "hg19_and_mm10").

  • Fixes issue preventing mkfastq from demultiplexing data from recent sequencer software versions.

Analysis Improvements

  • Confidently align more reads to the transcriptome, greatly improving alignment rates with shorter reads. - Reads Confidently Mapped to Transcriptome increases from 55% to 62% with 98bp reads and from 34% to 54% with 32bp reads (Human PBMCs vs GRCh38).

  • Add a graph-based clustering algorithm: Louvain Modularity Optimization, which, unlike K-Means, does not require pre-specifying K.

Visualization

  • Automatically produce Loupe Cell Browser (.cloupe) files in the count, aggregate, and reanalyze pipelines.

  • Output a web summary HTML file in the reanalyze pipeline.

  • Be explicit about pre- and post- depth normalization metric values in the aggr web summary.

  • When the web summary subselects 10e3 cells for display, show the original cluster sizes and not the subselected sizes.

  • Make the web summary HTML slightly smaller by rounding t-SNE coordinates.

  • Update plotly to enable scrollable legends.

File format improvements

  • Add Read Group (RG) headers and tags to the output BAM file for better data provenance.

Bug fixes

  • Preserve trimmed bases via the TR/TQ BAM tags for much longer read lengths without crashing.

  • Fix crash when copying files on certain types of network shares that do not support file permissions.

  • Omit no-call bases from Q30 metrics to be consistent with Illumina's Q30 calculation.

  • Allow generation of 3-d (alongside 2-d) t-SNE projections without crashing.

  • Do a better job of hiding dynamic elements while the web summary HTML is loading.

General

  • Make the --params argument to reanalyze optional to enable re-runs with the default parameters.

  • Check for mismatches between the library IDs given in the aggr CSV and those in the matrix file.

  • Limit max_clusters for K-Means to 50 to ensure sane memory consumption.

  • Fix incorrect results being produced when aggr processes a count output that contains multiple libraries (gem groups).

  • Exclude untested genes from p-value adjustment.

  • Don't crash when extra commas are present in an IEM samplesheet for mkfastq.

  • Don't crash when no project folders are present for mkfastq.

  • Correctly handle the second index when mkfastq receives a dual-indexed IEM samplesheet.

  • Allow matrices to have more than 2^31-1 nonzero entries in the matrix HDF5 format.

  • Don't display alerts until the web summary page fully loads.

General

  • Rename main pipeline to cellranger count, which produces a gene-barcode matrix for one library sequenced one or more times.

  • Add support for and autodetection of Chromium Single Cell 3' v2 chemistry; still compatible with v1 chemistry.

  • Fix incorrect default cell count being used when "expected recovered cells" not specified.

New aggr aggregation pipeline

  • New pipeline cellranger aggr which aggregates data from multiple libraries into one dataset.

  • Supports combining libraries totalling up to 1,000,000 cells and secondary analysis of the combined data.

  • Automatically performs sequencing depth-normalization for all combined libraries.

New reanalyze custom reanalysis pipeline

  • Reruns secondary analysis (dimensionality reduction, clustering, and differential expression) with fully customizable parameters.

New mkfastq demultiplexing pipeline

  • Easier to integrate with existing bcl2fastq-based workflows.

  • Now the preferred demultiplexing method; demux still available but deprecated.

  • mkfastq is a thin wrapper around bcl2fastq with same basic interface.

  • Accepts Illumina Experiment Manager-compatible sample sheets with support for 10x sample index sets.

  • Produces FASTQ files and folders in the same structure as bcl2fastq.

  • Generates InterOp output for SAV.

  • Also generates 10x-specific run QC metrics in JSON format.

Scalability enhancements

  • Support combined secondary analysis (dimensionality reduction, clustering, differential expression, and visualization) of up to 1,000,000 cells in under 12 hours with 64 GB of RAM.

  • Change PCA implementation to the Netflix-scale memory-efficient method IRLBA.

  • Decrease runtime of t-SNE implementation.

Analysis Improvements

  • Change differential expression algorithm to the negative-binomial based method sSeq.

  • Report log2 fold-change and p-value for all genes in all clusters.

Sample and genome support

  • Add pre-built GRCh38 reference package

Web summary enhancements

  • Add plots that show Sequencing Saturation and Median Genes Detected as a function of downsampled reads per cell.

  • Add Total Genes Detected.

  • Rename "cDNA PCR Duplication" to "Sequencing Saturation."

  • Add chemistry field.

  • Order clusters by size.

  • Add help bubbles to charts.

File format improvements

  • Generate BAM index files with the same basename as the main file.

  • Change cell-barcode and UMI quality tags to CY and UY for better compatibility with the SAM specification.

  • Add TR, TQ tags to BAM to enable lossless BAM to FASTQ conversion.

  • Output HDF5-based sparse matrices in addition to the Matrix Exchange format files for better scalability to high cell counts.

  • Report proportion of variance explained for each principal component.

Martian runtime

  • Pipestance output files (outs) are no longer symlinks.

  • Partial stage restart.

  • Add output filename override, supports two output files having same basename.

  • Add --onfinish handler support.

  • Add support for units of KB and B for memory reservation in cluster job templates.

  • Pipestances now generate a UUID in _uuid.

  • Add auto-retry mechanism when pipeline stages fail due to causes that appear to be transient.

  • --maxjobs now defaults to 64 in local jobmode.

  • --jobinterval now defaults to 100ms in local jobmode.

  • Fix for rare race condition in some Python components

  • Enabled STAR multithreading

  • Added more detailed reference metadata

  • Fixed chromosome name mismatches in 10x reference data

  • Fixed t-SNE algorithm not converging for samples with high cell counts

  • Fixed cell-barcode correction not correcting as many sequences as it should

  • Fixed out-of-memory crash in COUNT_GENES for high-depth samples

  • Fixed occasional loss of the last few reads per chunk in ATTACH_BCS_AND_UMI

  • Added "Reads Mapped Confidently to Exonic Regions" metric to the summary.

  • Changed alert for "Reads Mapped Confidently to Transcriptome" to reflect shorter read lengths and non-human references.

  • Fixed problem where differential expression table sorts incorrectly on click.

  • Fixed problem where very high depth samples would cause an out-of-memory error.

  • Fixed problem where mkgtf would produce incorrectly formatted GTF files.

  • Fixed problem where debug tar.gz file would be very large if the pipestance halted mid-stage.

  • Fixed problem with copying files on certain CIFS volumes.

  • Initial release.