Support homeCell Ranger ATACRelease Notes
Cell Ranger ATAC Release Notes

Cell Ranger ATAC Release Notes

  • The cellranger-atac mkfastq pipeline is deprecated and will be removed in the next Cell Ranger ATAC release.
  • The 20,000 total cell limit has been lifted. Previously, there was a hard-coded limit of 20,000 cells for any cellranger-atac analysis. Even if your data contained more cells, the pipeline would cap the output at this number. This restriction has now been removed from the algorithm. This allows for a more accurate analysis of larger datasets.
  • With the removal of the overall cell limit, the --force-cells parameter in the cellranger-atac count and cellranger-atac reanalyze pipelines is no longer restricted to 20,000. You can now specify any positive integer value for this parameter to force the calling of a specific number of cells.
  • The parameter --nosecondary has been enabled for cellranger-atac count and cellranger-atac aggr. Learn more about the parameter.
  • Strand information is now included in the fragments.tsv file, determined from R1/R2 adapter sequences. This enhancement improves the performance of downstream filters, specifically doublet and inclusion list contamination detection.
  • The secondary analysis output folder output by cellranger-atac count and aggr pipelines now includes the "peaks" prefix in the folder name. This changes the folder name from, for example, /outs/analysis/lsa/1_components to /outs/analysis/lsa/peaks_1_components.
  • Cell Ranger ATAC can now process FASTQs with a quality score up to the full supported range (93 instead of 41).
  • Minor optimizations have been implemented in the cellranger-atac count and aggr pipelines, resulting in small increases to processing speed.
  • The 2024-A reference transcriptomes for human (GRCh38) and mouse (GRCm39) are now available for download. For more information, please visit the Cell Ranger ARC page to access the reference release notes and build steps. Additionally, the 2020-A references remain available for download.
  • New features:

    • Added a batch correction algorithm to cellranger-atac aggr, intended for use when aggregating datasets that use different chemistries. See the aggr and algorithms pages for more details on how and when to use batch effect correction.

    • Added UMAP as an alternate projection in Loupe and Cell Ranger ATAC output files.

    • Added support for custom peaks with peaks on non-primary contigs.

    • Minor cell calling improvements:

      • Added less stringent filtering of low-targeting barcodes.
      • Improved calling on multi-species samples.
  • Reference changes:

    • Added support for overlapping gene annotations in cellranger-atac mkref.
    • Cell Ranger ATAC 2.1 can only be run on references generated by cellranger-atac mkref version 2.1 or 2.0. Cell Ranger ATAC 2.0 can only be run on references that are constructed using cellranger-atac mkref version 2.0.
  • Web summary improvements:

    • Enabled improved identification of the non-nucleosomal fragment fraction. The metric "fragments in nucleosome free regions" reported in the web_summary.html has been redefined as the fraction of high quality fragments smaller than 124 basepairs (previously, the threshold was 147 basepairs).
    • Increased warning threshold for number of called cells from 10k to 15k.
    • Added warning to web summary when forcing cell calls to unsupported levels.
    • Added notification when user is using custom peaks or forcing cell calls.
    • Added links to support pages.
  • Bug fixes:

    • Added missing format version (VN:) to header of BAM file.
    • Removed dependence on external Unix commands in BAM processing.
    • The metric "Fraction of fragments overlapping any targeted region" reported in the web_summary.html was dropped in version 2.0 because some custom reference lacks DNase HS sites, or promoter/enhancer data, but the metric was still present in the aggr web_summary.html. This empty metric has been removed in v2.1.
    • In v2.0, the on_target_fragments metric (from the aggr pipeline) was switched with the blacklist_region_fragments in the singlecell.csv file. This column switch was fixed in v2.1.
  • New wavelet-based peak caller:

    • Eliminates large peaks much larger than 5kb. Peaks have a tighter size distribution around ~ 1kb.
    • Improved detection of cluster-specific peaks.
    • Improved reproducibility between technical replicates.
    • Consistent performance across a range of cell loads.
    • Fixes crashes in the signal-background fitting procedure.
  • Change in duplicate marking algorithm:

    • Two read pairs are duplicates if they share the same start, end and cell barcode. Previously, only the start and end were used.
    • Boosts median fragments per cell by as much as 25% at high cell loads.
    • We no longer distinguish between PCR and sequencer duplicates.
  • Improved computational performance:

    • Up to 4x faster, 0.5x disk requirements.
    • Complete rewrite of read processing and differential accessibility analysis in Rust.
    • Minimize disk I/O.
  • Change to pre-built references:

    • The following additional annotation tracks have been removed:

      • blacklist.bed
      • dnase.bed
      • ctcf.bed
      • enhancer.bed
      • promoter.bed
    • As a consequence, columns in the singlecell.csv output for the corresponding tracks are all zero. Additionally, the on_target_fragments column, which is a sum of TSS, DNAse, enhancer and promoter columns represents the TSS fragments. This value would be lower than expected for this reason.

  • Breaking changes to reference package structure:

    • Cell Ranger ATAC 2.0 pipelines cannot be run with a version 1.2 reference. Cell Ranger ARC 1.0+ or Cell Ranger ATAC 2.0 references must be used.
    • Restrictions on the number of contigs or primary contigs in the reference have been eliminated.
    • Change to the config file format to construct a reference using mkref.
    • Primary contigs are now defined to be the set of gene-containing contigs and cannot be specified by the user.
    • Disabled support for URLs in the config file.
    • Disabled support for GFF annotations, annotations must be in GTF format.
    • Eliminated discrepancies between reference checks in mkref and preflight checks in count. Previously, it was possible to pass checks in mkref and fail checks in count.
  • Added header lines beginning with # to the fragments.tsv.gz and peaks.bed files that contain version, reference and sample information.

  • Eliminated the --downsample option and replaced by --subsample-rate.

  • Change to ATAC peak annotation:

    • Annotate peaks using all genes provided in the reference GTF. Previously only certain gene types were used for annotation.
    • The peak column is now split into three columns: chromstartend.
    • When a peak has multiple gene annotations, the same peak appears in multiple rows with each annotation. Previously, each row represented one peak and multiple annotations were expressed using ; separators in the same row.
  • Change to CSV definition file format for cellranger-atac aggr: eliminated peaks column. Custom peaks may be specified using a --peaks argument at the command line.

  • Eliminated normalization mode signal from cellranger-atac aggr.

  • Loupe browser files now contain pre-computed K-means clustering for K=2-5 (previously K=2-10).

  • Loupe browser files generated by the pipeline can only be opened by Loupe browser version 5.0 or later.

  • The web_summary.html file output from cellranger-atac count has been updated to be functionally consistent with that from cellranger-arc count.

  • The PLSA algorithm is restricted to use one thread and computational performance is likely to be affected.

  • Change to metric names in summary.csv generated by cellranger-atac count.

  • Eliminated secondary alignments from the position-sorted BAM.

Cell Ranger ATAC v1.2 now filters gel bead multiplets and barcode multiplets, leading to more accurate cell calling. For customers concerned about either of these issues, we would recommend running Cell Ranger ATAC v1.2.
  • Allow robust handling of GFF3 input in mkref.
  • Fix a bug in the pre-generated human references where the Human Pseudoautosomal Region (PAR) genes are filtered out.
  • Fix a bug in reporting the erroneous line number of mal-formatted bed file containing comment header lines.
  • Fix a bug where the adjusted fragment bounds exceed the size of contig to which the fragment is mapped.
  • Fix a bug in peak calling where the initialization of the mixture model fitting involved integer division instead of floating point division.
  • Fix an issue in the peak calling algorithm where the mixture components were not always ordered with respect to each other in the same consistent way, leading to occasional stringent peak calls.
  • Fix a bug in the interpolation formula used to evaluate the sensitivity of the assay at various downsampled depths.
  • Allow better handling of whitespaces in file paths in mkref.
  • Add new metrics to metrics.json: median unique fragments per cell overlapping peaks, percentage of genome in peaks, barcode and gel bead multiplet rate.
  • Fix a bug in the web summary file where the alert color for cell calls is not consistent with the thresholds.
  • Add a feature to the websummary where a guidance message is printed at the top of the html file.
  • Mask out barcodes associated with gel bead doublets from the set of barcodes on which cell calling is performed.
  • Mask out barcodes associated with barcode multiplets from the set of barcodes on which cell calling is performed.
  • Include new mkref tool to allow building of single-species custom references from fasta and gene annotations.
  • Remove metrics related to targeting based on custom files that may not be available for custom genomes.
  • Include new reanalyze pipeline to allow rerunning data from a finished pipeline but with custom selection of peaks and barcodes, along with tweaking analysis parameters.
  • Include new aggr pipeline to allow aggregating data from multiple pipelines and analyzing it as one dataset.
  • Include GC and depth normalized differential enrichment analysis for accessibility of transcription factor binding motifs.
  • Include depth normalized differential enrichment analysis for accessibility in peaks.
  • Improve peak annotation to include associations of genes with distal peaks.
  • Improve performance of motif scanning by setting moderate background nucleotide frequencies for peaks in extreme GC bins.
  • Analysis output directory now has enrichment in place of diff_tf.
  • Fix an issue in peak calling where signal and noise components of a mixture model get swapped and produces nonsensical threshold. Fixes via changing the mixture model components.
  • Fix an issue in peak calling where odds-ratio determines the wrong threshold that would lead to calling entire genome in peaks.
  • Replace the default clustering for LSA and PLSA based dimensionality reductions to be spherical k-means in place of k-medoids.
  • Add an additional step to filter out low targeting barcodes prior to cell calling, for better cell calling.
  • Update references to include transcripts.bed file derived from gene annotations, used in annotating peaks.
  • Fix an issue in which some peak annotations were reversed.
  • Update references to fix an off-by-one issue in some tss.bed entries.
  • Fix an issue where PLSA would crash on CPUs without AVX support.
  • Fix a division-by-zero issue in calculating the background nucleotide % during motif scanning.
  • Initial release.