- The
cellranger-atac mkfastq
pipeline is deprecated and will be removed in the next Cell Ranger ATAC release. - The 20,000 total cell limit has been lifted. Previously, there was a hard-coded limit of 20,000 cells for any
cellranger-atac
analysis. Even if your data contained more cells, the pipeline would cap the output at this number. This restriction has now been removed from the algorithm. This allows for a more accurate analysis of larger datasets. - With the removal of the overall cell limit, the
--force-cells
parameter in thecellranger-atac count
andcellranger-atac reanalyze
pipelines is no longer restricted to 20,000. You can now specify any positive integer value for this parameter to force the calling of a specific number of cells. - The parameter
--nosecondary
has been enabled forcellranger-atac count
andcellranger-atac aggr
. Learn more about the parameter. - Strand information is now included in the
fragments.tsv
file, determined from R1/R2 adapter sequences. This enhancement improves the performance of downstream filters, specifically doublet and inclusion list contamination detection. - The secondary analysis output folder output by
cellranger-atac count
andaggr
pipelines now includes the "peaks" prefix in the folder name. This changes the folder name from, for example,/outs/analysis/lsa/1_components
to/outs/analysis/lsa/peaks_1_components
. - Cell Ranger ATAC can now process FASTQs with a quality score up to the full supported range (93 instead of 41).
- Minor optimizations have been implemented in the
cellranger-atac count
andaggr
pipelines, resulting in small increases to processing speed.
- The 2024-A reference transcriptomes for human (GRCh38) and mouse (GRCm39) are now available for download. For more information, please visit the Cell Ranger ARC page to access the reference release notes and build steps. Additionally, the 2020-A references remain available for download.
-
New features:
-
Added a batch correction algorithm to
cellranger-atac aggr
, intended for use when aggregating datasets that use different chemistries. See the aggr and algorithms pages for more details on how and when to use batch effect correction. -
Added UMAP as an alternate projection in Loupe and Cell Ranger ATAC output files.
-
Added support for custom peaks with peaks on non-primary contigs.
-
Minor cell calling improvements:
- Added less stringent filtering of low-targeting barcodes.
- Improved calling on multi-species samples.
-
-
Reference changes:
- Added support for overlapping gene annotations in
cellranger-atac mkref
. - Cell Ranger ATAC 2.1 can only be run on references generated by
cellranger-atac mkref
version 2.1 or 2.0. Cell Ranger ATAC 2.0 can only be run on references that are constructed usingcellranger-atac mkref
version 2.0.
- Added support for overlapping gene annotations in
-
Web summary improvements:
- Enabled improved identification of the non-nucleosomal fragment fraction. The metric "fragments in nucleosome free regions" reported in the web_summary.html has been redefined as the fraction of high quality fragments smaller than 124 basepairs (previously, the threshold was 147 basepairs).
- Increased warning threshold for number of called cells from 10k to 15k.
- Added warning to web summary when forcing cell calls to unsupported levels.
- Added notification when user is using custom peaks or forcing cell calls.
- Added links to support pages.
-
Bug fixes:
- Added missing format version (
VN:
) to header of BAM file. - Removed dependence on external Unix commands in BAM processing.
- The metric "Fraction of fragments overlapping any targeted region" reported in the web_summary.html was dropped in version 2.0 because some custom reference lacks DNase HS sites, or promoter/enhancer data, but the metric was still present in the
aggr
web_summary.html. This empty metric has been removed in v2.1. - In v2.0, the
on_target_fragments
metric (from theaggr
pipeline) was switched with theblacklist_region_fragments
in thesinglecell.csv
file. This column switch was fixed in v2.1.
- Added missing format version (
-
New wavelet-based peak caller:
- Eliminates large peaks much larger than 5kb. Peaks have a tighter size distribution around ~ 1kb.
- Improved detection of cluster-specific peaks.
- Improved reproducibility between technical replicates.
- Consistent performance across a range of cell loads.
- Fixes crashes in the signal-background fitting procedure.
-
Change in duplicate marking algorithm:
- Two read pairs are duplicates if they share the same start, end and cell barcode. Previously, only the start and end were used.
- Boosts median fragments per cell by as much as 25% at high cell loads.
- We no longer distinguish between PCR and sequencer duplicates.
-
Improved computational performance:
- Up to 4x faster, 0.5x disk requirements.
- Complete rewrite of read processing and differential accessibility analysis in Rust.
- Minimize disk I/O.
-
Change to pre-built references:
-
The following additional annotation tracks have been removed:
blacklist.bed
dnase.bed
ctcf.bed
enhancer.bed
promoter.bed
-
As a consequence, columns in the
singlecell.csv
output for the corresponding tracks are all zero. Additionally, theon_target_fragments
column, which is a sum of TSS, DNAse, enhancer and promoter columns represents the TSS fragments. This value would be lower than expected for this reason.
-
-
Breaking changes to reference package structure:
- Cell Ranger ATAC 2.0 pipelines cannot be run with a version 1.2 reference. Cell Ranger ARC 1.0+ or Cell Ranger ATAC 2.0 references must be used.
- Restrictions on the number of contigs or primary contigs in the reference have been eliminated.
- Change to the config file format to construct a reference using
mkref
. - Primary contigs are now defined to be the set of gene-containing contigs and cannot be specified by the user.
- Disabled support for URLs in the config file.
- Disabled support for GFF annotations, annotations must be in GTF format.
- Eliminated discrepancies between reference checks in
mkref
and preflight checks incount
. Previously, it was possible to pass checks inmkref
and fail checks incount
.
-
Added header lines beginning with
#
to the fragments.tsv.gz and peaks.bed files that contain version, reference and sample information. -
Eliminated the
--downsample
option and replaced by--subsample-rate
. -
Change to ATAC peak annotation:
- Annotate peaks using all genes provided in the reference GTF. Previously only certain gene types were used for annotation.
- The
peak
column is now split into three columns:chrom
,start
,end
. - When a peak has multiple gene annotations, the same peak appears in multiple rows with each annotation. Previously, each row represented one peak and multiple annotations were expressed using
;
separators in the same row.
-
Change to CSV definition file format for
cellranger-atac aggr
: eliminatedpeaks
column. Custom peaks may be specified using a--peaks
argument at the command line. -
Eliminated normalization mode
signal
fromcellranger-atac aggr
. -
Loupe browser files now contain pre-computed K-means clustering for K=2-5 (previously K=2-10).
-
Loupe browser files generated by the pipeline can only be opened by Loupe browser version 5.0 or later.
-
The web_summary.html file output from
cellranger-atac count
has been updated to be functionally consistent with that fromcellranger-arc count
. -
The PLSA algorithm is restricted to use one thread and computational performance is likely to be affected.
-
Change to metric names in summary.csv generated by
cellranger-atac count
. -
Eliminated secondary alignments from the position-sorted BAM.
- Allow robust handling of GFF3 input in
mkref
. - Fix a bug in the pre-generated human references where the Human Pseudoautosomal Region (PAR) genes are filtered out.
- Fix a bug in reporting the erroneous line number of mal-formatted bed file containing comment header lines.
- Fix a bug where the adjusted fragment bounds exceed the size of contig to which the fragment is mapped.
- Fix a bug in peak calling where the initialization of the mixture model fitting involved integer division instead of floating point division.
- Fix an issue in the peak calling algorithm where the mixture components were not always ordered with respect to each other in the same consistent way, leading to occasional stringent peak calls.
- Fix a bug in the interpolation formula used to evaluate the sensitivity of the assay at various downsampled depths.
- Allow better handling of whitespaces in file paths in
mkref
. - Add new metrics to metrics.json: median unique fragments per cell overlapping peaks, percentage of genome in peaks, barcode and gel bead multiplet rate.
- Fix a bug in the web summary file where the alert color for cell calls is not consistent with the thresholds.
- Add a feature to the websummary where a guidance message is printed at the top of the html file.
- Mask out barcodes associated with gel bead doublets from the set of barcodes on which cell calling is performed.
- Mask out barcodes associated with barcode multiplets from the set of barcodes on which cell calling is performed.
- Include new
mkref
tool to allow building of single-species custom references from fasta and gene annotations. - Remove metrics related to targeting based on custom files that may not be available for custom genomes.
- Include new
reanalyze
pipeline to allow rerunning data from a finished pipeline but with custom selection of peaks and barcodes, along with tweaking analysis parameters. - Include new
aggr
pipeline to allow aggregating data from multiple pipelines and analyzing it as one dataset. - Include GC and depth normalized differential enrichment analysis for accessibility of transcription factor binding motifs.
- Include depth normalized differential enrichment analysis for accessibility in peaks.
- Improve peak annotation to include associations of genes with distal peaks.
- Improve performance of motif scanning by setting moderate background nucleotide frequencies for peaks in extreme GC bins.
- Analysis output directory now has
enrichment
in place ofdiff_tf
. - Fix an issue in peak calling where signal and noise components of a mixture model get swapped and produces nonsensical threshold. Fixes via changing the mixture model components.
- Fix an issue in peak calling where odds-ratio determines the wrong threshold that would lead to calling entire genome in peaks.
- Replace the default clustering for LSA and PLSA based dimensionality reductions to be
spherical k-means
in place ofk-medoids
. - Add an additional step to filter out low targeting barcodes prior to cell calling, for better cell calling.
- Update references to include transcripts.bed file derived from gene annotations, used in annotating peaks.
- Fix an issue in which some peak annotations were reversed.
- Update references to fix an off-by-one issue in some tss.bed entries.
- Fix an issue where PLSA would crash on CPUs without AVX support.
- Fix a division-by-zero issue in calculating the background nucleotide % during motif scanning.
- Initial release.