Cell Ranger ATAC's pipelines analyze sequencing data produced from Chromium Epi ATAC libraries. This involves the following steps:
- Run
cellranger-atac mkfastq
on the Illumina® BCL output folder to generate FASTQ files. - Run
cellranger-atac count
on each library that was demultiplexed bycellranger-atac mkfastq
.
For the following example, assume that the Illumina® BCL output is in a folder named /sequencing/140101_D00123_0111_AHAWT7ADXX
.
First, follow the instructions on running cellranger-atac mkfastq to generate FASTQ files. For example, if the flow cell serial number was HAWT7ADXX
, then cellranger-atac mkfastq
will output FASTQ files in HAWT7ADXX/outs/fastq_path
.
To generate single cell accessibility counts for a single library, run cellranger-atac count
with the following arguments. For a complete list of command-line arguments, run cellranger-atac count --help
.
These are the required command line arguments (also available through cellranger-atac count --help
):
Argument | Description |
---|---|
--id=ID | Required. A unique run ID string (e.g., sample345 ). The name is arbitrary and will be used to name the directory containing all pipeline-generated files and outputs. Only letters, numbers, underscores, and hyphens are allowed (maximum of 64 characters). |
--reference=PATH | Path to folder containing a Cell Ranger ATAC or Cell Ranger ARC reference. |
--fastqs | Either: Path of the fastq_path folder generated by cellranger-atac mkfastq e.g. /home/jdoe/runs/HAWT7ADXX/outs/fastq_path . This contains a directory hierarchy that cellranger-atac count will automatically traverse. - OR - Any folder containing FASTQ files, for example if the FASTQ files were generated by a service provider and delivered outside the context of the mkfastq output directory structure. Can take multiple comma-separated paths, which is helpful if the same library was sequenced on multiple flow cells. Doing this will treat all reads from the library, across flow cells, as one sample. We do not support combining analyses from multiple libraries for this version. |
See list of optional parameters on the command line arguments page.
After determining these input arguments, run cellranger-atac
:
$ cd /home/jdoe/runs
$ cellranger-atac count --id=sample345 \
--reference=/opt/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \
--fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \
--sample=mysample \
--localcores=8 \
--localmem=64
Following a set of preflight checks to validate input arguments, cellranger-atac count
pipeline stages will begin to run:
Martian Runtime - 4.0.7
Running preflight checks (please wait)...
By default, cellranger-atac
will use all of the cores available on your system to execute pipeline stages. You can specify a different number of cores to use with the --localcores
option; for example, --localcores=16
will limit cellranger-atac
to using up to sixteen cores at once. Similarly, --localmem
will restrict the amount of memory (in GB) used by cellranger-atac
.
The pipeline will create a new folder named with the sample ID you specified (e.g. /home/jdoe/runs/sample345
) for its output. If this folder already exists, cellranger-atac
will assume it is an existing pipestance and attempt to resume running it.
A successful cellranger-atac count
run should conclude with a message similar to this:
Outputs:
- Per-barcode fragment counts & metrics: /home/jdoe/runs/sample345/outs/singlecell.csv
- Position sorted BAM file: /home/jdoe/runs/sample345/outs/possorted_bam.bam
- Position sorted BAM index: /home/jdoe/runs/sample345/outs/possorted_bam.bam.bai
- Summary of all data metrics: /home/jdoe/runs/sample345/outs/summary.json
- HTML file summarizing data & analysis: /home/jdoe/runs/sample345/outs/web_summary.html
- Bed file of all called peak locations: /home/jdoe/runs/sample345/outs/peaks.bed
- Raw peak barcode matrix in hdf5 format: /home/jdoe/runs/sample345/outs/raw_peak_bc_matrix.h5
- Raw peak barcode matrix in mex format: /home/jdoe/runs/sample345/outs/raw_peak_bc_matrix
- Directory of analysis files: /home/jdoe/runs/sample345/outs/analysis
- Filtered peak barcode matrix in hdf5 format: /home/jdoe/runs/sample345/outs/filtered_peak_bc_matrix.h5
- Filtered peak barcode matrix in mex format: /home/jdoe/runs/sample345/outs/filtered_peak_bc_matrix
- Barcoded and aligned fragment file: /home/jdoe/runs/sample345/outs/fragments.tsv.gz
- Fragment file index: /home/jdoe/runs/sample345/outs/fragments.tsv.gz.tbi
- Filtered tf barcode matrix in hdf5 format: /home/jdoe/runs/sample345/outs/filtered_tf_bc_matrix.h5
- Filtered tf barcode matrix in mex format: /home/jdoe/runs/sample345/outs/filtered_tf_bc_matrix
- Loupe Browser input file: /home/jdoe/runs/sample345/outs/cloupe.cloupe
- csv summarizing important metrics and values: /home/jdoe/runs/sample345/outs/summary.csv
- Annotation of peaks with genes: /home/jdoe/runs/sample345/outs/peak_annotation.tsv
- Peak-motif associations: /home/jdoe/runs/sample345/outs/peak_motif_mapping.bed
Pipestance completed successfully!
The output of the pipeline will be contained in a folder named with the sample ID you specified (e.g. sample345
). The subfolder named outs/
will contain the main pipeline output files:
File Name | Description |
---|---|
singlecell.csv | Per-barcode fragment counts & metrics |
possorted_bam.bam | Position sorted BAM file |
possorted_bam.bam.bai | Position sorted BAM index |
summary.json | Summary of all data metrics |
web_summary.html | HTML file summarizing data & analysis |
peaks.bed | Bed file of all called peak locations |
raw_peak_bc_matrix.h5 | Raw peak barcode matrix in hdf5 format |
raw_peak_bc_matrix | Raw peak barcode matrix in mex format |
analysis | Directory of analysis files |
filtered_peak_bc_matrix.h5 | Filtered peak barcode matrix in hdf5 format |
filtered_peak_bc_matrix | Filtered peak barcode matrix |
fragments.tsv.gz | Barcoded and aligned fragment file |
fragments.tsv.gz.tbi | Fragment file index |
filtered_tf_bc_matrix.h5 | Filtered tf barcode matrix in hdf5 format |
filtered_tf_bc_matrix | Filtered tf barcode matrix in mex format |
cloupe.cloupe | Loupe Browser input file |
summary.csv | summary metrics in CSV form |
peak_annotation.tsv | Peak-gene associations based on genome proximity |
peak_motif_mapping.bed | Peak motif associations. Note that one peak could be associated with multiple transcription factor motifs. |
Once cellranger-atac count
has successfully completed, you can browse the resulting summary HTML file in any supported web browser, open the .cloupe
file in Loupe Browser, or refer to the Understanding Output section to explore the data by hand.