Running Cell Ranger count

This tutorial is written with Cell Ranger v6.1.2. Commands are compatible with later versions of Cell Ranger, unless noted otherwise.

The cellranger count pipeline aligns sequencing reads in FASTQ files to a reference transcriptome and generates a .cloupe file for visualization and analysis in Loupe Browser, along with a number of other outputs compatible with other publicly-available tools for further analysis.

We will call our working directory the yard. Start by making a directory to run the analysis in.


mkdir ~/yard/run_cellranger_count
cd ~/yard/run_cellranger_count

Next, download FASTQ files from one of the publicly-available data sets on the 10x Genomics support site. This example uses the 1,000 PBMC data set from human peripheral blood mononuclear cells (PBMC), consisting of lymphocytes (T cells, B cell, and NK kills) and monocytes.


wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_fastqs.tar

The size of this dataset is 5.17G and takes a few minutes to download.

Since this is a tar file and not a tar.gz file, you don't need the -z argument used in previous tutorials to extract it.


tar -xvf pbmc_1k_v3_fastqs.tar

The output is similar to the following:


pbmc_1k_v3_fastqs/
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_I1_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz
pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_I1_001.fastq.gz

Now you have a directory of two sets of FASTQ files, and can see they are named based on the bcl2fastq2 naming convention: Sample_S1_L00X_R1_001.fastq.gz. The files names indicate that they were all from the same sample called pbmc_1k_v3 and the library was run on two lanes - Lane 1: L001 and Lane 2: L002.

Next, you need a reference transcriptome. From the download page for the FASTQ files it showed that these are human cells. There are several prebuilt human reference transcriptome packages on the 10x Genomics support site. Download the latest package and decompress it.


wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
tar -zxvf refdata-gex-GRCh38-2020-A.tar.gz

The size of the reference genome is 10.6G and takes ~five minutes to download.

Saving data and reference files

Once you have downloaded and extracted the reference transcriptome files, you can keep them for future runs. However, if you need to delete to save space on your server between runs, the pre-compiled reference files are publicly-available, and can re-downloaded if needed.

Your raw data FASTQ files, however, are raw data that cannot be replaced. We strongly recommend backing these up and archiving them in case something happens to the disk space.

Once you have FASTQ files and a reference transcriptome, you are ready to run cellranger count.

Print the usage statement to see what is needed to build the command.


cellranger count --help

The output is similar to the following:


cellranger-count
Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well

USAGE:
    cellranger count [FLAGS] [OPTIONS] --id <ID> --transcriptome <PATH>

FLAGS:
          --no-bam                  Do not generate a bam file
          --nosecondary             Disable secondary analysis, e.g. clustering. Optional
          --include-introns         Include intronic reads in count
          --no-libraries            Proceed with processing using a --feature-ref but no Feature Barcode libraries
                                    specified with the 'libraries' flag
          --no-target-umi-filter    Turn off the target UMI filtering subpipeline. Only applies when --target-panel is
                            used
          --dry                     Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop
          --disable-ui              Do not serve the web UI
          --noexit                  Keep web UI running after pipestance completes or fails
          --nopreflight             Skip preflight checks
      -h, --help                    Prints help information
  ...

To run cellranger count, you need to specify an --id. This can be any string, which is a sequence of alpha-numeric characters, underscores, or dashes and no spaces, that is less than 64 characters. Cell Ranger creates an output directory that is named using this id. This directory is called a "pipeline instance" or pipestance for short.

The --fastqs should be a path to the directory containing the FASTQ files. If you demultiplexed your data using cellranger mkfastq, you can use the path to fastq_path directory in the outs from the pipeline. If there is more than one sample in the FASTQ directory, use the --sample argument to specify which samples to use. This --sample argument works off of the sample id at the beginning of the FASTQ file name. It is unnecessary for this tutorial run because all of the FASTQ files are from the same sample, but it is included as an example. The last argument needed is the path to the --transcriptome reference package. Be sure to edit the file paths in the command below.


cellranger count --id=run_count_1kpbmcs \
   --fastqs=/mnt/home/user.name/yard/run_cellranger_count/pbmc_1k_v3_fastqs \
   --sample=pbmc_1k_v3 \
   --transcriptome=/mnt/home/user.name/yard/run_cellranger_count/refdata-gex-GRCh38-2020-A

Since this is a full-sized dataset, it can take several hours to complete.

The output is similar to the following:


/mnt/yard/user.name/yard/apps/cellranger-7.2.0/bin
cellranger count (7.2.0)
Copyright (c) 2021 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Martian Runtime - v4.0.6
...
2021-10-15 17:12:42 [perform] Serializing pipestance performance data.
Waiting 6 seconds for UI to do final refresh.
Pipestance completed successfully!

When the output of the cellranger count command says, “Pipestance completed successfully!”, this means the job is done.

The cellranger count pipeline outputs are in the pipestance directory in the outs folder. List the contents of this directory with ls -1.


ls -1 run_count_1kpbmcs/outs

The output is similar to the following:


├── analysis
├── cloupe.cloupe
├── filtered_feature_bc_matrix
├── filtered_feature_bc_matrix.h5
├── metrics_summary.csv
├── molecule_info.h5
├── possorted_genome_bam.bam
├── possorted_genome_bam.bam.bai
├── raw_feature_bc_matrix
├── raw_feature_bc_matrix.h5
└── web_summary.html

Check the web_summary.html to see results of the experiment. You can also load the cloupe.cloupe file into the Loupe Browser and start an analysis. This outs/ directory also contains a number of outputs that can be used as input for software tools developed outside of 10x Genomics, such as the Seurat R package.

Get data

Saving data and reference files

Set up the command for cellranger count

Run cellranger count

Explore the output of cellranger count