10x Genomics Support/Cell Ranger/Tutorials/

Running Cell Ranger aggr

This tutorial is written with Cell Ranger v6.1.2. Commands are compatible with later versions of Cell Ranger, unless noted otherwise.

The cellranger aggr pipeline is optional. It is used to aggregate, or combine two cellranger count runs together. With experiments involving multiple samples, and multiple 10x Chromium GEM wells, libraries must each be processed in separate runs of cellranger count. To compare samples to each other for differential expression analysis, cellranger aggr is used to combine output files from each run of cellranger count to produce one single feature-barcode matrix and a .cloupe file for visualizing with Loupe Browser.

Use the following publicly available molecule_info.h5 files:

  • 1,000 PBMC experiment
  • 10,000 PBMC data set

Start by making a directory to run the aggr pipeline in:

mkdir run_cellranger_aggr cd run_cellranger_aggr

Next, download the data files.

wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_molecule_info.h5 wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_molecule_info.h5

These are small files, less than 1GB each and usually take less than one minute to download.

The next step is to build the CSV file. CSV stands for comma separated value. For specific instructions for creating this CSV, see the cellranger aggr page.

The CSV file is a two-column file. The first column is for the sample id. This id name can be anything you want. Choose descriptive ids since they are used later in the analysis. The second column contains the paths to the molecule_info.h5 output files from the cellranger count pipelines.

For Cell Ranger v6.0+ and Loupe Browser v5.1.0+, the libraries CSV header should be 'sample_id,molecule_h5'. For prior software versions, it should be 'library_id,molecule_h5'.

From the same directory where the HDF5 files were downloaded, use the pwd command to print out the path:

pwd

The output is similar to the following:

/path/to/run_cellranger_aggr

Copy the path to make the CSV file. Use the text editor of your choice to make this file. This example uses nano.

nano pbmc_aggr.csv

This opens the nano text editor. Paste the text into the editor. Edit the /path/to/ part for each molecule_info.h5 file so it matches the absolute path of the file on your system.

sample_id,molecule_h5 1k_pbmcs,/path/to/run_cellranger_aggr/pbmc_1k_v3_molecule_info.h5 10k_pbmcs,/path/to/run_cellranger_aggr/pbmc_10k_v3_molecule_info.h5

Exit out of the nano text editor by pressing keys and then pressing for "Yes" to save the file.

Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES) ? Y Yes N No ^C Cancel

Nano then asks you:

File Name to Write: pbmc_aggr.csv

Press the key to confirm keeping this filename and saving the file. Now you are back to the command prompt.

We have now saved our Linux-formatted CSV file and exited out of the nano text editor.

Run the --help command to print the usage statement and view the input requirements.

cellranger aggr --help

This command prints the following:

cellranger-aggr Aggregate data from multiple Cell Ranger runs USAGE: cellranger aggr [FLAGS] [OPTIONS] --id <ID> --csv <CSV> FLAGS: --nosecondary Disable secondary analysis, e.g. clustering --dry Do not execute the pipeline. Generate a pipeline invocation (.mro) file and stop --disable-ui Do not serve the web UI --noexit Keep web UI running after pipestance completes or fails --nopreflight Skip preflight checks -h, --help Prints help information OPTIONS: --id <ID> A unique run id and output folder name [a-zA-Z0-9_-]+ ...

This pipeline has two inputs:

  • --id is used to name the output directory that the pipeline runs in.
  • --csv takes a CSV file that points to the outputs from the cellranger count pipeline.

Next, build the command line and run it.

cellranger aggr --id=1k_10k_pbmc_aggr --csv=pbmc_aggr.csv

The output is similar to the following:

2021-10-28 19:59:07 [perform] Serializing pipestance performance data. Waiting 6 seconds for UI to do final refresh. Pipestance completed successfully! 2021-10-28 19:59:13 Shutting down.

Just like the other pipelines, when you see “Pipestance completed successfully!” the job is done, and the pipeline outputs are in the pipestance directory in the outs/ folder. List the contents of this directory:

ls -1 1k_10k_pbmc_aggr/outs/

The output is similar to the following:

├── aggregation.csv ├── count │   ├── analysis │   │   ├── clustering │   │   ├── diffexp │   │   ├── pca │   │   ├── tsne │   │   └── umap │   ├── cloupe.cloupe │   ├── filtered_feature_bc_matrix │   │   ├── barcodes.tsv.gz │   │   ├── features.tsv.gz │   │   └── matrix.mtx.gz │   ├── filtered_feature_bc_matrix.h5 │   └── summary.json └── web_summary.html

The outputs are similar to those from the cellranger count pipeline, with the exception of the BAM files and molecule_info.h5 files. More information about outputs is available in the Understanding Outputs section.