Tag Assignment of 10x Genomics CellPlex Data Using Seurat’s HTODemux Function

May 18, 2022

Share via:

Note: 10x Genomics does not provide support for community-developed tools and makes no guarantees regarding their function or performance. Please contact tool developers with any questions. If you have feedback about Analysis Guides, please email analysis-guides@10xgenomics.com.

Occasionally users may want to use tools other than Cell Ranger for assigning cells to samples with their CellPlex (CMO) data. Here are some tools that may be used for alternative processing of CellPlex data.

HTODemux: Function for tag assignment originally implemented to demultiplex antibody based tags (Stoeckius et al., Genome Biol 19, 224 (2018)).
MULTIseqDemux: Seurat implementation of the tag assignment algorithm from Multi-seq (McGinnis et al., Nat Methods 16, 619–626 (2019)).
hashedDrops: Function for tag assignment available in the package DropletUtils.

This tutorial provides guidance for using Seurat's HTODemux function on data generated using CellPlex technology. The tutorial also shows how to generate a csv file with tag assignments from HTODemux that can be input into Cell Ranger to generate output files such as .cloupe with custom tag assignments.

The first step of the process is to run Cell Ranger to generate the matrix of UMI counts for Gene Expression and Multiplexing Capture libraries. This page describes the cellranger multi command needed to analyze CellPlex data. Once cellranger multi has completed, you can use the matrix generated by the multi pipeline as input to HTODemux for custom tag assignment.

For custom tag assignment purposes, you will use the raw feature-barcode matrix generated by the multi pipeline. This tutorial needs the MEX format matrix. It is located in outs/multi/count/raw_feature_bc_matrix/ in the Cell Ranger output directory.

If you do not have your own data to work with, for the purpose of working through this tutorial, you can use one of our public datasets. For example a 10k 1:1 mixture of Raji and Jurkat cells available here. Download the raw feature barcode matrix for this data here.


# Decompress the raw matrix file
# The command line below works on UNIX based operating systems
tar -xzvf SC3_v3_NextGem_DI_CellPlex_Jurkat_Raji_10K_Multiplex_count_raw_feature_bc_matrix.tar.gz

Load Seurat library in R and then load the 10x Genomics feature-barcode matrix using the Read10X function. This tutorial assumes you have already installed R and Seurat. If you need help installing R or Seurat, please refer to Installation Instructions for Seurat.


library(Seurat)
data_dir <- "outs/multi/count/raw_feature_bc_matrix/"
data <- Read10X(data.dir = data_dir)

You will see an alert 10X data contains more than one type and is being returned as a list containing matrices of each type., which is okay.

Create Seurat object, and add data for "Multiplexing Capture" library type as CMO assay.


seurat_object = CreateSeuratObject(counts = data$`Gene Expression`)
seurat_object[['CMO']] = CreateAssayObject(counts = data$`Multiplexing Capture`)

The raw feature-barcode matrix contains counts for all observed barcodes. Therefore, it may contain barcodes that have only ambient mRNA. You will likely want to process only cells containing barcodes in HTODemux. To do this, subset the matrix for barcodes that are called as cell-associated by Cell Ranger. The list of cell-associated barcodes can be found in the following file output by Cell Ranger: outs/multi/multiplexing_analysis/assignment_confidence_table.csv. For the data used in this tutorial, you can download the file here. This file has a column called "Barcodes" which is the cell-associated barcode id. You can import the cell barcodes listed in the above file into R using the following commands:


library(data.table)
cells <- fread("assignment_confidence_table.csv",select = c("Barcodes"))

Check if the above command worked by listing some of the records in cells. It should look like below:


cells[1:5]
             Barcodes
1: AAACCCAAGAGTGTGC-1
2: AAACCCAAGATGCTTC-1
3: AAACCCAAGGTACAGC-1
4: AAACCCAAGTAGCTCT-1
5: AAACCCACACGGATCC-1

Extract only cell barcodes from the matrix.


seurat_object_use <- subset(seurat_object, cells = cells$Barcodes)

Clean up CMO data using the following criteria:

Retain the data for only those CMOs that were used in this dataset. For example for this dataset, CMO301 and CMO302 were used for multiplexing.
Remove any cells that have 0 CMO counts.


DefaultAssay(seurat_object_use)<-"CMO"
seurat_object_use <- subset(seurat_object_use, features = c("CMO301","CMO302"))
seurat_object_use = subset(x = seurat_object_use, subset = nCount_CMO > 0)

Normalize the CMO data and run the HTODemux function. For more details about this function, please see this vignette by Seurat.


seurat_object_norm <- NormalizeData(seurat_object_use, assay = "CMO", normalization.method = "CLR")
seurat_object_demux <- HTODemux(seurat_object_norm, assay = "CMO", positive.quantile = 0.99)
## Output: Cutoff for CMO301 : 4599 reads Cutoff for CMO302 : 3371 reads

Print a summary of tag assignments.


table(seurat_object_demux$CMO_classification.global)

## Output:

 Doublet Negative  Singlet
     587     3310     9718

If you would like to re-run Cell Ranger with the tag assignments generated above, we need to rename the categories output by HTODemux to identifiers that Cell Ranger accepts and then write the data to a CSV format file.


levels(seurat_object_demux$hash.ID)[levels(seurat_object_demux$hash.ID)=="Negative"] <- "Unassigned"
levels(seurat_object_demux$hash.ID)[levels(seurat_object_demux$hash.ID)=="Doublet"] <- "Multiplet"
write.table(c("Barcode,Assignment"),"./cells_assigned.csv",quote = FALSE,sep=",",col.names = FALSE,row.names=FALSE)
write.table(seurat_object_demux$hash.ID,"./cells_assigned.csv",quote = FALSE,sep=",",col.names = FALSE,append=TRUE)

At this point, you are ready to run Cell Ranger (version 7.0 and later) with tag assignments in cells_assigned.csv file made via HTODemux tool. To do this, you will input the cells_assigned.csv into the multi config file. Here is an example of the config file.


[gene-expression]
ref,/path/to/refdata_cellranger/GRCh38-2020-A
expect-cells,10000
barcode-sample-assignment,/path/to/cells_assigned.csv

[libraries]
fastq_id,fastqs,feature_types
GEX_fastqs_id,/path/to/GEX_fastqs/,gene expression
MUX_fastqs_id,/path/to/MUX_fastqs/,Multiplexing Capture

[samples]
sample_id,cmo_ids,description
Jurkat,CMO301,Jurkat
Raji,CMO302,Raji

Run Cell Ranger command:


cellranger multi --id=htodemux --csv=./config.csv

Following articles by Seurat's team may be helpful:

https://satijalab.org/seurat/reference/htodemux: HTOdemux vignette
https://satijalab.org/seurat/reference/read10x: Read 10x data
https://satijalab.org/seurat/articles/essential_commands.html: Essential Seurat commands

R session information:


sessionInfo()
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.14.2  SeuratObject_4.0.4 Seurat_4.1.0

loaded via a namespace (and not attached):
  [1] nlme_3.1-157          matrixStats_0.62.0    spatstat.sparse_2.1-1 RcppAnnoy_0.0.19
  [5] RColorBrewer_1.1-3    httr_1.4.2            sctransform_0.3.3     tools_4.2.0
  [9] utf8_1.2.2            R6_2.5.1              irlba_2.3.5           rpart_4.1.16
 [13] KernSmooth_2.23-20    uwot_0.1.11           mgcv_1.8-40           lazyeval_0.2.2
 [17] colorspace_2.0-3      tidyselect_1.1.2      gridExtra_2.3         compiler_4.2.0
 [21] cli_3.2.0             plotly_4.10.0         scales_1.2.0          lmtest_0.9-40
 [25] spatstat.data_2.2-0   ggridges_0.5.3        pbapply_1.5-0         goftest_1.2-3
 [29] stringr_1.4.0         digest_0.6.29         spatstat.utils_2.3-0  pkgconfig_2.0.3
 [33] htmltools_0.5.2       parallelly_1.31.1     fastmap_1.1.0         htmlwidgets_1.5.4
 [37] rlang_1.0.2           shiny_1.7.1           generics_0.1.2        zoo_1.8-10
 [41] jsonlite_1.8.0        ica_1.0-2             spatstat.random_2.2-0 dplyr_1.0.8
 [45] magrittr_2.0.3        patchwork_1.1.1       Matrix_1.4-1          Rcpp_1.0.8.3
 [49] munsell_0.5.0         fansi_1.0.3           abind_1.4-5           reticulate_1.24
 [53] lifecycle_1.0.1       stringi_1.7.6         MASS_7.3-56           Rtsne_0.16
 [57] plyr_1.8.7            grid_4.2.0            parallel_4.2.0        listenv_0.8.0
 [61] promises_1.2.0.1      ggrepel_0.9.1         crayon_1.5.1          miniUI_0.1.1.1
 [65] deldir_1.0-6          lattice_0.20-45       cowplot_1.1.1         splines_4.2.0
 [69] tensor_1.5            pillar_1.7.0          igraph_1.3.1          spatstat.geom_2.4-0
 [73] future.apply_1.9.0    reshape2_1.4.4        codetools_0.2-18      leiden_0.3.10
 [77] glue_1.6.2            png_0.1-7             vctrs_0.4.1           httpuv_1.6.5
 [81] gtable_0.3.0          RANN_2.6.1            purrr_0.3.4           spatstat.core_2.4-2
 [85] polyclip_1.10-0       tidyr_1.2.0           scattermore_0.8       future_1.25.0
 [89] ggplot2_3.3.6         mime_0.12             xtable_1.8-4          later_1.3.0
 [93] survival_3.3-1        viridisLite_0.4.0     tibble_3.1.6          cluster_2.1.3
 [97] globals_0.14.0        fitdistrplus_1.1-8    ellipsis_0.3.2        ROCR_1.0-11

Stay connected with latest technical workflow and software updatesSubscribe to newsletter