Occasionally users may want to use tools other than Cell Ranger for assigning cells to samples with their CellPlex (CMO) data. Here are some tools that may be used for alternative processing of CellPlex data.
- HTODemux: Function for tag assignment originally implemented to demultiplex antibody based tags (Stoeckius et al., Genome Biol 19, 224 (2018)).
- MULTIseqDemux: Seurat implementation of the tag assignment algorithm from Multi-seq (McGinnis et al., Nat Methods 16, 619–626 (2019)).
- hashedDrops: Function for tag assignment available in the package DropletUtils.
This tutorial provides guidance for using Seurat's HTODemux function on data generated using CellPlex technology. The tutorial also shows how to generate a csv file with tag assignments from HTODemux that can be input into Cell Ranger to generate output files such as
.cloupe with custom tag assignments.
The first step of the process is to run Cell Ranger to generate the matrix of UMI counts for
Gene Expression and
Multiplexing Capture libraries. This page describes the
cellranger multi command needed to analyze CellPlex data. Once
cellranger multi has completed, you can use the matrix generated by the
multi pipeline as input to HTODemux for custom tag assignment.
For custom tag assignment purposes, you will use the raw feature-barcode matrix generated by the
multi pipeline. This tutorial needs the MEX format matrix. It is located in
outs/multi/count/raw_feature_bc_matrix/ in the Cell Ranger output directory.
If you do not have your own data to work with, for the purpose of working through this tutorial, you can use one of our public datasets. For example a 10k 1:1 mixture of Raji and Jurkat cells available here. Download the raw feature barcode matrix for this data here.
# Decompress the raw matrix file # The command line below works on UNIX based operating systems tar -xzvf SC3_v3_NextGem_DI_CellPlex_Jurkat_Raji_10K_Multiplex_count_raw_feature_bc_matrix.tar.gz
Load Seurat library in R and then load the 10x Genomics feature-barcode matrix using the Read10X function. This tutorial assumes you have already installed R and Seurat. If you need help installing R or Seurat, please refer to Installation Instructions for Seurat.
library(Seurat) data_dir <- "outs/multi/count/raw_feature_bc_matrix/" data <- Read10X(data.dir = data_dir)
You will see an alert
10X data contains more than one type and is being returned as a list containing matrices of each type., which is okay.
Create Seurat object, and add data for "Multiplexing Capture" library type as CMO assay.
seurat_object = CreateSeuratObject(counts = data$`Gene Expression`) seurat_object[['CMO']] = CreateAssayObject(counts = data$`Multiplexing Capture`)
The raw feature-barcode matrix contains counts for all observed barcodes. Therefore, it may contain barcodes that have only ambient mRNA. You will likely want to process only cells containing barcodes in HTODemux. To do this, subset the matrix for barcodes that are called as cell-associated by Cell Ranger. The list of cell-associated barcodes can be found in the following file output by Cell Ranger:
For the data used in this tutorial, you can download the file here. This file has a column called "Barcodes" which is the cell-associated barcode id. You can import the cell barcodes listed in the above file into R using the following commands:
library(data.table) cells <- fread("assignment_confidence_table.csv",select = c("Barcodes"))
Check if the above command worked by listing some of the records in
cells. It should look like below:
cells[1:5] Barcodes 1: AAACCCAAGAGTGTGC-1 2: AAACCCAAGATGCTTC-1 3: AAACCCAAGGTACAGC-1 4: AAACCCAAGTAGCTCT-1 5: AAACCCACACGGATCC-1
Extract only cell barcodes from the matrix.
seurat_object_use <- subset(seurat_object, cells = cells$Barcodes)
Clean up CMO data using the following criteria:
- Retain the data for only those CMOs that were used in this dataset. For example for this dataset,
CMO302were used for multiplexing.
- Remove any cells that have 0 CMO counts.
DefaultAssay(seurat_object_use)<-"CMO" seurat_object_use <- subset(seurat_object_use, features = c("CMO301","CMO302")) seurat_object_use = subset(x = seurat_object_use, subset = nCount_CMO > 0)
Normalize the CMO data and run the HTODemux function. For more details about this function, please see this vignette by Seurat.
seurat_object_norm <- NormalizeData(seurat_object_use, assay = "CMO", normalization.method = "CLR") seurat_object_demux <- HTODemux(seurat_object_norm, assay = "CMO", positive.quantile = 0.99) ## Output: Cutoff for CMO301 : 4599 reads Cutoff for CMO302 : 3371 reads
Print a summary of tag assignments.
table(seurat_object_demux$CMO_classification.global) ## Output: Doublet Negative Singlet 587 3310 9718
If you would like to re-run Cell Ranger with the tag assignments generated above, we need to rename the categories output by HTODemux to identifiers that Cell Ranger accepts and then write the data to a CSV format file.
levels(seurat_object_demux$hash.ID)[levels(seurat_object_demux$hash.ID)=="Negative"] <- "Unassigned" levels(seurat_object_demux$hash.ID)[levels(seurat_object_demux$hash.ID)=="Doublet"] <- "Multiplet" write.table(c("Barcode,Assignment"),"./cells_assigned.csv",quote = FALSE,sep=",",col.names = FALSE,row.names=FALSE) write.table(seurat_object_demux$hash.ID,"./cells_assigned.csv",quote = FALSE,sep=",",col.names = FALSE,append=TRUE)
At this point, you are ready to run Cell Ranger (version 7.0 and later) with tag assignments in
cells_assigned.csv file made via HTODemux tool. To do this, you will input the
cells_assigned.csv into the
multi config file. Here is an example of the config file.
[gene-expression] ref,/path/to/refdata_cellranger/GRCh38-2020-A expect-cells,10000 barcode-sample-assignment,/path/to/cells_assigned.csv [libraries] fastq_id,fastqs,feature_types GEX_fastqs_id,/path/to/GEX_fastqs/,gene expression MUX_fastqs_id,/path/to/MUX_fastqs/,Multiplexing Capture [samples] sample_id,cmo_ids,description Jurkat,CMO301,Jurkat Raji,CMO302,Raji
Run Cell Ranger command:
cellranger multi --id=htodemux --csv=./config.csv
Following articles by Seurat's team may be helpful:
- https://satijalab.org/seurat/reference/htodemux: HTOdemux vignette
- https://satijalab.org/seurat/reference/read10x: Read 10x data
- https://satijalab.org/seurat/articles/essential_commands.html: Essential Seurat commands
R session information:
sessionInfo() > sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  data.table_1.14.2 SeuratObject_4.0.4 Seurat_4.1.0 loaded via a namespace (and not attached):  nlme_3.1-157 matrixStats_0.62.0 spatstat.sparse_2.1-1 RcppAnnoy_0.0.19  RColorBrewer_1.1-3 httr_1.4.2 sctransform_0.3.3 tools_4.2.0  utf8_1.2.2 R6_2.5.1 irlba_2.3.5 rpart_4.1.16  KernSmooth_2.23-20 uwot_0.1.11 mgcv_1.8-40 lazyeval_0.2.2  colorspace_2.0-3 tidyselect_1.1.2 gridExtra_2.3 compiler_4.2.0  cli_3.2.0 plotly_4.10.0 scales_1.2.0 lmtest_0.9-40  spatstat.data_2.2-0 ggridges_0.5.3 pbapply_1.5-0 goftest_1.2-3  stringr_1.4.0 digest_0.6.29 spatstat.utils_2.3-0 pkgconfig_2.0.3  htmltools_0.5.2 parallelly_1.31.1 fastmap_1.1.0 htmlwidgets_1.5.4  rlang_1.0.2 shiny_1.7.1 generics_0.1.2 zoo_1.8-10  jsonlite_1.8.0 ica_1.0-2 spatstat.random_2.2-0 dplyr_1.0.8  magrittr_2.0.3 patchwork_1.1.1 Matrix_1.4-1 Rcpp_126.96.36.199  munsell_0.5.0 fansi_1.0.3 abind_1.4-5 reticulate_1.24  lifecycle_1.0.1 stringi_1.7.6 MASS_7.3-56 Rtsne_0.16  plyr_1.8.7 grid_4.2.0 parallel_4.2.0 listenv_0.8.0  promises_188.8.131.52 ggrepel_0.9.1 crayon_1.5.1 miniUI_0.1.1.1  deldir_1.0-6 lattice_0.20-45 cowplot_1.1.1 splines_4.2.0  tensor_1.5 pillar_1.7.0 igraph_1.3.1 spatstat.geom_2.4-0  future.apply_1.9.0 reshape2_1.4.4 codetools_0.2-18 leiden_0.3.10  glue_1.6.2 png_0.1-7 vctrs_0.4.1 httpuv_1.6.5  gtable_0.3.0 RANN_2.6.1 purrr_0.3.4 spatstat.core_2.4-2  polyclip_1.10-0 tidyr_1.2.0 scattermore_0.8 future_1.25.0  ggplot2_3.3.6 mime_0.12 xtable_1.8-4 later_1.3.0  survival_3.3-1 viridisLite_0.4.0 tibble_3.1.6 cluster_2.1.3  globals_0.14.0 fitdistrplus_1.1-8 ellipsis_0.3.2 ROCR_1.0-11