Oct 3, 2016

Single cell RNA-Seq data analysis using Cell Ranger and Seurat

Fred 10x

The human brain is a complex organ. Roughly 100 billion neurons communicate across nearly 125 trillion synapses to integrate the sights, sounds and feelings we associate with the world around us. Yes, even while watching football and eating wings on a Sunday afternoon your brain is hard at work. Understanding how such a complicated system allows you to taste those wings, and makes you enjoy football is no small task, but has become more tractable with high-throughput single cell technologies like the Chromium Single Cell 3’ Solution. This technology allows the measurement of gene expression in tens of thousands of individual cells, enabling discovery of novel cell types and genetic markers that provide valuable insights into biological function. With the power to interrogate complex systems comes the need to analyze complex datasets and fortunately 10x Genomics and the single cell research community are working hard to provide computational solutions that are approachable to both seasoned bioinformaticians and newbies alike.

There are several analysis challenges that are common to most single cell experiments. First, the complexity of the data needs to be reduced by focusing on the genes that contribute to most of the differences between the individual cells, while reducing the impact of less informative genes. This is commonly achieved through Principal Components Analysis (PCA) in a process known as dimensionality reduction. PCA can also be used to project these data in 2 or 3 dimensions, however the tSNE algorithm is generally preferred for visualization.  With this more easily manageable data, the next step is to group cells based on their similar gene expression profiles, or clustering, to identify putative cell types or cell states. Kmeans clustering is often used at this step, although other methods such as hierarchical, or density-based clustering can be effective as well. Lastly, differential gene expression analysis is performed to determine which genes define a given cluster (putative cell type). The genes can then be used to assign the cells to a given cell type (if the cell type is known), or can be used for Gene Set Enrichment Analysis (GSEA) to elucidate the biological pathways enriched in these cells and infer their function.

All of these analysis tools are available as part of the Cell Ranger pipeline, a freeware software solution that transforms raw Illumina sequencing data from Chromium Single Cell libraries into easy to work with file formats that are ready for downstream analysis. Using the provided Cell Ranger R-kit, these files can be read directly into the R programming environment where users can perform PCA, Kmeans clustering and differential gene expression analysis, as well as visualize their results using the popular tSNE algorithm. The R-kit also enables users to normalize and combine datasets from different Chromium runs, making it a breeze to compare the results of multiple experiments. By working in the R programming environment, this analysis can easily be transitioned into other single cell analysis software packages including Seurat, a cutting-edge R package developed at the New York Genome Institute. Seurat provides a robust computational framework to identify significant sources of variation in the data, perform clustering using hierarchical and density-based approaches and identify significantly enriched genes using a variety of methods optimized for single cell datasets.

On **Tuesday October 4th, Dr. Rahul Satija will be presenting a Nature Webcast **demonstrating how Seurat can be applied to 10x Genomics Single Cell 3’ data to reveal structure in heterogeneous samples and identify novel cell types, using a 68,000 PBMC dataset as an example.  Register for the webinar

Learn more about Cell Ranger Software and download single cell datasets

Get started with data analysis:

  • Seurat tutorial