10x Genomics Support/Loupe Browser/Tutorials/

Filtering and Reanalysis Workflow in Single Cell Data

  • Identify cells of interest, perform graph-based clustering, and generate t-SNE and/or UMAP projections.
  • Screen out low-quality cells, possible multiplets, and cells with high expression of mitochondrial genes before generating new t-SNE and UMAP projections.

By default, the .cloupe file from a gene expression dataset includes all barcodes called as cells by Cell Ranger's cell calling algorithm. The default clusters and projections in a .cloupe file are derived from this set of cells. However, it may be more useful to only analyze a subset of these cells. For example, it may be desirable to more precisely screen out possible cell multiplets, dead cells, or cells with low diversity. Alternatively, it may be preferable to focus on a particular type of cell, or even remove a particular cell type from an analysis.

For these reasons, Loupe Browser v5.0 and later provides an interactive reanalysis workflow. In a few short steps, it is possible to identify cells of interest, and then compute a Louvain clustering and t-SNE projection over these cells. Loupe Browser v5.1 and later additionally supports the generation of a UMAP projection.

Important
Loupe Browser v6.2 and later supports reanalysis for GEX (Gene Expression) data in GEX + Antibody Capture, GEX + CRISPR Guide Capture, and GEX + ATAC datasets.

Click the Reanalyze button in the Mode Selector panel to enter the reanalyze workflow. A new window is launched:

To exit the introduction pop up, click the blue 'Got it!' button at the bottom. You may choose to check the box that prevents this pop up from appearing the next time.

There are three columns for all steps in the reanalyze workflow:

  • The leftmost column shows the current progress through the workflow steps. It is possible to advance or go back to any step in the workflow at any time.
  • The middle column contains the tooling for the active step.
  • The rightmost column shows statistics about which barcodes have been removed. On the bottom of the Reanalyze window, there are buttons to advance to the next step or skip to the final step. Each step in the workflow is described in the sections below.

The first step, Review Barcodes, allows an initial filtering by either whole clusters or a barcode list. It is connected to the main window; changing the category in the main window will change the active category in the reanalysis workflow. By selecting or de-selecting clusters in the main window, it is possible to either include or exclude entire clusters of barcodes from downstream analysis.

For this tutorial, select the disease state group with the Normal Adjacent Tissue cluster de-selected, as shown below:

The reanalyze workflow will respond in kind, selecting the two disease state clusters:

It is also possible to filter by custom groups, such as those created with the lasso tools, quantitative filters, boolean filters, or CSV import. It is recommended that these groups be created before initiating the reanalysis workflow.

Finally, for finer-grained control or to filter by lists defined by external algorithms, it is possible to either explicitly add or remove a set of barcodes by clicking the Upload CSV link below the plot.

The next step is to threshold by UMI count. This step shows a violin plot of UMI counts of the currently selected barcodes. Moving the sliders at the top and bottom of the distribution will remove barcodes from outside the range. It is also possible to enter numerical values explicitly or see the distribution on a log plot. For this tutorial, an upper UMI count limit of 70,000 UMIs per barcode on the linear scale will be used, as shown below:

The next step is to threshold by a distinct number of detected features. For gene expression datasets (including Feature Barcode libraries), this will be the number of distinct genes (features) found in each barcode. Depending on the experiment, barcodes with anomalously low or high numbers of distinct features may be undesirable. For this tutorial, a lower feature count bound of 100 features per barcode on the linear scale (6.6439 equivalent on log scale) and an upper bound of 9000 (13.1357 equivalent on log scale) will be used, as shown below:

The next step is to filter cells by mitochondrial fraction -- the percentage of UMIs per barcode associated with mitochondrial genes. This step requires either the selection of a predefined reference (human or mouse), or uploading the set of mitochondrial genes for a custom reference.

To select from the list of pre-recognized references, click the Select a reference genome drop-down menu. The options will show the percentage of mitochondrial genes in the reference that are present in the dataset. The lung carcinoma tutorial dataset is a human dataset, with most mitochondrial genes present. Note that the human reference list of mitochondrial genes is prefaced with "MT-" (e.g., "MT-ATP6", "MT-CO1", etc.), which may not match all gene names used in custom references.

To specify your own list of mitochondrial genes, create a text-based file with a ".csv" file extension that has no header and lists one gene per row. You can parse the custom reference GTF file to find the exact names used for the mitochondrial genes (described in older versions of this tutorial).

Your CSV should look similar to:

ATP6 ATP8 COX1 COX2 COX3 CYTB ND1 ND2 ND3 ND4 ND4L ND5 ND6

Now, the CSV file can be used in Reanalyze by clicking the Upload CSV button.

After selecting a reference or uploading a gene list, another violin plot and slider will be visible. In this tutorial, we set a mitochondrial fraction upper bound of 15%. This threshold will vary depending on your experiment.

After filtering, the next step is to determine the type of plot to generate. It is possible to generate a t-SNE and/or UMAP projection. Selecting both projections will double the processing time.

Under the Adjust reanalyze parameters (for advanced users) drop-down menu, it is possible to enter custom parameters for the dimensionality reduction used for clustering, or the parameters for generating the t-SNE and UMAP plots respectively. For each parameter, there are detailed instructions if you select Learn more. Defaults are recommended, and no action is necessary if the default values are acceptable. In this tutorial, a t-SNE projection with default reanalyze parameters was selected.

Finally, the last step is to name the reanalyzed dataset. The name will be used in the main window as both the projection and clustering category, so it should be recognizable. In this tutorial, we use the name "reanalyzed disease states".

Click the Reanalyze button to kick off the algorithms. In the background, Loupe will run virtually the same principal components, Louvain clustering, and t-SNE algorithms as the Cell Ranger pipeline.

Run time will depend on your local machine speed, but is most dependent on the number of barcodes going into the reanalysis, and whether you are running a t-SNE projection, a UMAP projection, or both. If only generating a single projection, expect most datasets under 10,000 cells to reprocess in less than two minutes. Larger datasets above 30,000 cells may take over 10 minutes, and there is a hard cap at 100,000 cells. Datasets near that 100,000-cell limit may take nearly an hour to process. Generating both a t-SNE and a UMAP projection will double the processing time. To reduce run time, consider only generating a UMAP projection, which will complete in roughly half the time compared to a t-SNE projection for datasets of 20,000 cells and above.

Once the reanalysis is complete, you should see the following:

In Loupe Browser v6.0 and later, you can export a CSV file with the projection coordinates for the t-SNE and/or UMAP projection(s) that were generated from this window by clicking Export Projection(s).

When reanalysis completes, click on the Done button, which will close the workflow window, and bring up the new projection and category in the main window. You can now find it under a separate Analysis category in the View Selector menu. You can also export the projection CSV file by clicking the three vertical dots on the View Selector for each plot type:

All operations in Loupe done while the reanalysis-derived projection is selected will be limited to the barcodes in that projection. It is possible to calculate significant features limited to the reanalyzed barcodes, visualize gene expression projections in that cell subset, as well as see clonotype lists limited to the active barcode set. In addition, selecting a category derived from a reanalysis will automatically load the projection associated with that reanalysis.

However, it is still possible to change projections while a reanalysis-derived category is active to see how the recomputed clusters map onto the larger data:

Saving the .cloupe at this time will save the reanalyzed projections and categories only (though not any computed differential expression data). Finally, it is possible to either tweak the reanalysis or recall its parameters by clicking on the Edit Reanalyze button, located below any reanalysis category.

Which 10x Genomics products can I reanalyze?

Reanalyze is available for Single Cell Gene Expression, Single Cell Multiome ATAC + Gene Expression and Visium Spatial Gene Expression datasets. If you are analyzing a Single Cell Gene Expression dataset with Feature Barcode data, reanalysis is possible, but the algorithm will only consider genes in the reanalysis. New projections will not be created based on Feature Barcode analytes.

How many cells can I reanalyze with Loupe Browser? Are there any limits?

You can reanalyze a minimum of ten cells and a maximum of 100,000 cells. If your dataset is larger than 100,000 cells, you can make use of the cellranger reanalyze pipeline.

Does reanalyze recompute the PCA?

Yes, reanalyze recomputes the PCA. You can also specify the exact number of principal components by entering a specific number into the field “Number of Principal Components” in the "Reanalyze” step under the Adjust reanalyze parameters (for advanced users) drop-down.

What type of projection does reanalyze generate (e.g. t-SNE, UMAP)?

In Loupe Browser v5.1 and later, reanalyze provides the option to generate a t-SNE projection, a UMAP projection, or both.

Why is reanalysis taking so long?

Do not be concerned if reanalysis is taking a while. Speed is dependent on your processing power, the size of your dataset, and whether you select one or both t-SNE and UMAP projections. A 30,000-cell dataset with a single projection may take around ten minutes or more. If reanalysis is taking much longer than expected, try restarting Loupe Browser.

How do I specify mitochondrial genes for the Mitochondrial UMI filter step?

Please see this example for parsing a Rhesus macaque custom reference GTF file.

How can I provide feedback or feature requests related to reanalyze?

Email our support exports at [email protected].