Analysis Guides/

Visium HD Multi-sample Analysis in Python: a Tutorial in Google Colab

Aug 27, 2025
Share via:

Note: 10x Genomics does not provide support for community-developed tools and makes no guarantees regarding their function or performance. Please contact tool developers with any questions. If you have feedback about Analysis Guides, please email analysis-guides@10xgenomics.com.



Understanding how gene expression differs between samples or conditions can provide novel biological insights. This Analysis Guide details a comprehensive workflow within Scverse’s ecosystem of Python packages to determine differentially expressed genes across multiple Visium HD Spatial Gene Expression datasets. The guide outlines a general approach that includes importing and creating SpatialData objects, merging multiple datasets, and pseudobulking. It also covers optional topics like applying batch correction with Harmony and performing data sketching for efficient processing.

This tutorial aims to familiarize you with Python libraries for Visium HD data analysis and inspire your own data analysis journey. As spatial transcriptomic data analysis evolves, we encourage researchers to learn about and explore new community-published tools and algorithms.

To begin, open the Google Colab notebook:

Open Google Colab notebook

The key steps described in this tutorial include:

  • Python Environment Setup and Library Installation: Setting up a Python virtual environment, installing necessary libraries, and defining helper functions.
  • Data Download Links: Downloading instructions for the public dataset used in this Analysis Guide.
  • Conversion of Space Ranger Output to Zarr Format and SpatialData Object Creation: Converting Space Ranger output to a Zarr format and loading data into a merged SpatialData object.
  • Quality Control and Filtering: Initial assessment and filtering of genes and bins based on gene expression levels.
  • Data Normalization and Dimensionality Reduction: Standardizing gene expression data and reducing its complexity using PCA.
  • Clustering and UMAP Visualization: Applying Leiden clustering and visualizing the UMAP projection of the data.
  • Batch Correction (Optional): Using Harmony to remove batch effects from the datasets.
  • Spatial Visualization of Clusters: Plotting the identified clusters on the Visium HD images to visualize their spatial distribution.
  • Marker Gene Identification and Cluster Annotation: Determining genes characteristic of each cluster and assigning biological labels to clusters.
  • Differential Gene Expression Analysis: Identifying genes that are differentially expressed between different sample conditions (e.g., cancer vs. normal) within specific cell types or spatial domains.

The Appendix covers:

  • Sketch Downsampling: Reducing the dataset size for faster processing while maintaining biological variability.
  • Projection of Sketched Results onto the Full Dataset: Projecting the clusters identified in the sketched data back onto the full dataset.
Stay connected with latest technical workflow and software updatesSubscribe to newsletter