Feature Slice H5

The feature slice HDF5 dataset contains Visium HD or Visium HD 3' Gene Expression data, structured for easy and efficient retrieval of 2 µm resolution "image" slices for single or multiple genes. The data can be binned to arbitrary scales or plotted against the microscope tissue image.

An HDF5 dataset is typically composed of groups, which can contain other groups or datasets. Datasets are raw byte arrays stored in a binary compressed manner. Furthermore, each group or dataset can have associated attributes.

There are eight groups that make a feature_slice.h5 dataset:


$ h5ls feature_slice.h5
feature_slices           Group
features                 Group
images                   Group
masks                    Group
reads                    Group
secondary_analysis       Group
segmentations            Group
umis                     Group

Each group is documented below.

This group contains a group for the index of each feature (gene) listed in feature_slice.h5/features/{id} where {id} is the index of the feature in the /features group. Only features that have at least one total UMI are stored here, i.e., if a specific feature is missing, that feature had no UMIs observed in this sample.


$ h5ls feature_slice.h5/feature_slices
0                        Group
10                       Group
100                      Group
1000                     Group
10007                    Group
.
.
999                      Group
9990                     Group
9991                     Group

Each gene-specific group contains the matrix row, col and data that compose the slice of the gene expression data for that gene.


$ h5ls feature_slice.h5/feature_slices/0
col                      Dataset {48/Inf}
data                     Dataset {48/Inf}
row                      Dataset {48/Inf}

This group contains datasets related to the feature (gene) names and ids in the reference transcriptome. It is identical to the features group stored in the HDF5 raw/filtered feature-barcode matrix output from the pipeline.


$ h5ls feature_slice.h5/features
_all_tag_keys            Dataset {1}
feature_type             Dataset {32285}
genome                   Dataset {32285}
id                       Dataset {32285}
name                     Dataset {32285}
target_sets              Group

$ h5ls feature_slice.h5/features/target_sets
Visium\ Mouse\ Transcriptome\ Probe\ Set\ v2.0 Dataset

This group contains the grayscale microscope and CytAssist images projected onto the grid of 2 µm squares.


$ h5ls feature_slice.h5/images/microscope
col                      Dataset {11785262/Inf}
data                     Dataset {11785262/Inf}
row                      Dataset {11785262/Inf}

This group contains a binary image mask marking bins under tissue. There are four different groups:


$ h5ls feature_slice.h5/masks
filtered                 Group
square_008um             Group
square_020um             Group
square_050um             Group

The filtered group corresponds to the raw resolution bins (square_002um).

Each of the groups again stores a matrix for the mask:


$ h5ls feature_slice.h5/masks/square_008um
col                      Dataset {391917/Inf}
data                     Dataset {391917/Inf}
row                      Dataset {391917/Inf}

This group provides details on barcode correction, read processing, and mapping to the reference.

barcode_corrected_sequenced_reads indicates the number of sequenced reads where the raw barcode was corrected.
sequenced_reads tracks reads assigned to this barcode, with or without correction.
half_mapped tracks read pairs where one read successfully maps to the reference, but its mate does not map or maps unexpectedly.
split-mapped tracks instances where different parts of a single read align to distinct, non-contiguous regions of the reference genome or transcriptome.
unmapped_reads tracks reads that could not be aligned to the reference genome or transcriptome at all.


$ h5ls feature_slice.h5/reads
barcode_corrected_sequenced Group
half_mapped              Group
sequenced                Group
split_mapped             Group
unmapped_reads           Group

This group contains secondary analysis results: clustering, pca, and umap. For more information about secondary analyis, see this page.


$ h5ls feature_slice.h5/secondary_analysis
clustering               Group
pca                      Group
umap                     Group

This group contains cell and nucleus segmentation results.


$ h5ls feature_slice.h5/segmentations
cell_segmentation_mask   Group
nucleus_segmentation_mask Group

This group contains the total UMI spatial matrix organized similar to the per-gene group (feature_slice.h5/feature_slices/index).


$ h5ls feature_slice.h5/umis/total
col                      Dataset {5451603/Inf}
data                     Dataset {5451603/Inf}
row                      Dataset {5451603/Inf}

Here is some example python code for how to bin a feature slice.


import h5py as h5
import numpy as np

ROW_DATASET_NAME = "row"
COL_DATASET_NAME = "col"
DATA_DATASET_NAME = "data"

METADATA_JSON_ATTR_NAME = "metadata_json"
UMIS_GROUP_NAME = "umis"
TOTAL_UMIS_GROUP_NAME = "total"


class CooMatrix:
    row: list[int]
    col: list[int]
    data: list[int | float]

    @classmethod
    def from_hdf5(cls, group):
        return cls(
            row=group[ROW_DATASET_NAME][:],
            col=group[COL_DATASET_NAME][:],
            data=group[DATA_DATASET_NAME][:],
        )

    def to_ndarray(self, nrows, ncols, binning_scale = 1):
        """Convert the COO matrix representation to a dense ndarray at the specified binning scale."""
        ncols_binned = int(np.ceil(ncols / binning_scale))
        nrows_binned = int(np.ceil(nrows / binning_scale))

        result = np.zeros((nrows_binned, ncols_binned), dtype="int32")
        for row, col, data in zip(self.row, self.col, self.data):
            result[row // binning_scale, col // binning_scale] += data
        return result

# Load total UMIs at 8um bin size
with h5.File("hd_feature_slide.h5", "r") as h5_file:
    metadata = json.loads(h5_file.attrs[METADATA_JSON_ATTR_NAME])
    umis_8um = CooMatrix.from_hdf5(h5_file[UMIS_GROUP_NAME][TOTAL_UMIS_GROUP_NAME]).to_ndarray(
        nrows=metadata["nrows"], ncols=metadata["ncols"], binning_scale=4
    )

Each feature_slice.h5 file also contains metadata in the following format that may be useful for example to translate between the original microscope image and the barcoded array space, or to obtain the full dimensions of the barcoded array:


$ h5dump -a /metadata_json feature_slice.h5
HDF5 "feature_slice.h5" {
ATTRIBUTE "metadata_json" {
   DATATYPE  H5T_STRING {
      STRSIZE H5T_VARIABLE;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SCALAR
   DATA { (content reformatted below for readability)
   (0): {
		"sample_id": "10001_10002",
		"sample_desc": "SJ0118_SJ118-7-D_100_100_10_Brain_5_mouse_Brain_D",
		"slide_name": "visium_hd_rc1",
		"nrows": 3350,
		"ncols": 3350,
		"spot_pitch": 2.0,
		"hd_layout_json": {
			"slide_uid": "UNKNOWN",
			"file_format": "n/a",
			"aligner_version": "n/a",
			"input_hash": "n/a",
			"slide_design": "n/a",
			"transform": [
				1.0,
				0.0,
				0.0,
				0.0,
				1.0,
				0.0,
				0.0,
				0.0,
				1.0
			]
		},
		"transform_matrices": {
			"spot_colrow_to_microscope_colrow": [
				[
					0.041985535298565406,
					-7.298152402766004,
					25585.255831045426
				],
				[
					7.298152402766004,
					0.041985535298565406,
					2956.752264439149
				],
				[
					0.0,
					0.0,
					1.0
				]
			],
			"microscope_colrow_to_spot_colrow": [
				[
					0.000788241806459152,
					0.13701644608939972,
					-425.29105551521997
				],
				[
					-0.13701644608939972,
					0.000788241806459152,
					3503.2701905117615
				],
				[
					0.0,
					0.0,
					1.0
				]
			],
			"spot_colrow_to_cytassist_colrow": null,
			"cytassist_colrow_to_spot_colrow": null
		}
	}
}

}
}

Feature slices

Features

Images

Masks

Reads

Secondary analysis

Segmentations

UMIs

How to bin a feature slice

Metadata