Xenium Ranger Algorithms Overview

The relabel pipeline works by building a 1:1 map between the previously used gene panel and the new gene panel. This map is used to correct the gene labels given to each of the discovered transcripts. After correcting the labels, all results are recomputed using the same methods employed by the Xenium Onboard Analysis pipeline.

The resegment pipeline uses the same nucleus segmentation model as the corresponding Xenium Onboard Analysis (XOA) version's segmentation algorithm. For example, Xenium Ranger v3.0 uses the nucleus segmentation model included in the XOA v3.0 release. XOA segmentation algorithm changes by software version are summarized in this table.

Starting in XOA and Xenium Ranger v4.0, nucleus segmentation uses the DAPI and 18S rRNA marker (when Cell Segmentation Staining workflow used) 2D global focus map images, and is supplemented by the 3D DAPI Z-stack results. For resegmentation in pre-v2.0 XOA datasets, nucleus segmentation is performed with the autofocus and 3D Z-stack DAPI images. Nuclei are filtered by 95th percentile pixel intensity (the intensity threshold can be adjusted with the --dapi-filter parameter).

See Xenium Onboard Analysis algorithm overview for details about the nuclear expansion-only and multimodal cell segmentation algorithms.

Resegment with multimodal cell segmentation

In the Xenium Onboard Analysis pipeline v2.0 and later, the multimodal cell segmentation algorithm results are prioritized in this order for each cell:

Segment cells based on their cell boundary stain: The inferred segmentation from this method should be closest to the true cell membrane boundary. It uses cell-surface marker antibodies to target epithelial markers (ATP1A1, E-Cadherin) and immune markers (pan-lymphocyte: CD45). This method can split nuclei, define cells missing a nucleus, and identify multinucleate cells. Nuclei that overlap with anucleate cells are assigned to the cell.
Segment cells based on expansion from the nucleus to the cell interior stain edge: This method includes both a deep learning model and a nuclear expansion method using the interior stain to infer cell boundaries. It uses the interior stain (18S rRNA marker) and the DAPI stain for nuclei.
Nuclear expansion: For cases where cells that do not have boundary or interior stains, segment cells with a nuclear (DAPI) expansion distance of 5 µm or until another cell boundary is encountered (described more on the XOA segmentation algorithms page).

In Xenium Ranger:

If --boundary-stain is enabled (default), the algorithm will do cell segmentation using the selected boundary stain and DAPI nuclear expansion for any cells that do not have a boundary stain. If disabled, Xenium Ranger will not use the boundary stain segmentation method.
If --interior-stain is enabled (default), the algorithm will do interior segmentation and expansion with selected stain and DAPI nuclear expansion for any cells that do not have an interior stain. If disabled, Xenium Ranger will not use the interior stain segmentation method.

Next, if boundary cell segmentation results are available, Xenium Ranger can assign nuclei that overlap significantly with a boundary-segmented cell to that cell. A boundary-segmented cell can have multiple overlapping nuclei. For each nucleus, if 50% or more of that nucleus overlaps with the cell boundary, the overlapping portion of the nucleus is assigned to that cell. For each nucleus, if 50% or more of that nucleus is outside the boundary-segmented cell, the algorithm designates it as a new nucleus outside of that cell and continues to the next prioritized segmentation method (interior segmentation or free expansion). This guarantees that a nucleus will never partially overlap a boundary-segmented cell in the final result. If the nucleus and cell overlap is < 50%, it will be removed from the outputs.

For the remaining nuclei, if interior segmentation is available, the algorithm then finds the nuclei that have significant overlap with the interior stain and expands those nuclei with the interior stain mask. Finally, the remaining nuclei that do not overlap significantly with either boundary-segmented or interior-segmented cells will expand isotropically by the --expansion-distance parameter (5 µm default in v2.0).

Starting in Xenium Ranger v4.0, cells can be resegmented using the multimodal cell segmentation algorithm described above with an additional function to the boundary stain method to better detect and segment larger cell types. This feature is specified by adding --segment-large-cells to the resegment command.

Training the large cell segmentation model

The large cell segmentation algorithm is trained separately from the standard multimodal cell segmentation model to optimize for large cell types. The training data includes a variety of human and mouse tissue types. Internal research suggests this method could improve segmentation for cell types such as human skeletal muscle cells, dorsal root ganglion (DRG) neurons, and adipocytes, as well as mouse DRG neurons and cardiomyocytes.

Downsampling image data

The Xenium segmentation algorithm infers cell boundaries by looking at cells in one pixel patch at a time (1024 x 1024). Large cells may not be captured by a single patch, which can result in incomplete and fragmented segmentation. Thus, the large cell segmentation algorithm first downsamples the 2D DAPI and boundary stain images (e.g., ch0001_atp1a1_e-cadherin_cd45.ome.tif) to create a 4x smaller version (0.25x). The algorithm uses linear interpolation to downsample image pixels to create a more continuous transition in pixel values. This process enables the algorithm to more consistently detect the entire cell shape within a single patch instead of fragmented views across patches.

Segmenting cell boundaries

At the cell boundary stain method step of multimodal cell segmentation, Xenium Ranger will segment cells in parallel with the standard cell boundary method (1x model; "Segment cells based on their cell boundary stain" described above) and the large cell method (0.25x model).

Consolidating segmentation results

For cells segmented by both cell boundary methods, Xenium Ranger determines which method's results to save by comparing an internal segmentation quality score for each result. Cells that are segmented by the large cell segmentation method are labeled "Segmented by boundary stain at 0.25x" and those segmented by the standard cell boundary method are labeled "Segmented by boundary stain". For cells that cannot be segmented by either boundary method, the algorithm proceeds onwards to the interior stain and the nuclear expansion methods.

Xenium Ranger can import a variety of community-developed and XOA segmentation formats. XOA segmentation algorithm changes by software version are summarized in this table.

The methods used to incorporate new segmentations fall under three scenarios (read more below):

Import nucleus and cell labeled segmentation masks (TIFF or NPY), where each pixel is an integer corresponding to the cell ID
Import nucleus and cell segmentation polygons (GeoJSON)
Import transcript-based segmentations

For every scenario, a unique random ID is assigned to each cell in the same string format used by the XOA pipeline.

In scenario 1, if the user only imports a nuclear segmentation mask, then a new cell segmentation is generated by nuclear expansion. If both nuclei and cells are imported, then Xenium Ranger will inspect the masks for consistency.

If nuclei and cells are imported, imported nuclei are treated as nuclei and imported cells are treated in the same way as boundary-segmented cells (described above for resegment pipeline). An imported cell can have multiple overlapping nuclei. For each nucleus, if 50% or more of that nucleus overlaps with the cell boundary, the overlapping portion of the nucleus is assigned to that cell. For each nucleus, if 50% or more of that nucleus is outside the imported cell, the algorithm designates it as a new nucleus outside of that cell and continues to the next prioritized segmentation method (nuclear expansion). This guarantees that a nucleus will never partially overlap an imported cell in the final result. If the nucleus and cell overlap is < 50%, it will be removed from the outputs.
If only cells are imported, Xenium Ranger will produce two polygon sets and masks in the cells.zarr.zip file, where the polygon set and mask that are usually reserved for nuclei will be empty.
If only nuclei are imported, Xenium Ranger will isotropically expand using the --expansion-distance parameter (5 µm default in v2.0 and later).

In scenario 2, Xenium Ranger first takes the input GeoJSON polygons and converts them into labeled masks. Given the flexibility of the GeoJSON format, it is possible the input polygons do not fit neatly into a mask. For example, two polygons could overlap one another. In the process of converting polygons into masks, Xenium Ranger detects polygons that overlap one another and marks the overlapping pixels as ambiguous. The ambiguous pixels are then resolved by assigning the pixel to the object with the most neighboring pixels. Metrics are generated to explain how many ambiguous pixels were found. For polygons with holes ("non-simple polygons"), the holes are removed. For cells defined as multipolygons, the cell is removed entirely. Metrics are generated to report these removals. After masks have been generated, the remaining methods follow scenario 1.

In scenario 3, when importing a transcript-based segmentation, Xenium Ranger records the cell assignments for each of the transcripts. Subsequently, all results are recomputed using the imported transcript assignments. When constructing the cell-feature matrix, Xenium Ranger uses the transcript quality score from the transcripts output file and only includes transcripts with Q-score ≥ 20. If any cells only have low quality transcripts, this will result in cells with zero transcripts in the cell-feature matrix file.

As mentioned above, there can potentially be issues in converting imported polygons into a mask. For the case of importing transcript assignments, Xenium Ranger will not try to convert the visualization polygons into a mask. Instead, it will generate an empty mask and leave the polygons untouched.

To combine spatial data in the --viz-polygons GeoJSON with transcript data in the --transcript-assignment segmentation CSV, Xenium Ranger matches the CSV file's integer suffix (e.g., "3" in cell = "CRc17aaabcd-3" for a Baysor ID) to the GeoJSON file's integer value (e.g., "cell":3 for a Baysor ID). Every cell in the visualized polygons must have a transcript assigned to it or Xenium Ranger will error. See this Knowledge Base article for information about cleaning transcript-based segmentation inputs.

Relabel

Resegment

Nucleus segmentation

Cell segmentation

Large cell segmentation

Import-segmentation

Segmentation masks

Segmentation polygons

Transcript-based segmentation