Archiving Xenium Data

The Xenium platform aims to support and embrace principles of data findability, accessibility, interoperability, and reusability (FAIR) so that it is easy to share newly generated Xenium data for collaborative analysis and reproduce findings from published Xenium data.

What Xenium output should I keep for archival storage for reanalysis and grant funding requirements?

We recommend archiving Xenium raw data outputs, which consist of:

Decoded transcripts with assigned Phred-scaled Q-Scores
High-resolution morphology images

Decoded transcripts are provided in .zarr and .parquet format. Morphology images are provided in ome.tif format. These data should be archived to fulfill grant funding requirements and for reanalysis, and may be submitted to repositories such as GEO. All other Xenium outputs are derived from these raw data in Xenium Onboard Analysis, can be rederived after a Xenium instrument run, and are not strictly necessary for long-term archival and reproducibility.

Additional detail on Xenium raw data output:

A Xenium Q-Score indicates the probability that the detected object exists and was correctly identified by the decoding algorithm. All decoded transcript Q-Scores are output in the transcripts files. The cells and cell-feature matrix output files in the Xenium output bundle are filtered to Q-Score ≥ 20. For more details, see our Overview of Xenium Algorithms support page.
Xenium morphology images (including DAPI, cell segmentation, and protein) will always be provided at the same resolution that our onboard segmentation algorithm uses as input. This ensures that you can benefit from improvements to our segmentation model as we add to its training over time, or run your own segmentation methods if you choose. Our off-instrument reanalysis package, Xenium Ranger, enables you to easily rerun segmentation or import your own segmentation results to generate derived outputs (e.g., cell-feature matrix) and view them in Xenium Explorer for RNA-only and RNA + protein datasets.
We will stand by these FAIR principles with future capabilities. High-resolution morphology images will continue to be included in the Xenium output bundle for our onboard multimodal segmentation method.
Other outputs from Xenium Onboard Analysis (XOA) are derived data from these raw outputs, and the community can recapitulate them from Xenium raw data.

Xenium raw data reduces low-level internal sensor data as described at Overview of Xenium Algorithms. It preserves details needed to assess decoded transcript quality, abstracting away low-level details of the instrumentation and assay that require calibration and specialized methods that will change over time as the platform improves and gains new capabilities.

On-instrument processing of Xenium internal sensor data — i.e., the 3D per-pixel values that Xenium Analyzer’s internal image sensor captures across multiple FOVs, multiple fluorescence channels, and multiple cycles of chemistry and imaging processing — is closely tied to Xenium optics. Consequently, Xenium internal sensor data cannot be reanalyzed after processing with Xenium Onboard Analysis.

Internal sensor data is not practically useful for reanalysis or storage (~tens of terabytes of data per sample). In the spirit of scientific reproducibility, it is more useful to store the Xenium decoded transcripts with assigned Phred-scaled Q-Scores and morphology images (typical output directory sizes) for reanalysis.

To add further transparency and to supplement existing methods to QC Xenium data, downsampled RNA diagnostic images are available in the Xenium auxiliary output directory in Xenium Onboard Analysis v1.6 and later. In XOA v1.7 and later, these images are also available in the Analysis Summary. These images are not needed for raw data archival, but should be useful in gaining confidence in the robustness of Xenium's decoding algorithm.

Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more.

The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.

To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.

The tables below show estimated output directory sizes (GB) as a function of tissue area (cm²) and transcript density (transcripts per µm²), assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:

0.72 cm² tissue area
11 Z-slices
162k cells
62.4M transcripts
0.25 cells per 100 µm²
107 transcripts > Q20 per 100 µm²
80% of transcripts > Q20

Estimates are based on data generated with the cell segmentation staining workflow and multimodal cell segmentation.

Xenium v1 estimated output directory size (GB) with XOA v4.0:

Sample source	Tissue area (cm²)	Transcript density: 0.5 transcripts/µm²	Transcript density: 1 transcript/µm²
		Estimated directory size (Total transcripts)	Estimated directory size (Total transcripts)
Core needle biopsy	0.01	0.3 GB (500k)	0.3 GB (1M)
Coronal mouse brain hemisphere	0.5	14 GB (25M)	15 GB (50M)
Full coronal mouse brain	1	29 GB (50M)	31 GB (100M)
Tissue section covering entire sample area	2.35	67 GB (117M)	72 GB (235M)

Xenium Prime 5K estimated output directory size (GB) with XOA v4.0:

Sample source	Tissue area (cm²)	Transcript density: 1.5 transcripts/µm²	Transcript density: 3 transcripts/µm²	Transcript density: 12 transcripts/µm²
		Estimated directory size (Total transcripts)	Estimated directory size (Total transcripts)	Estimated directory size (Total transcripts)
Core needle biopsy	0.01	0.3 GB (1.5M)	0.4 GB (3M)	0.7 GB (12M)
Coronal mouse brain hemisphere	0.5	16 GB (75M)	19 GB (150M)	35 GB (600M)
Full coronal mouse brain	1	33 GB (150M)	39 GB (300M)	70 GB (1.2B)
Tissue section covering entire sample area	2.35	77 GB (352M)	91 GB (705M)	165 GB (2.8B)

Estimated output directory size (GB) for Xenium In Situ Gene and Protein Expression with Cell Segmentation Staining datasets run with XOA v4.0. These calculations are for a sample with an estimated transcript density of 1 transcript/µm²:

Sample source	Tissue area (cm²)	Estimated directory size for 12-protein marker dataset	Estimated directory size for 27-protein marker dataset
Core needle biopsy	0.01	0.5 GB	0.6 GB
Coronal mouse brain hemisphere	0.5	23 GB	31 GB
Full coronal mouse brain	1	47 GB	64 GB
Tissue section covering entire sample area	2.35	110 GB	149 GB

The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:

Dataset	Chemistry	Tissue area (cm²)	Total transcripts (MM)	Output directory size (GB)
Mouse brain tiny subset	Xenium v1	~0.17	9	3.5
Mouse brain full coronal section	Xenium v1	0.66	34	13.0
FFPE human breast, Tissue 1	Xenium v1	0.90	68	24.4
FFPE human breast using the entire sample area, Replicate 1	Xenium v1	2.28	106	51.9
FFPE human ovarian cancer	Xenium Prime	0.86	120	26.7
FF human ovary	Xenium Prime	1.98	2,164	144
FFPE human renal cell carcinoma	Xenium v1 + protein	0.66	63	36.4

Archiving data

Output directory size

Estimated output directory sizes: RNA

Estimated output directory sizes: RNA + protein

Public dataset directory sizes