The Xenium platform aims to support and embrace principles of data findability, accessibility, interoperability, and reusability (FAIR) so that it is easy to share newly generated Xenium data for collaborative analysis and reproduce findings from published Xenium data.
What Xenium output should I keep for archival storage for reanalysis and grant funding requirements?
We recommend archiving Xenium raw data outputs, which consist of:
- Decoded transcripts with assigned Phred-scaled Q-Scores
- High-resolution morphology images
Decoded transcripts are provided in .zarr and .parquet format. Morphology images are provided in ome.tif format. These data should be archived to fulfill grant funding requirements and for reanalysis, and may be submitted to repositories such as GEO. All other Xenium outputs are derived from these raw data in Xenium Onboard Analysis, can be rederived after a Xenium instrument run, and are not strictly necessary for long-term archival and reproducibility.
Additional detail on Xenium raw data output:
- A Xenium Q-Score indicates the probability that the detected object exists and was correctly identified by the decoding algorithm. All decoded transcript Q-Scores are output in the transcripts files. The cells and cell-feature matrix output files in the Xenium output bundle are filtered to Q-Score ≥ 20. For more details, see our Overview of Xenium Algorithms support page.
- Xenium morphology images (including DAPI, cell segmentation, and protein) will always be provided at the same resolution that our onboard segmentation algorithm uses as input. This ensures that you can benefit from improvements to our segmentation model as we add to its training over time, or run your own segmentation methods if you choose. Our off-instrument reanalysis package, Xenium Ranger, enables you to easily rerun segmentation or import your own segmentation results to generate derived outputs (e.g., cell-feature matrix) and view them in Xenium Explorer for RNA-only and RNA + protein datasets.
- We will stand by these FAIR principles with future capabilities. High-resolution morphology images will continue to be included in the Xenium output bundle for our onboard multimodal segmentation method.
- Other outputs from Xenium Onboard Analysis (XOA) are derived data from these raw outputs, and the community can recapitulate them from Xenium raw data.
Xenium raw data reduces low-level internal sensor data as described at Overview of Xenium Algorithms. It preserves details needed to assess decoded transcript quality, abstracting away low-level details of the instrumentation and assay that require calibration and specialized methods that will change over time as the platform improves and gains new capabilities.
On-instrument processing of Xenium internal sensor data — i.e., the 3D per-pixel values that Xenium Analyzer’s internal image sensor captures across multiple FOVs, multiple fluorescence channels, and multiple cycles of chemistry and imaging processing — is closely tied to Xenium optics. Consequently, Xenium internal sensor data cannot be reanalyzed after processing with Xenium Onboard Analysis.
Internal sensor data is not practically useful for reanalysis or storage (~tens of terabytes of data per sample). In the spirit of scientific reproducibility, it is more useful to store the Xenium decoded transcripts with assigned Phred-scaled Q-Scores and morphology images (typical output directory sizes) for reanalysis.
To add further transparency and to supplement existing methods to QC Xenium data, downsampled RNA diagnostic images are available in the Xenium auxiliary output directory in Xenium Onboard Analysis v1.6 and later. In XOA v1.7 and later, these images are also available in the Analysis Summary. These images are not needed for raw data archival, but should be useful in gaining confidence in the robustness of Xenium's decoding algorithm.
Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more.
The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.
To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.
The tables below show estimated output directory sizes (GB) as a function of tissue area (cm2) and transcript density (transcripts per µm2), assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:
- 0.72 cm2 tissue area
- 11 Z-slices
- 162k cells
- 62.4M transcripts
- 0.25 cells per 100 µm2
- 107 transcripts > Q20 per 100 µm2
- 80% of transcripts > Q20
Estimates are based on data generated with the cell segmentation staining workflow and multimodal cell segmentation.
Xenium v1 estimated output directory size (GB) with XOA v4.0:
| Sample source | Tissue area (cm2) | Transcript density: 0.5 transcripts/µm2 | Transcript density: 1 transcript/µm2 | 
|---|---|---|---|
| Estimated directory size (Total transcripts) | Estimated directory size (Total transcripts) | ||
| Core needle biopsy | 0.01 | 0.3 GB (500k) | 0.3 GB (1M) | 
| Coronal mouse brain hemisphere | 0.5 | 14 GB (25M) | 15 GB (50M) | 
| Full coronal mouse brain | 1 | 29 GB (50M) | 31 GB (100M) | 
| Tissue section covering entire sample area | 2.35 | 67 GB (117M) | 72 GB (235M) | 
Xenium Prime 5K estimated output directory size (GB) with XOA v4.0:
| Sample source | Tissue area (cm2) | Transcript density: 1.5 transcripts/µm2 | Transcript density: 3 transcripts/µm2 | Transcript density: 12 transcripts/µm2 | 
|---|---|---|---|---|
| Estimated directory size (Total transcripts) | Estimated directory size (Total transcripts) | Estimated directory size (Total transcripts) | ||
| Core needle biopsy | 0.01 | 0.3 GB (1.5M) | 0.4 GB (3M) | 0.7 GB (12M) | 
| Coronal mouse brain hemisphere | 0.5 | 16 GB (75M) | 19 GB (150M) | 35 GB (600M) | 
| Full coronal mouse brain | 1 | 33 GB (150M) | 39 GB (300M) | 70 GB (1.2B) | 
| Tissue section covering entire sample area | 2.35 | 77 GB (352M) | 91 GB (705M) | 165 GB (2.8B) | 
Estimated output directory size (GB) for Xenium In Situ Gene and Protein Expression with Cell Segmentation Staining datasets run with XOA v4.0. These calculations are for a sample with an estimated transcript density of 1 transcript/µm2:
| Sample source | Tissue area (cm2) | Estimated directory size for 12-protein marker dataset | Estimated directory size for 27-protein marker dataset | 
|---|---|---|---|
| Core needle biopsy | 0.01 | 0.5 GB | 0.6 GB | 
| Coronal mouse brain hemisphere | 0.5 | 23 GB | 31 GB | 
| Full coronal mouse brain | 1 | 47 GB | 64 GB | 
| Tissue section covering entire sample area | 2.35 | 110 GB | 149 GB | 
The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:
| Dataset | Chemistry | Tissue area (cm2) | Total transcripts (MM) | Output directory size (GB) | 
|---|---|---|---|---|
| Mouse brain tiny subset | Xenium v1 | ~0.17 | 9 | 3.5 | 
| Mouse brain full coronal section | Xenium v1 | 0.66 | 34 | 13.0 | 
| FFPE human breast, Tissue 1 | Xenium v1 | 0.90 | 68 | 24.4 | 
| FFPE human breast using the entire sample area, Replicate 1 | Xenium v1 | 2.28 | 106 | 51.9 | 
| FFPE human ovarian cancer | Xenium Prime | 0.86 | 120 | 26.7 | 
| FF human ovary | Xenium Prime | 1.98 | 2,164 | 144 | 
| FFPE human renal cell carcinoma | Xenium v1 + protein | 0.66 | 63 | 36.4 |