Create Single Cell References and Cell Type Annotations for Custom Panel Design

The XPD tool uses single cell reference data to create a model of the expression profiles of the cell types present in your samples. It is important that custom panels are designed and used with tissues that are representative of the reference data included in the panel design.

Include more than one reference in the custom panel design to represent the diversity of samples that may be studied using a single panel. The detection budget may exceed recommendations if a panel designed for one tissue is used on another with relatively higher expression in certain genes or cell types (i.e., healthy vs. tumor tissue). This effect may be mitigated by including a tumor reference in the panel design.

XPD then uses the expression profile model to evaluate the risk of optical crowding, check for highly expressed genes, and assign codewords to the genes you selected in an optimal fashion.

There are three options to specify up to five reference datasets in the Xenium Panel Designer:

Select from a collection of publicly available reference datasets with similar tissue type and condition to the samples you plan to run from our provided list of pre-built references.
Upload your own annotated single cell gene expression data in MEX, HDF5, or CLOUPE format. See Creating Single Cell References for Xenium Custom Panel Design from Seurat or AnnData for additional guidance on reference file format conversion.
Choose a combination of public reference datasets and your own single cell reference datasets.

To help you choose and compile reference data, review the following frequently asked questions regarding formatting and composition:

1. I want to use a public annotated single cell reference but it has been preprocessed. What should I do with the cells that were filtered out before and do not have an annotation?

It is best to include all available cells in your analysis. For cells lacking annotations, you can either re-annotate them yourself or label them as "Unknown" to ensure all detected biology is represented.

2. I want to use different references from the same tissue type (e.g., my own data and a public dataset). Should I aggregate the data?

It is generally recommended that you treat each dataset as an individual reference, as differences in sequencing depth, technical parameters, and batch effects may complicate aggregation and affect cell annotation accuracy.

3. I am interested in using tissue with multinucleated cells (e.g., muscle). Should I use single nuclei or single cell references?

For tissues with multinucleated cells, it is valuable to use both single nuclei and single cell references if available. Single nuclei data will miss transcripts localized in the cytosol, which are captured by whole cell sequencing. Using both provides a more comprehensive representation of gene expression and improves accuracy for downstream analyses, including the assessment of optical crowding.

4. There is a cell population that I am struggling to annotate in my single cell reference, should I exclude it?

All cell populations present in the reference should be retained to accurately represent tissue composition. For populations that cannot be confidently annotated, you can either label them as "Unknown" or use cluster identifiers (e.g., Seurat cluster number). These clusters can still inform cellular composition and transcriptomic predictions, and retaining them maintains the integrity of the reference.

5. I have used hashed samples for my single cell RNAseq experiment. Is this data still compatible with the Xenium Panel Designer? Should I filter the data for doublets and low quality cells?

Doublets could be removed as they are not expected to occur in a normal biological sample. Low quality cells could be filtered out of the dataset if they represent a small proportion or could be labeled as "Low Quality Cells" so that their potential impact can be assessed. If there are significant concerns regarding tissue or data quality, it is preferable not to use such a reference, as lower quality data could compromise panel predictions.

6. I am interested in using my panel for diseased and healthy tissues, how should I prepare my single cell references?

Prepare separate references for each condition of interest (e.g., disease and healthy). Each condition can have a distinct transcriptome and cellular composition, so distinguishing them enables the panel designer to optimize probe allocation for each context. This is important for accurate prediction of gene expression, especially for targets specific to one condition.

7. Is there a recommended sequencing depth for the single cell references?

Higher sequencing depth improves transcriptome coverage and enables more accurate representation of cell types and gene expression, which assists both cell annotation and optical crowding prediction. However, be cautious with excessively deep (saturated) sequencing, as this can bias gene expression estimates (e.g., through duplicated reads), potentially leading to overestimation of highly expressed genes and their removal from the panel.

The single cell reference must be accompanied by cell type annotations for the barcodes. In the design process, the expression levels are aggregated across each cell type. This information is used to assign codewords that minimize optical crowding, as well as ensure that cell type clusters match the broad, expected categories. See general guidelines for panel gene selection in the Xenium Add-On Panel Design Technical Note.

Single cell data can come from Chromium Single Cell Gene Expression or Single Cell Gene Expression Flex assays. If the single cell data comes from Flex, it is important to note that this product does not include genes that are highly and ubiquitously expressed such as mitochondrial genes, ribosomal genes, and HLA class 1 genes.

We strongly discourage the inclusion of those genes on Xenium custom panels as well. They take up a large portion of the available optical budget and increase the risk of optical crowding. However, including a small number of genes not present in Flex data generally poses a low risk to assay performance. During the custom panel design process, those genes will be assigned an averaged expression level for the purposes of the utilization analysis.

The design tool needs a gene list and a measure of expected gene expression in the sample stratified by cell type.

If providing your own reference data, the Xenium Panel Designer will accept one of the formats described below.

The total size of your uploaded reference files must be less than 2G per design. The panel designer app computes a pseudobulk matrix from your reference. As long as all cell types of interest are well-represented, the end result of matrices with 50,000, 200,000, or 1 million cells will be very similar. If your references are collectively more than 2G, we recommend subsampling the abundant cell types in the file (avoid subsampling rare cell types).

One or more unnormalized whole transcriptome filtered feature-barcode matrices with cell type annotations for each matrix. The matrix and annotation files should be bundled as a .zip, .tar, or tar.gz file (one matrix + one annotation file per bundle).

See Creating Single Cell References for Xenium Custom Panel Design from Seurat or AnnData for additional guidance on reference file format conversion.

The feature-barcode matrix can be in either Cell Ranger Matrix Exchange (MEX) or HDF5 format. The MEX format is a folder containing three files (matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz). The HDF5 matrix is a single file.

The cell type annotations file can be in CSV or TSV format. It is a two-column file and headers are required. The first column must be "barcode". For example:


barcode,annotation
ATGCATTGCGTAAGTG-1,fibroblast
TTGCAAAGCCGAAGTG-1,fibroblast
CATCATTGCGTAATTG-1,T cell
...

Important

It is critical that barcode suffixes and prefixes in the annotations file exactly match those for barcodes in the matrix file.

If looking for rare cell types, providing matrix files for multiple samples may yield better results. We recommend providing a matrix file per sample; it does not need to be aggregated. If multiple matrices are provided, the cell type information across all of the matrices will be evaluated.

Important

It is very important that this matrix is not normalized or gene-filtered. Normalizing/filtering limits our ability to assess the impacts of optical crowding. If the matrix contains a subset of the total gene count data, the representation per gene will be skewed.

A single uncompressed CLOUPE file generated by Cell Ranger. The Xenium Panel Designer uses the graph-based clustering results for cell annotations, so no additional annotation file is needed for this input format.

An error message will be shown for these input file issues:

Gene IDs and/or gene symbols do not match between the matrix, gene list, and the 2020-A reference.
Gene names contain spaces/blanks in the gene name or have typos.
Files have missing column headers or headers with unexpected or misspelled names.
Matrix and annotation CSV files do not have exactly the same barcodes. This is often seen when barcodes in the annotation file have an extra sample suffix after aggregation, but the matrix itself does not.

In some cases, the initial upload of the bundled reference files may pass format checks, but later error when XPD inspects the individual files during the panel generation step. Refer to this article for troubleshooting guidance on common MEX reference and annotation file upload errors. After fixing format errors, return to the Provide Reference Datasets screen, delete the old file, and reupload the edited file.

Common input file issues that do not halt panel design but give poor results:

The design tool will not error with normalized counts data, but results will be skewed. The design tool should be used with integer counts data.
The design tool will not error with matrix files that filter many genes, but results will be skewed and consequently generate a suboptimal design.
Matrix files that are missing genes in the gene list.
Poorly matched expression data.
Annotation CSV files where the first two columns are not "barcode,annotation". If hierarchical annotations are present in additional columns, they are ignored.

You can use publicly available data if you do not have an annotated single cell RNA-seq or want to provide a combination of public reference datasets with your own single cell reference datasets.

The Xenium Panel Designer provides a variety of curated reference datasets from sources such as CELLxGENE and GEO for both human and mouse tissues and a variety of conditions:

Table last updated October 2025

Species	Tissue	References for these conditions
Human	Artery	Atherosclerosis
Human	Blood	SARS-CoV-2 infection
Human	Bone Marrow	Non-diseased
Human	Brain	Alzheimer's; Glioblastoma; Non-diseased
Human	Breast	Cancer (ER+, HER2+, or triple-negative); Non-diseased
Human	Colon	Colorectal cancer; Non-diseased
Human	Gonad	Non-diseased
Human	Heart	Non-diseased
Human	Kidney	Chronic kidney disease; Acute kidney injury; Clear cell carcinoma; Non-diseased
Human	Liver	Hepatocellular carcinoma; Primary biliary cholangitis; Primary sclerosing cholangitis; Non-diseased
Human	Lung	Non-small cell lung cancer (NSCLC); SARS-CoV-2 infection; Small cell lung carcinoma; Non-diseased
Human	Lymph Node	Non-diseased
Human	Medulla	Non-diseased
Human	Multiple	Non-diseased
Human	Ovary	Ovarian cancer
Human	Pancreas	Ductal adenocarcinoma; Non-diseased
Human	Prostate	Non-diseased
Human	Retina	Non-diseased
Human	Skin	Non-diseased
Human	Spleen	Non-diseased
Human	Stomach	Gastric cancer
Human	Thymus	Non-diseased
Mouse	Bone Marrow	Non-diseased
Mouse	Brain	Non-diseased
Mouse	Heart	Non-diseased
Mouse	Hypothalamus	Non-diseased
Mouse	Kidney	Non-diseased
Mouse	Lung	Non-diseased
Mouse	Multiple	Non-diseased
Mouse	Pancreas	Non-diseased
Mouse	Prostate	Non-diseased
Mouse	Retina	Non-diseased
Mouse	Thymus	Non-diseased

Create Single Cell References and Cell Type Annotations for Custom Panel Design

How is reference data used?

Input options

Reference data FAQ

How are cell type annotations used?

Reference data formats

MEX or HDF5 format

CLOUPE format

Common reference issues

Pre-built references