Space Ranger Feature-Barcode Matrices

Space Ranger outputs unfiltered (raw_feature_bc_matrix) and filtered feature-barcode (filtered_feature_bc_matrix) matrices in two file formats: the Market Exchange Format (MEX, described on this page) and Hierarchical Data Format (HDF5).

Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column).

Type	Description
Unfiltered feature-barcode matrix	Contains every barcode from fixed list of known-good barcode sequences that has at least one read. This includes background and tissue-associated barcodes.
Filtered feature-barcode matrix	Contains only tissue-associated barcodes. For Visium probe-based assays, genes not in the filtered probe set are removed from the filtered matrix by default.
Raw probe-barcode matrix	Contains columns that indicate the probes in the filtered probe reference, the probes that passed gDNA filtering, and the probe barcodes that are in spots. It is similar to the feature-barcode matrix, but is organized at the probe level rather than the gene level.

When Space Ranger is run on Visium HD data, feature-barcode matrices are provided at three levels by default: 2 µm square size (native resolution, with a square corresponding to a single barcode), 8 µm bin size (16 barcodes per bin), and 16 µm bin size (64 barcodes per bin). In the context of binned outputs, any reference to "barcode" should be read as "bin".

Each matrix is stored in the Market Exchange Format (MEX) for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices, respectively. For example, the matrices output may look like:


$ cd /home/jdoe/runs/sample345/outs
$ tree filtered_feature_bc_matrix

filtered_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz

0 directories, 3 files

Features correspond to row indices. For each matrix, the components of features.tsv.gz are:

Column 1: feature ID
Column 2: feature name
Column 3: type of feature i.e. Gene Expression or Antibody Capture

Below is a minimal example features.tsv.gz file showing data collected for three genes and antibodies.


$ gzip -cd filtered_feature_bc_matrix/features.tsv.gz

ENSG00000187634	SAMD11	Gene Expression
ENSG00000188976	NOC2L	Gene Expression
ENSG00000187961	KLHL17	Gene Expression

For Gene Expression (GEX) data, the ID corresponds to gene_id in the annotation field of the reference GTF. Correspondingly, the name corresponds to gene_name in the annotation field of the reference GTF. If no gene_name field is present in the reference GTF, gene name is equivalent to gene ID. Similarly, for Protein Expression (PEX) data, the feature ID and name are taken from the first two columns of the Feature Reference CSV file.

For multi-species experiments, gene IDs and names are prefixed with the genome name to avoid name collisions between genes of different species, e.g., GAPDH becomes hg19_GAPDH and Gm15816 becomes mm10_Gm15816.

Barcode sequences correspond to column indices.


$ gzip -cd  filtered_feature_bc_matrices/barcodes.tsv  | head -10

AACACTTGGCAAGGAA-1
AACAGGATTCATAGTT-1
AACAGGTTATTGCACC-1
AACAGGTTCACCGAAG-1
AACAGTCAGGCTCCGC-1
AACAGTCCACGCGGTG-1
AACATAGTCTATCTAC-1
AACATCTTAAGGCTCA-1
AACCAATCTGGTTGGC-1
AACCACTGCCATAGCC-1

Each barcode sequence includes a suffix with a dash separator followed by a number:

AACACTTGGCAAGGAA-1

More details on the barcode sequence format are available in the barcoded BAM section.

R and Python support the MEX format and sparse matrices can be used for more efficient manipulation.

For suggestions on downstream analysis with 3rd party R and Python tools, see the 10x Genomics Analysis Guides resource.

The R package Matrix supports loading MEX format data, and can be easily used to load the sparse feature-barcode matrix.

Space Ranger represents the feature-barcode matrix using sparse formats (only the nonzero entries are stored) in order to minimize file size. All of our programs, and many other programs for gene expression analysis, support sparse formats.

However, certain programs (e.g. Excel) only support dense formats (where every row-column entry is explicitly stored, even if it's a zero). Here are a few methods for converting feature-barcode matrices to CSV:

Load matrices into Python

The csv, os, gzip, and scipy.io modules can be used to load a feature-barcode matrix into Python.

mat2csv

You can convert a feature-barcode matrix to dense CSV format using the spaceranger mat2csv command.

Important

WARNING: Dense files can be very large and may cause Excel to crash, or even fail in mat2csv if your computer doesn't have enough memory. For example, a feature-barcode matrix from a human reference (~33k genes) with ~3k barcodes uses at least 200MB of disk space. Our 1.3 million mouse neuron dataset, if converted to this format, would use more than 60GB of disk space. Thus, while you can use mat2csv for small datasets, we strongly recommend using R or Python (as shown in the sections above) to examine these matrix files.

This command takes two arguments - an input matrix generated by Space Ranger (either an HDF5 file or a MEX directory), and an output path for the dense CSV. For example, to convert a matrix from a pipestance named sample123 in the current directory, either of the following commands would work:


# Convert from MEX
spaceranger mat2csv sample123/outs/filtered_feature_bc_matrix sample123.csv

# Or, convert from HDF5
spaceranger mat2csv sample123/outs/filtered_feature_bc_matrix.h5 sample123.csv

You can then load sample123.csv into Excel.

Shell commands

Please see this Q&A article for shell commands to convert MEX files to CSV. This method creates a single file that is sparse (zeroes are ignored).

Overview

Converting matrix files to CSV format

Load matrices into Python

mat2csv

Shell commands