Choose a product below to filter the page content to your needs:
Cell Ranger offers downloadable prebuilt human and mouse reference packages, designed for use with its pipelines. These 10x Genomics reference packages are based on the T cell receptor (TRA, TRB) and B cell immunoglobulin (IGH, IGL, IGK) gene annotations in Ensembl version 94 for the human and mouse references. They also include multiple corrections to various V, D, J, and C genes based on empirical observations and to correct clear errors such as frameshifts, leader peptide truncations, and nucleotides that are never observed in rearrangements. These changes are documented in the release notes of each version of Cell Ranger. See Prebuilt References for details on how these references were created.
If you would like to use your own genome FASTA or gene GTF annotations, Cell Ranger supports the use of customer-generated Ensembl-based references. Cell Ranger also includes support for generating a V(D)J reference from the IMGT database.
There are two ways to generate a V(D)J reference:
- Making a Genome-based Reference Package (e.g., using Ensembl) * Making a V(D)J Segment-based Reference Package (e.g., using IMGT)
The cellranger mkvdjref tool can be used to generate a custom reference
package from a genome sequence FASTA file and a gene annotation GTF.
$ cellranger mkvdjref --genome=my_vdj_ref \
                     --fasta=GRCh38_ensembl.fasta \
                     --genes=GRCh38_ensembl.gtf
A Cell Ranger V(D)J reference consists of germline gene segment sequences. It assumes that these sequences are contained within a genome reference FASTA, and that an Ensembl-formatted gene annotation GTF points to the relevant gene segments.
The cellranger mkvdjref tool expects a FASTA file (supplied by the --fasta argument) containing genomic reference sequences whose names are consistent with the names used in the GTF file.
The cellranger vdj pipeline expects a GTF file (supplied by
the --genes argument) in an Ensembl-like format that contains
information about V(D)J gene segments.
GTF columns
| Column | Name | Description | 
|---|---|---|
| 1 | Chromosome | Must refer to a chromosome/contig in the genome fasta. | 
| 2 | Source | Unused. | 
| 3 | Feature | Cell Ranger vdjonly uses rows where this line is equal to one ofCDSorfive_prime_utr. | 
| 4 | Start | Start position on the reference (1-based inclusive). | 
| 5 | End | End position on the reference (1-based inclusive). | 
| 6 | Score | Unused. | 
| 7 | Strand | Strandedness of this feature on the reference: +or-. | 
| 8 | Frame | Unused. | 
| 9 | Attributes | A semicolon-delimited list of key-value pairs of the form key "value". The attribute keys used by Cell Ranger V(D)J are detailed below. | 
GTF attributes
| GTF Attribute | Description | 
|---|---|
| transcript_id | Becomes the record_idin the Cell Ranger V(D)J reference entry format. | 
| transcript_biotype | The value is used to infer the V(D)J segment type. Either transcript_biotypeorgene_biotypemust be a value in the "Accepted Biotypes" list below. Iftranscript_biotypeis not on the accepted list, thengene_biotypeis used. | 
| gene_biotype | See transcript_biotype. | 
| gene_name | Must be specified. Becomes the gene_namein the Cell Ranger V(D)J reference entry format. | 
Accepted biotypes
- TR_C_gene
- TR_D_gene
- TR_J_gene
- TR_V_gene
- IG_C_gene
- IG_D_gene
- IG_J_gene
- IG_V_gene
Example minimal GTF row used by cellranger vdj
14      havana  CDS     21621904        21621946        .       +       0       transcript_id "ENST00000542354"; gene_name "TRAV1-1"; transcript_biotype "TR_V_gene";
A successful cellranger mkvdjref run creates a directory whose name is
specified by the --genome argument.
$ tree my_vdj_ref
my_vdj_ref
       ├── fasta
       │   └── regions.fa
       └── reference.json
The regions.fa is the V(D)J segment FASTA file. The header (first) line
contains V(D)J-specific metadata. The feature_id and display_name are
feature-specific identifier fields (see table). All other fields describe
the V(D)J feature.
>feature_id|display_name record_id|gene_name|region_type|chain_type|chain|isotype|allele_name
SEQUENCE
| Field | Description | 
|---|---|
| feature_id | Unique integer ID for this feature. | 
| display_name | Name used when displaying the segment in, e.g., Loupe V(D)J Browser. | 
| record_id | Describes the accession ID of the source molecule. Unused. | 
| gene_name | The name of the V(D)J gene, e.g. TRBV2-1. | 
| region_type | Supported values are 5'UTR,L-REGION+V-REGION,D-REGION,J-REGION, andC-REGION. Variations are not accepted. | 
| chain_type | Specifies whether this is a T or B cell receptor chain. Supported values are TR and IG. | 
| chain | The nucleotide sequence of the V(D)J gene specified under gene_name(e.g., ifgene_nameisTRBV2-1,chainshows TRBV2-1 sequence). | 
| isotype | Specifies the class of heavy chain constant region; set to Noneif not applicable. | 
| allele_name | The identifier for the allele, e.g. 01 for TRBV2-1*01, or Noneif no allele is to be specified. | 
Example V(D)J segment FASTA files
>1|TRAV1*01 AF259072|TRAV1|L-REGION+V-REGION|TR|TRA|None|01
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>979|IGHA*01 J00475|IGHA|C-REGION|IG|IGH|A|01
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
You can directly generate a FASTA file in the segment format and create a V(D)J reference by passing this file to cellranger mkvdjref with the --seqs argument, for example:
cellranger mkvdjref --genome=my_vdj_ref \
                     --seq=custom_segments.fasta
Cell Ranger comes with the fetch-imgt script that downloads the relevant
sequences from IMGT and generates a V(D)J segment FASTA file. The V(D)J
segment FASTA file can subsequently be used to construct a V(D)J reference package.
The instructions for creating IMGT references have been tested only for human and mouse species. If you are working with other species, it is advisable to validate your IMGT references. To get started, see the Knowledge Base article titled Building a custom reference for V(D)J using IMGT tool.
This example generates a mouse V(D)J reference based on IMGT.
# source the environment of Cell Ranger 7.1.0 for your shell (bash/csh)
# (for bash shell)
source path/to/cellranger-7.1.0/sourceme.bash
# OR (for C shell)
source path/to/cellranger-7.1.0/sourceme.csh
echo "version checks"
python --version
# Using a script that comes with Cell Ranger, get data from IMGT and create a FASTA suitable for use by mkvdjref
# The option --species is the name of the species for which the data is to be downloaded.
# The option --genome provides the prefix used to name the 2 output files. Only the file with the suffix -mkvdjref-input.fasta is used as input to the mkvdjref utility.
path/to/cellranger-7.1.0/lib/bin/fetch-imgt --genome vdj_IMGT_mouse --species "Mus musculus"
# Build the CR reference. could also include Cell Ranger on your PATH to avoid specifying the full path for cellranger.
# The option --genome is a single identifier with no special symbols aside from hyphen or underscore. The reference will be placed in a directory created with that name.
# The option --seqs is the mkvdjref-input.fasta file generated by the fetch-imgt command.
path/to/cellranger-7.1.0/cellranger mkvdjref --genome=vdj_IMGT_mouse --seqs=vdj_IMGT_mouse-mkvdjref-input.fasta