10x Genomics Support/Cell Ranger/Analysis/

Cell Ranger vdj Annotations

The structure of a typical V(D)J transcipt:


UTR: Untranslated region; FWR: Framework region; CDR: Complementarity determining region

The cellranger vdj pipeline provides amino acid and nucleotide sequences for framework and complementarity determining regions (CDRs). The V(D)J annotations on the assembled contigs and on the clonotype consensus sequences are produced in multiple formats.

Learn more about productive contigs on the Annotation Algorithm page.

  • CSV: High-level annotations with one contig, consensus, or clonotype per row.
  • JSON: Detailed annotations, including alignment coordinates and amino acid translations.
  • BED: Germline V(D)J segments as features for use with tools like IGV.
  • TSV: Used for the AIRR rearrangement format of V(D)J contigs and consensus sequences.
  • clonotypes.csv: High-level descriptions of each clonotype.
  • consensus_annotations.csv: High-level and detailed annotations of each clonotype consensus sequence.
  • filtered_contig_annotations.csv: High-level annotations of each high-confidence contigs from cell-associated barcodes. This is a subset of all_contig_annotations.csv.
  • all_contig_annotations.csv: High-level and detailed annotations of all contigs (from cell and background barcodes) in CSV format.
  • all_contig_annotations.bed: High-level and detailed annotations of all contigs (from cell and background barcodes) in BED format.
  • all_contig_annotations.json: High-level and detailed annotations of all contigs (from cell and background barcodes) in JSON format.
  • airr_rearrangement.tsv: Annotated contigs and consensus sequences of V(D)J rearrangements in the AIRR format.

The clonotypes.csv file provides high-level descriptions of each clonotype.

ColumnDescription
clonotype_idThe ID of the clonotype to which this consensus sequence was assigned.
frequencyThe observed number of cell barcodes with this clonotype.
proportionThe observed fraction of cell barcodes with this clonotype.
cdr3s_aaA semicolon-delimited list of chain:sequence pairs, where chain is TRA, TRB, TRG, TRD, IGK, IGL, or IGH and sequence is the CDR3 amino acid sequence for that chain.
cdr3s_ntA semicolon-delimited list of chain:sequence pairs, where chain is TRA, TRB, TRG, TRD, IGK, IGL, or IGH and sequence is the CDR3 nucleotide sequence for that chain.
inkt_evidenceFor T cells, this column indicates whether the clonotype is a group of iNKT cells. The evidence is semicolon-delimited list of chain:matches, where chain is one of TRA or TRB and matches is one of genes, junction or genes+junction. See iNKT/MAIT for more information.
mait_evidenceFor T cells, this column indicates whether the clonotype is a group of MAIT cells. The evidence is semicolon-delimited list of chain:matches, where chain is one of TRA or TRB and matches is one of genes, junction or genes+junction. See iNKT/MAIT for more information.

The consensus_annotations.csv file provides high-level and detailed annotations of each clonotype consensus sequence.

ColumnDescription
clonotype_idThe ID of the clonotype to which this consensus sequence was assigned.
consensus_idThe ID of this consensus sequence.
v_start0-based index of the V region start position on the consensus sequence.
v_end0-based index of the V region end position on the consensus sequence.
v_end_ref0-based index of the V gene end position on the reference.
j_start0-based index of the J region start position on the consensus sequence.
j_start_ref0-based index of the J gene start position on the reference.
j_end0-based index of the J region end position on the consensus sequence.
cdr3_start0-based index of the CDR3 region start position on the consensus sequence.
cdr3_end0-based index of the CDR3 region end position on the consensus sequence.

There are two contig annotation CSV files:

  • all_contig_annotations.csv contains annotations of all contigs (from cell and background barcodes)
  • filtered_contig_annotations.csv contains annotations of each high-confidence cell barcode.

Both files have these columns:

ColumnDescription
barcodeCell barcode for this contig.
is_cellTrue or False value indicating whether the barcode was called as a cell.
contig_idUnique identifier for this contig.
high_confidenceTrue or False value indicating whether the contig was called as high-confidence (unlikely to be a chimeric sequence or other artifact).
lengthThe contig sequence length in nucleotides.
chainThe chain associated with this contig: TRA, TRB, IGK, IGL, or IGH.
v_geneThe highest-scoring V segment, e.g., TRAV1-1.
d_geneThe highest-scoring D segment, e.g., TRBD1.
j_geneThe highest-scoring J segment, e.g., TRAJ1-1.
full_lengthTrue or False value indicating if the contig was declared as full-length.
productiveTrue or False value indicating if the contig was declared as productive.
fwr1The predicted FWR1 amino acid sequence.
fwr1_ntThe predicted FWR1 nucleotide sequence.
cdr1The predicted CDR1 amino acid sequence.
cdr1_ntThe predicted CDR1 nucleotide sequence.
fwr2The predicted FWR2 amino acid sequence.
fwr2_ntThe predicted FWR2 nucleotide sequence.
cdr2The predicted CDR2 amino acid sequence.
cdr2_ntThe predicted CDR2 nucleotide sequence.
fwr3The predicted FWR3 amino acid sequence.
fwr3_ntThe predicted FWR3 nucleotide sequence.
cdr3The predicted CDR3 amino acid sequence.
cdr3_ntThe predicted CDR3 nucleotide sequence.
fwr4The predicted FWR4 amino acid sequence.
fwr4_ntThe predicted FWR4 nucleotide sequence.
readsThe number of reads aligned to this contig.
umisThe number of distinct UMIs aligned to this contig.
raw_clonotype_idThe ID of the clonotype to which this cell barcode was assigned.
raw_consensus_idThe ID of the consensus sequence to which this contig was assigned.
exact_subclonotype_idThe ID of the exact subclontype to which this cell barcode was assigned.

Details on how the Cell Ranger algorithm delimits CDRs (Complementarity Determining Regions) and FWRs (Frame Work Regions) are provided on the enclone features page.

The all_contig_annotations.bed file provides high-level and detailed annotations of all contigs (from cell and background barcodes) in BED format. The columns are not named but correspond to:

  • Contig name
  • Nucleotide position at which the contig annotation starts
  • Nucleotide position at which the contig annotation ends
  • Annotation

The all_contig_annotations.bed provides information about the structure of each assembled contig and allows further investigation into why some contigs were filtered out. An example all_contig_annotations.bed is shown here:

AAACCTGAGACAGGCT-1_contig_1 0 36 IGKV3-11_5'UTR AAACCTGAGACAGGCT-1_contig_1 36 381 IGKV3-11_L-REGION+V-REGION AAACCTGAGACAGGCT-1_contig_1 376 415 IGKJ2_J-REGION AAACCTGAGACAGGCT-1_contig_1 415 551 IGKC_C-REGION

The all_contig_annotations.json file provides high-level and detailed annotations of all contigs (from cell and background barcodes) in JSON format. This file can be used to learn more about each assembled contig, and investigate why some contigs were filtered out. The all_contig_annotations.json file is the input required to run enclone.

FieldDescription
barcodeBarcode sequence
contig_nameName of the contig
sequenceNucleotide sequence of the contig
qualsContig quality score
fraction_of_reads_for_this_barcode_provided_as_input_to_assemblyFraction of reads for this barcode that were provided as input to the assembly algorith
read_countNumber of reads assigned to this contig
umi_countNumber of UMIs assigned to this contig
start_codon_posStarting nucleotide base position of the start codon on the contig
stop_codon_posLast nucleotide base position of stop codon on the contig
aa_sequenceAmino acid sequence of the contig
frameUnused field. Ignored by the algorithm.
cdr3Amino acid sequence of the contig's CDR3
cdr3_seqNucleotide sequence of the contig's CDR3
cdr3_startStarting base of the contig's CDR3
cdr3_stopLast base of the contig's CDR3
fwr1-fwr4Optional; Start and stop positions of the contig's FWR1-FWR4 regions
cdr1-cdr2Optional; Start and stop positions of the contig's CDR1-CDR2 regions
annotationsThe annotations for the contig from the reference file
clonotypeNull; filled in after clonotyping
high_confidenceTRUE or FALSE statement of whether the contig has high confidence
validated_umisA list of UMIs that have been validated
non_validated_umisA list of UMIs that have not been validated
invalidated_umisA list of invalidated UMIs
is_cellTRUE or FALSE statement about whether the barcode was declared a cell
productiveTRUE or FALSE statement about whether the contig was productive based on five criteria. NULL=not full length.
filteredAlways TRUE
is_gex_cellTRUE or FALSE statement about whether the barcode was declared a cell by Gene expression data. Null=Data not available
is_asm_cellTRUE or FALSE statement about whether the barcode was declared a cell by the VDJ assembler. Null=Data not available
full_lengthTRUE or FALSE statement about whether the contig is full length.
junction_supportA map of {Reads: x, UMIs: y} supporting the junction region of a contig. This information is generated by the cellranger vdj assembler for productive contigs in reference-assisted assembly (or valid contigs in de novo assembly) and used for confidence determination and cell filtering.

The airr_rearrangement.tsv file provides the annotated contigs and consensus sequences of V(D)J rearrangements in the AIRR format.

ColumnDescription
cell_idCell barcode defining the cell for the query sequence.
clone_idClonotype ID/clonotype assignment.
sequence_idThe name of the contig associated with the rearrangement.
sequenceThe nucleotide sequence of the rearrangement.
sequence_aaThe amino acid sequence of the rearrangement.
productiveWhether or not the rearrangement is productive.
rev_compSet to false by default (10x Genomics V(D)J sequences are not reverse complemented).
v_callThe name of the aligned V gene for the rearrangement.
v_cigarThe CIGAR string of the V gene alignment.
d_callThe name of the aligned D gene for the rearrangement.
d_cigarThe CIGAR string of the D gene alignment.
j_callThe name of the aligned J gene for the rearrangement.
j_cigarThe CIGAR string of the J gene alignment.
c_callThe name of the aligned C gene for the rearrangement.
c_cigarThe CIGAR string of the C gene alignment.
sequence_alignmentThe aligned sequence of the VDJ rearrangement.
germline_alignmentThe assembled, aligned, full-length inferred germline sequence of the aligned sequence.
junctionThe nucleotide sequence of the rearrangement's junction (CDR3).
junction_aaThe amino acid sequence of the rearrangement's junction (CDR3).
junction_lengthThe length of the rearrangement's junction nucleotide sequence.
junction_aa_lengthThe length of the rearrangement's junction amino acid sequence.
v_sequence_start1-based index on the contig of the V region start position.
v_sequence_end1-based index on the contig of the V region end position.
d_sequence_start1-based index on the contig of the D region start position.
d_sequence_end1-based index on the contig of the D region end position.
j_sequence_start1-based index on the contig of the J region start position.
j_sequence_end1-based index on the contig of the J region end position.
c_sequence_start1-based index on the contig of the C region start position.
c_sequence_end1-based index on the contig of the C region end position.
consensus_countThe number of reads associated with this rearrangement.
duplicate_countThe number of unique molecular identifiers associated with this rearrangement.
is_cellIs this rearrangement cell-associated?

The AIRR rearrangement file includes all mandatory AIRR fields and several optional variables to enhance reproducibility and guide analyses.