10x Genomics Support/Cell Ranger/Analysis/

Cell Ranger vdj FASTA/FASTQ Outputs

The cellranger vdj pipeline outputs several indexed FASTA and FASTQ files. Refer to the V(D)J outputs Overview page for a list of all output files generated.

  • FASTA files serve as inputs to downstream tools such as the Integrated Genome Viewer (IGV) or V(D)J annotation tools like IGBLAST.
  • FASTQ files are used to inspect assembly base quality scores.
FileRecordsDescription
all_contig.fastaAssembled contigsFASTA format sequence for each assembled contig in the V(D)J library
all_contig.fasta.faiIndexCompanion file to the all_contig.fasta.fai that serves as an external index.
filtered_contig.fastaAssembled contigsContig sequences from barcodes that passed the algorithm's filtering steps (described on the Assembly Algorithm page). These contigs are annotated in the filtered_contig_annotations.csv. Since additional filtration steps are performed by enclone during the clonotype grouping step, some contigs present in the filtered_contig.fasta may be absent in the final filtered_contig_annotations.csv.
consensus.fastaClonotype consensus sequencesThe consensus sequence of each assembled contig. It is identical to the sequence of the top (most frequent) exact subclonotype. The consensus sequence should be full-length (starting in the 5' UTR and ending at the C gene primer binding site). Poor data quality may result in partial sequence.
consensus.fasta.faiIndexCompanion file to the consensus.fasta that serves as an external index.
concat_ref.fastaConcatenated reference segmentsConcatenated V(D)J reference segments for the segments detected on each consensus sequence. These serve as an approximate reference for each consensus sequence.
concat_ref.fasta.faiIndexCompanion file to the concat_ref.fasta that serves as an external index.
donor_regions.faInferred germline genesList of records that correspond to a unique, donor-specific V gene that differs from the V gene found in the V(D)J reference. Learn More

Cell Ranger v5.0+ infers the germline V genes used to rearrange T cell and B cell receptors. See Clonotype Grouping for more information. All cells with a given V gene (including cells in unrelated clonotypes) are inspected for shared mutations relative to the V(D)J reference; mutations shared across all cells are likely to be somatic mutations present in the germline V gene of the donor. In Cell Ranger v5.0+, these inferred V gene germline sequences are exported as pipeline outs (donor_regions.fa).

Each donor_regions.fa file contains a list of unique records. Each record corresponds to a unique, donor-specific V gene that differs from the V gene found in the V(D)J reference. The nucleotide sequence exported in the record spans the translated RNA sequence through the beginning of CDR3 (i.e., leader peptide to CDR3) and does not include the 5’ UTR.

Consider the following example header:

>454:d1:1:TRAV1-2 (reference record id : donor name : allele number : gene name)

There are four elements in the donor_regions.fa header:

  1. The first element of the header (454) corresponds to the entry of the closest V gene in the regions.fa V(D)J reference.
  2. The second element (d1) is the donor name provided to or inferred by the pipeline.
  3. The third element (1) refers to the allele of the closest V gene in the V(D)J reference.
  4. The fourth element of the header (TRAV1-2) is the name of the closest V gene in the V(D)J reference.

D, J, and C germline genes are not inferred in Cell Ranger 5.0+.

The cellranger vdj pipeline produces quality scores for assembled bases. In the filtered_contig.fastq and all_contig.fastq files, the quality score corresponds to the probability of the base not being a sequencing, PCR, or reverse-transcription (RT) error. It is computed using the per-read sequencing Q-scores and an assumed RT error rate.

The cellranger vdj pipeline's quality score differs from the quality scores in a typical FASTQ file. In a typical FASTQ file, quality score indicates the Phred-encoded probability that the base is correct.