10x Genomics Support/Cell Ranger/Analysis/

Cell Ranger multi Filtered Outputs

The per_samples_outs/ directory is produced after a successful execution of the multi pipeline and contains filtered data, i.e., data from cell-associated barcodes in this sample. These are the main outputs of interest.

Contents of the following folders located within the per_samples_outs/ directory are described here. Click on the folder name below or scroll down to learn more.

  • count
  • vdj_t or vdj_t_gd
  • vdj_b
  • antigen_analysis

The count/ folder contains the results of 5' Single Cell Gene Expression analysis:

├── count ├── analysis │   ├── clustering │   ├── diffexp │   ├── pca │   ├── tsne │   └── umap ├── aggregate_barcodes.csv ├── feature_reference.csv ├── sample_cloupe.cloupe ├── sample_filtered_barcodes.csv ├── sample_filtered_feature_bc_matrix │   ├── barcodes.tsv.gz │   ├── features.tsv.gz │   └── matrix.mtx.gz ├── sample_filtered_feature_bc_matrix.h5 ├── sample_molecule_info.h5 ├── sample_alignments.bam └── sample_alignments.bam.bai
analysisFolder containing the results of graph-based clusters and K-means clustering 2-10; differential gene expression analysis between clusters; and PCA, t-SNE, and UMAP dimensionality reduction. Learn more
aggregate_barcodes.csvContents from both antibody and antigen aggregate barcode algorithms. If Antibody and Antigen Capture Libraries are included, and a specific barcode has been determined to be both an antigen and an antibody aggregate, this file contains two lines for that barcode. The first line is the antibody UMI count and the second line is the antigen UMI count associated with that aggregate barcode. The library_type column distinguishes antibody vs. antigen aggregate barcodes.
feature_reference.csvA copy of the input feature_reference.csv
sample_cloupe.cloupeA Loupe Browser readable file.
sample_filtered_barcodes.csvFile containing a list of barcodes associated with aligned reads. The barcode sequence ends in a suffix with a dash separator followed by a number. The number denotes a GEM well, and is used to virtualize barcodes in order to achieve a higher effective barcode diversity when combining samples generated from separate GEM chip channel runs. The number should be “1” across all barcodes when analyzing a sample from a single GEM well. The suffix-based preservation of GEM well information is especially useful when running cellranger aggr on multiple libraries generated from different GEM chip channels.
sample_filtered_feature_bc_matrixContains only detected cell-associated barcodes. Each element of the matrix is the number of UMIs associated with a feature (row) and a barcode (column). This file can be input into third-party packages and allows users to wrangle the barcode-feature matrix (e.g. to filter outlier cells, run dimensionality reduction, normalize gene expression). This file is similar to the filtered_feature_bc_matrix file described here
sample_filtered_feature_bc_matrix.h5Same information as sample_molecule_bc_matrix in H5 format.
sample_molecule_info.h5Contains per-molecule information for all molecules that contain a valid barcode and valid UMI and were assigned with high confidence to a gene or Feature Barcode. This file is a required input to run cellranger aggr. Learn more
sample_alignments.bamIndexed BAM file containing position-sorted reads aligned to the genome and transcriptome, as well as unaligned reads. Learn more
sample_alignments.bam.baiCompanion file to the sample_alignment.bam that serves as an external index. In cases where the reference transcriptome is generated from a genome with very long chromosomes (>512 Mbp), Cell Ranger v7.0+ generates a sample_alignments.bam.csi index file instead.
TCR with gamma-delta chains: The cellranger multi pipeline allows users to analyze TCR libraries enriched for gamma (TRG) and delta (TRD) chains. However gamma-delta analysis is not a supported workflow and algorithm performance cannot be guaranteed. TRG/D outputs are located in the outs/multi/vdj_t_gd folder. Output files in the vdj_t_gd folder are similar to those of vdj_t/b.

The vdj_t/ and vdj_b/ folders contain the results of V(D)J immune profiling analysis for T cells and B cells, respectively. The output file names and file structure in these folders are identical, and are only described once:

|── vdj_b/t ├── airr_rearrangement.tsv ├── cell_barcodes.json ├── clonotypes.csv ├── concat_ref.bam ├── concat_ref.bam.bai ├── concat_ref.fasta ├── concat_ref.fasta.fai ├── consensus_annotations.csv ├── consensus.bam ├── consensus.bam.bai ├── consensus.fasta ├── consensus.fasta.fai ├── donor_regions.fa ├── filtered_contig_annotations.csv ├── filtered_contig.fasta ├── filtered_contig.fastq ├── vdj_contig_info.pb └── vloupe.vloupe
airr_rearrangement.tsvAnnotated contigs and consensus sequences of V(D)J rearrangements in the AIRR format. Learn more
cell_barcodes.jsonList of barcodes identified as T/B cells.
clonotypes.csvHigh-level descriptions of each clonotype. Learn more
concat_ref.bamFor each clonotype consensus, each reference sequence is the annotated germline segments concatenated together. This file shows how both the per-cell contigs and the clonotype consensus contig relate to the germline reference. concat_ref.bam is expected to reveal polymorphisms, somatic mutations, and recombination-induced differences such as non-templated nucleotide additions.
concat_ref.bam.baiCompanion file to the concat_ref.bam that serves as an external index.
concat_ref.fastaConcatenated V(D)J reference segments for the segments detected on each consensus sequence. These serve as an approximate reference for each consensus sequence.
concat_ref.fasta.faiCompanion file to the concat_ref.fasta that serves as an external index.
consensus_annotations.csvHigh-level and detailed annotations of each clonotype consensus sequence.
consensus.bamEach reference sequence is a clonotype consensus sequence, and each record is an alignment of a single cell's contig against this consensus. For a clonotype consensus sequence, this file shows how the constituent per-cell assemblies support the consensus.
consensus.bam.baiCompanion file to the consensus.bam that serves as an external index.
consensus.fastaThe clonotype consensus sequences is the consensus sequence of each assembled contig. It is identical to the sequence of the top (most frequent) exact subclonotype. The consensus sequence should be full-length (starting in the 5' UTR and ending at the C gene primer binding site). Poor data quality may result in partial sequence.
consensus.fasta.faiCompanion file to the consensus.fasta that serves as an external index.
filtered_contig_annotations.csvHigh-level annotations of each high-confidence, cellular contig. This is a subset of all_contig_annotations.csv. Learn more
filtered_contig.fastaHigh-confidence contig sequences in cell barcodes in FASTA format.
filtered_contig.fastqHigh-confidence contig sequences in cell barcodes in FASTQ format.
vdj_contig_info.pbThis file stores the contig annotations, V(D)J reference and additional metadata in a protobuf binary file format. This file is required to run the cellranger aggr pipeline. Learn more
vloupe.vloupeLoupe V(D)J Browser readable file.

Folder containing the results of Antigen Capture analysis. Only present if an Antigen Capture library is included in the analysis. The two files in this folder are antigen_specificity_scores.csv (if the [antigen-specificity] section was provided in the multi config CSV) and per_barcode.csv.

The primary outputs of the antigen specificity algorithm are located in the antigen_specificity_scores.csv. The barcode column shows all cell-associated barcodes, the antigen and antigen_umi columns show on-target antigen IDs and per barcode on-target antigen UMI counts, and the control and control_umi columns show the negative control antigen IDs and negative control antigen UMI counts. The antigen specificity score is calculated per barcode (described in the Antigen Algorithm page and reported in the antigen_specificity_scores column. For a TCR Antigen Capture (BEAM-T) library, MHC allele ID is shown in the mhc_allele column. If a given barcode is associated with a clonotype, the clonotype and exact sub-clonotype IDs are reported in the raw_clonotype_id and exact_subclonotype_id columns, respectively.

barcode,antigen,antigen_umi,control,control_umi,antigen_specificity_score,mhc_allele,raw_clonotype_id,exact_subclonotype_id AAACGGGAGCCCGAAA-1,BEAM01,0,BEAM12,2,0.0,HLA-A*02:01,clonotype1,1 AAACGGGAGCCCGAAA-1,BEAM02,0,BEAM12,2,0.0,HLA-A*02:01,clonotype1,1 AAACGGGAGCCCGAAA-1,BEAM03,0,BEAM12,2,0.0,HLA-A*02:01,clonotype1,1 AAACGGGAGCCCGAAA-1,BEAM04,0,BEAM12,2,0.0,HLA-A*02:01,clonotype1,1

The per_barcode.csv is a barcode lookup table to find barcodes that are called as Gene Expression or V(D)J cells. The is_gex_cell column identifies barcodes called as cells based on the Gene Expression library, the is_vdj_cell column identifies barcodes called as cells based on the V(D)J library, the raw_clonotype_id column shows the clonotype ID assigned to that barcode (if one exists), and the exact_subclonotype_id column shows the exact subclonotype ID assigned to that barcode (if one exists).

barcode,is_gex_cell,is_vdj_cell,raw_clonotype_id,exact_subclonotype_id AAACCTGAGGTAGCCA-1,true,true,clonotype1,1 AAACCTGAGTGCTGCC-1,true,true,clonotype1,1 AAACCTGCAATCCGAT-1,true,true,clonotype1,1 AAACCTGGTACCGGCT-1,true,true,clonotype1,1 AAACCTGTCAGTTAGC-1,true,true,clonotype1,1 AAACCTGTCCGAAGAG-1,true,false,, AAACGGGAGCTCAACT-1,true,true,clonotype1,1 AAACGGGCAGTAACGG-1,true,true,clonotype1,1 AAACGGGTCATAACCG-1,true,true,clonotype1,1 AAAGATGCAAGCGAGT-1,true,true,clonotype1,1 AAAGATGCACGGACAA-1,true,false,,