Cell Ranger outputs certain files that are specific to the CRISPR Guide Capture analysis, besides the Gene Expression outputs. We briefly describe each file in the crispr_analysis/
sub-folder.
A JSON object of (key, value) pairs, where each key is a protospacer name
and the corresponding value is a list of all cell barcodes in which that protospacer was detected.
{
"GFP": [
"AAACCCAAGCCGTTAT-1",
"AAACCCATCTCAAAGC-1",
"AAACGAACATTGCTTT-1",
"AAACGCTAGCCGATCC-1",
"AAAGAACAGAACAGGA-1",
...
],
"Control_1": [
"AAACCCAAGCCGTTAT-1",
"AAACGAACACCCTGAG-1",
"AAACGAACAGTTGTCA-1",
"AAACGAAGTATTCCTT-1",
"AAACGAAGTCTGTCAA-1",
...
A CSV file containing four columns: cell barcode, number of features (protospacers) detected in that cell, the names of detected feature(s), and the number of molecules (UMIs) for each feature in that cell. In the event that there are multiple features for a cell_barcode
, the corresponding multiple entries for feature_call
and num_umis
will be separated by a | character.
head -3 protospacer_calls_per_cell.csv
cell_barcode,num_features,feature_call,num_umis
TGTCCTGCACCAGTTA-1,1,GFP,1085
CTCCGATAGACTCATC-1,1,GFP,1466
A CSV file containing five columns: a category (combination of protospacers or a particular property, like that of having no guide molecules detected), the number of cell barcodes belonging to that category, the percentage of cell barcodes in that category, the median UMI counts aggregated over protospacer(s) in the category (if any), and the standard deviation of UMI counts aggregated over protospacer(s) in the category (if any). In particular, we include the following 2 categories: "No guide molecules", which refers to cells where no guide molecules were detected, and "No confident call" which refers to cells where either no guide molecules were detected or the number of detected guide molecules were not judged by the protospacer-calling algorithm to be significantly different from the background.
head protospacer_calls_summary.csv
protospacer_call,num_cells,pct_cells,median_umis,stddev_umis
No guide molecules,0,0.0,None,None
No confident call,538,11.2,None,None
1 protospacer expressed,4214,87.72,None,None
More than 1 protospacer expressed,52,1.08,None,None
GFP,2059.0,42.86,1345.0,1480.621745859537
GFP - Control_1,52.0,1.08,2015.5,2072.675627903681
Control_1,2155.0,44.86,1001.0,1815.816097039266
A CSV file containing two columns: the protospacers used in the experiment, and the minimum number of molecules (umi_count
) a cell needs to possess in order to be classified as expressing that protospacer.
head -3 protospacer_umi_thresholds.csv
protospacer, umi_threshold
GFP,120
Control_1,79
The same as the file above, only in JSON format.
The output files described below are only produced if the Feature Reference file includes at least one Non-Targeting guide RNA. These output files rely on differential expression analysis, which requires the presence of control cells (those that receive only Non-Targeting guide RNAs). For more information on Non-Targeting guide RNAs, please see CRISPR Algorithms page.
This CSV file summarizes the effectiveness of the guide RNAs in perturbing their targets. The cells are grouped "by feature" i.e. combinations of guides present. The following columns are present:
Column | Brief Description |
---|---|
Perturbation | Combinations of guide RNAs present. |
Target Guide | The guide RNA whose target gene is being considered for the perturbation efficiency calculation. |
Log2 Fold Change | The log2 fold change in expression of the target gene, relative to control cells. |
p Value | p-value for the hypothesis test. |
Log2 Fold Change Lower Bound | Lower confidence bound (5th percentile) for Log2 Fold Change. |
Log2 Fold Change Upper Bound | Upper confidence bound (95th percentile) for Log2 Fold Change. |
Cells with Perturbation | Number of cells with this perturbation i.e. this combination of guide RNAs. |
Number of cells with this perturbation i.e. this combination of guide RNAs. | Mean UMI count for the target gene in question, among cells with this combination of guide RNAs. |
Cells with Non-Targeting Guides | Number of control cells i.e cells with only non-targeting guide RNAs. |
Mean UMI Count Among Cells with Non-Targeting Guides | Mean UMI count for the target gene in question, among control cells. |
An example is provided below:
head -3 perturbation_efficiencies_by_feature.csv
Perturbation, Target Guide, Log2 Fold Change, p Value, Log2 Fold Change Lower Bound, Log2 Fold Change Upper Bound, Cells with Perturbation, Mean UMI Count Among Cells with Perturbation, Cells with Non-Targeting Guides, Mean UMI Count Among Cells with Non-Targeting Guides
Guide1|Guide2 Guide1 -2.487458049, 0.684604341, -3.8938847, -1.591207405, 37, 0.189189189, 3760, 1.199734043
Guide1|Guide2 Guide2 1.46887148, 0.273153378, -0.206506878, 2.354657695, 37, 0.054054054, 3760, 0.028723404
This CSV file is very similar to perturbation_efficiencies_by_feature.csv
, with the only difference being that cells are grouped "by target" i.e. combinations of genes that their guide RNAs are targeting.
head -4 perturbation_efficiencies_by_target.csv
Perturbation, Target Gene, Log2 Fold Change, p Value, Log2 Fold Change Lower Bound, Log2 Fold Change Upper Bound, Cells with Perturbation, Mean UMI Count Among Cells with Perturbation, Cells with Non-Targeting Guides, Mean UMI Count Among Cells with Non-Targeting Guides
Gene1, Gene1, -3.742386941, 5.98E-07, -4.869624021, -2.7417221, 942, 0.001061571, 3760, 0.028723404
Gene2|Gene1, Gene2, -3.006368897, 0.666629078, -3.080612608, -2.920975978, 34, 0, 3760, 0.23643617
Gene2|Gene1, Gene1, 0.023108305, 1, -0.197392057, 0.283497989, 34, 0, 3760, 0.028723404
This CSV file lists the top 10 genes from the reference transcriptome that showed the greatest increase or decrease in expression as a result of each perturbation detected in the experiment. For each perturbation, the results are presented in 4 columns; the results for multiple perturbations are juxtaposed together.
These columns are: the (human-readable) name of the gene, the Gene ID, the log2 fold change in expression, and the adjusted p-value (corrected for multiple comparisons).
This CSV file lists the effects of each perturbation on each gene in the reference transcriptome. Each row is specific to a gene: the first column specifies the Gene ID and the second column specifies the human-readable gene name. The subsequent columns summarize how the expression of each gene changes under each perturbation (combinations of guide RNAs). The results from each perturbation are summarized by 3 columns: mean UMI counts for that gene under that perturbation, log2 fold change relative to the expression in control cells, and adjusted p-value for this comparison. The results from different perturbations are juxtaposed with each other.
This CSV file is very similar to perturbation_effects_by_feature/top_perturbed_genes.csv
, with the only difference being that cells are grouped "by target" i.e. combinations of genes that their guide RNAs are targeting.
This CSV file is very similar to perturbation_effects_by_feature/transcriptome_analysis.csv
, with the only difference being that cells are grouped "by target" i.e. combinations of genes that their guide RNAs are targeting.