Cell Ranger requires FASTQ files as input, which typically come from running cell ranger mkfastq or one of Illumina's demultiplexing software, bcl2fastq or BCL Convert. However, it is possible to use FASTQ files from other sources, such as a published dataset, or the 10x Genomics bamtofastq tool.
For experiments where only gene expression data are present, here are the arguments available for specifying which FASTQ files cellranger count
or cellranger vdj
should use:
Argument | Brief Description |
---|---|
--fastqs | Required. The folder containing the FASTQ files to be analyzed. Generally, this will be the fastq_path folder generated by cellranger mkfastq . If the files are in multiple folders, for instance, because one library was sequenced across multiple flow cells, supply a comma-separated list of paths. |
--sample | Optional. Sample name to analyze. This will be as specified in the sample sheet supplied to bcl-convert , bcl2fastq or mkfastq . Multiple names may be supplied as a comma-separated list, in which case they will be treated as one sample. |
--libraries | Required for Feature Barcode analysis. Path to a libraries.csv file declaring input libraries. Cannot be combined with --fastqs or --sample . |
--lanes | Optional. Lanes associated with this sample. Defaults to using all lanes. |
For Feature Barcode experiments, separate libraries for the Gene Expression reads and the Feature Barcode reads are generated. In this case, you must construct a CSV file indicating the input data folder, sample name, and library type of each input library. Then pass this file to cellranger count
using the --libraries
flag. See Libraries CSV page for details on how to construct the libraries.csv
file.
Here are the columns available in the [libraries]
section of the multi config CSV for specifying which FASTQ files cellranger multi
should use:
Column | Brief Description |
---|---|
fastq_id | Required. The Illumina sample name to analyze. This will be as specified in the sample sheet supplied to bcl-convert , bcl2fastq or mkfastq . Multiple names may be supplied as a comma-separated list, in which case they will be treated as one sample. |
fastqs | Required. The folder containing the FASTQ files to be analyzed. Generally, this will be the fastq_path folder generated by cellranger mkfastq . |
feature_types | Required. The underlying feature type of the library must be one of 'Gene Expression', 'VDJ', 'VDJ-T', 'VDJ-B', 'Antibody Capture', 'CRISPR Guide Capture', or 'Antigen Capture'. |
lanes | Optional. Lanes associated with this sample. Defaults to using all lanes. |
There are many ways bcl-convert
, bcl2fastq
and mkfastq
can be used, resulting in a wide range of potential file names and locations as the output.
To serve as inputs for cellranger
, FASTQ files should conform to the naming conventions of bcl-convert
, bcl2fastq
and mkfastq
:
[Sample Name]
_S1_L00[Lane Number]
_[Read Type]
_001.fastq.gz
-OR-
[Sample Name]
_S1_[Read Type]
_001.fastq.gz
Where Read Type is one of:
I1
: Sample index read (optional)I2
: Sample index read (optional)R1
: Read 1R2
: Read 2
The FASTQ files are specified by providing the path to the folder containing them (via the fastqs
column) and their Illumina sample name (via the fastq_id
column) and optionally restricting the selection further by specifying the lanes of interest.
Finding the right FASTQ files to process and the right arguments to process those files as desired can be confusing. To assist users, this page illustrates examples of how to handle common scenarios involving different FASTQ file folder hierarchies.
- In an output folder from
bcl-convert
,bcl2fastq
ormkfastq
(fastq_path
) and: - In a different folder:
- Consistent with
bcl-convert
,bcl2fastq
ormkfastq
, e.g. "mysample_S1_L001_I1_001.fastq.gz" (see above). - Scenario 5: Like "read-I1-AAAAAAA_lane-001-chunk-001.fastq.gz".
- Scenario 6: Unlike any of the above examples.
My FASTQs are in an output folder from mkfastq or bcl2fastq, in a subdirectory next to Reports and Stats folders, with expected sample name prefixes.
How did I get here?
By running mkfastq
with a simple CSV layout file or Illumina Experiment Manager sample sheet, or by running bcl-convert
or bcl2fastq
directly (with an IEM sample sheet) on a flow cell. If you ran mkfastq
, your files will be in a (MKFASTQ_ID)/outs/fastq_path
folder and your file hierarchy probably looks something like this:
MKFASTQ_ID
├── MAKE_FASTQS_CS
└── outs
└── fastq_path
└── HFLC5BBXX
├── test_sample1
│ ├── test_sample1_S1_L001_I1_001.fastq.gz
│ ├── test_sample1_S1_L001_R1_001.fastq.gz
│ ├── test_sample1_S1_L001_R2_001.fastq.gz
│ ├── test_sample1_S1_L002_I1_001.fastq.gz
│ ├── test_sample1_S1_L002_R1_001.fastq.gz
│ ├── test_sample1_S1_L002_R2_001.fastq.gz
│ ├── test_sample1_S1_L003_I1_001.fastq.gz
│ ├── test_sample1_S1_L003_R1_001.fastq.gz
│ └── test_sample1_S1_L003_R2_001.fastq.gz
├── test_sample2
│ ├── test_sample2_S2_L001_I1_001.fastq.gz
│ ├── test_sample2_S2_L001_R1_001.fastq.gz
│ ├── test_sample2_S2_L001_R2_001.fastq.gz
│ ├── test_sample2_S2_L002_I1_001.fastq.gz
│ ├── test_sample2_S2_L002_R1_001.fastq.gz
│ ├── test_sample2_S2_L002_R2_001.fastq.gz
│ ├── test_sample2_S2_L003_I1_001.fastq.gz
│ ├── test_sample2_S2_L003_R1_001.fastq.gz
│ └── test_sample2_S2_L003_R2_001.fastq.gz
├── Reports
├── Stats
├── Undetermined_S0_L001_I1_001.fastq.gz
...
└── Undetermined_S0_L003_R2_001.fastq.gz
If you ran bcl2fastq
directly, then the output root folder would be where fastq_path
is in the hierarchy above.
"Expected sample name prefixes" means you have one set of FASTQ files per sample, prefixed with the name of the sample as it appears in the simple CSV layout file or IEM sample sheet. Other situations described later on this page deal with the presence of four separate sets of files (four "samples" from bcl2fastq's point of view) per single biological sample/library.
For more information on the naming conventions, please visit Illumina's support site or refer to the bcl2fastq User Guide. The scenario where your files do not conform to the naming convention is described in a different section in this page.
Cell Ranger count/vdj arguments
The table below describes the arguments you would pass into any analysis pipeline to target the right FASTQ fastq files in this scenario. Be sure to substitute the capitalized text as appropriate. Also note that in most cases you will be passing a single sample into any given pipeline. Exceptions to this are described in the documentation for the individual pipelines. The "All Samples" entries in this table are provided for technical completeness.
Situation | Argument and Value |
---|---|
All samples (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path |
All samples (mkfastq), multiple flowcells | --fastqs=MKFASTQ_ID/outs/fastq_path1,MKFASTQ_ID/outs/fastq_path2 |
All samples (bcl2fastq direct) | --fastqs=/PATH/TO/bcl2fastq_output |
Process test_sample1 from all lanes (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path --sample=test_sample1 |
Process test_sample1 from lane 1 only (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path --sample=test_sample1 --lanes=1 |
Process test_sample1 and test_sample2 as a single merged sample (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path --sample=test_sample1,test_sample2 |
Cell Ranger multi config CSV arguments
The arguments you would pass into any analysis pipeline to target the right FASTQ files in this scenario are described below. Be sure to substitute the capitalized text as appropriate. Also note that in most cases you will be passing a single sample into any given pipeline. Exceptions to this are described in the documentation for the individual pipelines.
Gene Expression and V(D)J (mkfastq), one flowcell
[libraries]
fastq_id,fastqs,feature_types
test_sample1,MKFASTQ_ID/outs/fastq_path,Gene Expression
test_sample2,MKFASTQ_ID/outs/fastq_path,VDJ
Gene Expression and V(D)J (mkfastq), multiple flowcells
[libraries]
fastq_id,fastqs,feature_types
test_sample1,MKFASTQ_ID/outs/fastq_path1,Gene Expression
test_sample2,MKFASTQ_ID/outs/fastq_path2,VDJ
Gene Expression and V(D)J (bcl2fastq direct)
[libraries]
fastq_id,fastqs,feature_types
test_sample1,/PATH/TO/bcl2fastq_output,Gene Expression
test_sample2,/PATH/TO/bcl2fastq_output,VDJ
My FASTQs are in an output folder from mkfastq or bcl2fastq, but there are multiple folders per sample index, like "SI-GA-A1_1" and "SI-GA-A1_2".
How did I get here?
An input sample sheet was likely used that explicitly separated the four oligos in a 10x sample index set into four separate sample names. You may see a file hierarchy like this:
bcl2fastq_output
├── HFLC5BBXX
├── SI-GA-A1_1
│ ├── SI-GA-A1_1_S1_L001_I1_001.fastq.gz
│ ├── SI-GA-A1_1_S1_L001_R1_001.fastq.gz
│ └── SI-GA-A1_1_S1_L001_R2_001.fastq.gz
├── SI-GA-A1_2
│ ├── SI-GA-A1_2_S2_L001_I1_001.fastq.gz
│ ├── SI-GA-A1_2_S2_L001_R1_001.fastq.gz
│ └── SI-GA-A1_2_S2_L001_R2_001.fastq.gz
├── SI-GA-A1_3
│ ├── SI-GA-A1_3_S3_L001_I1_001.fastq.gz
│ ├── SI-GA-A1_3_S3_L001_R1_001.fastq.gz
│ └── SI-GA-A1_3_S3_L001_R2_001.fastq.gz
├── SI-GA-A1_4
│ ├── SI-GA-A1_4_S4_L001_I1_001.fastq.gz
│ ├── SI-GA-A1_4_S4_L001_R1_001.fastq.gz
│ └── SI-GA-A1_4_S4_L001_R2_001.fastq.gz
├── Reports
├── Stats
├── Undetermined_S0_L001_I1_001.fastq.gz
├── Undetermined_S0_L001_R1_001.fastq.gz
└── Undetermined_S0_L001_R2_001.fastq.gz
You probably want to be able to merge All samples from the SI-GA-A1
index into a single analysis. If you only run one index at a time, you will see a smaller number of reads than expected, which may translate to lower coverage or cell count than you expect for your experiment.
Cell Ranger count/vdj arguments
Situation | Argument and Value |
---|---|
All samples | --fastqs=MKFASTQ_ID/outs/fastq_path |
Process all SI-GA-A1 reads in a single analysis | --fastqs=MKFASTQ_ID/outs/fastq_path \ --sample=SI-GA-A1_1,SI-GA-A1_2,SI-GA-A1_3,SI-GA-A1_4 |
Only process first sample index | --fastqs=MKFASTQ_ID/outs/fastq_path \ --sample=SI-GA-A1_1 |
Cell Ranger multi config CSV arguments
Process all SI-GA-A1
reads in a single analysis
[libraries]
fastq_id,fastqs,feature_types
SI-GA-A1_1,MKFASTQ_ID/outs/fastq_path,Gene Expression
SI-GA-A1_2,MKFASTQ_ID/outs/fastq_path,Gene Expression
SI-GA-A1_3,MKFASTQ_ID/outs/fastq_path,Gene Expression
SI-GA-A1_4,MKFASTQ_ID/outs/fastq_path,Gene Expression
Only process the first sample index
[libraries]
fastq_id,fastqs,feature_types
SI-GA-A1_1,MKFASTQ_ID/outs/fastq_path,
Gene Expression
My FASTQs are in an output folder from mkfastq or bcl2fastq, in the same directory as the Reports and Stats folders.
How did I get here?
An Illumina Experiment Manager-formatted sample sheet was used with either no entry or a blank entry for the Sample_Project
column. Your hierarchy likely looks something like this:
fastq_path
├── Reports
├── Stats
├── test_sample_S1_L001_I1_001.fastq.gz
├── test_sample_S1_L001_R1_001.fastq.gz
├── test_sample_S1_L001_R2_001.fastq.gz
├── test_sample_S1_L002_I1_001.fastq.gz
├── test_sample_S1_L002_R1_001.fastq.gz
├── test_sample_S1_L002_R2_001.fastq.gz
├── test_sample_S1_L003_I1_001.fastq.gz
├── test_sample_S1_L003_R1_001.fastq.gz
├── test_sample_S1_L003_R2_001.fastq.gz
├── Undetermined_S0_L001_I1_001.fastq.gz
...
└── Undetermined_S0_L003_R2_001.fastq.gz
This is fine; you would use the same arguments as if the FASTQs were organized into subfolders within the output folder.
Cell Ranger count/vdj arguments
Situation | Argument and Value |
---|---|
All samples (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path |
All samples (bcl2fastq direct) | --fastqs=/PATH/TO/bcl2fastq_output |
Process test_sample from all lanes (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path \ --sample=test_sample |
Process test_sample from lane 1 only (mkfastq) | --fastqs=MKFASTQ_ID/outs/fastq_path \ --sample=test_sample \ --lanes=1 |
Cell Ranger multi config CSV arguments
Process test_sample
from all lanes (mkfastq)
[libraries]
fastq_id,fastqs,feature_types
test_sample,MKFASTQ_ID/outs/fastq_path,Gene Expression
Process test_sample
from lane 1 only (mkfastq)
[libraries]
fastq_id,fastqs,lanes,feature_types
test_sample,MKFASTQ_ID/outs/fastq_path,1,Gene Expression
My FASTQs are in a different folder; I don't see Reports or Stats anywhere. The files are named like "MySample_S1_L001_I1_001.fastq.gz".
How did I get here?
FASTQ files have likely been transferred from either a bcl-convert
, bcl2fastq
or mkfastq
run into another folder. They still retain the names assigned by the software, which is a combination of sample name, sample order, lane, read type, and chunk. Your file hierarchy may look like this:
PROJECT_FOLDER
├── MySample_S1_L001_I1_001.fastq.gz
├── MySample_S1_L001_R1_001.fastq.gz
├── MySample_S1_L001_R2_001.fastq.gz
├── MySample_S1_L002_I1_001.fastq.gz
├── MySample_S1_L002_R1_001.fastq.gz
└── MySample_S1_L002_R2_001.fastq.gz
Cell Ranger count/vdj arguments
Situation | Argument and Value |
---|---|
All samples | --fastqs=/PATH/TO/PROJECT_FOLDER |
Process MySample from all lanes | --fastqs=/PATH/TO/PROJECT_FOLDER \ --sample=MySample |
Process MySample from lane 1 only | --fastqs=/PATH/TO/PROJECT_FOLDER \ --sample=MySample \ --lanes=1 |
Cell Ranger multi config CSV arguments
Process MySample
from all lanes
[libraries]
fastq_id,fastqs,feature_types
MySample,/PATH/TO/PROJECT_FOLDER,Gene Expression
Process MySample
from lane 1 only
[libraries]
fastq_id,fastqs,lanes,feature_types
test_sample,/PATH/TO/PROJECT/FOLDER,1,Gene Expression
My FASTQs are named like "read-I1-AAAAAAA_lane-001-chunk-001.fastq.gz".
How did I get here?
The 10x demux
pipeline was used to demultiplex the flowcell instead of mkfastq
. This pipeline has been deprecated and cellranger
no longer directly supports using FASTQ files in this layout. Please contact [email protected] for assistance.
My FASTQs are not named like any of the above examples.
How did I get here?
You likely received files that were processed through a proprietary LIMS system, which employs its own naming conventions.
10x pipelines need files named in the bcl2fastq
convention in order to run properly. You will need to determine which file corresponds to which sample and which read type, likely by consulting your sequencing core or the individual who demultiplexed your flowcell.
It is likely that these files were initially processed with bcl2fastq
. Once you track down their origin, you must rename the files (according to format described above).
Cell Ranger count/vdj arguments
After you have renamed those files into that format, you will use the following arguments:
Situation | Argument and Value |
---|---|
All samples | --fastqs=/PATH/TO/PROJECT_FOLDER |
Process SAMPLENAME from all lanes | --fastqs=/PATH/TO/PROJECT_FOLDER \ --sample=SAMPLENAM |
Process SAMPLENAME from lane 1 only | --sample=SAMPLENAME \ --fastqs=/PATH/TO/PROJECT_FOLDER \ --lanes=1 |
Cell Ranger multi config CSV arguments
Process SAMPLENAME
from all lanes
[libraries]
fastq_id,fastqs,feature_types
SAMPLENAME,/PATH/TO/PROJECT_FOLDER,Gene Expression
Process SAMPLENAME
from lane 1 only
[libraries]
fastq_id,fastqs,lanes,feature_types
test_sample,/PATH/TO/PROJECT/FOLDER,1,Gene Expression