Specifying Input FASTQ Files for Cell Ranger count, vdj, and multi

The cellranger mkfastq pipeline is deprecated and will be removed in a future release. Please use Illumina's BCL Convert to generate Cell Ranger-compatible FASTQ files. For detailed guidance, refer to the Generating FASTQs page.

Cell Ranger requires FASTQ files as input, which typically come from running demultiplexing software (e.g., Illumina's BCL Convert). However, it is possible to use FASTQ files from other sources, such as a published dataset or the 10x Genomics bamtofastq tool. Check the compatible products page for sequencer platforms that are compatible with single cell gene expression, Flex, and Immune Profiling assays.

To serve as inputs for cellranger, FASTQ files should conform to the following naming conventions:

[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz

[Sample Name]_S1_[Read Type]_001.fastq.gz

Where Read Type is one of:

I1: Sample index read (optional)
I2: Sample index read (optional)
R1: Read 1
R2: Read 2

The FASTQ files are specified by providing the path to the folder containing them (fastqs), their sample name (--samples for count or vdj pipelines; fastq_id column for multi pipeline), and optionally restricting the selection further by specifying the lanes of interest.

Cell Ranger scans the folder's subdirectories to locate the *.fastq.gz files. Make sure there are no duplicate sequence files in the subdirectories.

Here are the command line arguments for specifying which FASTQ files cellranger count or cellranger vdj should use:

Argument	Brief Description
`--fastqs`	Required. The folder containing the FASTQ files to be analyzed. If the files are in multiple folders, for instance, because one library was sequenced across multiple flow cells, provide a comma-separated list of paths.
`--sample`	Optional. Sample name to analyze. Use the `[Sample_Name]` shown above. Multiple names may be supplied as a comma-separated list, in which case they will be treated as one sample.
`--libraries`	Required for Feature Barcode analysis. Path to a `libraries.csv` file specifying input libraries. If the libraries CSV file is provided, do not use `--fastqs` or `--sample`.
`--lanes`	Optional. Lanes associated with this sample. Defaults to using all lanes.

For Feature Barcode experiments, separate libraries are generated for the Gene Expression reads and the Feature Barcode reads. In this case, you must construct a CSV file indicating the input data folder, sample name, and library type of each input library. Then pass this file to cellranger count using the --libraries flag. See Libraries CSV page for details on how to construct this file.

Here are the columns available in the [libraries] section of the multi config CSV for specifying which FASTQ files cellranger multi should use:

Column	Brief Description
`fastq_id`	Required. The sample name to analyze. Use the `[Sample_Name]` shown above. Multiple names may be provided as a comma-separated list, in which case they will be treated as one sample.
`fastqs`	Required. The folder containing the FASTQ files to be analyzed.
`feature_types`	Required. The underlying feature type of the library (see multi config CSV documentation for all type options)
`lanes`	Optional. Lanes associated with this sample. Defaults to using all lanes.

If you have multiple directories and/or multiple FASTQ files to process as a single analysis:


├── flowcell_1
│   ├── Sample-GA-A1
│   │   ├── Sample-GA-A1_S1_L001_I1_001.fastq.gz
│   │   ├── Sample-GA-A1_S1_L001_R1_001.fastq.gz
│   │   ├── Sample-GA-A1_S1_L001_R2_001.fastq.gz
│   │   └── Sample-GA-A1_S1_L001_I2_001.fastq.gz
├── flowcell_2
│   ├── Sample-GA-A1
│   │   ├── Sample-GA-A1_S1_L001_I1_001.fastq.gz
│   │   ├── Sample-GA-A1_S1_L001_R1_001.fastq.gz
│   │   ├── Sample-GA-A1_S1_L001_R2_001.fastq.gz
│   │   └── Sample-GA-A1_S1_L001_I2_001.fastq.gz

There are a few ways to specify these FASTQ files depending on your experiment and choice of pipeline:

To process reads from multiple FASTQ files in a single Gene Expression analysis, the count pipeline would look like this:


cellranger count --fastqs=/flowcell_1,/flowcell_2 --sample=Sample-GA-A1

For Feature Barcode experiments, the libraries.csv file should be formatted like this:


fastqs,sample,library_type,
flowcell_1,Sample-GA-A1,Gene Expression,
flowcell_2,Sample-GA-A1,Gene Expression,
fastq_path,CRISPR_sample1,CRISPR Guide Capture,

The count pipeline would then look like this:


cellranger count --libraries=libraries.csv

To process reads from multiple FASTQ files in a single analysis with the cellranger multi pipeline, the [libraries] section of the multi config CSV should be formatted like this:
```
[libraries]
fastq_id,fastqs,feature_types
Sample-GA-A1,flowcell_1,Gene Expression
Sample-GA-A1,flowcell_2,Gene Expression
```

Specifying Input FASTQ Files for Cell Ranger count, vdj, and multi

Overview

FASTQ file naming convention

Specifying FASTQs

Cell Ranger count and vdj options

Cell Ranger multi config CSV options

Specify multiple FASTQ files