Instructions to Download and Process FASTQs of 1.3M Brain Cells

Due to the large size of the data (3.6 Terabytes), the raw data will not be available directly from our website. Users can download the raw fastq data from Amazon S3 at their cost using the 'Requestor Pays Bucket' option. (Depending on user's Amazon region, Amazon may charge up to ~$350 for this data transfer). Interested users should follow the steps here to download and process the FASTQ files.

1. Download the files

Use the instructions at this Amazon Link to transfer the fastq files from this bucket: s3://10x.largefiles/1M_neurons
Download the mros (mros.tar.gz) from s3://10x.largefiles/1M_neurons to the location you want to run the pipestances, e.g. /path/to/pipestances
Download the aggregator.csv file from s3://10x.largefiles/1M_neurons to the location you want to run the aggregation, e.g. /path/to/aggregator
Download the reanalyze.csv file from /datasets/1-3-million-brain-cells-from-e-18-mice-2-standard-1-3-0 to the location you want to run the reanalysis, e.g. /path/to/reanalyze

2. Extract the fastqs

One flowcell per folder - to a common location - e.g. /path/to/fastqs/<flowcell ID>

3. Run the count pipeline

Extract the mros
Replace /path/to/fastqs in all the mros with your actual path to the flowcell folders
Replace /path/to/refdata-cellranger/mm-1.2.0 in all the mros with your actual path to the refdata-cellranger/mm-1.2.0
For each mro, run: cellranger mrp <mrofile>.mro <mrofile>

4. Run the aggregator pipeline

Replace /path/to/pipestances in aggregator.csv with your actual pipestance path
Run: cellranger aggr --id=neuron_aggregation --csv=aggregator.csv --nosecondary
NOTE: this uses >300GB RAM - it was run at 10x on a machine with 384GB RAM
NOTE: E18_20160930_Neurons_Sample_14 and E18_20160930_Neurons_Sample_26 are omitted from the aggregation

5. Run the reanalyze pipeline:

cellranger reanalyze --id=neuron_reanalyze --matrix=/path/to/aggregator/neuron_aggregation/outs/filtered_gene_bc_matrices_h5.h5 --params=reanalyze.csv (NOTE: this uses >300GB RAM - it was run at 10x on a machine with 384GB RAM)