Author: Scott Brouilette, 10x Genomics
Day 3 got started at a ridiculously early hour – 7:15am to be precise - and given the number of parties going on the night before I was concerned about the turnout for our second Exhibitor Education Event: "Intuitive Tools for Sequence Analysis: Crunching Genomic, Single-Cell, and Immune Repertoire Data Using 10x Chromium™ Software". But, I had overlooked a key aspect - the expected audience was the hardcore, informatics savvy amongst the ASHG attendee pool - and you didn’t let us down!
As the room filled Alex Wong, VP of Software & Infrastructure at 10x Genomics, highlighted that we consider ourselves a "Solution" company, offering sample-to-answer solutions. Key to this are the software tools we build and support to enable analysis of new data types such as Linked-Reads. And, the pace of software development is impressive, with 7 different software packages and 25 major releases since launch. Alex also discussed the importance of linking data to developers, aided by the public availability of various datasets (for both DNA and RNA), a common software platform, and open source tools. Our software can run anywhere, from standalone servers, to HPC, to cloud environments - all based on a common framework called Martian. Thereafter, followed a number of demonstrations of our software, including LongRanger and the Loupe Genome Browser for Genome/Exome analysis, Supernova for de novo assembly, Cell Ranger and the Loupe Cell Browser for single-cell gene expression analysis, and the Loupe V(D)J Browser for single-cell immune repertoire profiling.
Earlier this month 10x Genomics and PerkinElmer announced a new, automated solution for whole genome and exome sequencing from dried blood spots, and this collaboration featured in the concurrent PerkinElmer Educational Session.
So what exactly are dried blood spot (DBS) samples, and why should you be interested?
It was Robert Gunthrie (Scotland, 1963) who introduced the idea that capillary blood from heel or finger pricks could be blotted onto filter paper and used for screening metabolic diseases in large populations of neonates. Since that time many countries around the world have used this approach to collect and screen various disorders, frequently using commercial solutions such as PerkinElmer 226 filter paper, which contains four 10-mm-diameter circles. In the United Kingdom the UK National Screening Committee recommends screening for 9 disorders at day 5–8 of life, whereas the United States covers 30 "core" disorders. And, the number of stored samples around the world continues to grow - for example in Denmark DBS samples have been obtained from virtually all newborns since 1982, giving an estimate of 1.8 million samples (as of 2007, so this number will have increased significantly as of today).
DBS samples - a veritable treasure trove of biomedical data
Technically the only reason to retain a DBS sample from a newborn screened negative for a given condition is to address false-negative results, but many of the existing screening tests are aimed at various metabolites, the long-term (years to decades) stability of which remains unclear. However, DNA is stable and so the use of residual DBS samples (i.e. those that are no longer required for analyzing newborn diseases), combined with clinical/phenotypic information, has the potential to provide the biomedical research community with access to genomic DNA from large cohorts of well-characterized patients and healthy controls (which are historically harder to recruit). While these archived DBS samples represent an untapped resource of biomedical data, there are some technical considerations for their analysis:
- the relatively small amount of genomic DNA with older reported yields of 60ng (Hannelius et al., 2005), ranging to a more workable 180ng (Rajatileka et al., 2013);
- the extraction efficiency and;
- the overall quality (length) of the DNA fragments.
Which brings us back to the 10x Genomics + Perkin Elmer solution: "Linked-Reads: Enabling Robust Genome Analysis from Dried Blood Spots".
In 2015 10x introduced the world to Linked-Reads, an elegant solution that utilizes existing short-read next generation sequencing (NGS) technology to deliver long-range genomic information, enabling de novo assembly, haplotype phasing and structural variant (SV) detection. With a low DNA input requirement (~1ng) and the ability to provide haplotype information and SV detection (both of which are tricky or nearly impossible with standard short-read NGS data alone), combining 10x Linked-Read technology with PerkinElmer automation seemed to be a promising solution to enable the sequencing of a large number of archival DBS samples, while obtaining a maximal amount of genomic information from limited sample quantities.
So, in this proof-of-principle study PerkinElmer used their chemagic™ 360 automated extraction platform to isolate gDNA from DBS samples, yielding an average DNA length of 10kb, but with a reasonable tail of higher lengths. This material was then used to generate a Linked-Read library, which was then compared to a short-read Illumina TruSeq PCR-free Library. Compared to the TruSeq library, Linked-Reads yielded an additional 22Mb of data consistent with increased coverage, including "NGS Dead Zones" (Mandelker et al., 2016) and 73 medically relevant genes. A high correlation of SNP calling was seen between the library types, but the Linked-Read library called an additional 20-50k SNPs in regions inaccessible to short-reads. Furthermore, it was possible to phase and detect broader structural variants such as homozygous deletions and balanced inversions that were absent in the TruSeq library. Phase block length averaged 275kb with a maximum of 334kb, and 98.6% of SNPs phased. One key advantage of phasing is the ability to study adult onset disease in the absence of trios, thus significantly reducing logistics and costs.
One initial concern about using Linked-Reads for DBS samples was the degraded nature of the isolated gDNA (average size ~10kb) affecting the ability to obtain long-range information for phasing and complex SV detection. However, these data prove that complex, high-quality Linked-Read sequencing data can be successfully generated from DBS samples using an automated workflow developed with PerkinElmer.
Want to see what else happened at #ASHG17? Check out @10xgenomics on twitter and read the live tweets from talks and workshops throughout the week.