Blog
Apr 14, 2017

Direct determination of diploid genome sequences

Kariena Dill

In the past decade, the emergence of low-cost, high-throughput DNA sequencing technologies has enabled thousands of human genomes to be sequenced. At present, the most frequently used  approach for analyzing the short reads generated by these next generation sequencing platforms involves aligning the reads to a reference genome, which creates significant bias and does not capture sequences novel to the sample under investigation. Less commonly, genome assemblies are generated de novo, without relying on a reference genome. While de novo assembly avoids the problem of reference bias, in nearly all cases, the diploid genome is represented by a haploid consensus sequence. The collapsed haplotypes in these assemblies do not capture the full spectrum of alternate alleles present in the diploid genome and fail to resolve complex heterozygous variants within a single locus.

In a new publication in Genome Research, senior author David Jaffe and colleagues at 10x describe the Supernova™ assembler, which allows for direct determination of diploid genome sequences. Supernova uses a single data type, Linked-Reads, which are sequenced on standard illumina platforms. The authors demonstrate their method with seven human samples from diverse populations. Using the Chromium™ instrument and reagents, a single barcoded sequencing library was created for each individual. The resulting assemblies have contigs longer than 100kb, phase blocks longer than 2.5Mb, and scaffolds longer than 15Mb (Table 1 in the article).

Highlights:

  • Each library was generated from only ~1ng of high molecular weight DNA (> 90kb)
  • Libraries were sequenced on Illumina HiSeq X to ~56x coverage or ~1200M 150bp reads
  • Supernova is a ‘pushbutton’ algorithm -- requires no data processing or user-tuning
  • Each assembly took 2 days on a single server

The seven samples used in this publication included four individuals for whom parental data was available to test phasing and two individuals for whom finished sequence was available to test base-level accuracy.  The latter included one unique sample for which 340 Mb of finished sequence was available from the Human Genome Project. These data were used to perform extensive validation studies, generating contiguity and reference-based statistics for each sample, as well as phasing and base-level accuracy for the samples described above.  For comparison, the authors applied the same analysis to six published human genomes, including one of only two published diploid human genomes.  The contiguity and accuracy of these inexpensive, automated diploid assemblies compare favorably to (primarily) haploid results generated at great cost and effort.

Together, the Chromium platform and Supernova assembler provide a complete and affordable solution for generating high-quality genome assemblies at very low experimental burden. This solution provides a scalable capability for determining the actual diploid genome sequence in a sample, opening the door to new approaches in genomic biology and medicine.

Read the full article here.

Additional Resources:

De novo assembly application note

Watch the scientific seminar featuring David Jaffe, Everyday de novo assembly with the Supernova Assembler.

Read more about Supernova and download the software package here.