Apr 20, 2017

Linked-Read sequencing for genome assembly of an endangered species, the Hawaiian Monk Seal

Shauna Clark

Guest Authors: Alan F. Scott and David W. Mohr

Recently, there has been increasing interest in large-scale sequencing projects, including the very ambitious Earth BioGenome Project (EBP), which aims to sequence "all life on earth". While sequencing is affordable and accurate for short sequences, obtaining chromosome-scale assemblies has been difficult and costly. Several new methods have become available, one of which is 10x Genomics Linked-Read sequencing and assembly with Supernova™ software. We opted to test this approach because it allowed us to use the Illumina system already available in our facility and required very small amounts of DNA, a distinct advantage when working with endangered species samples, which are limited and difficult to collect.

We chose to sequence "Benny," a Hawaiian monk seal that lives mainly along the beaches of Oahu. Benny is a 15-year-old male and enjoys celebrity status in Hawaii as a "poster seal" for environmental protection. He is featured in a children’s book about keeping beaches clean for good reason; he has repeatedly swallowed fish hooks and has had to be captured to have them surgically removed.

Photo courtesy of Alan F. Scott.
Photo courtesy of Alan F. Scott.

We obtained a blood sample from Benny which 10x Genomics used to make a library that we, in turn, sequenced. 10x ran their Supernova Assembler on our sequence data to obtain scaffolds that were over 160 times longer than those obtained by 250 bp Discovar Paired End sequencing, the best method then available. In parallel, we used optical mapping from Bionano Genomics as a way to validate the computationally obtained scaffolds.

The 10x data produced remarkably long scaffold lengths with an N50 of nearly 30 Mb and one scaffold longer than 80 Mb. An obvious question was whether these were correct. To determine the quality of the assemblies, we first compared them to the Bionano optical maps, and secondly, we translated the scaffolds to obtain predicted protein-coding genes. Because of conserved synteny in mammals we saw that the predicted genes were primarily in the locations and order we expected. The scaffolds also matched the optical maps extremely well.

Since Benny is one in a population of about 1,400 animals, we expected to see reduced heterogeneity. In comparison to the NA12878 dataset posted on the 10x website, Benny’s phase blocks are smaller and his overall heterogeneity is perhaps 5-10% of a typical human. We expect that this study will be helpful in not only understanding the genomics of monk seals but also in helping with their management and recovery.

The fact that Linked-Read sequencing worked so well and required such small amounts of DNA argues that it may be the preferred method for genome assembly for other rare and endangered species.  Details of the paper can be found here.

Additional Resources:

Read more about Benny the Monk Seal on the Monk Seal Mania blog.

Learn more about de novo assembly with Linked-Reads and Supernova

Read the publication preprint on Biorxiv