Author: Scott Brouilette, 10x Genomics
Despite the weather threatening to put a damper on our Community Mixer on Tuesday night the turnout was great and everyone had a good time with many interesting discussions around the venue. Of course, the downside of a good time the night before is a fuzzy head the morning after…!
First thing Wednesday we made some exciting announcements about our new kits for unbiased gene expression plus T- or B-cell repertoire sequencing from the same single cell! And, we are also reducing our pricing for our Chromium™ Genome kits for WGS, WES & assembly applications with Linked-Reads. Find out more information in the press releases here.
Wednesday morning kicked-off with a session on Disease Gene Discovery Strategies, starting with Karyn Meltz Steinberg from Washington University discussing genomic diagnosis of birth defects. Karyn started with an outline of their existing WGS and WES approaches including the informatics tools they use and something that would immediately become a running theme - the use of multiple packages for calling variants, followed by the creation of merged .vcfs for downstream annotation. Their pipeline was validated by comparing with clinical data from GeneDX; 100% of the known SNVs were called, as were 7/8 known SVs (the fail was close to the telomeric region). But of course Karyn’s team are interested in the UNknowns, and so she proceeded to highlight more recent work using the 10x Genomics Chromium Genome Solution and Linked-Reads. As expected, 98% of SNPs were then successfully phased with SNV calls highly consistent with GATK, but for INDELs LongRanger "significantly improved the precision of the calls". Ongoing work will focus on fully integrating Linked-Reads into their workflow and developing Linked-Read data analysis.
Animal models play a vital role in the study of human disease, and Cynthia Smith from Jackson Labs was next up to tell us how the Mouse Genome Informatics (MGI) Program aims to facilitate the use of mouse as model for heritable human disease. Using Congenital Diaphragmatic Hernia as an example Cynthia used the Human-Mouse Disease Connection (HMDC) to find human genes and their mouse paralogs. Data is presented as a matrix with genes on the left hand side and cell/tissue types across the top. The intersections are colored to indicate the number of annotations and allow click-through to detailed information.
Jessica Chong from University of Washington then switched gears a little with her talk "Gene discovery via direct-to-family engagement using MyGene2". Jessica framed her work her by stating that the rate of gene discovery for the approx 3000 unexplained Mendelian conditions is simply too slow, with the lack of data sharing (we will come back to this during the Gates/Collins discussion) remaining key bottleneck - pervasive, international, rapid public data sharing is key. With that she introduced MyGene2, an online portal "through which families with rare genetic conditions who are interested in sharing their health and genetic information can connect with other families, clinicians, and researchers." To date MyGene2 has 1225 user profiles and continues to grow, but this type of resource desperately needs support to scale with the volume of data genetic researchers are producing.
Next, our very own Deanna Church was chairing Session 22: Detection & Interpretation of SVs, so I dropped by to hear a couple of the opening talks. Michael Gonzalez from CHOP talked about the use of Linked-Reads and WES using an Agilent SureSelect enhanced bait set to detect SVs in NF1. Neurofibromatosis is an autosomal dominant condition caused by mutation in NF1 and it is assumed that 2/3 sporadic cases exhibit some type of mosacism. Previous genotyping and sequencing using Illumina TruSight Cancer panel found no pathogenic mutations, but WES using Linked-Reads the researchers were able to phase >98% genes/SNPs and produce 3.5Mb phase blocks. The LongRanger pipeline then successfully called a mosaic deletion in NF1, and all findings have since been confirmed with orthogonal methods. The group are now working to develop novel algorithms to better resolve & visualize identified breakpoints using Linked-Reads.
The next couple of talks then focused on the Comp Bio side.
Andrew Farrell (University of Utah) outlined the problems with both de novo assembly and reference-based alignment before introducing RUFUS. In this approach 2 samples can be compared, filtering to yield just the reads that differbetween the samples using k-mer based approach; these reads are then assembled into contigs and used to call variants. In a clinical setting RUFUS called all known variants identified by GATK, plus many more… and they are now applying RUFUS to 1000 Genome Project data to identify family-private variants. A key aspect of RUFUS is the reduction in analysis time - down to just a few minutes as only the "different" reads are actually being analyzed.
Then Andrew Miller (also from Utah) introduce GRAPHITE, a computational framework for SV "adjudication". Many SV callers have been developed, but applying all to the same data typically results in divergent calls. GRAPHITE takes FASTQ reference, multiple samples files and a VCF file, then uses graph-based analysis to remap reads. The GRAPHITE output vcf can then be read in IGV, with a .bed file highlighting the graphs nodes in the visualization.
Lunchtime saw our first Educational Session on "Advancing Genomic and Single-Cell Sequencing Drop-by-Drop with the 10x Chromium™ System". Deanna Church was back to give an overview of the 10x technology, and a reminder to all that we make a huge amount of data available on our website in both pre- and post-analysis formats for Genome and Single Cell. Deanna also took the opportunity to highlight the release of new products that very morning: unbiased SC expression plus T or B cell repertoire sequencing from the same cell.
The first user-speaker of the session was Hakon Hakonarson, Head of Applied Genomics at CHOP discussing more of the work introduced by Michael Gonzalezin Session 10. Hakon listed a number of case studies that illustrated the power on Linked Reads in resolving variants that have been refractory to short read analysis. This included: identification of Compound Hets in ADAR gene in Aicardi-Goutieres Syndrome; inversions in introns 1 and 22 leading to Haemophilia - intractable using standard WES; fine-mapping of balanced translocations, and resolution of the "Dark Matter" genes within the exome. Hakon closed by introducing a Targeted Panel to study Lynch Syndrome
Last up was Madhuri Hegde, VP & CSO, Global Labs, from PerkinElmer, to show Linked Read data from DNA extracted from Dried Blood Spot (DBS) samples. This will be covered in more detail in a later post, but the upshot is that extraction from DBS samples yielded DNA approximately 10kb in length with a tail of longer molecules. Despite this DNA being lower molecular weight than we recommend, Madhuri’s team saw phase blocks up to 334kb with 98.6% of SNPs phased and improved coverage of the NGS dead zones that include 73 medically-relevant genes. Madhuri’s conclusion was that "complex, high quality Linked Read sequencing data can be generated from DBS samples".
The rest of the afternoon was filled with posters and the Presidential Symposium featuring speakers Bill Gates and Francis Collins discussing global health and genomics. Read more about this interesting session here.