Structural variation (SV), or the rearrangement of chromosomal segments ≥ 50kb, is a major driver of phenotypic variation and disease. However, in part because they tend to be clustered in duplicated and repetitive regions of the genome that are not accessible by short-read sequencing technologies, this important class of variation in the human genome is poorly understood. In an effort to benchmark the mutational spectrum of large and complex SV within a disease cohort, a new study published in Genome Biology looks at the diversity of complex SV in 689 participants diagnosed with autism spectrum disorder (ASD).
Using a technique called long insert whole genome sequencing liWGS (105X physical coverage; 3.5kb mean fragment size), researchers observed a total of 436,741 SVs, or a mean of 637 large SVs per genome. The observed variants represent a catalog of >11,000 distinct SVs, more than one third of which (38.1%) had not been reported previously. The authors defined 7 major SV classes (Fig. 1c in the paper), including one class termed non-canonical, complex SV (cxSV), or rearrangements that have multiple, compounded mutations and do not fit in a single SV category. Almost all of the cxSVs detected (93.8%) were novel to this study.
In the case of 3 cxSVs that could not be fully resolved by liWGS, the authors used Linked-Read WGS (lrWGS)*. The Linked-Read data enabled researchers to fully resolve all breakpoints of each predicted large cxSV, including one particularly tricky de novo complex translocation. For this cxSV, 550 kb of inverted sequence and 3 breakpoints were predicted by liWGS, but 2 of the 3 breakpoints could not be validated by PCR or Sanger sequencing due to low sequence uniqueness flanking the junctions. Using Linked-Read data, the researchers were able to span the low-complexity repetitive sequence and fully resolve the breakpoints.
The data presented in this paper demonstrate that large-scale and complex SVs are more prevalent and diverse than previously recognized. Linked-Reads and other technologies that provide long-range genomic linkage information will be invaluable to enabling a more complete appreciation of SVs in the human genome and their phenotypic consequences.
*Learn more about Linked-Reads and how they enable SV detection in our Everything you wanted to know about Linked-Reads blog post and Chromium™ Structural Variant Analysis with Linked-Reads application note.
Note: The image used in this post is from the publication "Defining the diverse spectrum of inversions, complex structural variation and chromothripsis in the morbid human genome", Ryan Collins et al; Genome Biology (2017) PMCID:PMC5338099 DOI:10.1186/s13059-017-1158-6