All datasets

NA12878 Germline

De Novo Assembly dataset analyzed using Supernova 1.0.0

Assess data quality
View summary metrics to assess data quality and more.
View summary
Visualize and explore data
Discover differentially expressed genes, visualize your favorite genes, and explore your data with our visualization software.

Learn about Chromium analysis

Overview

Supernova produces phased assemblies, which means that in regions with sufficient information and diversity, two sequences will be produced — one for each haplotype.

It is important to note that input molecule length impacts scaffold length. The longer the input molecules, the longer the scaffolds. Phaseblocks are regions where there is sufficient information to separate the sequences of the two parental haplotypes. Phaseblock length is also affected by input molecule length but as well by diversity within the sample. For example, samples from individuals with African ancestry have longer phaseblocks than samples from individuals of European ancestry.

The native output of Supernova is a graph. We also provide four different translations of this graph into FASTA format.

Sequencing

  • Input DNA: 1.25ng
  • Sequencer: Illumina HiSeq X Ten

Definitions

  • N50: The length at which 50% of the sequences in the assembly reside. For example, for a contig N50 of 100 Kb, that means 50% of the contigs in the assembly are 100 Kb or greater.
  • Contig: an ungapped sequence.
  • Scaffold: a gapped sequence containing multiple contigs for which the order and orientation is asserted.
  • Edge: An edge is a completely unambiguous sequence in the assembly. Ambiguities can occur within contigs, typically due to homopolymers. These ambiguities create bubbles in the assembly graph, but do not break contigs.

This dataset is licensed under the Creative Commons Attribution license.