| Literature DB >> 25671094 |
Hongzhi Cao1, Alex R Hastie2, Dandan Cao3, Ernest T Lam2, Yuhui Sun4, Haodong Huang4, Xiao Liu5, Liya Lin4, Warren Andrews2, Saki Chan2, Shujia Huang4, Xin Tong6, Michael Requa2, Thomas Anantharaman2, Anders Krogh7, Huanming Yang3, Han Cao2, Xun Xu3.
Abstract
BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion.Entities:
Keywords: Epstein-Barr virus (EBV) integration; Genome mapping; Repeat units; Structural variation
Year: 2014 PMID: 25671094 PMCID: PMC4322599 DOI: 10.1186/2047-217X-3-34
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Molecule collection statistics under different length thresholds
| Length cutoff (kb) | No. Molecules | Total length (Gb) | Estimated depth (X)* |
|---|---|---|---|
| 100 | 1,568,969 | 303 | 95 |
| 120 | 1,313,832 | 275 | 86 |
| 150 | 932,855 | 223 | 70 |
| 180 | 659,149 | 179 | 56 |
| 200 | 523,146 | 153 | 48 |
| 300 | 170,432 | 68 | 21 |
| 500 | 22,944 | 14 | 4 |
*Estimated depth based on 3.2 Gb genome size.
Figure 1Flowchart of consensus genome map assembly and structural variant discovery using genome mapping data.
Figure 2Size distribution of total detected large insertions (green) and deletions (purple) using genome mapping. The comparative histogram bars in red and blue respectively represent deletions and insertions supported by NGS. NGS: next-generation sequencing.
Figure 3A plot of repeat units in two human genomes as seen in single molecules. A repeat unit is defined as five or more equidistant labels. Total units in bins are normalized to the average coverage depth in the genome.
Figure 4Circos plot of distribution of integration events throughout YH genome. The genome was divided into non-overlapping windows of 200 kb. The number of molecules with evidence of integration in each window is plotted with each concentric grey circle representing a two-fold increment in virus detection.