| Literature DB >> 26510793 |
Angel C Y Mak1, Yvonne Y Y Lai1, Ernest T Lam2, Tsz-Piu Kwok3, Alden K Y Leung4, Annie Poon1, Yulia Mostovoy1, Alex R Hastie2, William Stedman2, Thomas Anantharaman2, Warren Andrews2, Xiang Zhou2, Andy W C Pang2, Heng Dai2, Catherine Chu1, Chin Lin1, Jacob J K Wu5, Catherine M L Li5, Jing-Woei Li6, Aldrin K Y Yim6, Saki Chan2, Justin Sibert7, Željko Džakula2, Han Cao2, Siu-Ming Yiu5, Ting-Fung Chan6, Kevin Y Yip8, Ming Xiao7, Pui-Yan Kwok9.
Abstract
Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.Entities:
Keywords: biotechnology; genome mapping; structural variation detection
Mesh:
Year: 2015 PMID: 26510793 PMCID: PMC4701098 DOI: 10.1534/genetics.115.183483
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1Overview of genome mapping strategy. (A) High-molecular-weight DNA was extracted from cell culture of the trio cell lines. (B) Nt.BspQI nicking endonuclease was used to nick the top strand of the DNA. The top strand was then displaced with fluorescently labeled thymine using Taq DNA polymerase. The displaced strand was simultaneously removed by the 5′ flap endonuclease activity of Taq DNA polymerase. The nicked DNA was then repaired by Taq ligase. The DNA backbone was fluorescently stained by YOYO‐1. (C) Labeled DNA molecules were loaded onto flowcells where they uncoiled in the gradient pillar region before they entered the nanochannels where they were imaged. Molecule size and BspQI label locations were determined to generate single‐molecule maps. (D) Single‐molecule maps were assembled de novo into genome maps. (E) Genome maps were compared with hg38 in silico maps to detect structural variants and identify heterozygous regions.
Statistics of single-molecule maps and de novo consensus maps
| NA12878 | NA12891 | NA12892 | |
|---|---|---|---|
| Single-molecule maps | |||
| No. of DNA molecules (k) | 994 | 720 | 650 |
| Average size (kb) | 278 | 326 | 328 |
| Maximum size (kb) | 2258 | 2,912 | 3,255 |
| Total molecule length (Gb) | 276 | 235 | 213 |
| Estimated average depth of coverage (×) | 92× | 78× | 71× |
| Consensus maps | |||
| Total consensus map size (Gb) | 2.9 | 3.0 | 3.0 |
| No. consensus maps | 1049 | 990 | 995 |
| N50 (Mb) | 4.59 | 4.87 | 5.00 |
| Longest consensus map size | 26.4 | 25.4 | 29.0 |
| % aligned to hg38 | 99% | 99% | 98% |
| hg38 genome coverage | 96% | 96% | 96% |
Statistics were based on DNA molecules that are >180 kb.
Statistics were based on DNA molecules that are >150 kb.
Figure 2Genome coverage of de novo assembled genome maps on hg38. The genome maps of the NA12878 (orange), NA12891 (blue), and NA12892 (red) were aligned to hg38 in silico BspQI maps (gray line below NA12878 genome maps). Telomere and centromere locations (green) were based on annotations from the UCSC Table Browser, and N‐base gap regions are gray.
Figure 3EBV integration sites. Using single‐molecule maps that aligned to the EBV genome, we found the potential EBV integration sites by masking the EBV‐matching part of the molecules and mapping the rest of the molecules to hg38. Each predicted site was supported by at least 20 molecules. The very different patterns of EBV integration in the trio are most certainly the consequence of the passage of immortalized cell lines in culture and not due to true infection of inheritance from parent to child.
Figure 4Use of genome maps to size N‐base gaps and resolve regions with 6-kb tandem repeats. (A) A 50-kb N‐base gap at the chr18:47M region was sized in the trio. NA12878 is heterozygous with 7 and 9 tandem repeats, giving N‐base gap sizes of 24 kb (red arrows, inherited from mother, NA12892) and 36 kb (blue arrows, inherited from father, NA12891). (B) NA12891 has 9 and 10 tandem repeats, giving N‐base gap sizes of 36 kb (blue arrows) and 42 kb (blue dotted arrows). (C) NA12892 has 7 and 14 tandem repeats, giving N‐base gap sizes of 24 kb (red arrows) and 66 kb (red dotted arrows). (D) UCSC genome browser view of the region marked by black box in A. In silico BspQI map shown by the BspQI track overlapped with TCEB genes. The variable number of repeats in this region may thus reflect a copy number difference of the TCEB3 family of genes.
Validated insertions and deletions (>5 kb) detected by single-molecule maps and genome maps
| Sample/SV Category | Insertion | Deletion |
|---|---|---|
| By samples | ||
| NA12878 | 769 | 522 |
| NA12891 | 743 | 496 |
| NA12892 | 748 | 456 |
| By novelty based on 1000 Genomes Project pilot and phase 1 insertions and deletions >5 kb | ||
| Known | 39 | 125 |
| Novel | 870 | 536 |
| By Mendelian inheritance | ||
| Mendelian | 879 | 631 |
| Non-Mendelian | 4 | 4 |
| No call | 26 | 26 |
| Total | 909 | 661 |
Unable to generate a call for Mendelian inheritance due to insufficient data to determine an insertion/deletion call to any of the samples.
Deletions in the trio samples that are associated with disease susceptibility and drug response
| Chromosome | Start | Stop | NA12878 | NA12891 | NA12892 | Gene(s) | Gene description | Disease susceptibility |
|---|---|---|---|---|---|---|---|---|
| 1 | 109687501 | 109714725 | del/+ | del/del | del/+ | Glutathione S-transferase Mu‐1 | Reduced ability to metabolize certain chemical carcinogens and toxins, increasing susceptibility to various cancers and to aplastic anemia | |
| 1 | 152570855 | 152621659 | del/+ | del/del | +/+ | Late cornified envelope 3B, C | Increased susceptibility to psoriasis | |
| 1 | 207535283 | 207565105 | del/+ | del/del | del/+ | Complement component receptor 1 | Malarial resistance; determinant of Knops system blood group | |
| 19 | 51628937 | 51649902 | del/+ | del/del | +/+ | Sialic acid binding Ig-like lectin 14 | Increased susceptibility to group B |
Figure 5Detection of inversions and complex structural variants. (A) A 50-kb inversion previously reported in NA12878 at the chromosome 23 (104 M) region. A 10-kb deletion (from 150 kb in hg38 to 140 kb in trio) was also detected over the inversion region (blue). This inversion was homozygous in all members in the trio. (B) Detection of complex structural variants. A complex structural variation was detected at the chromosome 7 (144.25 M) region where an inversion was previously reported (Kidd ). Our genome maps show that this is a structurally complex region: both duplication and inversion events were observed at this locus.