| Literature DB >> 26719794 |
Audrey S M Teo1, Davide Verzotto2, Fei Yao1, Niranjan Nagarajan2, Axel M Hillmer1.
Abstract
BACKGROUND: Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116.Entities:
Keywords: Cancer genome; Genome structure; Genomic mapping; Optical mapping; Single-molecule restriction mapping
Mesh:
Substances:
Year: 2015 PMID: 26719794 PMCID: PMC4696294 DOI: 10.1186/s13742-015-0106-1
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
In silico analysis of restriction enzyme cutting statistics for the human reference genome (hg19)
| Enzyme | Usable DNA fragments (%) | Average fragment size (kb) | Maximum fragment size (kb) | #Fragments >100 kb | ||
|---|---|---|---|---|---|---|
| 5–20 kb | 6–15 kb | 6–12 kb | ||||
| 13.3 | 5.48 | 5.43 | 4.47 | 143.96 | 4 | |
| 99.22 | 92.95 | 92.9 | 7.92 | 153.92 | 21 | |
|
|
|
|
|
|
|
|
| 0.08 | 0.03 | 0.03 | 3.81 | 164.18 | 2 | |
| 99.86 | 98.97 | 90.75 | 10.23 | 204.75 | 88 | |
| 99.28 | 96.71 | 94.55 | 7.27 | 311.48 | 101 | |
| 2.33 | 0.81 | 0.8 | 3.71 | 109.69 | 1 | |
| 2.21 | 0.79 | 0.79 | 3.67 | 86.14 | 0 | |
| 0.34 | 0.01 | 0.01 | 135.32 | 2276.59 | 8295 | |
| NdeI | 5.9 | 1.78 | 1.78 | 3.19 | 105.86 | 1 |
| 0.03 | 0.02 | 0.02 | 2.66 | 173.76 | 6 | |
| 2.75 | 1.15 | 1.15 | 3.58 | 146.27 | 2 | |
| 17.02 | 6.37 | 2.21 | 23.78 | 430.88 | 3269 | |
To select the restriction enzyme that cuts the human genome to maximize the fraction of fragments resulting in informative maps, the human genome was cut in silico with 13 commonly used restriction enzymes based on their canonical cutting sites. Usable restriction fragment sizes were defined as 5–20 kb, 6–15 kb, and 6–12 kb, since smaller DNA fragments do not allow accurate size estimates, and longer fragments can result in maps with too few fragments. KpnI was selected based on its high fraction of usable DNA fragments (highlighted in bold)
Fig. 1Representative optical map of GM12878. DNA molecules were stretched and immobilized onto a glass MapCard surface with the aid of a channel-forming device, cut by KpnI, stained, and visualized by fluorescence imaging. Interrupted linear stretches indicate DNA digested by KpnI. Whirly, non-linear, short, and disjointed DNA molecules are filtered out by the image processing software
Summary of MapCard statistics of GM12878
| MapCard ID | Fa | Input mapsb (theoretical genome coverage) | Average Argus quality score | Average DNA molecule size (kb) | Average # of fragments | Average fragment size (kb) | OPTIMA alignment rate | Yield (genome coverage)c | Average digestion ratec | Average false/ extra cut ratec | Ratio small missing fragments (≤2 kb)c |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 21157LB | (r) | 73365 (7.2×) | 0.50 | 295 | 18 | 16.5 | 0.253 | 2.0× | 0.659 | 0.736 | 0.139 |
| (s) | 38483 (4.7×) | 0.53 | 368 | 22 | 17.0 | 0.357 | 1.7× | 0.650 | 0.733 | 0.133 | |
| 21159LB | (r) | 75761 (7.6×) | 0.47 | 300 | 17 | 17.4 | 0.190 | 1.6× | 0.628 | 0.723 | 0.129 |
| (s) | 41236 (5.1×) | 0.50 | 370 | 21 | 17.8 | 0.268 | 1.3× | 0.618 | 0.718 | 0.124 | |
| 21431LB | (r) | 93896 (8.6×) | 0.52 | 274 | 17 | 15.8 | 0.200 | 1.9× | 0.676 | 0.773 | 0.187 |
| (s) | 43667 (5.1×) | 0.54 | 348 | 21 | 16.3 | 0.303 | 1.5× | 0.665 | 0.768 | 0.184 | |
| 21443LB | (r) | 66857 (6×) | 0.51 | 271 | 17 | 15.8 | 0.192 | 1.3× | 0.674 | 0.771 | 0.175 |
| (s) | 29991 (3.5×) | 0.53 | 346 | 21 | 16.3 | 0.292 | 1.0× | 0.661 | 0.772 | 0.168 | |
| Total | (r) | 309879 (29.4×) | 0.50 | 285 | 17 | 16.4 | 0.209 | 6.8× | 0.660 | 0.751 | 0.158 |
| (s) | 153377 (18.3×) | 0.52 | 359 | 21 | 16.9 | 0.310 | 5.5× | 0.649 | 0.747 | 0.152 |
ar: inclusion of DNA molecules with ≥10 fragments and ≥150 kb in length; s: inclusion of DNA molecules with ≥12 fragments and ≥250 kb in length
bfragmented DNA molecules
cof OPTIMA aligned data
Summary of MapCard statistics of HCT116
| MapCard ID | Fa | Input mapsb (theoretical genome coverage) | Average Argus quality score | Average DNA molecule size (kb) | Average # of fragments | Average fragment size (kb) | OPTIMA alignment rate | Yield (genome coverage)c | Average digestion ratec | Average false/ extra cut ratec | Ratio small missing fragments (≤2 kb)c |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 17182LA | (r) | 10911 (0.9×) | 0.33 | 257 | 16 | 15.7 | 0.040 | 0.04× | 0.661 | 1.288 | 0.170 |
| (s) | 3744 (0.4×) | 0.33 | 351 | 20 | 17.7 | 0.040 | 0.02× | 0.628 | 1.226 | 0.190 | |
| 17184LA-2 | (r) | 55719 (5.7×) | 0.43 | 305 | 19 | 16.3 | 0.180 | 1.1× | 0.678 | 0.760 | 0.197 |
| (s) | 28658 (3.7×) | 0.45 | 390 | 23 | 17.2 | 0.250 | 0.9× | 0.669 | 0.737 | 0.199 | |
| 17185LA | (r) | 56879 (5.4×) | 0.55 | 285 | 19 | 14.7 | 0.240 | 1.5× | 0.705 | 0.756 | 0.219 |
| (s) | 28003 (3.4×) | 0.59 | 365 | 24 | 15.1 | 0.352 | 1.2× | 0.696 | 0.739 | 0.217 | |
| 17186LA-3 | (r) | 52984 (5.8×) | 0.54 | 328 | 20 | 16.0 | 0.327 | 2.0× | 0.696 | 0.677 | 0.167 |
| (s) | 31588 (4.3×) | 0.56 | 404 | 25 | 16.4 | 0.423 | 1.7× | 0.688 | 0.671 | 0.163 | |
| 17187LA | (r) | 88730 (7.8×) | 0.45 | 264 | 18 | 14.8 | 0.115 | 1.0× | 0.692 | 0.940 | 0.195 |
| (s) | 36018 (4.2×) | 0.46 | 349 | 22 | 15.8 | 0.171 | 0.7× | 0.678 | 0.919 | 0.188 | |
| 14593LB | (r) | 30994 (2.7×) | 0.39 | 261 | 14 | 18.9 | 0.059 | 0.2× | 0.626 | 0.847 | 0.161 |
| (s) | 10944 (1.2×) | 0.39 | 337 | 17 | 20.2 | 0.086 | 0.1× | 0.597 | 0.869 | 0.151 | |
| Total | (r) | 296217 (28.3×) | 0.47 | 287 | 18 | 15.7 | 0.181 | 5.7× | 0.691 | 0.774 | 0.191 |
| (s) | 138955 (17.2×) | 0.50 | 372 | 23 | 16.5 | 0.271 | 4.6× | 0.682 | 0.749 | 0.188 |
ar: inclusion of DNA molecules with ≥10 fragments and ≥150 kb in length; s: inclusion of DNA molecules with ≥12 fragments and ≥250 kb in length
bfragmented DNA molecules
cof OPTIMA aligned data