| Literature DB >> 24731174 |
Heng Wang1, Dan Nettleton, Kai Ying.
Abstract
BACKGROUND: A copy number variation (CNV) is a difference between genotypes in the number of copies of a genomic region. Next generation sequencing (NGS) technologies provide sensitive and accurate tools for detecting genomic variations that include CNVs. However, statistical approaches for CNV identification using NGS are limited. We propose a new methodology for detecting CNVs using NGS data. This method (henceforth denoted by m-HMM) is based on a hidden Markov model with emission probabilities that are governed by mixture distributions. We use the Expectation-Maximization (EM) algorithm to estimate the parameters in the model.Entities:
Mesh:
Year: 2014 PMID: 24731174 PMCID: PMC4021345 DOI: 10.1186/1471-2105-15-109
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1True change point and the estimate of the window based change point.
Comparison among m-HMM with and without change point adjustment, and original HMM in simulation study 1
| Gain | Sensitivity | 0.917 | 0.915 | 0.863 |
| EFNR | 0.097 | 0.174 | 0.751 | |
| Normal | Specificity | 0.996 | 0.994 | 0.945 |
| EFPR | 0.002 | 0.002 | 0.003 | |
| Loss | Sensitivity | 0.858 | 0.831 | 0.826 |
| EFNR | 0.309 | 0.434 | 0.846 | |
| Absent | Sensitivity | 0.960 | 0.857 | 0.810 |
| EFNR | 0.011 | 0.005 | 0.000 |
Sensitivities and empirical false negative rates for copy number gain, loss and absent states, as well as specificities and empirical false positive rates for the normal state in the first simulation study.
Comparison between m-HMM and SegSeq in simulation study 1
| Gain | Sensitivity | 0.917 | 0.000 |
| EFNR | 0.097 | 0.000 | |
| Normal | Specificity | 0.996 | 0.998 |
| EFPR | 0.002 | 0.019 | |
| Loss or absent | Sensitivity | 0.898 | 0.076 |
| EFNR | 0.223 | 0.678 |
Sensitivities and empirical false negatives rates for copy number gain, loss and absent states, as well as d positive rates for the normal state in the first simulation study.
m-HMM segmentation with different CNV lengths in simulation study 2
| Loss | 0 | 4 | 9 | 10 | 10 |
| Gain | 3 | 5 | 10 | 10 | 10 |
| Absent | 2 | 9 | 10 | 10 | 10 |
Counts of correctly detected CNV segments and false positively detected normal segments, comparing between m-HMM, SegSeq and CNVnator in simulation study 3
| | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gain | Correct detection | 10 | 10 | 10 | 7 | 10 | 10 | 6 | 9 | 10 |
| Loss or absent | Correct detection | 9 | 10 | 10 | 7 | 4 | 6 | 8 | 10 | 10 |
| Normal | False positives | 0 | 0 | 3 | 1 | 7 | 6 | 436 | 466 | 399 |
Overlap rate of true CNV segments and detected CNV segments in simulation study 3, comparing between m-HMM with and without change point adjustment
| m-HMM without adj. | 0.724 | 0.912 | 0.902 |
| m-HMM | 0.859 | 0.971 | 0.927 |
The overlap rate is evaluated using the overlap length divided by the true CNV segment length.
Figure 2Log ratio between the counts of B73 and Mo17 on chromosome 1. Each point represents a window.
Figure 3Log ratio between the counts of B73 and Mo17 on chromosome 3. Each point represents a window.
Figure 4Log ratio between the counts of B73 and Mo17 on chromosome 6. Each point represents a window.
Figure 5Log ratio between the counts of B73 and Mo17 on chromosome 10. Each point represents a window.
The m-HMM segmentation result for chromosome 6 from 35.3 Mb to 57.0 Mb
| 35,274,511 | 36,847,009 | 1,511 | 2291 | 1.33 |
| 36,847,028 | 36,986,070 | 81 | 242 | 0.67 |
| 36,988,155 | 39,227,681 | 2,375 | 3850 | 1.21 |
| 39,227,711 | 39,392,612 | 53 | 294 | 0.36 |
| 39,392,816 | 39,998,180 | 605 | 810 | 1.51 |
| 39,998,210 | 40,070,876 | 114 | 95 | 2.43 |
| 40,071,001 | 42,219,831 | 1,978 | 2751 | 1.45 |
| 42,221,884 | 42,272,394 | 3 | 305 | 0.01 |
| 42,273,061 | 42,279,935 | 0 | 37 | 0.00 |
| 42,281,369 | 42,467,432 | 6 | 484 | 0.02 |
| 42,467,722 | 42,510,527 | 0 | 199 | 0.00 |
| 42,520,362 | 42,588,228 | 10 | 134 | 0.15 |
| 42,589,445 | 42,592,668 | 0 | 16 | 0.00 |
| 42,600,850 | 42,949,605 | 41 | 734 | 0.11 |
| 42,957,788 | 43,020,977 | 0 | 213 | 0.00 |
| 43,021,808 | 43,021,808 | 1 | 0 | 50.00 |
| 43,021,875 | 43,034,674 | 0 | 20 | 0.00 |
| 43,050,012 | 43,156,464 | 7 | 214 | 0.06 |
| 43,159,602 | 43,239,996 | 0 | 110 | 0.00 |
| 43,240,069 | 43,240,069 | 4 | 1 | 8.11 |
| 43,240,496 | 43,253,531 | 0 | 25 | 0.00 |
| 43,270,468 | 43,367,135 | 27 | 396 | 0.13 |
| 43,382,243 | 43,436,720 | 0 | 211 | 0.00 |
| 43,437,634 | 43,437,634 | 1 | 0 | 50.00 |
| 43,437,664 | 43,438,342 | 0 | 19 | 0.00 |
| 43,440,089 | 43,466,314 | 8 | 146 | 0.11 |
| 43,466,499 | 43,502,281 | 0 | 183 | 0.00 |
| 43,502,441 | 43,502,441 | 1 | 0 | 50.00 |
| 43,504,419 | 43,516,326 | 0 | 3 | 0.00 |
| 43,532,757 | 43,802,344 | 29 | 815 | 0.077 |
| 43,802,387 | 43,809,486 | 0 | 214 | 0.00 |
| 43,809,952 | 43,809,952 | 4 | 0 | 50.00 |
| 43,809,982 | 43,811,684 | 0 | 47 | 0.00 |
| 43,812,791 | 43,828,685 | 1 | 128 | 0.01 |
| 43,833,620 | 43,862,804 | 0 | 121 | 0.00 |
| 43,863,025 | 43,863,025 | 1 | 0 | 50.00 |
| 43,866,525 | 43,959,900 | 0 | 164 | 0.00 |
| 43,959,976 | 43,960,274 | 2 | 10 | 0.40 |
| 43,960,449 | 43,984,663 | 0 | 95 | 0.00 |
| 43,991,670 | 44,242,537 | 24 | 498 | 0.09 |
| 44,247,904 | 44,297,744 | 0 | 50 | 0.00 |
| 44,297,961 | 44,297,961 | 1 | 0 | 50.00 |
| 44,299,769 | 44,341,066 | 0 | 252 | 0.00 |
| 44,341,857 | 46,215,928 | 371 | 2308 | 0.32 |
| 46,215,958 | 46,834,832 | 572 | 895 | 1.29 |
| 46,835,472 | 47,130,192 | 74 | 545 | 0.27 |
| 47,130,202 | 47,668,306 | 642 | 833 | 1.56 |
| 47,681,491 | 47,739,876 | 18 | 72 | 0.50 |
| 47,740,273 | 47,740,273 | 0 | 2 | 0.00 |
| 47,746,723 | 48,880,371 | 283 | 1356 | 0.42 |
| 48,880,568 | 48,923,056 | 0 | 51 | 0.00 |
| 48,923,160 | 49,722,664 | 213 | 1128 | 0.38 |
| 49,722,844 | 50,132,724 | 231 | 298 | 1.57 |
| 50,132,929 | 50,239,093 | 28 | 119 | 0.47 |
| 50,260,756 | 52,852,085 | 2,487 | 3567 | 1.41 |
| 52,852,089 | 52,958,397 | 155 | 130 | 2.41 |
| 52,958,403 | 53,950,690 | 918 | 1453 | 1.28 |
| 53,951,899 | 56,137,418 | 895 | 3196 | 0.56 |
| 56,142,451 | 56,472,218 | 296 | 570 | 1.05 |
| 56,472,614 | 56,863,324 | 102 | 570 | 0.36 |
| 56,866,117 | 57,022,366 | 123 | 212 | 1.17 |
The first two columns are the start position and the end position of each of the CNV segments. Column 3 and column 4 are the read counts for Mo17 and B73 between the start position and the end position, respectively. The last column is the normalized read count ratio between Mo17 and B73, i.e., for that CNV segment.
*The last column Mo17/B73 represents the normalized ratio of read counts between Mo17 and B73 for the genomic region between the start position and the end position. The normalization factor c0 = 0.493.
This table lists the detected CNV segments that are longer than 2 Mb and presents greater than 2-fold normalized read count differences between B73 and Mo17
| chr 1 | 7,674,194 | 10,130,535 | 0.47 |
| | 73,051,891 | 76,150,311 | 0.49 |
| | 78,765,566 | 80,973,937 | 0.40 |
| | 106,020,503 | 108,085,310 | 0.46 |
| | 183,346,370 | 185,514,363 | 0.34 |
| | 185,561,332 | 187,744,738 | 0.34 |
| | 206,016,332 | 208,795,988 | 0.49 |
| chr 2 | 14,327,370 | 17,154,616 | 0.49 |
| | 43,421,667 | 45,665,032 | 0.47 |
| | 53,360,846 | 57,062,635 | 0.48 |
| | 101,464,135 | 103,592,856 | 0.37 |
| | 173,361,552 | 175,485,424 | 0.46 |
| | 183,250,632 | 186,406,359 | 0.46 |
| | 209,163,585 | 211,227,635 | 0.48 |
| | 211,597,012 | 214,704,617 | 0.48 |
| chr 3 | 3,653,572 | 5,790,383 | 0.44 |
| | 45,465,224 | 48,749,064 | 0.40 |
| | 82,020,005 | 84,053,318 | 0.48 |
| | 176,248,175 | 179,361,747 | 0.48 |
| chr 4 | 645 | 2,454,064 | 0.45 |
| | 23,025,833 | 25,854,581 | 0.48 |
| | 136,792,400 | 140,387,061 | 0.36 |
| | 140,502,832 | 143,245,529 | 0.49 |
| | 146,672,844 | 148,733,733 | 0.44 |
| | 239,597,061 | 242,255,833 | 0.48 |
| | 243,186,165 | 245,433,834 | 0.49 |
| chr 5 | 753,941 | 4,311,600 | 0.48 |
| | 5,201,605 | 7,723,585 | 0.48 |
| | 11,328,383 | 14,574,771 | 0.49 |
| | 16,611,265 | 19,032,650 | 0.44 |
| | 68,583,278 | 71,949,086 | 0.40 |
| | 156,325,001 | 158,697,338 | 0.47 |
| | 168,069,569 | 170,816,075 | 0.39 |
| | 176,121,726 | 179,598,151 | 0.43 |
| | 191,459,521 | 195,013,031 | 0.48 |
| chr 6 | 25,898,308 | 28,476,332 | 0.33 |
| | 70,440,134 | 72,858,938 | 0.46 |
| | 76,981,074 | 80,570,101 | 0.46 |
| | 85,620,416 | 87,659,132 | 0.48 |
| | 102,350,617 | 104,422,628 | 0.49 |
| | 154,917,250 | 159,407,299 | 0.47 |
| | 162,488,577 | 165,722,159 | 0.48 |
| chr 7 | 18,304,500 | 22,673,932 | 0.49 |
| | 25,498,467 | 30,547,169 | 0.47 |
| | 78,344,496 | 81,198,663 | 0.40 |
| | 110,502,115 | 113,262,773 | 0.49 |
| | 117,913,056 | 119,946,052 | 0.49 |
| | 120,044,241 | 122,233,553 | 0.49 |
| chr 8 | 20,108,771 | 23,447,168 | 0.43 |
| | 160,011,805 | 166,514,647 | 0.45 |
| | 167,430,533 | 169,880,960 | 0.45 |
| chr 9 | 38,523,842 | 40,732,642 | 0.41 |
| | 46,734,889 | 49,382,812 | 0.38 |
| | 61,499,201 | 64,577,206 | 0.39 |
| | 83,491,462 | 88,345,837 | 0.46 |
| | 118,555,633 | 121,557,270 | 0.49 |
| | 143,196,239 | 145,630,853 | 0.45 |
| | 145,813,631 | 150,079,829 | 0.46 |
| chr 10 | 20,262,176 | 24,734,243 | 0.48 |
| | 39,046,734 | 41,852,436 | 0.46 |
| | 72,370,857 | 74,553,412 | 0.42 |
| | 111,247,775 | 116,238,001 | 0.46 |
| | 117,013,634 | 120,679,932 | 0.49 |
| | 121,011,209 | 124,203,304 | 0.49 |
| 130,590,451 | 139,841,271 | 0.46 |
The last column Mo17/B73 represents the normalized ratio of read counts between Mo17 and B73 for the genomic region between the start position and the end position. The normalization factor c0 = 0.493.