| Literature DB >> 17517140 |
David S H Chew1, Ming-Ying Leung, Kwok Pui Choi.
Abstract
BACKGROUND: Replication origins are considered important sites for understanding the molecular mechanisms involved in DNA replication. Many computational methods have been developed for predicting their locations in archaeal, bacterial and eukaryotic genomes. However, a prediction method designed for a particular kind of genomes might not work well for another. In this paper, we propose the AT excursion method, which is a score-based approach, to quantify local AT abundance in genomic sequences and use the identified high scoring segments for predicting replication origins. This method has the advantages of requiring no preset window size and having rigorous criteria to evaluate statistical significance of high scoring segments.Entities:
Mesh:
Year: 2007 PMID: 17517140 PMCID: PMC1904460 DOI: 10.1186/1471-2105-8-163
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The list of herpesviruses to be analyzed.
| Alcelaphine herpesvirus 1 | alhv1 | 130608 | 53 | |
| Ateline herpesvirus 3 | athv3 | 108409 | 63 | |
| Bovine herpesvirus 1 | bohv1 | 135301 | 28 | |
| Bovine herpesvirus 4 | bohv4 | 108873 | 59 | |
| Bovine herpesvirus 5 | bohv5 | 138390 | 25 | |
| Callitrichine herpesvirus 3 | calhv3 | 149696 | 51 | |
| Cercopithecine herpesvirus 1 | cehv1 | 156789 | 26 | |
| Cercopithecine herpesvirus 2 | cehv2 | 150715 | 24 | |
| Cercopithecine herpesvirus 8 | cehv8 | 221454 | 51 | |
| Cercopithecine herpesvirus 9 | cehv7 | 124138 | 59 | |
| Cercopithecine herpesvirus 15 | cehv15 | 171096 | 38 | |
| Cercopithecine herpesvirus 16 | cehv16 | 156487 | 24 | |
| Cercopithecine herpesvirus 17 | mmrv | 133719 | 47 | |
| Equid herpesvirus 1 | ehv1 | 150224 | 44 | |
| Equid herpesvirus 2 | ehv2 | 184427 | 43 | |
| Equid herpesvirus 4 | ehv4 | 145597 | 50 | |
| Gallid herpesvirus 1 | gahv1 | 148687 | 52 | |
| Gallid herpesvirus 2 | gahv2 | 174077 | 56 | |
| Gallid herpesvirus 3 | gahv3 | 164270 | 46 | |
| Human herpesvirus 1 | hsv1 | 152261 | 32 | |
| Human herpesvirus 2 | hsv2 | 154746 | 30 | |
| Human herpesvirus 3 | vzv | 124884 | 54 | |
| Human herpesvirus 4 | ebv | 171823 | 41 | |
| Human herpesvirus 5 (AD169) | hcmv | 230287 | 43 | |
| Human herpesvirus 5 (Merlin) | hcmv-m | 235645 | 42 | |
| Human herpesvirus 6 | hhv6 | 159321 | 58 | |
| Human herpesvirus 6B | hhv6b | 162114 | 58 | |
| Human herpesvirus 7 | hhv7 | 153080 | 63 | |
| Human herpesvirus 8 | hhv8 | 137508 | 47 | |
| Ictalurid herpesvirus 1 | ichv1 | 134226 | 43 | |
| Meleagrid herpesvirus 1 | mehv1 | 159160 | 52 | |
| Murid herpesvirus 1 | mcmv | 230278 | 41 | |
| Murid herpesvirus 2 | rcmv | 230138 | 39 | |
| Murid herpesvirus 4 | muhv4 | 119450 | 53 | |
| Macaca fuscata rhadinovirus | mfrv | 131217 | 48 | |
| Ostreid herpesvirus 1 | oshv1 | 207439 | 61 | |
| Ovine herpesvirus 2 | ohv2 | 135135 | 47 | |
| Pongine herpesvirus 4 | ccmv | 241087 | 38 | |
| Psittacid herpesvirus 1 | pshv1 | 163025 | 39 | |
| Saimiriine herpesvirus 2 | sahv2 | 112930 | 65 | |
| Suid herpesvirus 1 | shv1 | 143461 | 26 | |
| Tupaiid herpesvirus 1 | thv | 195859 | 34 |
Herpesviruses : HSS at 5% level using the conservative bound.
| alhv1 | ebv | ||||||
| ehv1 | |||||||
| 125691 | 125726 | 31 | |||||
| athv3 | 8827 | 8892 | 40 | ||||
| bohv1 | |||||||
| 110314 | 110352 | 23 | |||||
| 29 | 45 | 16 | 128924 | 128992 | 23 | ||
| 58542 | 58569 | 15 | ehv2 | ||||
| bohv4 | 60687 | 60826 | 35 | ||||
| bohv5 | |||||||
| 59921 | 59938 | 14 | |||||
| 17408 | 17433 | 13 | |||||
| 41883 | 41899 | 13 | |||||
| calhv3 | |||||||
| ccmv | |||||||
| 147310 | 147384 | 20 | |||||
| cehv1 | 786 | 831 | 24 | ||||
| ehv4 | |||||||
| 24415 | 24441 | 14 | |||||
| cehv15 | |||||||
| 114927 | 114988 | 19 | 10630 | 10697 | 31 | ||
| cehv16 | 58833 | 58906 | 31 | ||||
| 82616 | 82701 | 31 | |||||
| 127230 | 127351 | 31 | |||||
| 112929 | 112967 | 29 | |||||
| 145082 | 145120 | 29 | |||||
| gahv1 | |||||||
| gahv2 | 106724 | 106811 | 35 | ||||
| gahv3 | 11168 | 11198 | 27 | ||||
| 122384 | 122414 | 27 | |||||
| 134414 | 134461 | 26 | |||||
| 162999 | 163046 | 26 | |||||
| 30975 | 30991 | 14 | 58953 | 58999 | 25 | ||
| cehv2 | hcmv | ||||||
| 51884 | 51910 | 14 | |||||
| 93873 | 93887 | 14 | |||||
| 112292 | 112320 | 14 | |||||
| cehv7 | 86167 | 86296 | 37 | 108222 | 108303 | 24 | |
| cehv8 | 159296 | 159380 | 24 | ||||
| 71011 | 71055 | 23 | |||||
| 29233 | 29278 | 29 | 226192 | 226230 | 23 | ||
| 163766 | 163806 | 28 | |||||
| 177904 | 178092 | 28 | |||||
| 89538 | 89589 | 27 | |||||
| hcmv-m | mfrv | ||||||
| hhv6 | |||||||
| mmrv | |||||||
| hhv6b | |||||||
| 3911 | 3988 | 37 | |||||
| 157232 | 157309 | 37 | |||||
| hhv7 | |||||||
| 112930 | 113033 | 28 | |||||
| hhv8 | muhv4 | 6000 | 6037 | 29 | |||
| ohv2 | |||||||
| 23547 | 23598 | 27 | |||||
| 30712 | 30775 | 27 | |||||
| 119416 | 119467 | 27 | |||||
| 106412 | 106452 | 25 | |||||
| hsv1 | 76335 | 76370 | 26 | ||||
| 79158 | 79265 | 26 | |||||
| oshv1 | |||||||
| 96047 | 96069 | 16 | |||||
| 136146 | 136162 | 16 | |||||
| hsv2 | |||||||
| 108068 | 108173 | 45 | |||||
| 171433 | 171549 | 44 | |||||
| 67872 | 67975 | 43 | |||||
| 114689 | 114763 | 42 | |||||
| pshv1 | |||||||
| 99337 | 99370 | 15 | |||||
| ichv1 | |||||||
| 134013 | 134049 | 21 | |||||
| 78233 | 78256 | 20 | |||||
| rcmv | |||||||
| 20109 | 20187 | 24 | |||||
| 10016 | 10063 | 23 | |||||
| 125686 | 125733 | 23 | |||||
| mcmv | |||||||
| 24072 | 24108 | 21 | |||||
| sahv2 | 28533 | 28613 | 45 | ||||
| shv1 | |||||||
| 219239 | 219282 | 22 | |||||
| mehv1 | NIL | ||||||
| 8432 | 8455 | 14 | |||||
| thv | |||||||
| 28257 | 28286 | 17 | |||||
| vzv | |||||||
| 110195 | 110227 | 32 | |||||
| 119669 | 119701 | 32 | |||||
Entries in italics are significant at 1% too.
Figure 1The Excursion Plot of the vzv virus. The horizontal line corresponds to the 5% significant level. The two triangles denote the locations of known replication origins of the vzv.
Prediction results at 5% level using the conservative bound.
| bohv1 | 111190 | 109702 | 109730 | 25 | Yes |
| bohv1 | 127028 | 128487 | 128515 | 25 | Yes |
| bohv4 | 97996.5 | 60687 | 60826 | 35 | No |
| bohv5 | 113312 | 113549 | 113583 | 28 | Yes |
| bohv5 | 129701 | 129429 | 129463 | 28 | Yes |
| cehv1 | 61690.5 | 61680 | 61700 | 20 | Yes |
| cehv1 | 61893.5 | 61680 | 61700 | 20 | Yes |
| cehv1 | 132795.5 | 132785 | 132805 | 20 | Yes |
| cehv1 | 132998.5 | 132785 | 132805 | 20 | Yes |
| cehv1 | 149425.5 | 149415 | 149435 | 20 | Yes |
| cehv1 | 149628.5 | 149415 | 149435 | 20 | Yes |
| cehv16 | 62981 | 62970 | 62991 | 21 | Yes |
| cehv16 | 133479 | 133468 | 133489 | 21 | Yes |
| cehv16 | 149824 | 149813 | 149834 | 21 | Yes |
| cehv2 | 61493.5 | 61483 | 61503 | 20 | Yes |
| cehv2 | 129537.5 | 129527 | 129547 | 20 | Yes |
| cehv2 | 144471.5 | 144461 | 144481 | 20 | Yes |
| cehv7 | 109636.5 | 86167 | 86296 | 37 | No |
| cehv7 | 118622.5 | 86167 | 86296 | 37 | No |
| ebv | 8313.5 | 11854 | 11950 | 45 | No |
| ebv | 40797 | 43158 | 43235 | 23 | Yes |
| ebv | 143825.5 | 77111 | 77150 | 24 | No |
| ehv1 | 126262.5 | 128924 | 128992 | 23 | Yes |
| ehv4 | 73909.5 | 73340 | 73509 | 37 | Yes |
| ehv4 | 119471.5 | 112929 | 112967 | 29 | No |
| ehv4 | 138577.5 | 132383 | 132462 | 49 | No |
| gahv1 | 24871.5 | 24852 | 24890 | 30 | Yes |
| hcmv | 93923.5 | 96685 | 96824 | 34 | Yes |
| hhv6 | 67805 | 130410 | 130501 | 59 | No |
| hhv6b | 69160.5 | 132997 | 133163 | 62 | No |
| hhv7 | 66991.5 | 128589 | 128984 | 70 | No |
| hsv1 | 62475 | 62465 | 62485 | 20 | Yes |
| hsv1 | 131999 | 131990 | 132008 | 18 | Yes |
| hsv1 | 146235 | 144115 | 144142 | 18 | Yes |
| hsv2 | 62930 | 62919 | 62939 | 17 | Yes |
| hsv2 | 132760 | 132691 | 132711 | 17 | Yes |
| hsv2 | 148981 | 146600 | 146631 | 19 | Yes |
| rcmv | 77318 | 24072 | 24108 | 21 | No |
| shv1 | 63878 | 63862 | 63892 | 24 | Yes |
| shv1 | 114701 | 114686 | 114715 | 20 | Yes |
| shv1 | 129901 | 129607 | 129636 | 20 | Yes |
| vzv | 110218.5 | 110195 | 110227 | 32 | Yes |
| vzv | 119678.5 | 119669 | 119701 | 32 | Yes |
For each replication origin, we list the high-scoring segment (at 5% level) closest to it. When the peak of a high-scoring segment is less than 2 map units away from the center of a replication origin, we say that our method has correctly predicted that particular replication origin.
Prediction Performance Summary.
| 74% | 22% | 0.34 ± 0.57 | 16% | 31% | 53% | |
| 86% | 17% | 0.35 ± 0.53 | 24% | 30% | 46% | |
| 67% | 25% | 0.31 ± 0.52 | 14% | 34% | 52% | |
| 74% | 18% | 0.34 ± 0.57 | 16% | 31% | 53% |
(C) indicates that the "Conservative" bound is used while (G) indicates that the "Generous" bound is used. Sensitivity refers to the percentage of replication origins predicted by our method, and PPV (positive predictive value) the proportion of HSS that correctly predict replication origins. APD (average predictive distance), given in map units (± one standard deviation), shows the average of the distances between the center of each replication origin and a HSS that predicts it in map units. %L, %R and %C count the number of times the center of replication origin falls within the left, right or center of the HSS.
Figure 2Predictions of AT excursion and BWS. In this figure, the set A consists of origin replications predicted by the AT excursion method and B consists of those predicted by the BWS1 method. A ⋂ B= {cehv71, cehv72, ehv41, hsv21, hsv22, hsv23}, A⋂ B = {cehv162, cehv163, ebv1, ebv3, hhv6, hhv6b, rcmv}, (A ⋃ B)= {bohv4, ehv42, ehv43, hhv7}. The rest of the replication origins (26 of them) are predicted by both methods. Note that for viruses with several known replication origins, such as the hsv2, which has three (see Table 3), we denote the replication origins as hsv21, hsv22, hsv23, etc.
Figure 3Predictive Distances for PLS, BWS. These boxplots show the predictive distances for PLS, BWS1, AT-swp and AT excursion.