| Literature DB >> 26727213 |
Cheng-Hong Yang1, Yu-Da Lin1, Yi-Cheng Chiang1, Li-Yeh Chuang2.
Abstract
BACKGROUND: CpG islands have been demonstrated to influence local chromatin structures and simplify the regulation of gene activity. However, the accurate and rapid determination of CpG islands for whole DNA sequences remains experimentally and computationally challenging. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2016 PMID: 26727213 PMCID: PMC4705099 DOI: 10.1371/journal.pone.0144748
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1ClusterPSO Flowchart.
Comparison of different CpG island detection methods.
| Methods | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Contig | CpGPlot | CpGcluster | CpGProD | CpGIS | PSO | PSORL | CPSO | CPSORL | ClusterPSO | |
| NT_113952.1 | SN | 56.43 | 50.46 | 58.07 | 83.98 | 69.22 | 75.58 | 77.43 | 84.88 | |
| SP | 99.95 | 99.50 | 99.05 | 99.61 | 99.02 | 99.58 | 99.05 | 99.47 | ||
| ACC | 98.09 | 97.78 | 97.69 | 98.39 | 98.28 | 97.99 | 98.61 | 98.43 | ||
| PC | 56.42 | 49.92 | 52.36 | 69.59 | 63.77 | 62.27 | 70.91 | 70.34 | ||
| CC | 74.38 | 69.41 | 68.83 | 81.25 | 77.66 | 75.71 | 82.49 | 81.8 | ||
| NT_113955.2 | SN | 47.19 | 67.15 | 68.51 | 85.12 | 54.47 | 59.63 | 77.8 | 87.38 | |
| SP | 99.72 | 99.63 | 99.30 | 99.96 | 99.88 | 99.5 | 99.61 | 99.51 | ||
| ACC | 98.08 | 98.54 | 98.50 | 98.79 | 98.31 | 98.42 | 98.71 | 99.16 | ||
| PC | 47.14 | 62.47 | 62.35 | 71.78 | 53.87 | 57.74 | 68.67 | 79.08 | ||
| CC | 67.94 | 77.03 | 76.65 | 82.96 | 72.41 | 74.51 | 80.85 | 87.89 | ||
| NT_113958.2 | SN | 51.29 | 27.16 | 46.41 | 82.13 | 79.27 | 81.65 | 81.08 | 84.11 | 88.56 |
| SP | 99.94 | 98.93 | 98.26 | 98.13 | 97.90 | 98.17 | 98.34 | 99.10 | ||
| ACC | 96.90 | 95.32 | 95.60 | 97.24 | 96.93 | 96.87 | 97.08 | 97.43 | ||
| PC | 51.24 | 26.92 | 40.10 | 65.36 | 62.10 | 62.33 | 63.8 | 67.51 | ||
| CC | 70.38 | 49.96 | 56.80 | 77.63 | 75.03 | 75.28 | 76.41 | 79.31 | ||
| NT_113953.1 | SN | 22.80 | 57.32 | 29.79 | 74.05 | 60.20 | 64.80 | 70.53 | 75.65 | |
| SP | 99.74 | 99.56 | 98.83 | 99.27 | 99.23 | 99.22 | 99.13 | 99.47 | ||
| ACC | 97.76 | 98.51 | 97.53 | 98.11 | 98.13 | 98.23 | 98.38 | 98.45 | ||
| PC | 22.80 | 52.74 | 25.96 | 53.23 | 48.39 | 51.59 | 55.91 | 58.57 | ||
| CC | 47.21 | 69.89 | 43.61 | 68.64 | 64.50 | 67.25 | 70.9 | 73.1 | ||
| NT_113954.1 | SN | 31.24 | 29.86 | 52.01 | 76.31 | 56.92 | 63.58 | 70.54 | 77.68 | |
| SP | 99.46 | 98.72 | 97.62 | 98.40 | 98.13 | 98.34 | 98.23 | 98.23 | ||
| ACC | 97.47 | 96.90 | 97.00 | 96.83 | 96.87 | 96.86 | 97.32 | |||
| PC | 31.24 | 26.19 | 38.94 | 47.05 | 40.12 | 42.74 | 49.22 | 53.15 | ||
| CC | 55.17 | 43.81 | 54.68 | 63.29 | 55.65 | 58.36 | 64.72 | 68.53 | ||
| NT_028395.3 | SN | 27.11 | 44.89 | 54.18 | 76.68 | 68.97 | 72.79 | 72.52 | 77.02 | |
| SP | 99.47 | 99.45 | 98.93 | 99.27 | 98.99 | 99.18 | 98.9 | 99.24 | ||
| ACC | 97.98 | 97.53 | 98.19 | 98.14 | 98.19 | 98.06 | 98.24 | 98.12 | ||
| PC | 27.10 | 39.26 | 45.36 | 59.36 | 57.49 | 57.17 | 59.36 | 59.25 | ||
| CC | 51.51 | 57.21 | 62.26 | 73.57 | 72.21 | 71.75 | 73.61 | 73.48 | ||
The bold type indicates the best value in all methods.
SN = Sensitivity, SP = Specificity, ACC = Accuracy, PC = Performance coefficient, CC = Correlation coefficient
Fig 2Results of the position of the true CpG island and the positions of the detected CpG islands using the PSO-based methods and ClusterPSO.
The search regions for PSO-based methods are also shown to illustrate the difficulty in finding the optimal CpG island. True positive, false positive, false negative and true negative outcomes are clearly shown for comparison between the six methods.
Comparison of the number of CpG islands identified in the human genome with different methods (NCBI.36).
| 347,334 | 639,161 | 1,072,192 | 1,280,505 | 1,440,953 | 1,564,596 | 1,527,114 | 1,607,472 | 1,728,357 | |
| 973 | 2,703 | 1,091 | 3,704 | 2,648 | 2,648 | 2,813 | 2,813 | 3,864 | |
| 0.73 | 1.36 | 2.28 | 2.73 | 3.07 | 3.3 | 3.36 | 3.4 | 3.68 | |
| | 357 | 237 | 983 | 346 | 542 | 591 | 561 | 571 | 447 |
| | 101 | 8 | 500 | 200 | 202 | 202 | 202 | 202 | 201 |
| | 3,047 | 3,028 | 6,732 | 1,948 | 4,009 | 4,020 | 4,032 | 4,035 | 5,785 |
| | 62.17±0.07 | 65.49±0.07 | 54.49±0.06 | 57.98±0.04 | 54.63±0.05 | 53.73±0.05 | 54.12±0.05 | 53.72±0.05 | 53.81±0.07 |
| | 0.84±0.1 | 0.87±0.3 | 0.63±0.1 | 0.68±0.1 | 0.71±0.14 | 0.64±0.08 | 0.68±0.11 | 0.65±0.08 | 0.68±0.05 |
| 679,803 | 522,748 | 2,067,653 | 2,842,255 | 2,772,787 | 2,802,675 | 2,873,255 | 2,907,983 | 3,090,231 | |
| 1,642 | 2,186 | 1,903 | 6,875 | 4,571 | 4,571 | 4,882 | 4,882 | 6,624 | |
| 1.36 | 1.05 | 4.16 | 5.71 | 5.34 | 5.64 | 5.60 | 5.85 | 6.22 | |
| | 414 | 239 | 1,087 | 413 | 581 | 613 | 570 | 596 | 467 |
| | 200 | 8 | 500 | 200 | 201 | 198 | 201 | 202 | 201 |
| | 7,902 | 7,774 | 8,363 | 3,339 | 4,064 | 4,076 | 4,064 | 4,076 | 5,785 |
| | 63.70±0.08 | 70.23±0.08 | 55.84±0.07 | 55.12±0.06 | 54.91±0.05 | 54.50±0.07 | 55.16±0.05 | 54.46±0.07 | 54.91±0.07 |
| | 0.84±0.1 | 0.95±0.3 | 0.62±0.1 | 0.68±0.1 | 0.66±0.08 | 0.63±0.05 | 0.66±0.10 | 0.63±0.05 | 0.66±0.05 |
*The values related to CpG island are obtained from NCBI.
Comparison of four methods for CpG island detection in the entire human genome.
| Methods | CpGcluster | CpGIS | CPSORL | ClusterPSO |
|---|---|---|---|---|
| 2.86 × 109 | ||||
| 198,702 | 37,729 | 208,536 | 254,783 | |
| 1.90 | 1.44 | 4.1 | 4.27 | |
| | 273 ± 246 | 1,090±717 | 572±469 | 494±572 |
| | 63.78±7.50 | 60.64±5.06 | 53.90±5.25 | 53.76±4.80 |
| | 0.855±0.265 | 0.717±0.082 | 0.649±0.087 | 0.678±0.102 |
| 21,741 | 15,106 | 25,477 | 23,757 | |
| 29,156 | 13,196 | 54,356 | 29,880 | |
Fig 3Comparison of CPU time and search efficiency amongst six methods for the six contig sequences.
The CPU times for PSO, CPSO, PSORL, CPSORL and ClusterPSO are shown to assess relative search efficiency in the six contig sequences. The horizontal axis represents the implementation time, and the vertical axis represents the log10 value for the presently-detected position in the sequence. Arrows a and c show that the CpGcluster step handles long sequences, hence sequence scanning may proceed more slowly at first than in other methods. Arrows a, b and c show that the CpGcluster step detects very few CpG island candidates in the region.