| Literature DB >> 20576140 |
Liqun Xi1, Yvonne Fondufe-Mittendorf, Lei Xia, Jared Flatow, Jonathan Widom, Ji-Ping Wang.
Abstract
BACKGROUND: The nucleosome is the fundamental packing unit of DNAs in eukaryotic cells. Its detailed positioning on the genome is closely related to chromosome functions. Increasing evidence has shown that genomic DNA sequence itself is highly predictive of nucleosome positioning genome-wide. Therefore a fast software tool for predicting nucleosome positioning can help understanding how a genome's nucleosome organization may facilitate genome function.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20576140 PMCID: PMC2900280 DOI: 10.1186/1471-2105-11-346
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Updating linker length improves prediction - Normal linker length model
| 1st order | 4th order | ||||||
|---|---|---|---|---|---|---|---|
| update | total | sensitivity(%) | FDR(%) | update | sensitivity (%) | FDR(%) | |
| 0 | 10215 (14) | 72 (0.5) | 30 (0.5) | 0 | 10253 (15) | 75 (0.4) | 27 (0.4) |
| 1 | 10210 (12) | 76 (0.5) | 25 (0.5) | 1 | 10231 (18) | 80 (0.5) | 22 (0.5) |
| 4 | 10120 (19) | 83 (0.7) | 18 (0.8) | 4 | 10131 (16) | 85 (0.6) | 16 (0.7) |
Total predictions, sensitivity, and false discovery rate (FDR) are the averages (standard deviations in parentheses) based on 10 repeated simulations. For each simulation a genomic sequence consisting of 10000 nucleosomes and 10001 linkers were simulated using the 1st and 4th order yeast models. The linker length distribution was initialized as uniform on 1,..., 200, and was iteratively updated in the dHMM. Results are shown after 0, 1, and 4 updates.
Updating linker length improves prediction - Gamma linker length model
| 1st order | 4th order | ||||||
|---|---|---|---|---|---|---|---|
| update | total | sensitivity(%) | FDR(%) | update | total | sensitivity (%) | FDR(%) |
| 0 | 8670 (22) | 59 (0.7) | 32 (0.7) | 0 | 8896 (27) | 64 (0.5) | 28 (0.5) |
| 1 | 9347 (25) | 65 (0.8) | 30 (0.7) | 1 | 9550 (42) | 70 (0.6) | 26 (0.6) |
| 4 | 9833 (39) | 67 (0.7) | 31 (0.7) | 4 | 9880 (35) | 72 (0.5) | 27 (0.6) |
Total predictions, sensitivity, and false discovery rate (FDR) are the averages (standard deviations in parentheses) based on 10 repeated simulations. For each simulation a genomic sequence consisting of 10000 nucleosomes and 10001 linkers were simulated using the 1st and 4th order yeast models. The linker length distribution was initialized as uniform on 1,..., 200, and was iteratively updated in the dHMM. Results are shown after 0, 1, and 4 updates.
Re-scaling models improves prediction - Normal linker length model
| model | update | re-scaled total | sensitivity (%) | FDR(%) | update | total | sensitivity (%) | FDR(%) |
|---|---|---|---|---|---|---|---|---|
| 1st | 0 | 10266 (12) | 71 (0.4) | 31 (0.4) | 0 | 13272 (24) | 59 (0.5) | 55 (0.4) |
| 1 | 10279 (15) | 76 (0.4) | 27 (0.4) | 1 | 14803 (25) | 53 (0.4) | 64 (0.3) | |
| 2 | 10240 (19) | 79 (0.3) | 23 (0.3) | 2 | 15383 (23) | 51 (0.4) | 67 (0.3) | |
| 4th | 0 | 10280 (16) | 74 (0.3) | 28 (0.3) | 0 | 12785 (28) | 63 (0.4) | 51 (0.4) |
| 1 | 10267 (20) | 79 (0.4) | 24 (0.5) | 1 | 14065 (25) | 58 (0.3) | 59 (0.3) | |
| 2 | 10220 (24) | 81 (0.4) | 20 (0.5) | 2 | 14591 (24) | 55 (0.4) | 62 (0.3) |
Total predictions, sensitivity, and false discovery rate (FDR) are the averages (standard deviations in parentheses) based on 10 repeated simulations. For each simulation a maize-like genomic sequence consisting of 10000 nucleosomes and 10001 linkers were simulated using the re-scaled 1st and 4th order yeast models. Each sequence was scanned using the true models (re-scaled, 1st or 4th order) and the yeast models with an initial uniform linker length distribution on 1,..., 200. The results after 0, 1, 2 updates of linker length distribution are compared.
Re-scaling models improves prediction - Gamma linker length mode l
| model | update | re-scaled total | sensitivity (%) | FDR(%) | update | total | sensitivity (%) | FDR(%) |
|---|---|---|---|---|---|---|---|---|
| 1st | 0 | 8746 (28) | 60 (0.7) | 31 (0.6) | 0 | 10640 (19) | 70 (0.3) | 35 (0.3) |
| 1 | 9471 (42) | 67 (0.7) | 30 (0.6) | 1 | 11513 (14) | 60 (0.6) | 48 (0.6) | |
| 2 | 9787 (38) | 68 (0.5) | 30 (0.4) | 2 | 11812 (18) | 55 (0.4) | 53 (0.3) | |
| 4th | 0 | 8886 (18) | 63 (0.3) | 29 (0.4) | 0 | 10461 (25) | 73 (0.7) | 30 (0.7) |
| 1 | 9533 (26) | 70 (0.8) | 27 (0.8) | 1 | 11190 (32) | 66 (0.7) | 41 (0.7) | |
| 2 | 9775 (33) | 72 (0.8) | 27 (0.9) | 2 | 11443 (26) | 63 (0.5) | 45 (0.5) |
Total predictions, sensitivity, and false discovery rate (FDR) are the averages (standard deviations in parentheses) based on 10 repeated simulations. For each simulation a maize-like genomic sequence consisting of 10000 nucleosomes and 10001 linkers were simulated using the re-scaled 1st and 4th order yeast models. Each sequence was scanned using the true models (re-scaled, 1st or 4th order) and the yeast models with an initial uniform linker length distribution on 1,..., 200. The results after 0, 1, 2 updates of linker length distribution are compared.
Figure 1A plot of the experimentally defined reads occupancy score (red curve) for a region of yeast chromosome 4 showing the selected well-defined nucleosomes (grey shaded bars).
Figure 2A snapshot of predicted nucleosome occupancy from NuPoP (shaded grey) compared with the experimentally obtained reads-occupancy (red).
Figure 3Comparing sensitivity of NuPoP predictions with existing methods. The sensitivity is assessed based on 20,471 well-defined nucleosomes from 454 nucleosomes reads. We call a prediction correct if a nucleosome is predicted within +/- k bp distance (X-axis) of a well-defined nucleosome (center to center) for k = 5 to 73. (a) Sensitivity plot of NuPoP (black) compared with N-score method (blue), MM/TE method (red), and the duration Hidden Markov Model using a uniform distribution on 1,...,500 (pink). The random expectation curve is not calculated because the total number of predictions varies in different methods. (b) Sensitivity plot of NuPoP (black) compared with MM/TE method (red) and random expectation (green) while controlling the total predictions to be the same as MM/TE method. Green: random expectation. (c) Sensitivity plot of NuPoP (black) compared with N-score method (red) and random expectation (green) while controlling the total predictions to be the same as N-score method.