| Literature DB >> 19133123 |
Paola M V Rancoita1, Marcus Hutter, Francesco Bertoni, Ivo Kwee.
Abstract
BACKGROUND: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19133123 PMCID: PMC2674052 DOI: 10.1186/1471-2105-10-10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of simulated data. The simulated data in the figure represent an easy, medium and difficult case, respectively.
Comparison among the hyper-parameter estimators
| 0.0904 | 0.0091 | 0.0059 | 0.0094 | 0.021 | 0.067 | 0.0169 | 0.0729 | |
| 0.0042 | 0.0014 | 0.0041 | 0.0036 | 0.0037 | 0.0114 | 0.0123 | 0.0272 | |
| 0.0633 | 0.0871 | 0.2508 | 0.2404 | 0.2426 | 0.4271 | 0.9921 | 1.3254 | |
| 0.068 | 0.0009 | 0.0008 | 0.0014 | 0.0047 | 0.0593 | 0.0024 | 0.0623 |
The table shows the estimated mean square error of the estimators , , and applied to datasets without replicates, for different values of σ2 and ρ2. The table shows that increased with ρ2 and increased with σ2. The estimator was generally more accurate than with respect to the MSE error measure.
Comparison among the several boundary estimators
| 11.5967 ± 0.4562 | 18.0933 ± 0.6388 | 18.2 ± 0.6177 | 17.6833 ± 0.5781 | |
| 8.8667 ± 0.3329 | 17.6333± 0.6224 | 17.4833± 0.5898 | 16.6167 ± 0.5367 | |
| 8.6833 ± 0.3437 | 17.3833 ± 0.6219 | 17.2467 ± 0.5983 | 15.8967 ± 0.5278 | |
| 8.7267 ± 0.3449 | 17.3933 ± 0.6226 | 17.2567 ± 0.5978 | 15.9133 ± 0.5303 | |
| 11.4767 ± 0.4532 | 15.2867 ± 0.5325 | 15.6867 ± 0.536 | 16.1933 ± 0.5301 | |
| 8.7933 ± 0.3294 | 13.73 ± 0.4731 | 13.81 ± 0.4691 | 14.1667 ± 0.451 | |
| 8.47 ± 0.3372 | 13.0567 ± 0.4635 | 13.18 ± 0.4725 | 13.2367 ± 0.4344 | |
| 8.4567 ± 0.3361 | 13.0467 ± 0.4676 | 13.08 ± 0.4695 | 13.2233 ± 0.4361 | |
The table shows the estimated mean binary errors ± standard deviation of the estimators of 0 applied to datasets with replicates for different values of σ2 and ρ2. We always used , , and both ρ2 estimators as estimators of the other parameters involved. The estimators and always had the lowest errors. When σ2 > ρ2 the error was lower by using , otherwise the two ρ2 estimators gave similar errors.
Comparison among the boundary estimators on Simulated Chromosomes
| 14.23 | 0.00877 | 0.889 | 0.961 | 0.883 | |
| 2.22 | 0.00840 | 0.904 | 0.992 | 0.892 | |
| 1.70 | 0.00733 | 0.936 | 0.992 | 0.932 | |
| 9.74 | 0.00952 | 0.881 | 0.960 | 0.877 | |
| 2.67 | 0.00970 | 0.882 | 0.993 | 0.867 | |
| 1.85 | 0.00781 | 0.929 | 0.993 | 0.920 | |
This table shows the comparison among the error measures for profile estimation, obtained on dataset Simulated Chromosomes, for all estimators of 0 (apart ). We always used , , and both ρ2 estimators as estimators of the other parameters involved. The estimator always had the lowest SSQ and MAD errors and the highest accuracy both inside and outside the regions of aberration. Using the performance was slightly better, because σ2 <ρ2.
Figure 2Example of estimated piecewise constant profiles. The plots show the differences in the level estimation among the piecewise constant methods on samples with SNR = 3 and SNR = 1: some are unable to identify the small aberrations in presence of high noise. In each graph, the grey segments represent the true profile.
Comparison among the piecewise constant methods on Simulated Chromosomes
| mBPCR | 1.70 | 0.00733 | 0.936 | 0.992 | 0.932 |
| mBPCR | 1.85 | 0.00781 | 0.929 | 0.993 | 0.920 |
| CBS | 1.56 | 0.00705 | 0.953 | 0.985 | 0.950 |
| CGHseg | 5.42 | 0.00795 | 0.925 | 0.885 | 0.956 |
| HMM | 4.47 | 0.00350 | 0.993 | 0.968 | 0.997 |
| GLAD | 4.15 | 0.00846 | 0.939 | 0.930 | 0.952 |
| BioHMM | 5.69 | 0.003647 | 0.990 | 0.949 | 0.999 |
| Rendersome | 19.13 | 0 | 0.920 | 0.289 | 1 |
The table shows the comparison of the level estimations obtained using several piecewise constant methods on dataset Simulated Chromosomes. In this comparison, the methods CBS and mBPCR exhibited the lowest SSQ error in the profile estimation and the highest accuracy inside the aberrated regions. On the other hand, HMM, BioHMM and Rendersome had the highest accuracy outside the aberrations, but a high SSQ error. Therefore, the former group of algorithms globally estimated a better profile than the latter. Because of its definition, the MAD error is less informative: it does not take into account if a small number of probes are wrongly estimated, but these probes could correspond to breakpoints or small aberrations.
Figure 3Example of estimated regression curves. The plots show the differences in the level estimation among the smoothing methods on samples with SNR = 3 and SNR = 1: some oscillate more in the regions outside the aberrations. In cases of high noise, the more oscillating the profiles are, the harder it is to identify which regions correspond to the aberrations. In each graph, the grey segments represent the true profile.
Copy number estimation results obtained on 10K Array data of sample JEKO-1
| 3/2 | 2.97 | 2.99 | 2.97 | 2.90 | 2.92 | 2.92 | 3.14 | 2.92 | |
| ampl | 12.11 | 9.35 | 10.27 | 10.27 | 13.95 | 9.82 | 8.26 | ||
| 2 | |||||||||
| 4/5 | 4.08 | 4.29 | 4.08 | 4.08 | 3.84 | 3.79 | 3.50 | ||
| 4 | 4.08 | 4.29 | 4.08 | 4.08 | 3.84 | 3.79 | |||
| 4 | 3.72 | 3.59 | 3.57 | 3.72 | 3.62 | 3.58 | |||
| 4 | 3.82 | 3.62 | |||||||
| 2/3 | 2.81 | 3.00 | 2.83 | 2.50 | 2.93 | 3.14 | 2.93 | ||
| 4 | 3.63 | 3.62 | 3.64 | ||||||
| 4 | 3.63 | 3.62 | 3.64 | ||||||
On this noisy data, BioHMM and Rendersome often estimated the gene copy number wrongly, while this occurred only sometimes to CBS, HMM and GLAD. The method mBPCR with correctly estimated the gene copy numbers, apart from CCND1 whose copy number was estimated by all methods differently from the FISH technique.
Figure 4Comparison among the estimated profiles of chromosome 11 of JEKO-1. The figure shows the comparison among the piecewise constant estimated profiles of chromosome 11 of JEKO-1 using both 10K Array and 250K Array data. Only mBPCR with was able to detect the high amplification after position 110 Mb on the 10K Array data. On the other hand, all methods (apart from BioHMM) recognized it on the 250K Array data.