| Literature DB >> 19352488 |
Xiaorong Yang1, Xiaobo Zhou, Wan-Ting Huang, Lingyun Wu, Federico A Monzon, Chung-Che Chang, Stephen T C Wong.
Abstract
Copy Number Aberration (CNA) in myelodysplastic syndromes (MDS) study using single nucleotide polymorphism (SNP) arrays have been received increasingly attentions in the recent years. In the current study, a new Constraint Moving Average (CMA) algorithm is adopted to determine the regions of CNA regions first. In addition to large regions of CNA, using the proposed CMA algorithm, small regions of CNA can also be detected. Real-time Polymerase Chain Reaction (qPCR) results prove that the CMA algorithm presents an insightful discovery of both large and subtle regions. Based on the results of CMA, two independent applications are studied. The first one is power analysis for sample estimation. An accurate estimation of sample size needed for the desired purpose of an experiment will be important for effort-efficiency and cost-effectiveness. The power analysis is performed to determine the minimum sample size required for ensuring at least (0<lambda <or=) detected regions statistically different from normal references. As expected, power increase with increasing sample size for a fixed significance level. The second application is the distinguishment of high-grade MDS patients from low-grade ones. We propose to calculate the General Variant Level (GVL) score to integrate the general information of each patient at genotype level, and use it as the unified measurement for the classification. Traditional MDS classifications usually refer to cell morphology and The International Prognostic Scoring System (IPSS), which belongs to the classification at the phenotype level. The proposed GVL score integrates the information of CNA region, the number of abnormal chromosomes and the total number of the altered SNPs at the genotype level. Statistical tests indicate that the high and low grade MDS patients can be well separated by GVL score, which appears to correlate better with clinical outcome than the traditional classification approaches using morphology and IPSS sore at the phenotype level.Entities:
Mesh:
Year: 2009 PMID: 19352488 PMCID: PMC2662412 DOI: 10.1371/journal.pone.0005054
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Results for genotype conflicting analysis for 35 arrays.
|
|
|
|
|
|
|
|
|
| 171057 | 38094 | 26262 | 10919 | 7069 | 3770 |
|
| 0.362959 | 0.142516 | 0.036324 | 0.006756 | 0.000977 | |
|
|
|
|
|
|
|
|
|
| 2384 | 1242 | 734 | 386 | 216 | 181 |
|
| 0.000114 | 1.12E-05 | 9.22E-07 | 6.56E-08 | 4.07E-09 | 2.33E-10 |
E.g. there are 3770 genotyping conflicting SNPs, each of which appears in 5 arrays, and if such conflicts are just due to random error (i.e. can not be regarded as wrong genotyping calls), the probability is 0.000977.
Figure 1Illustration of the CMA algorithm.
MDS-2 Lymphoid is the reference, and MDS-2 Erythroid is the test sample. In the test sample, the average log-2 intensities of every five consecutive SNPs in the circled region are higher than 0.35, with the small SDs (< = 0.15). Selected overlapping regions are merged into a large region.
Output of the CMA algorithm results for a region located in chromosome 7q34 and the corresponding real-time PCR results.
| Sample | Fraction | Mean | SD | PCR |
| MDS-3 | Lymphoid (control) | 0.0951 | 0.1220 | 1 |
| Blast (test sample) | 0.1869 | 0.1724 | 0.91 | |
| Erythroid (test sample) | 0.3563 | 0.1229 | 0.65 |
Three fractions from the same patient are displayed. The Lymphoid is the normal reference. Blast and Erythroid serve as test samples. The log 2 ratio behaviors them are different. As normal one, the log 2 ratio of Lymphoid is closed to 0. There is a significant loss in Erythroid, but for Blast, the log 2 ratio is not low enough. The real time PCR of Lymphoid is normalized as 1. Comparing with the reference, Erythroid is concluded as copy number aberration. However, such abnormality can not be observed in Blast.
Figure 2CNA regions selected by CMA algorithm.
Compared with the references, the regions with circles indicate the CNA regions. For different samples, the CNA regions may occur in the different locations. Some appear repeatedly (the ones with green circles), and some others rarely occur (the ones with red one).
Figure 3Power analysis algorithm for the estimation of the minimum sample size.
Figure 4Power curves and sample size estimation under different views of effect sizes.
The significance level is set as 0.05.
Sample size estimation to detect at least , 0.5, 0.6, 0.7 and 0.8 truly altered regions for the desired power up to 0.8 and 0.9, with different significance level.
|
|
|
|
|
| |||
| P = 0.8 | P = 0.9 | P = 0.8 | P = 0.9 | P = 0.8 | P = 0.9 | ||
| 0.4 | 0.344 | 54 | 74 | 69 | 91 | 102 | 129 |
| 0.5 | 0.283 | 79 | 109 | 101 | 134 | 150 | 190 |
| 0.6 | 0.230 | 118 | 163 | 150 | 200 | 224 | 284 |
| 0.7 | 0.171 | 215 | 296 | 272 | 364 | 406 | 516 |
| 0.8 | 0.108 | 528 | 731 | 671 | 897 | 998 | 1271 |
(P: Power).
The average GVL of MDS patients and the discrimination between the high grade MDS and the low grade MDS by both cell morphology and IPSS score.
| Sample | Fraction |
|
| GVL | Average | High/Low by morphology | IPSS |
| MDS-1 | Blast | 8 | 365 | 0.3871 | 0.3835 | H | Int-1 |
| Erythroid | 6 | 53 | 0.3799 | ||||
| MDS-2 | Myeloid | 12 | 74 | 0.3581 | 0.4011 | H | Int-2 |
| Erythroid | 15 | 174 | 0.4441 | ||||
| MDS-6 | Blast | 4 | 16 | 0.3336 | 0.2725 | H | Int-2 |
| Erythroid | 3 | 15 | 0.3119 | ||||
| Myeloid | 1 | 6 | 0.1719 | ||||
| MDS-8 | Blast | 5 | 1610 | 0.3635 | 0.3561 | H | Int-1 |
| Erythroid | 3 | 875 | 0.3286 | ||||
| MDS-10 | Myeloid | 6 | 4586 | 0.3742 | 0.3742 | H | H |
| MDS-3 | Blast | 0 | 0 | 0 | 0.1123 | L | Int-1 |
| Myeloid | 0 | 0 | 0 | ||||
| Erythroid | 4 | 30 | 0.3369 | ||||
| MDS-4 | Myeloid | 0 | 0 | 0 | 0.0895 | L | L |
| Erythroid | 1 | 5 | 0.1790 | ||||
| MDS-5 | Erythroid | 1 | 5 | 0.1808 | 0.1808 | L | L |
| MDS-9 | Erythroid | 1 | 5 | 0.1939 | 0.1939 | L | L |
| MDS-11 | Myeloid | 0 | 0 | 0 | 0 | L | Int-1 |
| MDS-12 | Myeloid | 0 | 0 | 0 | 0 | L | Int-1 |
| MDS-7 | Blast | 15 | 198 | 0.4264 | 0.3842 |
| Int-1 |
| Myeloid | 3 | 38 | 0.3421 |
A GVL of zero implies that there is no selected abnormal region in the corresponding arrays.
Two group t-test results for the discrimination of MDS grade of using General Variant Level score in the sense of cell morphology and IPSS.).
| Morphology | IPSS | |||||
| Without MDS-7 |
| df |
|
| df |
|
| 9.3989 | 9 | 0.0001 | 4.2432 | 9 | 0.0022 | |
| With MDS-7 |
| df |
|
| df |
|
| 5.6028 | 10 | 0.0002 | 3.6182 | 10 | 0.0047 | |
The cutoff value of copy number one and three in CNAG is −0.35 and 0.35, and the window size of moving average is 5. The t-value is the value of statistics in the t test, and df is the degree of freedom of the test.
Comparison of selected abnormal regions in chromosome 7 by CMA algorithm, CNAG and CBS. Y and N denote if the region is selected or not, respectively.
| Arrays | Regions | CMA |
| CBS |
| MDS-8 B | Chr 7 monosomy | Y | Y | Y |
| MDS-8 E | Chr 7 monosomy | Y | Y | Y |
| MDS-1 B | 7q34–7q36.1 | Y | Y | Y |
| MDS-3 E | 7q34 | Y | N | N |
| MDS-1 B | 7p21.3 | Y | N | N |
| MDS-2 E | 7p14.2 | Y | N | N |
| MDS-1 E | 7q14.1 (mean = 0.56; SD = 0.47) | N | Y | N |
| MDS-1 E | 7q34 (mean = 0.25; SD = 0.20) | N | Y | N |
| MDS-2 E | 7p31.1 (mean = 0.24; SD = 0.34) | N | Y | N |
| MDS-2 M | 7p31.3 (single SNP) | N | Y | N |
| MDS-2 M | 7q34 (mean = 0.29; SD = 0.28) | N | Y | N |
B: Blast; E: Erythroid.
Copy number aberrations comparison of CMA algorithm and CNAG (MDS-7 is excluded).
| CMA | H | L |
| df |
| ||
| mean | SD | mean | SD | ||||
| Morphology | 5.93 | 2.83 | 0.64 | 0.56 | 7.07 | 9 | 0.0001 |
| IPSS | 6.22 | 3.67 | 1.85 | 2.44 | 4.72 | 0.0011 | |
Two-group t-test are performed under the null hypothesis, that mean of two groups are no significant different.