| Literature DB >> 19500389 |
Juan R González1, Isaac Subirana, Geòrgia Escaramís, Solymar Peraza, Alejandro Cáceres, Xavier Estivill, Lluís Armengol.
Abstract
BACKGROUND: Copy number variations (CNVs) may play an important role in disease risk by altering dosage of genes and other regulatory elements, which may have functional and, ultimately, phenotypic consequences. Therefore, determining whether a CNV is associated or not with a given disease might be relevant in understanding the genesis and progression of human diseases. Current stage technology give CNV probe signal from which copy number status is inferred. Incorporating uncertainty of CNV calling in the statistical analysis is therefore a highly important aspect. In this paper, we present a framework for assessing association between CNVs and disease in case-control studies where uncertainty is taken into account. We also indicate how to use the model to analyze continuous traits and adjust for confounding covariates.Entities:
Mesh:
Year: 2009 PMID: 19500389 PMCID: PMC2707368 DOI: 10.1186/1471-2105-10-172
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1CNV quantitative measurements. Examples of CNV data showing different clustering quality and copy number status.
Contingency table of disease status and copy number category
| Copy number status | |||||
| Disease | 1 | 2 | ⋯ | Total | |
| Cases | ⋯ | R | |||
| Controls | s2 | ⋯ | S | ||
Figure 2Empirical power for simulation studies. Empirical power for the three different approaches analyzed, varying the quality of clustering for underlying copy number status. Left panel is for fixed variance and varying means, while the right panel is for fixed mean and varying variances.
Simulation study
| Mean Square Error (×103) | ||||||||||
| I | SIM | NAIVE | THRES | LC | NAIVE | THRES | LC | |||
| 50 | 0.8 | 1.3 | (0.15,0.15) | 1.23 | 1.17 | 1.15 | 1.20 | 57 | 87 | 42 |
| 50 | 0.8 | 1.3 | (0.2,0.2) | 1.24 | 1.14 | 1.09 | 1.21 | 107 | 131 | 114 |
| 50 | 0.8 | 1.3 | (0.15,0.2) | 1.28 | 1.18 | 1.15 | 1.24 | 134 | 148 | 112 |
| 50 | 0.8 | 2 | (0.15,0.15) | 1.60 | 1.40 | 1.28 | 1.48 | 54 | 85 | 44 |
| 50 | 0.8 | 2 | (0.2,0.2) | 1.82 | 1.36 | 1.29 | 1.52 | 152 | 158 | 126 |
| 50 | 0.8 | 2 | (0.15,0.2) | 1.89 | 1.42 | 1.33 | 1.57 | 180 | 253 | 162 |
| 50 | 0.5 | 1.3 | (0.15,0.15) | 1.26 | 1.24 | 1.21 | 1.26 | 39 | 51 | 32 |
| 50 | 0.5 | 1.3 | (0.2,0.2) | 1.32 | 1.28 | 1.25 | 1.35 | 82 | 79 | 97 |
| 50 | 0.5 | 1.3 | (0.15,0.2) | 1.26 | 1.23 | 1.20 | 1.26 | 66 | 72 | 60 |
| 50 | 0.5 | 2 | (0.15,0.15) | 2.04 | 1.94 | 1.83 | 2.05 | 40 | 67 | 34 |
| 50 | 0.5 | 2 | (0.2,0.2) | 2.04 | 1.76 | 1.68 | 2.05 | 107 | 128 | 92 |
| 50 | 0.5 | 2 | (0.15,0.2) | 2.06 | 1.78 | 1.72 | 1.99 | 87 | 107 | 71 |
| 300 | 0.8 | 1.3 | (0.15,0.15) | 1.30 | 1.25 | 1.18 | 1.30 | 13 | 32 | 10 |
| 300 | 0.8 | 1.3 | (0.2,0.2) | 1.32 | 1.25 | 1.15 | 1.34 | 27 | 50 | 29 |
| 300 | 0.8 | 1.3 | (0.15,0.2) | 1.30 | 1.22 | 1.16 | 1.29 | 24 | 42 | 21 |
| 300 | 0.8 | 2 | (0.15,0.15) | 2.01 | 1.87 | 1.49 | 2.01 | 21 | 120 | 13 |
| 300 | 0.8 | 2 | (0.2,0.2) | 2.03 | 1.70 | 1.36 | 1.99 | 69 | 203 | 43 |
| 300 | 0.8 | 2 | (0.15,0.2) | 2.03 | 1.62 | 1.38 | 1.86 | 78 | 189 | 38 |
| 300 | 0.5 | 1.3 | (0.15,0.15) | 1.31 | 1.27 | 1.26 | 1.30 | 7 | 9 | 5 |
| 300 | 0.5 | 1.3 | (0.2,0.2) | 1.30 | 1.23 | 1.22 | 1.30 | 15 | 17 | 12 |
| 300 | 0.5 | 1.3 | (0.15,0.2) | 1.30 | 1.24 | 1.23 | 1.29 | 12 | 14 | 9 |
| 300 | 0.5 | 2 | (0.15,0.15) | 2.00 | 1.87 | 1.77 | 2.00 | 11 | 23 | 5 |
| 300 | 0.5 | 2 | (0.2,0.2) | 2.00 | 1.72 | 1.66 | 2.02 | 36 | 51 | 15 |
| 300 | 0.5 | 2 | (0.15,0.2) | 2.00 | 1.76 | 1.71 | 1.97 | 26 | 37 | 10 |
Odds ratio (e) and mean square error obtained in 1,000 simulations using the three different approaches, NAIVE, THRES and LC (see text for a description of each). Results are given for different scenarios, varying the number of individuals (I), the proportion of individuals with each copy number status (π), the odds ratio (e), and the variance for CNV quantitative measurements.
Figure 3Association between Gene 1 and disease. Graphical representation of peak intensities (CNV quantitative measurement) of individuals for Gene 1 analyzed in the example. The various colors indicate copy number status inferred using our proposed finite mixture model.
Figure 4Association between Gene 2 and disease. Graphical representation of peak intensities (CNV quantitative measurement) of individuals for Gene 2 analyzed in the example. The various colors indicate copy number status inferred using our proposed finite mixture model.
Contingency table of estimated and true copy number status for the two genes examined in the real data example.
| True copy number status | |||
| 0 | 1 | 2 | |
| 0 | 426 | 0 | 0 |
| 1 | 0 | 201 | 0 |
| 2 | 0 | 0 | 24 |
| 0 | 85 | 0 | 0 |
| 1 | 5 | 287 | 0 |
| 2 | 0 | 73 | 204 |
Association analysis of disease status and copy number category using the true copy number status and the estimated status obtained using the finite mixture proposed.
| True CN | Estimated CN | ||||||
| Co | Ca | OR (CI95%) | Co | Ca | ORnaïve (CI95%) | ORLC (CI95%) | |
| 0 | 210 | 216 | 1 | 210 | 216 | 1 | 1 |
| 1 | 75 | 126 | 1.63 (1.16,2.30) | 75 | 126 | 1.63 (1.16,2.30) | 1.63 (1.16,2.30) |
| 2 | 6 | 18 | 2.92 (1.14,7.49) | 6 | 18 | 2.92 (1.14,7.49) | 2.92 (1.14,7.50) |
| 0.0027 | 0.0027 | 0.0023 | |||||
| 5.0 × 10-4 | 5.0 × 10-4 | 5.0 × 10-4 | |||||
| 0 | 24 | 66 | 1 | 22 | 63 | 1 | 1 |
| 1 | 159 | 201 | 0.46 (0.27,0.77) | 129 | 178 | 0.44 (0.26,0.75) | 0.47 (0.27,0.82) |
| 2 | 108 | 93 | 0.31 (0.18,0.54) | 140 | 119 | 0.33 (0.19,0.57) | 0.31 (0.18,0.54) |
| 7.2 × 10-5 | 2.3 × 10-4 | 8.4 × 10-5 | |||||
| 2.1 × 10-5 | 1.0 × 10-4 | 2.1 × 10-5 | |||||
Steps used to assess association between CNVs and traits when aCGH is used.
Number of CNV blocks (out of 459) associated with estrogen receptor positivity from 50 aCGH breast cancer cell lines.
| Significance level | |||||
| 10-6 | 10-5 | 10-4 | 10-3 | 10-2 | |
| Latent class model | 1 | 4 | 27 | 64 | 117 |
| Chi-square test | 0 | 2 | 10 | 41 | 93 |
Results are given for different levels of association and comparing our proposed model with the naïve approach that does not consider uncertainty.