| Literature DB >> 17590078 |
Oscar M Rueda1, Ramón Díaz-Uriarte.
Abstract
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, "What is the probability that this gene/region has CNAs?" Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases.Entities:
Mesh:
Year: 2007 PMID: 17590078 PMCID: PMC1894821 DOI: 10.1371/journal.pcbi.0030122
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Effects of Variability in Interprobe Distance (Percentage of Probes Missing) on Correct Classification
Shown are the mean and 95% confidence interval around the mean of the correct classification error rate. Each mean and confidence interval is computed from 500 datasets [31]; see text and Protocol S1 for generation of interprobe distance variability. Alternative methods compared are ACE, developed by [27]; BioHMM, a nonhomogeneous HMM by [18]; DNAcopy with mergeLevels, with the original method developed by [24] (and use of mergeLevels following [31]); HMM, a homogeneous HMM developed by [19]; CGHseg, a random Gaussian process with abrupt changes in the mean by [28]; and GLAD, which uses a nonparametric likelihood method with adaptive weights for breakpoint detection, by [23].
Figure 2Joint Effects of Variability in Interprobe Distance and Noise on Correct Classification
The same data are shown as those in Figure 1. The noise (standard deviation) of each sample is split into ten non-overlapping ranges, and each panel shows the mean correct classification success versus the proportion of missing probes (i.e., increasing levels of variance in interprobe distance). Each mean is based on approximately 50 samples. See Figure 1 for method references.