| Literature DB >> 17477871 |
Tianwei Yu1, Hui Ye, Wei Sun, Ker-Chau Li, Zugen Chen, Sharoni Jacobs, Dione K Bailey, David T Wong, Xiaofeng Zhou.
Abstract
BACKGROUND: DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required.Entities:
Mesh:
Year: 2007 PMID: 17477871 PMCID: PMC1868765 DOI: 10.1186/1471-2105-8-145
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An illustration of the workflow of the forward-backward fragment assembling (FASeg) method.
Parameters tested for the seven R packages
| Packages | Parameters tested | |||
| Tuned packages | Tuning parameter | Values tested | Other parameters | |
| FASeg | Sig | 0.25, 0.1, 0.075, 0.05, 0.025, 0.01, 0.005, 0.001, 0.0001, 0.00001 | Default | |
| aCGH | Vr | 10, 7, 5, 2, 1, 0.5, 0.1, 0.05, 0.01, 0.001 | Default | |
| DNAcopy | alpha | 0.25, 0.1, 0.075, 0.05, 0.025, 0.01, 0.005, 0.001,0.0005, 0.0001 | * nperm = 1000 | |
| GLAD | qlambda | 0.75, 0.9, 0.925, 0.95, 0.975, 0.99, 0.9925, 0.995, 0.9975, 0.999 | ** lambdabreak = 0.01 | |
| Packages examined at default setting | Picard | Maxk = max(true segment size) + 5, maxSeg= #(true segments) + 1 | ||
| RJaCGH | *** burnin = 50, *** TOT = 500, jump.parameters = NULL, k.max = #(true states) + 1 | |||
| BioHMM | Default | |||
* The change in the number of permutations is to reduce computing time. Experiments showed that reducing the number from 10000 to 1000 has minimal effect on the outcome.
** These parameters were tuned according to the GLAD manual to increase sensitivity. Using default values, the method detected limited number of edges from noisy data.
*** The purpose of reducing the number of iterations was to save computing time.
Figure 2The effect of smoothing span on the sensitivity to detect CNA segments. Every sub-plot is based on 100 simulated chromosomes, each harboring 6 normal segments and 5 CNA segments. Ten alpha levels were examined at each smoothing span.
Figure 3The comparison of the performance of seven methodsavailable as R packages. Every sub-plot is based on 100 simulated chromosomes, each harboring 6 normal segments and 5 CNA segments. FASeg, aCGH, DNAcopy and GLAD were each run at 10 parameter settings; Picard, RJaCGH and BioHMM were run at default settings. The parameters used are detailed in Table 1.
Figure 4Sample output of the R-package FASeg. The results were obtained using the smoothing span of 50 SNPs and the alpha level of 10-6. (a) Raw copy number (upper panel) and fitted values (lower panel) of chromosome 9 for data from the Mapping 50 K Xba array, generated from an oral squamous cell carcinoma case (CZ T26). (b) Comparison of the copy numbers for chromosome 9 between four samples. Two primary skin fibroblast cell lines: GM03226 (with a known trisomic segment in chromosome 9 [9pter > q11]; red) and GM00870 (with a known single copy deletion segment in chromosome 9 [9pter > p21]; blue). Two previously uncharacterized oral squamous cell carcinoma cases: CZ T26 (green) and CZ T322 (aqua). (c) Color display of the fitted values of the whole genome for all four samples. From top to bottom: GM03226, GM00870, CZ T26 and CZ T322. The gridlines separate chromosomes lined up in numerical order, with the X chromosome being the last. Black: normal; red: higher; green: lower. (d) A section of the condensed table output containing copy number and Cytoband information for samples GM03226, GM00870, CZ T26, and CZ T322.
Comparison of computing time*
| CPU time (seconds) | |
| FASeg | 181 |
| aCGH | 107 |
| DNAcopy | 18 |
| GLAD | 98 |
| Picard | 101 |
| RJaCGH | 13778 |
| BioHMM | 1619 |
* Comparison was made in R 2.4.1 on a desktop computer running the Windows XP® operating system. CPU: AMD Athlon 64 3800+ @ 2.4 GHz; RAM: 1.2 Gb. The CPU time for the tumor sample CZ T26 was reported. For FASeg, aCGH, DNAcopy and GLAD, the ten parameters listed in Table 1 were tested and the average CPU time was reported. For Picard, ten maxSeg values between 2 and 20 were tested and the average CPU time was reported. For RJaCGH and BioHMM, the parameters listed in Table 1 were used.
Figure 5Demonstration of the performance of FASeg at different p-value cutoffs. Fitted values at each p-value cutoff were displayed on the left. The gridlines separate chromosomes lined up in numerical order, with the X chromosome being the last. Black: normal; red: higher; green: lower. (a) GM03226 cell line data; (b) CZ T26 cancer tissue data.