| Literature DB >> 19455235 |
Mark A van de Wiel1, Wessel N van Wieringen.
Abstract
An algorithm to reduce multi-sample array CGH data from thousands of clones to tens or hundreds of clone regions is introduced. This reduction of the data is performed such that little information is lost, which is possible due to the high dependencies between neighboring clones. The algorithm is explained using a small example. The potential beneficial effects of the algorithm for downstream analysis are illustrated by re-analysis of previously published colorectal cancer data. Using multiple testing corrections suitable for these data, we provide statistical evidence for genomic differences on several clone regions between MSI+ and CIN+ tumors. The algorithm, named CGHregions, is available as an easy-to-use script in R.Entities:
Keywords: Array CGH; Dimension reduction; FDR; Statistical testing; Tumor profiles
Year: 2007 PMID: 19455235 PMCID: PMC2675846
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Signatures and distances (d) to previous clone for 18 clones.
| 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 10 |
| 1 | 2 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 1 | 3 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 2 | 4 | −1 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 | 10 |
| 2 | 5 | −1 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 | 0 |
| 2 | 6 | −1 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 | 0 |
| 2 | 7 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 12 |
| 2 | 8 | −1 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 | 12 |
| 2 | 9 | −1 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 | 0 |
| 2 | 10 | −1 | 0 | 0 | 0 | 0 | −1 | −1 | −1 | 1 | 2 | 4 |
| 2 | 11 | −1 | 0 | 0 | 0 | 0 | −1 | −1 | −1 | 1 | 1 | 0 |
| 2 | 12 | −1 | 0 | 0 | 0 | 0 | −1 | −1 | −1 | 1 | 1 | 0 |
| 2 | 13 | 0 | 0 | 0 | 0 | 0 | 0 | −1 | −1 | 1 | 1 | 2 |
| 2 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | −1 | −1 | 0 | 1 | 1 |
| 2 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | −1 | −1 | 0 | 1 | 0 |
| 2 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | −1 | −1 | 0 | 1 | 0 |
| 2 | 17 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 8 |
| 2 | 18 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Regions created using c = 2 and their medoid signatures.
| 1 | 1–3 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
| 2 | (4,5,6,8,9) | −1 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 |
| 2 | 10–12 | −1 | 0 | 0 | 0 | 0 | −1 | −1 | −1 | 1 | 1 |
| 2 | 13–16 | 0 | 0 | 0 | 0 | 0 | 0 | −1 | −1 | 0 | 1 |
| 2 | 17–18 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
Figure 1.Visualization of 68 regions created by CGHregions for 37 colorectal tumor samples (Douglas et al. 2004). Y-axis: chromosome, X-axis: base pair position. A new region is displayed by a slight jump with respect to the previous region. The number of regions per chromosome ranges from 1 (several) to 9 (chromosome 8), indicating that the resolution of the results adapts to the heterogeneity of the transition locations. Each region is displayed as a bi-colored segment, the lower and upper part of which correspond to the proportions pl and pg of samples with a loss (red) or gain (green), respectively. The color coding is displayed as well: ‘1’: p (p) < 10%; ‘2’: 10% ≤ p (p) < 30%; ‘3’: 30% ≤ p (p) < 50%; ‘4’: p (p) ≥ 50%.
Regions significantly different for MSI+ and CIN+ colorectal cancers.
| 8 | 7938099 | 32678693 | 32 | 0.00166 | 0.01372 |
| 20 | 30814489 | 63589868 | 50 | 0.00464 | 0.03652 |
| 18 | 32413398 | 72886818 | 54 | 0.00618 | 0.03652 |
| 18 | 73991368 | 77615559 | 8 | 0.00618 | 0.03652 |
| 8 | 731200 | 6933218 | 14 | 0.00753 | 0.03652 |
| 8 | 34108046 | 35026137 | 2 | 0.01677 | 0.04491 |
| 18 | 225168 | 25700568 | 32 | 0.01961 | 0.04593 |
| 18 | 27315721 | 29970100 | 4 | 0.02259 | 0.04593 |
| 13 | 19104448 | 32907695 | 16 | 0.02264 | 0.04593 |
Figure 2.Region-wise frequency plots for 37 colorectal tumor samples (Douglas et al. 2004). Regions were created using settings T = 0.01 (a) and T = 0.025 (b). Left-axis displays the loss-proportion; this scale should be reversed (‘1-’) to obtain the gain-proportion.