| Literature DB >> 18784837 |
Barry S Taylor1, Jordi Barretina, Nicholas D Socci, Penelope Decarolis, Marc Ladanyi, Matthew Meyerson, Samuel Singer, Chris Sander.
Abstract
Understanding the molecular basis of cancer requires characterization of its genetic defects. DNA microarray technologies can provide detailed raw data about chromosomal aberrations in tumor samples. Computational analysis is needed (1) to deduce from raw array data actual amplification or deletion events for chromosomal fragments and (2) to distinguish causal chromosomal alterations from functionally neutral ones. We present a comprehensive computational approach, RAE, designed to robustly map chromosomal alterations in tumor samples and assess their functional importance in cancer. To demonstrate the methodology, we experimentally profile copy number changes in a clinically aggressive subtype of soft-tissue sarcoma, pleomorphic liposarcoma, and computationally derive a portrait of candidate oncogenic alterations and their target genes. Many affected genes are known to be involved in sarcomagenesis; others are novel, including mediators of adipocyte differentiation, and may include valuable therapeutic targets. Taken together, we present a statistically robust methodology applicable to high-resolution genomic data to assess the extent and function of copy-number alterations in cancer.Entities:
Mesh:
Year: 2008 PMID: 18784837 PMCID: PMC2527508 DOI: 10.1371/journal.pone.0003179
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overview of the RAE workflow.
Input is a set of patients; tumor DNA, (un) matched non-tumor DNA, and an unrelated reference normal cohort. Tumor and non-tumor samples are quantified, normalized, and subject to quality control. In the assessment phase, individual samples are segmented and a multi-component model is parameterized for each; this produces a detector for single-copy gain, amplification, hemizygous loss, and homozygous deletion. Across all tumors, a unified breakpoint profile (UBP) is derived from the ensemble of segmentation breakpoints, and each region is scored for gain and loss. A background model of random aberrations is constructed with supplemental cleavage and permutation of genomic regions, and p-values are assigned and corrected for multiple hypothesis testing. In the output phase, RAE determines genomic boundaries for regions of interest (ROI), controls for germline and population copy-number variation, and reports statistically significant alterations.
Figure 2Multi-component model of copy-number alteration.
(a) In a noisy system, a soft discriminator (red) is juxtaposed to a hard threshold (black); both of which assign points either continuous or binary values respectively (parentheses) for confidently copy-neutral or amplified loci (black) and for challenging cases at the margin of signal (green). This indicates the benefit of soft discrimination. (b) The functional form of the soft discriminator; a sigmoid function with parameters for location (E) and slope (β). (c) Individual-tumor approach to detecting gain and loss; the multi-component model parameterized for two tumors (red and blue) indicating that tumor-specific features produce different discriminators for single-copy gain and loss (solid), amplification (dot-dash), and homozygous deletion (dotted). Parameterization selects values for E and β such that their magnitude (unsigned) moves in the direction indicated (legend).
Figure 3Aggregation and permutation.
(a) The density of human recombination hotspots (top; median distance between hotspots is ∼55 kb) spans segmentation (red) of probe-level data (dark blue) in a ∼5 mb region of 13q14.13-3 in four pleomorphic liposarcomas. The unique tumor-associated breakpoints (black arrows) define the UBP (regions r1–6; bottom), the smallest of which (r3) spans four genes including the tumor suppressor RB1 (direction of transcription indicated). (b) On chromosome 1p, the density distribution of predicted recombination hotspots (red) at a width equal to the median distance between all p-arm hotspots (56 kb), and the distribution of their randomization (blue). The sampling procedure respects the shape of the original distribution and therefore the sequence features that underlie it. (c) Size distribution of regions derived from segmentation and subsequently defined by the unified breakpoint profile (UBP; gray), and those hotspot-cleaved regions of the same permuted during null model generation (as indicated, blue).
Figure 4Regions of interest (ROI).
Deletion of RB1 at 13q14.2–q14.3 in pleomorphic liposarcoma demonstrates features of ROI detection in RAE. (a) Heatmap of copy number in a small region of 13q in 24 pleomorphic liposarcomas (tumors are rows, markers are columns; color scale as indicated), and (b) From segmentation, the extent of genomic deletion in a subset of tumors with either hemizygous loss (thin) or homozygous deletion (thick) (c) Inset, the regions of the UBP at this locus (filled circles), and their summary score (D′, left axis). The combination of analytical error (error bars) and two thresholds (FDR and peak detection, green) determine the sensitivity of ROI detection. The detected peak (identified by red plus) is merged with physically adjacent regions that fall inside its error interval (red filled circles and error bars) and define the 5 and 3′ boundaries of the ROI (gray). Statistical significance (q-value) corresponding to summary scores such that permutation is unable to resolve a p-value smaller than 1/(Np+1) (dotted line, right axis) indicates the necessity for resolving ROIs in the space of the summary score D′. Regional and peak boundaries define the ROI (at bottom; mb) spanning 20 and two genes respectively, the latter including RB1 (direction of transcription indicated). Note, the region detected as the peak is void of genic content, emphasizing the necessity for incorporating a measure of uncertainty on its score.
Figure 5Statistically significant genomic alteration in pleomorphic liposarcoma.
The false discovery rate (q-value, left axis) and score (A′ and D′, right axis) for amplification and deletion (positive and negative respectively, labeled) on the 22 autosomes in genomic coordinates (chromosomes indicated at bottom and in plot by alternating colors, centromere in red). The threshold for significance determines the alterations subject to ROI detection (green). Maximum observed scores of A′ and D′ unattainable by permutation p-value (parentheses, right axis).
Alterations in the pleomorphic liposarcoma genome.
| Locus | Region (Peak) | Q-value | Number of genes | Genetic elements of interest | Spanning known structural variation (CNV) | ||
| Gain | Loss | Unknown | |||||
|
| |||||||
| 1p31.2–p31.1 | 68313202–77461343 | 1.51E-05 | 27(1) |
| 8.8 (8.7) | 11.1 (6.1) | 6.9 |
| 5p15.33–p15.32 | 165712–5444231 | 1.86E-04 | 28 |
| 53.1 (22.2) | 18.9 (5.9) | 27.3 |
| 5p15.2 | 10731281–11126282 | 4.69E-03 | 2 |
| - | 38.1 (3.5) | - |
| 5p15.2–p15.1 | 11413893–15762298 | 1.02E-03 | 7 |
| 3.7 (4.5) | 7.7 (6.6) | 14.1 |
| 5p13.3–p12 | 32750418–45758751 | 1.51E-04 | 64 |
| 8.5 (4.1) | 4.1 (1.4) | 2.3 |
| 7p21.3–p21.1 | 8732462–20510540 | 7.78E-04 | 26 |
| 21.1 (1.6) | 29.5 (1.6) | 14.5 |
| 7p15.3–p14.3 | 23715756–32063427 | 2.50E-05 | 51 |
| 8.8 (4.3) | 6.7 (1.6) | 5.5 |
| 19p12–q13.11 | 24161928–40254153 | <5.22E-06 | 43 |
| 7.3 (1.7) | 3.5 (3.2) | 11 |
| 32690406–34776985 | 1.51E-05 | 1 |
| 4.7 (3) | 16.2 (2.2) | 27.9 | |
| 38552698–38678914 | <5.22E-06 | 2 |
| 87.6 (2.5) | - | 100 | |
| 19q13.12 | 40680588–41040430 | 5.49E-03 | 21 |
| - | 3.7 (1.7) | - |
|
| |||||||
| 1q41–q42.12 | 216302655–221524000 | 4.51E-03 | 29 |
| 0.6 (1) | 0.3 (1) | 13.8 |
| 10q21.3–q22.1 | 68571354–70982955 | 2.65E-05 | 24 |
| 3.8 (5.5) | 3.1 (1.6) | 7.2 |
| 10q22.1–q22.2 | 72407412–76912067 | 2.65E-05 | 46 |
| 14.8 (3) | 1.4 (1) | 4 |
| 10q24.32–q24.33 | 103687999–105303029 | 8.21E-06 | 33 |
| - | 1.2 (1) | - |
| 10q26.11–q26.3 | 119097873–135323432 | 5.35E-05 | 98 |
| 17.8 (4.6) | 10 (5.1) | 13.9 |
| 12p13.33 | 456768–1673782 | 1.15E-05 | 9 |
| 39.3 (4) | 0.7 (1) | 33.2 |
| 13q14.2 | 47917390–48154504 | <5.74E-06 | 2 |
| 35.9 (1) | 35.9 (1) | - |
| 16q22.1 | 67103973–69257715 | 3.69E-03 | 32 |
| 14.3 (5.2) | 31.1 (2.8) | 21.3 |
| 17p13.1 | 6791092–7741807 | <5.74E-06 | 57 |
| - | - | - |
| 17q11.2 | 24079567–27921023 | 1.18E-03 | 49 |
| 5.1 (1) | 4.7 (2.3) | 8.2 |
| 22q13.1–q13.31 | 39296203–46069760 | 2.02E-04 | 84 |
| 12.4 (3.4) | 11.5 (3) | 17 |
Genomic boundaries detected as peaks within regions of contiguous alteration are indented.
False discovery-corrected p-value.
RefSeq (hg17); in parentheses, human microRNAs.
Locus of alteration spanning known population CNV (see Methods), percent genomic coverage; in parentheses, mean sample count.
Unknown: ambiguous direction of copy-number variant.
Boundary spans multiple observed intragenic breakpoints.
TP53 is focally deleted (peak, chr17∶7501467–7574417), but high in analytical error on low-marker count segments.
Marginal evidence of germline alteration in only two normal samples.
Non-genic germline signal in six normal samples spanning only a fraction of the locus, terminating prior to genic content.
Broad and focal alterations stratified by event type. The genetic element of interest is selected from the total genic content of an alteration if it has previously observed somatic mutations in cancer (COSMIC), known oncogenes or tumor suppressors (CGP), implication in pathways altered in liposarcoma, or novel genes of interest for further study [43], [62], [63].
tumorevents with scores greater than the maximum scoring random aberration of the background model (assigning a p-value corresponding to 1/(N+1) to the tumor event, where N is the count of random aberrations) (dashed curve in Figure 4). Because this affects regions of greatest interest, we determined peaks with a simple detector of local maxima in A′ and D′ as these are monotonic with the p-value and maximally resolved for any region of the UBP. These peaks, especially in analyses of uncommon tumor types, are sensitive to the level of error in the system (see text). Therefore we analytically derive a unique and symmetric value of error for each region of the UBP and this determined the sensitivity of peak detection in two ways (Methods S1). First, the detector identifies zero, one, or more peaks (identified with red plus in Figure 4) based on a shoulder sensitivity parameter that we set to two times the median analytical error of all UBP regions in the larger event (all error bars above the peak threshold in Figure 4). Second, we assumed that two or more statistically significant and physically adjacent regions that are assigned summary scores that lie within the error of the other likely do not define unique and independent events. Therefore, regions adjacent to a peak are merged with that locus if their summary score fell within the error bar of the peak (regions with red error bars in Figure 4).