| Literature DB >> 19542187 |
Silvio Bicciato1, Roberta Spinelli, Mattia Zampieri, Eleonora Mangano, Francesco Ferrari, Luca Beltrame, Ingrid Cifola, Clelia Peano, Aldo Solari, Cristina Battaglia.
Abstract
The integration of high-throughput genomic data represents an opportunity for deciphering the interplay between structural and functional organization of genomes and for discovering novel biomarkers. However, the development of integrative approaches to complement gene expression (GE) data with other types of gene information, such as copy number (CN) and chromosomal localization, still represents a computational challenge in the genomic arena. This work presents a computational procedure that directly integrates CN and GE profiles at genome-wide level. When applied to DNA/RNA paired data, this approach leads to the identification of Significant Overlaps of Differentially Expressed and Genomic Imbalanced Regions (SODEGIR). This goal is accomplished in three steps. The first step extends to CN a method for detecting regional imbalances in GE. The second part provides the integration of CN and GE data and identifies chromosomal regions with concordantly altered genomic and transcriptional status in a tumor sample. The last step elevates the single-sample analysis to an entire dataset of tumor specimens. When applied to study chromosomal aberrations in a collection of astrocytoma and renal carcinoma samples, the procedure proved to be effective in identifying discrete chromosomal regions of coordinated CN alterations and changes in transcriptional levels.Entities:
Mesh:
Year: 2009 PMID: 19542187 PMCID: PMC2731905 DOI: 10.1093/nar/gkp520
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of SODEGIR procedure: (1) statistical estimation of CN and transcriptional scores at common genomic positions; (2) identification of significant overlap of differentially expressed and genomic imbalanced regions (SODEGIR) on a single-sample basis; and (3) aggregation of SODEGIRs from different samples to obtain global signatures of tumor types.
Impact of CN, GE and SODEGIR regions in terms of total megabases (Mb) and percentage of the genome (%) for the various tumor samples
| Sample ID | CN | GE | SODEGIR | |||
|---|---|---|---|---|---|---|
| Mb | % | Mb | % | Mb | % | |
| Caki_100K | 469.884 | 16 | 567.536 | 19 | 166.246 | 6 |
| Caki_250K | 488.047 | 16 | 567.536 | 19 | 172.702 | 6 |
| HF0017 | 190.089 | 6 | 67.354 | 2 | 30.451 | 1 |
| HF0108 | 318.065 | 11 | 164.59 | 5 | 89.831 | 3 |
| HF0152 | 185.689 | 6 | 218.398 | 7 | 44.591 | 1 |
| HF0491 | 289.853 | 10 | 184.265 | 6 | 11.172 | 0 |
| HF0608 | 180.63 | 6 | 356.129 | 12 | 42.516 | 1 |
| HF1139 | 302.704 | 10 | 349.615 | 12 | 228.719 | 8 |
| HF1232 | 380.549 | 13 | 301.414 | 10 | 217.409 | 7 |
| HF1269 | 459.684 | 15 | 432.209 | 14 | 140.952 | 5 |
| HF1344 | 382.209 | 13 | 393.79 | 13 | 262.443 | 9 |
| HF1442 | 247.563 | 8 | 176.247 | 6 | 100.145 | 3 |
| HF1469 | 100.576 | 3 | 67.683 | 2 | 1.784 | 0 |
| HF1511 | 255.157 | 9 | 39.215 | 1 | 13.809 | 0 |
| 27CG | 452.867 | 15 | 477.841 | 16 | 213.958 | 7 |
| 28RA | 443.696 | 15 | 506.478 | 17 | 291.786 | 10 |
| 33BV | 509.906 | 17 | 171.066 | 6 | 150.24 | 5 |
| 36MML | 538.907 | 18 | 232.542 | 8 | 141.616 | 5 |
| 37BA | 295.591 | 10 | 375.085 | 13 | 36.954 | 1 |
| 40RR | 208.409 | 7 | 52.964 | 2 | 14.09 | 0 |
| 45DM | 177.191 | 6 | 399.721 | 13 | 91.845 | 3 |
| 46SA | 551.945 | 18 | 582.158 | 19 | 194.366 | 6 |
| 47CA | 380.862 | 13 | 394.464 | 13 | 169.46 | 6 |
| 49CA | 472.772 | 16 | 336.733 | 11 | 242.828 | 8 |
| 50PC | 341.269 | 11 | 410.785 | 14 | 159.763 | 5 |
| 51MI | 416.003 | 14 | 456.929 | 15 | 72.311 | 2 |
Values have been derived from.SDG_Table files of each sample, as deposited in CWS.
Figure 2.Visualization of SODEGIR results for the analysis of Caki-1 single sample using 100K mapping array. (A) Genome view: regions of CN gain/loss, GE up-/down-regulation and deleted (CN loss and GE down-regulation) and amplified SODEGIRs (CN gain and GE up-regulation) are shown as boxes on each chromosome. As in the cPlot view of R geneplotter package, horizontal lines represent chromosomes and grey bars indicate gene positions. Three lines per chromosome and shades of red and green are used to display CN gain/loss, GE up-/down, and SODEGIRs amplified and deleted. (B) Chromosome view of chromosome 3: CN status (N_AB) and LOH status as estimated by the CNAG HMM on each SNP probe, CN, GE and SODEGIR statuses as determined by the SODEGIR procedure on gene positions for a given chromosome in a single sample. The grey bars indicate SNP probes (in N_AB and LOH lanes) or Entrez Gene ID positions (for CN, GE and SODEGIR lanes). Red and green bars in the N_AB lane indicate N_AB >3 and <1, respectively. Blue bars in the LOH lane highlight SNP probes with an inferred LOH value >20. Green bars in CN, GE, and SODEGIR lanes indicate loss, down-, or deletion (i.e. a status of 1). Red bars in CN, GE, and SODEGIR lanes indicate gain, up-, or amplification (i.e. a status of 3). (C) Genome boxplot: distribution of gene GE scores according to gene CN scores on all SODEGIRs of the entire genome. CN levels are categorized into five bins highlighting two ranges of loss (green boxes, gene CN score <−0.1), one range of diploidy (white box, gene CN score between −0.1 and 0.1) and two ranges of gain (red boxes, gene CN score >0.1). (D) Chromosome box plot for chromosome 3: distribution of GE scores according to gene CN scores on all the SODEGIRs of a specific chromosome (e.g. chromosome 3).
Figure 3.Visualization of SODEGIR results for the analysis of an Astro single sample (e.g. HT1139). (A) genome view and (B) genome box plot.
Figure 4.Visualization of SODEGIR results for the analysis of an RCC single sample (e.g. 50PC). (A) Genome view and (B) genome box plot.
Figure 5.Chromosome views. (A) and (B) show deleted SODEGIR on chromosome 10 of an Astro sample (e.g. HT1139) and on chromosome 3 of an RCC sample (i.e. 50PC); (C) and (D) report amplified SODEGIR for chromosomes 7 and 5 of an Astro (HT1139) and an RCC (50PC) sample, respectively.
Figure 6.Results of the aggregation of SODEGIRs in the analysis of the Astro dataset. (A) q_plot: The statistical significance for the aggregation of amplifications/deletions is displayed as q-value. Chromosome positions are indicated along the y-axis with the centromere positions identified by yellow dotted lines. Amplifications (red lines) and deletions (green lines) that are shared by a statistically relevant number of samples surpass the significance threshold (blue dotted line, q-value ≤0.05). (B) SDG chromosome view for chromosome 10 in all astrocytoma samples. (C) SDG chromosome view for chromosome 7 in all astrocytoma samples.
Summary of the SODEGIR signatures for the astrocytoma and renal carcinoma datasets
| Chr | Cytoband | Start (Mb) | End (Mb) | Length (Mb) | No of genes | Relevant cancer genes |
|---|---|---|---|---|---|---|
| 7 | p22.1–q11.22 | 4.7 | 70.9 | 66.2 | 244 | |
| q21.13–q35 | 89.6 | 144.0 | 54.4 | 344 | ||
| q36.1-q36.1 | 149.0 | 151.0 | 2.0 | 32 | ||
| 10 | 10p15.1–10p12.2 | 4.9 | 23.6 | 18.8 | 82 | |
| 10q21.3–10q26.3 | 70.2 | 132.0 | 61.8 | 387 | ||
| 5 | 5q21.1–5q21.2 | 99.9 | 103.0 | 3.1 | 9 | — |
| 5q21.3–5q35.3 | 108.0 | 179.0 | 71.0 | 419 | ||
| 3 | 3p22.3–3p14.1 | 35.7 | 65.3 | 29.6 | 279 | |
Amplified and deleted SODEGIRs are described in terms of cytoband, chromosomal region, and total number of annotated genes. Values have been derived from.SDGset_Table files of datasets, as deposited in CWS.
Figure 7.Results of the aggregation of SODEGIRs in the analysis of the RCCp dataset. (A) q_plot: The statistical significance for the aggregation of amplifications/deletions is displayed as q-value. Chromosome positions are indicated along the y-axis with the centromere positions identified by yellow dotted lines. Amplifications (red lines) and deletions (green lines) that are shared by a statistically relevant number of samples surpass the significance threshold (blue dotted line, q-value≤0.05). (B) SDG chromosome view for chromosome 5 in all RCC samples. (C) SDG chromosome view for chromosome 3 in all RCC samples.