| Literature DB >> 21493686 |
Pedro A F Galante1, Raphael B Parmigiani, Qi Zhao, Otávia L Caballero, Jorge E de Souza, Fábio C P Navarro, Alexandra L Gerber, Marisa F Nicolás, Anna Christina M Salim, Ana Paula M Silva, Lee Edsall, Sylvie Devalle, Luiz G Almeida, Zhen Ye, Samantha Kuan, Daniel G Pinheiro, Israel Tojal, Renato G Pedigoni, Rodrigo G M A de Sousa, Thiago Y K Oliveira, Marcelo G de Paula, Lucila Ohno-Machado, Ewen F Kirkness, Samuel Levy, Wilson A da Silva, Ana Tereza R Vasconcelos, Bing Ren, Marco Antonio Zago, Robert L Strausberg, Andrew J G Simpson, Sandro J de Souza, Anamaria A Camargo.
Abstract
Although patterns of somatic alterations have been reported for tumor genomes, little is known on how they compare with alterations present in non-tumor genomes. A comparison of the two would be crucial to better characterize the genetic alterations driving tumorigenesis. We sequenced the genomes of a lymphoblastoid (HCC1954BL) and a breast tumor (HCC1954) cell line derived from the same patient and compared the somatic alterations present in both. The lymphoblastoid genome presents a comparable number and similar spectrum of nucleotide substitutions to that found in the tumor genome. However, a significant difference in the ratio of non-synonymous to synonymous substitutions was observed between both genomes (P = 0.031). Protein-protein interaction analysis revealed that mutations in the tumor genome preferentially affect hub-genes (P = 0.0017) and are co-selected to present synergistic functions (P < 0.0001). KEGG analysis showed that in the tumor genome most mutated genes were organized into signaling pathways related to tumorigenesis. No such organization or synergy was observed in the lymphoblastoid genome. Our results indicate that endogenous mutagens and replication errors can generate the overall number of mutations required to drive tumorigenesis and that it is the combination rather than the frequency of mutations that is crucial to complete tumorigenic transformation.Entities:
Mesh:
Year: 2011 PMID: 21493686 PMCID: PMC3152357 DOI: 10.1093/nar/gkr221
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Sequencing strategy. Outline of the sequencing strategy and bioinformatics algorithms used for the identification of point mutations and structural chromosomal rearrangements in the HCC1954 and HCC1954BL genomes.
Summary of sequence generation and mapping to the reference human genome sequence for the HCC1954 and HCC1954BL cell lines
| HCC1954 | HCC1954BL | |||
|---|---|---|---|---|
| Capture sequencing | Paired-end sequencing | Capture sequencing | Paired-end sequencing | |
| Total number of reads | 5 996 389 | 381 274 888 | 6 265 250 | 347 891 568 |
| Mapped reads | 5 212 428 | 254 326 859 | 5 106 763 | 237 886 727 |
| Percentage of mapped reads | 86.9 | 66.7 | 81.5 | 68.4 |
| Total number of nucleotides | 3 143 589 263 | 19 392 752 128 | 3 252 428 887 | 15 693 171 704 |
| Mapped nucleotides | 2 257 027 363 | 13 432 965 012 | 2 175 120 803 | 11 166 288 816 |
| Percentage of mapped nucleotides | 71.8 | 69.3 | 66.7 | 71.1 |
Somatic point mutations and structural variations in the HCC1954 and HCC1954BL genomes
| Somatic variations | HCC1954 | HCC1954BL |
|---|---|---|
| Point mutations | 274 (100) | 173 (100) |
| Coding | 64 (23.36) | 30 (17.3) |
| Nonsense | 2 (0.73) | 3 (1.7) |
| Missense | 45 (16.42) | 15 (8.7) |
| Synonymous | 17 (6.20) | 12 (6.9) |
| Non-coding | 14 (5.11) | 15 (8.7) |
| UTR | 13 (4.74) | 13 (7.5) |
| ncRNA | 1 (0.36) | 2 (1.2) |
| miRNA | 0 (0) | 0 (0) |
| Intronic | 179 (65.33) | 114 (65.9) |
| Splice site | 0 (0) | 0 (0) |
| Other intronic | 179 (65.33) | 114 (65.9) |
| Intergenic | 17 (6.20) | 14 (8.1) |
| Structural variations | 94 (100) | 4 (100) |
| Interchromosomal | 49 (52.1) | 0 (0) |
| Intrachromosomal | 45 (47.9) | 4 (100) |
| Deletions | 30 (31.9) | 2 (50.0) |
| Inversions | 11 (11.7) | 2 (50.0) |
| Duplications | 4 (4.3) | 0 (0) |
UTR = untranslated region, ncRNA = non-coding RNA.
Figure 2.Circos plot representing somatic point mutations and structural variations in the (A) HCC1954 and (B) HCC1954BL genomes. Chromosome representations are shown around the outer ring and are oriented in a clockwise direction. Other tracks contain (from outside to inside) point mutations as dots (non-synonymous labeled in back and synonymous labeled in red), physical coverage of the genome by paired-end reads in green, interchromosomal rearrangements represented by colored lines linking two chromosomes (different colors representing interchromosomal rearrangements are determined by the first chromosome in the circos in the clockwise direction starting with chromosome 1), intrachromosomal deletions as blue lines, inversions as black lines and duplications as gray lines.
Single nucleotide variations identified in the HCC1954 and HCC1954BL genomes
| HCC1954 | HCC1954BL | |
|---|---|---|
| Substitutions | 82 355 (92.68) | 83 474 (93.60) |
| Coding | 11 717 (90.92) | 12 373 (93.84) |
| Intronic | 60 314 (92.53) | 61 428 (93.77) |
| UTR | 3419 (92.57) | 3570 (94.04) |
| ncRNA | 256 (96.87) | 260 (96.92) |
| Intergenic | 6649 (91.84) | 5843 (90.86) |
| Indels | 689 (52.10) | 587 (52.81) |
| Coding | 38 (50.00) | 31 (51.61) |
| Intronic | 595 (52.43) | 506 (54.15) |
| UTR | 30 (46.66) | 26 (42.30) |
| ncRNA | 1 (100.00) | 1 (0.00) |
| Intergenic | 25 (52.00) | 23 (39.13) |
UTR = untranslated region, ncRNA = non-coding RNA
Figure 3.Spectrum of nucleotide substitutions in the HCC1954 and HCC1954BL genomes. Frequency of point mutations in each of the six possible nucleotide substitution classes (A > C|T > G, A > G|T > C, A > T|T > A, G > A|C > T, G > C|C > G, G > T|C > A) observed in the HCC1954 (blue) and HCC1954BL (orange) genomes.
KEGG pathway analysis for genes with validated non-synonymous mutations present in the HCC1954 and HCC1954BL genomes
| KEGG ID | KEGG annotation | Number of genes in the pathway | Gene Name | |
|---|---|---|---|---|
| HCC1954 | ||||
| hsa05222 | Small cell lung cancer | 3 | ITGA6 TP53 TRAF2 | 0.0003 |
| hsa05410 | Hypertrophic cardiomyopathy | 2 | ITGA6 MYH7 | 0.0167 |
| hsa04210 | Apoptosis | 2 | TP53 TRAF2 | 0.0169 |
| hsa05414 | Dilated cardiomyopathy | 2 | ITGA6 MYH7 | 0.0191 |
| hsa04010 | MAPK signaling pathway | 3 | ARRB1 TP53 TRAF2 | 0.0237 |
| hsa00770 | Pantothenate and CoA biosynthesis | 1 | DPYD | 0.0325 |
| hsa04360 | Axon guidance | 2 | CFL2 SEMA3A | 0.0335 |
| hsa04614 | Renin-angiotensin system | 1 | LNPEP | 0.0372 |
| hsa05200 | Pathways in cancer | 3 | ITGA6 TP53 TRAF2 | 0.0375 |
| HCC1954BL | ||||
| hsa03440 | Homologous recombination | 1 | EME1 | 0.0234 |
| hsa00310 | Lysine degradation | 1 | SETD2 | 0.0382 |
| hsa04740 | Olfactory transduction | 2 | OR51E2 OR2D2 | 0.0421 |
Figure 4.Protein–protein interactions networks for mutated genes in HCC1954 (A) and HCC1954BL (B). Proteins with validated non-synonymous mutations are represented as red circles and each line represents a confident interaction. Interaction partners with mutated genes are represented in green if they interact with three mutated proteins or in light blue if they interact with two mutated proteins.
Protein–protein interaction analysis for genes with non-synonymous mutations in other solid tumors
| References | Tumor type | Number of genes with non-synonymous mutations | Number of mutated genes with PPI information (%) | Average number of interactions for mutated genes ( | Number of mutated genes with common partner (%) ( | Number of common partners ( |
|---|---|---|---|---|---|---|
| Pleasance | Lung | 90 | 50 (56) | 11.6 (0.2692) | 33 (66) (0.0001) | 42 (0.0870) |
| Pleasance | Melanoma | 188 | 100 (53) | 8.3 (0.8344) | 69 (69) (0.0001) | 103 (0.3130) |
| Ding | Breast basal | 29 | 17 (59) | 8.1 (0.2210) | 7 (41) (0.0001) | 7 (0.0132) |
| Shah | Breast lobular | 32 | 16 (50) | 32.5 (0.0034) | 7 (44) (0.0001) | 28 (0.0011) |
| Clark | GBM | 110 | 40 (36) | 12.9 (0.7269) | 18 (45) (0.0001) | 13 (0.1896) |
| Galante | Breast HCC1954 | 45 | 25 (56) | 33.2 (0.0017) | 17 (68) (0.0001) | 64 (0.0001) |