| Literature DB >> 24714652 |
Susanna L Cooke1, Adam Shlien1, John Marshall1, Christodoulos P Pipinikas2, Inigo Martincorena1, Jose M C Tubio1, Yilong Li1, Andrew Menzies1, Laura Mudie1, Manasa Ramakrishna1, Lucy Yates1, Helen Davies1, Niccolo Bolli3, Graham R Bignell1, Patrick S Tarpey1, Sam Behjati3, Serena Nik-Zainal1, Elli Papaemmanuil1, Vitor H Teixeira2, Keiran Raine1, Sarah O'Meara1, Maryam S Dodoran1, Jon W Teague1, Adam P Butler1, Christine Iacobuzio-Donahue4, Thomas Santarius5, Richard G Grundy6, David Malkin7, Mel Greaves8, Nikhil Munshi9, Adrienne M Flanagan10, David Bowtell11, Sancha Martin1, Denis Larsimont, Jorge S Reis-Filho12, Alex Boussioutas13, Jack A Taylor14, Neil D Hayes15, Sam M Janes2, P Andrew Futreal1, Michael R Stratton1, Ultan McDermott16, Peter J Campbell17.
Abstract
Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.Entities:
Mesh:
Year: 2014 PMID: 24714652 PMCID: PMC3996531 DOI: 10.1038/ncomms4644
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Somatic pseudogenes identified across 660 cancer samples
| PD7354c | Lung | 19–30 | No | Unmapped | NA | No | |
| PD7354h | Lung | 15–24 | No | Intergenic | 12 bp | Yes | |
| PD7354h | Lung | 11–14; 44 | No | Intron 15 | 7 bp | Yes | |
| PD7354h | Lung | 21–26 | No | Intergenic | 18 bp | No | |
| PD7354h | Lung | 11–14 | No | Intergenic | NA | No | |
| PD7354h | Lung | 1–9 | No | Unmapped | NA | No | |
| PD7354k | Lung | 14–16 | No | Unmapped | NA | No | |
| PD7354k | Lung | 1–11 | Yes | Unmapped | NA | No | |
| PD7354k | Lung | 13–20 | No | Intergenic | None | No | |
| PD7354k | Lung | 6–12 | No | Unmapped | NA | No | |
| PD7354k | Lung | 5–14 | No | Unmapped | NA | No | |
| PD7354k | Lung | 1–3 | Yes | Upstream | None | No | |
| PD7354r | Lung | 5; 8–9 | No | Upstream | 10 bp | No | |
| PD7354r | Lung | 6–14 | No | Intergenic | 9 bp | No | |
| PD7354r | Lung | 1–6 | Yes | Rearrangement | None | No | |
| PD7354r | Lung | 8–12 | No | Intergenic | 5 bp | No | |
| PD7355a | Lung | 5–9 | No | Intron 1 | None | No | |
| PD7356c | Lung | 6–9 | No | Intergenic | 14 bp | No | |
| PD7356c | Lung | 28–41 | No | Intergenic | 10 bp | Yes | |
| PD7356c | Lung | 5–8; 12 | No | Intron 3 | 16 bp | Yes | |
| PD7356c | Lung | 1–4; 6–8 | Yes | Intron 3 | 14 bp | No | |
| PD7356i | Lung | 1–5 | Yes | Intron 11 | 10 bp | No | |
| PD4864b | Lung | 1–9 | No | Intergenic | NA | No | |
| PD4864b | Lung | 1–8 | No | Unmapped | NA | No | |
| PD4861b | Lung | 4–10 | No | Intergenic | None | No | |
| PD4861b | Lung | 4–9 | No | Intron 1 | 9 bp | No | |
| PD6377a | Gastric | 25–27 | No | Intron 15 | NA | Yes | |
| PD6384a | Gastric | 1–11; 17 | No | Intergenic | NA | Yes | |
| PD6388a | Gastric | 1–5 | Yes | Intron 1 | 15 bp | No | |
| PD7261a | Colorectal | 1–14 | Yes | Intron 1 | NA | No | |
| PD9061a | Colorectal | 18–29 | No | Intergenic | NA | No | |
| PD6022a | Gastric | 1–6 | Yes | Unmapped | NA | No | |
| PD6037a | Cholangiocarcinoma | 7–16 | No | Unmapped | NA | No | |
| PD4226a | Breast | 8–10 | No | Unmapped | NA | Yes | |
| PD4226a | Breast | 10–13 | No | Unmapped | NA | No | |
| PD6368a | Chondrosarcoma | 1–5 | Yes | 3' UTR | NA | No | |
| LB771-HNC | Cell line (H&N) | 1–3; 4–9 | No | 3′ UTR | 17 bp | Yes | |
| LB771-HNC | Cell line (H&N) | 7–14 | No | 3′ UTR | None | No | |
| NCI-H2009 | Cell line (lung) | 3–8; 8 | No | Intergenic | 17 bp | Yes | |
| NCI-H2009 | Cell line (lung) | 12–17 | No | Exon 1 | None | No | |
| NCI-H2009 | Cell line (lung) | 1–29 | Yes | Intergenic | None | No | |
| NCI-H2087 | Cell line (lung) | 1–4 | Yes | Intergenic | 16 bp | No |
Insertion sites that could not be mapped may be due to insertion into repetitive sequences or failure of exon capture to include UTRs. H&N, head and neck carcinoma; NA, not available.
Figure 1Somatic pseudogenes.
(a) A somatic FOPNL pseudogene in a non-small cell lung cancer. Sequencing reads from high-coverage whole-genome shotgun sequencing of the tumour reveal a series of split reads (red) crossing the four canonical exon–exon splice junctions in the gene. In addition, read pairs map to adjacent exons with an insert size larger than expected (light brown). At either end of the gene, read pairs linking to chr7 could be identified, revealing that the FOPNL pseudogene is inserted into intron 11 of the SND1 gene in the opposite orientation with an intact polyA tail and a target-site duplication of 10 bp. (b) A somatic ARHGEF9 pseudogene in a non-small cell lung cancer. The insertion was confirmed as somatic by PCR (Supplementary Fig. 2) and capillary sequencing across an exon–exon junction and insertion site.
Figure 2Properties of somatic pseudogenes.
(a) Histogram showing the fraction of somatic pseudogenes with particular features. (b) Sequences of target-site duplications (between square brackets) and adjacent genomic regions, showing that the polyA tail of the somatic pseudogene inserts in the consensus TTTTAA sequence between the TTTT and AA. Target-site deletions were also occasionally seen (deleted sequence between the round brackets). (c) An example of an internal inversion in a somatic pseudogene, inserted into intergenic sequence. The insertion was confirmed as somatic by PCR. (d) Phylogenetic trees for four patients in whom multiple samples were sequenced, showing at which stage during the evolution of the cancer somatic pseudogenes were acquired.
Figure 3Tissue-specific patterns of somatic pseudogenes.
(a) Expression of template genes for somatic pseudogenes (individual points) compared with all genes for the most frequently affected organ sites. The violin plot formulation for all genes shows the median (white point), interquartile range (thick black line) and 1.5 × the interquartile range (thin black line). The coloured shapes denote a kernel density plot of the distribution of gene expression levels for all genes. Due to the non-normal distribution, we used Wilcoxon rank-sum tests to test whether expression levels of template genes for somatic pseudogenes were different to that expected. (b) RNA-sequencing data showing expression of the MLL-KRT6A fusion gene. (c) Deletion of the promoter and first exon of MGA during somatic pseudogene insertion, leading to abrogation of expression from that allele.