| Literature DB >> 22839428 |
Li-Fang Kang1, Zheng-Lin Zhu, Qian Zhao, Li-Yong Chen, Ze Zhang.
Abstract
BACKGROUND: Retrogenes generally do not contain introns. However, in some instances, retrogenes may recruit internal exonic sequences as introns, which is known as intronization. A retrogene that undergoes intronization is a good model with which to investigate the origin of introns. Nevertheless, previously, only two cases in vertebrates have been reported.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22839428 PMCID: PMC3565874 DOI: 10.1186/1471-2148-12-128
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Nine human retrogenes that gained introns investigated in this study
| TMEM14D | TMEM14B | 10 < −6 | 1 | 4 | A | |
| RPS3AP5 | RPS3A | 10 < −4 | 1 | 5 | B | |
| XXyac-R12DG2.2 | RCN1 | 13 < −11 | 2* | 5 | B | |
| HSP90B2P | HSP90B1 | 15 < −12 | 2 | 16 | B | |
| HSP90AA4P | HSP90AA1 | 4 < −14 | 3 | 9 | A,B | |
| HSP90AA5P | HSP90AA1 | 3 < −14 | 2 | 7 | B | |
| CSMD3 | RPL18 | 8 < −19 | 1 | 5 | B | |
| WBP2NL | SLC25A5 | 22 < −X | 1 | 3 | B | |
| AC019016.1 | CSNK1A1 | 15 < −5 | 2* | 8 | B |
In the column ‘Movement’, ‘10 < −6’ means a new gene on chromosome 10 is retroposed from a gene on chromosome 6, for example. ‘Intron (−)’ and ‘Intron (+)’ are the numbers of intron losses and intron gains in retrocopies, respectively. For ‘Evidence’, ‘A’, confirmed by RT-PCR; ‘B’, supported by convincing transcription evidence. ‘*’ means that the newly evolved intronic regions of XXyac-R12DG2.2 and AC019016.1 could be spliced in two patterns, respectively.
Figure 1Mechanisms of intron gain in retrogenes. In the parental gene, rectangles represent exons, ‘H’-like tags represent introns, the retroposed regions are indicated in purple, and other regions are indicated in blue. In the retrogene, the retroposed region is indicated in purple and the newly evolved intronic regions are indicated in yellow. Semi-rectangle lines with arrows indicate the direction of transcription. (A) The retrogene RPS3AP5 gained an intron by insertion of an external sequence; (B) the retrogene XXyac-R12DG2.2 evolved a new intron after transcription in the opposite orientation compared to the parent; (C) in retrogene HSP90AA4P three new introns were generated by intronization. There is no mutation at the splice sites in the two introns near the 5′ terminus, whereas one transition from ‘A’ to ‘G’ (indicated in red) at the splice sites occurred in the intron near the 3′ terminus.
Peptide support for intronized retrogenes
| HSP90B2P | NLNFVKGVVDSGGLSLNVSCETLQQHK | PRIDE: 8670 | 86 | Self (4e-19, 100 %) |
| IEKAMVSQCLTESLCALVASQYGWSGNMER | PRIDE: 8670 | 270 | Self (4e-24, 100 %) | |
| AMVSQCLTESLCALVASQYGWSGNMER | PRIDE: 8671; 8668 | 273 | Self (7e-21, 100 %) | |
| MAETIQEVEDEYKAFCK | PRIDE: 8672 | 1 | Self (9e-11, 100 %) | |
| CVFITDDFRDTMPK | PRIDE: 8669 | 72 | Self (7e-08, 100 %) | |
| HSP90AA4P | HNNDEQYAWESSLR | PeptideAtlas: PAp00393519 | 93 | Self (1e-07, 100 %) |
| ADLINNLGTITK | PeptideAtlas: PAp01587648 | 20 | Self (8e-04, 100 %) | |
| DQVANSTIVQR | PeptideAtlas: PAp00565957 | 207 | Self (0.005, 100 %) | |
| HSP90AA5P | IKEIVKKHSQFIGYPITLFVEKKR | PeptideAtlas: PAp00040955; PAp00423980 | 33 | Self (2e-17, 100 %) |
| HGLEVIYMIELIDKYCVQQLK | PeptideAtlas: PAp00040711 | 199 | Self (2e-15, 100 %) |
‘a’, Database name and experiment numbers or identifiers. ‘b’, BLASTP search against the GenBank non-redundant protein database (e-value and maximum identity of the match are shown in parentheses [32,33]).
Figure 2Comparison of splicing signals of retrogene introns (after retroposition) and their corresponding regions in the parental gene (before retroposition). The y-axis is the sum of the percentile scores of four different signals, comprising the branch site (BS), polypyrimidine tract (PPT), and 5′ and 3′ splice sites (Table 3). The higher the score, the stronger the splicing signal. Scores for BS and PPT were calculated with algorithm ‘K’ [47] and those for the splice sites were calculated with ‘M’ [48] by SROOGLE [41]. Six different retrogene introns are plotted on the x-axis. If a retrogene evolved only one intron (TMEM14D), we used the gene name to represent the intron, or we marked different introns in one retrogene in the format of the gene name plus a serial number following the hyphen. For example, ‘HSP90AA4P-1’ represents the first intron (in the direction from 5′ to 3′) in the retrogene HSP90AA4.
Percentile scores [41] of splicing signals of retrogene introns (after retroposition) and their corresponding regions in the parental gene (before retroposition)
| TMEM14D | GC-AG | 0.14 | 0.06 | 0.02 | 0.02 | 0.14 | 0.04 | 0.01 | 0.11 |
| (HSP90B2P-1) | GT-AG | 0.39 | 0.39 | 0 | 0.01 | 0.39 | 0.24 | 0 | 0 |
| HSP90B2P-2 | GT-AG | 0.5 | 0.03 | 0 | 0.01 | 0.5 | 0.02 | 0 | 0.02 |
| HSP90AA4P-1 | GT-AG | 0.21 | 0.03 | 0.04 | 0.12 | 0 | 0 | 0.04 | 0 |
| HSP90AA4P-2 | GT-AG | 0.03 | 0.06 | 0.04 | 0.28 | 0.03 | 0.06 | 0.04 | 0.25 |
| (HSP90AA4P-3) | GT-AG | 0.56 | 0.33 | 0 | 0.03 | 0.56 | 0.22 | 0 | 0.02 |
| (HSP90AA5P-1) | TT-AG | 0.25 | 0.27 | 0 | 0.54 | 0 | 0 | 0 | 0 |
| HSP90AA5P-2 | GT-AG | 0.45 | 0.45 | 0.01 | 0.15 | 0.12 | 0.15 | 0.01 | 0.21 |
| AC019016.1-1 | GT-AG | 0.91 | 0.35 | 0.11 | 0.84 | 0.61 | 0.35 | 0.11 | 0.82 |
| (AC019016.1-2) | GT-AG | 0.91 | 0.35 | 0.47 | 0.84 | 0.61 | 0.35 | 0 | 0.82 |
The higher the score, the stronger the splicing signal is. The scores for BS and PPT were calculated with the ‘K’ algorithms [47], and those for 5′ SS and 3′ SS were calculated with ‘M’ [48] by SROOGLE [41]. The intron symbol is in the format of the gene name plus a serial number following the hyphen. For example, ‘HSP90B2P-1’ indicates the first intron (in the direction from 5′ to 3′) in HSP90B2P. If a retrogene evolved only one intron (TMEM14D), the intron is represented by the gene name. In the column ‘Intron symbol’, parentheses indicate that the splice sites underwent base substitution.
Substitution rates between the intronic and exonic regions of retrogenes and their corresponding regions of parental genes
| TMEM14Dc | 0.062 | 0.058 | 1.074 | 0.936 | 105 | 0.006 | 0.014 | 0.440 | 0.570 | 237 |
| RPS3AP5a | NA | NA | NA | NA | NA | 0.017 | 0.014 | 1.210 | 0.172 | 780 |
| XXyac-R12DG2.2b | 0.024 | 0.029 | 0.830 | 0.892 | 129 | 0.008 | 0.012 | 0.643 | 0.631 | 813 |
| HSP90B2Pac,* | 0.823 | 0.597 | 1.379 | 0.526 | 144 | 0.045 | 0.067 | 0.678 | 0.091 | 2163 |
| HSP90AA4Pc,* | 0.104 | 0.277 | 0.374 | 0.000 | 744 | 0.055 | 0.085 | 0.656 | 0.000 | 1374 |
| HSP90AA5Pc,* | 0.087 | 0.215 | 0.406 | 0.001 | 672 | 0.088 | 0.221 | 0.400 | 0.082 | 897 |
| CSMD3b | 0.313 | 0.575 | 0.544 | 0.051 | 291 | 0.186 | 0.282 | 0.659 | 0.310 | 225 |
| WBP2NLb | 0.033 | 0.088 | 0.373 | 0.192 | 177 | 0.385 | 0.377 | 1.021 | 0.919 | 684 |
| AC019016.1c | 0.083 | 0.082 | 1.010 | 0.978 | 636 | 0.081 | 0.175 | 0.466 | 0.084 | 273 |
Ka represents the non-synonymous substitution rate and Ks indicates the synonymous substitution rate. The P-value was calculated with the likelihood ratio test and the null hypothesis was Ka/Ks =1. NA: not available (the corresponding parental sequence of the new intron in retrogene RPS3AP5 did not exist, because the intron was created by insertion of an external sequence). ‘a’, The retrogene gained introns by insertion of an external sequence. ‘b’, The retrogene gained introns after transcription in the opposite orientation compared to the parent. ‘c’, The retrogene gained introns by intronization. ‘*’, Evidence at the protein level for transcription of the retrogene was obtained.