| Literature DB >> 21166792 |
Ruth B McCole1, Noeleen B Loughran, Mandeep Chahal, Luis P Fernandes, Roland G Roberts, Franca Fraternali, Mary J O'Connell, Rebecca J Oakey.
Abstract
Retroposition is a widespread phenomenon resulting in the generation of new genes that are initially related to a parent gene via very high coding sequence similarity. We examine the evolutionary fate of four retrogenes generated by such an event; mouse Inpp5f_v2, Mcts2, Nap1l5, and U2af1-rs1. These genes are all subject to the epigenetic phenomenon of parental imprinting. We first provide new data on the age of these retrogene insertions. Using codon-based models of sequence evolution, we show these retrogenes have diverse evolutionary trajectories, including divergence from the parent coding sequence under positive selection pressure, purifying selection pressure maintaining parent-retrogene similarity, and neutral evolution. Examination of the expression pattern of retrogenes shows an atypical, broad pattern across multiple tissues. Protein 3D structure modeling reveals that a positively selected residue in U2af1-rs1, not shared by its parent, may influence protein conformation. Our case-by-case analysis of the evolution of four imprinted retrogenes reveals that this interesting class of imprinted genes, while similar in regulation and sequence characteristics, follow very varied evolutionary paths.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21166792 PMCID: PMC3107425 DOI: 10.1111/j.1558-5646.2010.01213.x
Source DB: PubMed Journal: Evolution ISSN: 0014-3820 Impact factor: 3.694
Information on the retrogenes studied, their “parent” and “host” genes
| Imprinted retrogene | Accession number | Parent gene | Parent gene accession number | Position of retrogene in mouse genome build mm9 | Host gene | Host intron size (bp) |
|---|---|---|---|---|---|---|
| DQ648020 | BC028317 | 5,329 | ||||
| NM_025543 | NM_026902 | 2,480 | ||||
| NM_021432 | NM_008671 | 21,751 | ||||
| NM_138742 | ||||||
| NM_011663 | NM_178754 | 25,337 |
Figure 1Timing of retroposition events within mammalian evolution. Arrows indicate the time of insertion of the retrogene indicated, given the sequence data available to date.
Figure 2Expression of the retrogene and parent gene transcripts in multiple tissues and developmental stages. Expression was assayed by reverse-transcriptase PCR with Actin as a control for each different tissue sample (bottom row). Tissues where only one member of a gene family is expressed are highlighted with black squares.
Figure 3Phylogenetic reconstruction for each gene family. Posterior probabilities for each branch are shown. ω values for the best-fit site-specific model (Mcts and Nap1l families) and foreground and background for best-fit lineage-specific model (Inpp5f_v2-Vma21 and U2af1-rs families) are shown.
Summary of codon evolution model that fits each gene family best following likelihood ratio test analysis
| Model | Estimates of parameters | Chi-squared test result | Positively selected sites |
|---|---|---|---|
| Inpp5f_v2-Vma21 family | |||
| Site specific model: Model 3: Discrete(K=2) | p0=0.49, p1=0.51 ω0=0.02, ω1=0.19 | M0 v M3k2 2(2039.75–2024.97)=29.56* Critical value≥5.99 | No positive selection |
| Lineage specific model: Model B | p0=0.23, p1=0.22, p2=0.28, p3=0.27 Parent Lineages (Background): ω0=0.01, ω1=0.19, ω2=0.15, ω3=0.19 | M3k2 v Model B 2(2024.97–2020.32)=9.3* Critical value≥5.99 | No positive selection |
| Retrogene Lineages (Foreground): ω0=0.01, ω1=0.19, ω2=0.80, ω3=0.80 | |||
| Mcts family | |||
| Site specific model: Model 3: Discrete(K=2) | p0=0.89, p1=0.11 ω0=0.01, ω1=0.27 | M0 v M3k2 2(2554.31–2528.13)=52.36* Critical value≥5.99 | No positive selection |
| Nap1l family | |||
| Site specific model: Model 3: Discrete(K=2) | p0=0.50, p1=0.50 ω0=0.06, ω1=0.38 | M0 v M3k2 2(12832.55–12667.28)=330.54* Critical value≥5.99 | No positive selection |
| U2af1-rs family | |||
| Site specific model: Model 3: Discrete(K=3) | p0=0.58, p1=0.31, p2=0.10 ω0=0.03, ω1=0.24, ω2=1.30 | M3k2 v M3k3 1(7302.16–7293.06)=9.1* Critical value≥1.00 | 36 sites, p.p.>0.5 15 sites, p.p.>0.95 4 sites, p.p>0.99 |
| Lineage specific model: Model B | p0=0.74, p1=0.18, p2=0.06, p3=0.01 Parent gene lineages (Background): ω0=0.06, ω1=0.79, ω2=0.06, ω3=0.78 | M3k2 v Model B 2(7302.16–7284.63)=107.06* Critical value≥5.99 | Foreground: 10 sites, p.p.>0.5 2 sites, p.p.>0.95 3 sites, p.p.>0.99 |
| Retrogene lineages (Foreground): ω0=0.06, ω1=0.79, ω2=3.74, ω3=3.74 |
Model 3 categorizes each site in the alignment into either two (K =2) or three (K =3) categories of ω, the values for ω are estimated based on the data. The proportion of sites with these ω values is given as “p” with the corresponding subscript for the ω value. Model B allows a specific branch of the phylogenetic tree to be marked as foreground and categorizes the sites into four proportions, p0, 1, 2, and 3, with four different values of ω estimated for the foreground and background independently.
Where models predicted categories of sites with ω>1, indicating positive selection, the estimated numbers of sites with posterior probabilities >0.5, >0.95, and >0.99 of belonging to this category are listed. Codons are estimated as belonging to the category of positively selected using Naïve Empirical Bayes analysis only if Bayes Empirical Bayes is not available.
Positions of positively selected codons in the U2af1-rs1 retrogene lineage
| Position in alignment | Amino acid in retrogene lineage | Amino acid in parent lineage | Position in retrogene protein | Position in parent protein | |
|---|---|---|---|---|---|
| 38 | M | L | 0.659 | 33 | 38 |
| 46 | A | L | 0.57 | 41 | 46 |
| 63 | L | E | 0.996 | 55 | 62 |
| 154 | E | G | 0.576 | 142 | 154 |
| 206 | V | I | 0.745 | 192 | 206 |
| 313 | V | M | 0.678 | 300 | 313 |
| 355 | P | D | 0.997 | 342 | 355 |
| 361 | S | F | 0.501 | 348 | 361 |
| Y(mouse) | |||||
| 384 | H(rat) | R | 0.875 | 371 | 384 |
| 385 | H | R | 0.965 | 372 | 385 |
| 388 | S | P | 0.528 | 373 | 388 |
| 480 | E | S | 0.993 | 415 | 475 |
| 485 | G | R | 0.593 | 420 | 480 |
| 491 | H | R | 0.942 | 426 | 486 |
| 493 | T | R | 0.802 | 428 | 488 |
The position differs from alignment to protein as the alignment file contains sequence gaps.
Our confidence in each of these sites being positively selected is calculated using the posterior probability and summarized in the P values shown. P values vary from 0.00 (no evidence for belonging in the positively selected category) to 1.00 (100% confidence of belonging in the positively selected category).
Dark gray area refers to residues deemed to be false positives due to poor alignment of the U2af1-rs sequences.
Figure 4Three-dimensional structure of U2af1-rs proteins. (A) Disorder prediction. Positions of all the positively selected residues along the protein are shown as gray triangles. Position of U2af35-homologous domain shown as thick black line. Positively selected residue within this domain shown as white triangle. Thin black line denotes disorder probability 0.5. Values above this predict disorder. (B and C) Closeup view of the neighboring residues to the positively selected residue. (D) U2AF1-RS2 isoelucine. (E) U2AF1-RS1 valine. Residues within 6 Angstrom cut-off from the isoleucine residue or valine residues are colored by secondary structure: beta-sheets yellow, alpha helix purple, and coil white.
Figure 5Expression for mouse retrogenes and their parent genes during spermatogenesis. GC RMA values for two biological replicates were averaged. Data were extracted from GEO dataset GDS2930 from Namekawa et al 2006. Probe identifiers were 1425018_at for Mcts1, 1451058_at for Mcts2, 1417411_at for Nap1l5, 1418046_at for Nap1l2, 1449354_at for U2af1-rs1, and 1455727_at for U2af1-rs2. There was no specific probe for Inpp5f_v2, so this gene could not be included.