| Literature DB >> 18842134 |
Robert Baertsch1, Mark Diekhans, W James Kent, David Haussler, Jürgen Brosius.
Abstract
BACKGROUND: Evolution via point mutations is a relatively slow process and is unlikely to completely explain the differences between primates and other mammals. By contrast, 45% of the human genome is composed of retroposed elements, many of which were inserted in the primate lineage. A subset of retroposed mRNAs (retrocopies) shows strong evidence of expression in primates, often yielding functional retrogenes.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18842134 PMCID: PMC2584115 DOI: 10.1186/1471-2164-9-466
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Description and distribution of expressed retrocopy events
| Type of event | Parent gene contribution | Count | Percentage |
| Type I – Exon acquisition (host gene modified by retrocopy) | New 5' exon (UTR and/or N-terminal protein coding) | 10 | 1% |
| New 3' exon (UTR and/or C-terminal protein coding) | 18 | 2% | |
| New internal exon | 6 | 1% | |
| Type II – duplication (no host gene involved) | Single exon | 624 | 86% |
| Exons/introns generated, post insertion | 55 | 8% | |
| Type III – novel genes (no host gene involved) | Antisense, majority of ORF out-of-frame wrt to parent, and other cases (e.g., from non-genic regions) | 13 | 2% |
| Total | 726 | ||
Type I: retrocopy inserted into or near an existing gene. A portion of the retrocopy contributes, mostly by alternative splicing, a new sequence to a pre-existing mRNA. Type I events can be divided into cases that add new N- or C-terminal encoding exons or internal exons. Type II: duplicated gene inserted at a locus where no prior gene existed. Type II events often acquired 5' or 3' UTR portions from the locus of integration after the insertion. Type III: novel gene sequence, whose encoded protein has little or no amino acid sequence similarity to that of the retrocopy's parent. Frequently, Type III events include SINEs, LINEs, LTRs etc., as well as unannotated sequences as additional contributors to gene candidates.
Figure 1Categories of Type I retrocopy events. A. Examples of Type Ia exon acquisitions contributed by "same orientation" of retrocopies (in magenta or dark red) with respect to host gene (light blue); not drawn to scale, splice events are marked by angled lines, open reading frames are depicted as vertically striped thick bars, UTRs by medium size bars, introns in the host gene as light blue lines (for symbols and colors, see also keys below). When parts of retrocopies are described they correspond to what they used to be in the parent gene. The retrocopy's start and stop codons are shown by green and red vertical bars, respectively. Retrogene parts apparently not recruited as functional modules are overlayered with gray. B, Examples of Type Ib exon acquisitions contributed by "reverse orientation" retrocopies. For detailed descriptions, see text.
Type I retrocopy-exon acquisition events
| 1A-1 | Retro adds PKC-activated phosphatase-1 inhibitor domain to LIM, Zinc binding, PDZ kinase gene at alternatively spliced C-terminal exon. | 2 spliced mRNAs, 4 spliced ESTs Reviewed RefSeq | 686 aa | y but stop codon after 13 aa | y but stop codon after 19 aa | n | n | |
| 1A-2 | Retro adds centaurin domain to C-terminal of cyclin gene,. | 2 spliced mRNAs, 2 spliced ESTs | 663 aa | y but seq. gap | Y | y 3 and 63 bp insertion, but 3 in-frame stops, occurring after 450 aa | n | |
| 1A-3 | 14 aa alternative 3' exon sense orientation out of frame wrt coding region of parent | 6 spliced ESTs | 244 aa | y 244 aa | y 233 aa | y 241 aa | Not assembled | |
| 1A-4 | Retro insertion triggered new alt spliced C-terminal exon for RPS29. Most of the retro became 3' UTR. | 1 spliced mRNA, > 20 spliced ESTs | 67 aa | y 67aa | y 67 aa | y 64aa | y 81 aa | |
| 1A-5 | Retro contributed PHD/zinc finger with bromodomain. | 3 spliced mRNAs, > 10 spliced ESTs | 1205 aa | y 1205 aa | y 1205 aa | y 1006 aa | y 885 aa | |
| 1A-6 | Retro swapped in C-terminal portion of GPCR. New ligand in primates? | 2 spliced mRNAs, 2 spliced ESTs | 462 aa | y but frameshift early in ORF | y but frameshift early in ORF | y 462 aa | y but frameshift early in ORF | |
| 1B-1 | Antisense internal cassette (alt spliced) exon inserted by retro. | 1 spliced mRNA | 1354 aa | y 100% open | y 100% open | y 100% open | n | |
| 1B-2 | Antisense alt. spliced internal cassette exon contributed by retro. Protein evidence in Swiss-Prot and PDB | 1 spliced mRNA PDB 1N83,1SOX | 556 aa | y in frame stop in first 20 aa | y 556 aa | y late translational start (31aa shorter) | y difficult to check ORF. | |
| 1B-3 | Antisense retro triggered shorter alternative transcript in apes. | 2 spliced mRNAs, 3 spliced ESTs | 332 aa | y 332aa | N | n | n | |
| 1B-3 | Antisense, ancient | 8 spliced mRNAs | 807 aa | y | Seq gap | y | ||
| / | 1B-3 | Antisense, alt. spliced C-terminus | 2 spliced mRNAs | 377 aa | y | y | ||
| 1B-4 | Retro contributed internal 15 aa antisense exon. | 2 spliced mRNAs, 5 spliced ESTs | 238 aa | y 100% open | y 100% open | y no ORF splice site has indel | ||
| 1B-5 | Antisense retro triggered slightly later start via novel exon. Alternative translation initiation. Reviewed RefSeq. | 1 spliced mRNA 2 spliced ESTs | 905 aa | y 905 aa | y 100% open | y 905 aa | n/a | |
| 1B-5 | Primate specific antisense internal cassette exon generated short alt spliced transcript encoding 396 aa. Possible alternative translation initiation. | 6 spliced mRNAs | 396 aa | y 396 aa | y ORF ok but missing splice site | y no start codon | ||
| 1B-6 | Antisense retro triggered different start via novel exon. | 1 spliced mRNA | 3667 aa | seq gap | f/s wrt human. | y f/s wrt to human. Same as PPY | ||
Examples of specific categories of retrocopy insertion events shown in Figure 2 with designation of parent gene. Genomic locations of the examples are cross-referenced in [see Additional File 1]. y; indicates presence of the retrogene in this species, n; indicates its absence, aa; amino acids in potential encoded entire protein, f/s; frame shift, wrt; with respect.
Figure 2Novel protein-sequence space generated by parts of retrocopies combined with other transposons or unusual events. For each part of the figure, the spliced parent mRNA is shown first (before retroposition) and the resulting gene(s) are shown below. New sequence space was triggered by a combination of retrogene insertions, recruitment of non-genic regions including retroposons, whereby the contribution of the retrocopy's original in-frame ORF is very small (see text and legend to Fig. 1 including color key for further details). Yellow boxes with grey vertical stripes and yellow medium size bars correspond to retroposed element contributions to ORFs and UTRs, respectively. For detailed descriptions see text.
Type III novel retrogenes that are out of frame or reverse sense with respect to the parent gene
| 2A | Out of frame | 4 mRNAs and 4 ESTs | 81 aa | 81 aa | 81 aa | 96 aa | n | |
| 2B | Human specific, 2 exons: first out of the blue and second from antisense retro | 3 spliced mRNAs, 10 spliced ESTs | 259 aa | stop after 114 aa | truncated to 152 aa by f/s, frame shift in ATG disrupts ORF | first exon in sequencing gap f/s 2 stops | n | |
| 2C | 3 coding exons: first from MIR, second unknown source, third from antisense retrocopy | 1 spliced mRNAs, 5 spliced ESTs | 170 aa | 170 aa | y 134 aa early stop, exon1,2 open | partially deleted in rhesus | stop in first 10% of ORF | |
| 2D | C-terminal exon from retro (partially out of frame), first coding exon from LTR | 2 spliced mRNAs, 2 spliced ESTs | 172 aa | Disrupted | Disrupted | 4 stop codons, 1st at 10 aa. | y 72 aa due to f/s. | |
| 2D | C-terminal coding exonfrom retro (partially out-of-frame), first coding exon from L2 LINE | 4 spliced mRNAs, 17 spliced ESTs | 163 aa | Disrupted | Disrupted | 4 stop codons, 1st at 10 aa. | y 63 aa due to f/s. | |
| 2D | Out-of-frame sense retrocopy | 4 spliced mRNAs, 17 spliced ESTs | 150 aa | Disrupted | Disrupted | 4 stop codons, 1st at 10 aa. | y 56 aa due to f/s. | |
| 2E | Novel primate specific gene candidate, peri centromeric | 1 spliced mRNA, 20 spliced ESTs | 89 aa | y 86 aa | Check transMap | y but part of gene inverted | Not assembled | |
| 2F | Primate specific cancer testis gene candidate. Swiss-Prot Q5JQC4 | 1 spliced mRNA, 10 spliced ESTs | 288 aa | seq gap in chimp | 291 aa about 80% common with human (ensembl) | y good ORF 15 bp multiple of 3 indel. | 287 aa | |
| 2G | Antisense retrocopy combined with Alu | 1 spliced mRNA, 3 spliced ESTs | 123 aa | y 127 aa | n AluY and retro are missing | n | n | |
| 2G | Antisense retrocopy combined with Alu | 2 spliced mRNAs, > 20 spliced ESTs | 61 aa | y 60 aa | n AluY and retro are missing | n | n | |
| 2H | Novel gene candidate with 4 coding exons: first from LTR, second and forth unknown sources and third from antisense retrocopy | 2 spliced mRNAs, 3 spliced ESTs | 154 aa | chimp not assembled in this region | y retro exon is open (assembly not complete) | n | n | |
| 2I | Human specific | 5 spliced mRNAs, 13 spliced ESTs | 205 aa | y 5 stops | y good ORF but gap (or deletion) in first 50aa | y 107 aa, starts later in first exon | y multi exon gene unknown ORF | |
| 2J | Novel 2 exon gene mostly from LINE and antisense retrocopy | 4 spliced mRNAs, 6 spliced ESTs | 154 aa | y but stop codon after 30 aa | y 137 aa includes LINE | y but short 78 aa | LINE not present | |
| 2J | Novel 2 exon gene mostly from LINE and antisense retrocopy | 1 spliced mRNA, 4 spliced ESTs | 147 aa | y but stop codon after 30 aa | y 127 aa includes LINE | y but short 78 aa | LINE not present | |
| 2K | Antisense novel gene, possible NMD | 1 spliced mRNA, 2 unspliced mRNA, 4 ESTs | 166 aa | y 169 aa | y 169 aa | early in frame stop | n | |
Details of candidate genes showing parent gene, expression evidence and phylogeny shown in Figure 2. y; indicates presence of the retrogene in this species, n; indicates its absence, aa; amino acids in potential encoded protein, f/s; frame shift, wrt; with respect to.
Figure 3The retroFinder pipeline for annotating retrocopies. Alignments of all human mRNAs that aligned more than once to the genome were scored for a set of features (see Methods). Number of strict ESTs, mRNAs, and size of ORF were applied to determine evidence of expression. Retrocopies that partially overlapped the protein coding region of annotated multi exon Refseq genes were classified as exon acquisition events. Numbers in parenthesis were reported previously [7] (Additional files 4, 5, 6, 7).