| Literature DB >> 20412590 |
Ingrid Lafontaine1, Bernard Dujon.
Abstract
BACKGROUND: Pseudogenes are ubiquitous genetic elements that derive from functional genes after mutational inactivation. Characterization of pseudogenes is important to understand genome dynamics and evolution, and its significance increases when several genomes of related organisms can be compared. Among yeasts, only the genome of the S. cerevisiae reference strain has been analyzed so far for pseudogenes.Entities:
Mesh:
Year: 2010 PMID: 20412590 PMCID: PMC2876123 DOI: 10.1186/1471-2164-11-260
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Number and types of pseudogenes identified in the eight yeast genomes studied.
| Species | Genome sizea | Total CDS | Pseudogenesb | « Full-size » | 3'-Truncation | 5'-Truncatione |
|---|---|---|---|---|---|---|
| 12.1 | 5769 | 77 (1.3) | 0.57 | 0.21 | 0.22 | |
| 12.3 | 5204 | 38 (0.7) | 0.47 | 0.32 | 0.21 | |
| 9.8 | 4998 | 105 (2.1) | 0.42 | 0.28 | 0.30 | |
| 10.4 | 5104 | 68 (1.2) | 0.54 | 0.16 | 0.29 | |
| 11.3 | 5308 | 117 (1.3) | 0.63 | 0.19 | 0.18 | |
| 10.7 | 5084 | 61 (2.2) | 0.66 | 0.18 | 0.16 | |
| 12.2 | 6273 | 175 (2.8) | 0.37 | 0.28 | 0.35 | |
| 20.5 | 6434 | 230 (3.6) | 0.36 | 0.3 | 0.35 |
a. in Megabases, except rDNA.
b. Pseudogenes of protein coding sequences only. The percentage of pseudogenes relative to CDS is indicated in parenthesis.
c. Proportion of « full-size » pseudogenes, i.e. extending over more than 70% of their bestmatch length.
d. Proportion of pseudogenes extensively truncated at their 3'-end.
e. Proportion of pseudogenes extensively truncated at their 5'-end.
Figure 1Boxplot [67]of the sequence divergence between pseudogenes and their closest functional homolog. P-distance (ordinate) is expressed as the fraction of non-identical nucleotides at the third positions of codons (see Methods). Left panel: p-distance of pseudogenes whose closest functional homolog (bestmatch) is in the same genome (paralog), central panel: p-distance of pseudogenes whose bestmatch is in another genome, right panel: p-distance of pairs of functional paralogs in the same species. The number of pairs analyzed is indicated in parenthesis (data in Table II).
Figure 2Conserved pseudogenes at syntenic locations. Each column represents a set of orthologous sequences in a region of synteny conservation. Vertical dashed lines separate the different regions. Rectangles represent annotated genes, dashed rectangles represent pseudogenes detected in this analysis. All these pseudogenes have no paralog in the genome. The topology of the species phylogeny [68] is given on the left of the figure (branch lengths ignored).
Pseudogenes in S. cerevisiae S288C with non-degraded homologs in other S. cerevisiae strains.
| S288C | YJM789 | RM11_1A | YPS163 | M22 | EC1118 | JAY291 | AWRI1631 |
|---|---|---|---|---|---|---|---|
| P | P | p | P | p | intact | Intact | |
| P | P | p | P | p | intact | Intact | |
| P | P | p | P | p | P | Intact | |
| P | P | intact | intact | p | intact | Intact | |
| P | P | p | P | intact | intact | Intact | |
| Intact | intact | intact | intact | intact | intact | Intact | |
| P | P | p | P | intact | intact | Intact | |
| P | P | intact | P | intact | intact | Intact | |
| P | P | p | P | intact | intact | P | |
| Intact | P | p | P | intact | P | P | |
| P | P | p | P | intact | P | P | |
| P | P | p | P | p | intact | P | |
| P | P | p | intact | p | P | P | |
| P | P | p | intact | p | P | P | |
| P | P | p | intact | p | P | P |
p: homologous pseudogene, intact: homolog with non-degraded coding sequence.
Repartition of the pseudogenes according to the presence/absence of an S. cerevisiae homolog, and their functional classification
| Species | S.c. homolog with known function | S.c. homolog with unknown function | No S.c. homolog | transporter, periphery of the cella | Enzymesb |
|---|---|---|---|---|---|
| 37 | 31 | 9 | 11 | 15 | |
| 23 | 2 | 13 | 13 | 2 | |
| 83 | 9 | 13 | 50 | 14 | |
| 38 | 6 | 17 | 11 | 15 | |
| 42 | 6 | 20 | 13 | 11 | |
| 78 | 10 | 29 | 33 | 19 | |
| 82 | 7 | 86 | 37 | 24 | |
| 117 | 6 | 107 | 26 | 38 |
S.c. is for S. cerevisiae.
aNumber of pseudogenes with a homolog in S. cerevisiae coding for proteins involved in transport and/or acting at the periphery of the cell.
bNumber of pseudogenes without homolog in S. cerevisiae coding for enzymes.
Subtelomeric localization of pseudogenes and presence/absence of annotated paralogs
| Species | P. endsa | No paralogc | Paralogd | ||
|---|---|---|---|---|---|
| In | Out | ||||
| 71.4 | 5.4 | 3 | 67 | 7 | |
| 60.5 | 3.2 | 8 | 27 | 3 | |
| 54.3 | 3.2 | 14 | 70 | 21 | |
| 48.5 | 2 | 18 | 39 | 11 | |
| 37.6 | 2.6 | 29 | 57 | 31 | |
| 31.1 | 2.9 | 14 | 33 | 14 | |
| 40.0 | 2.7 | 8 | 151 | 16 | |
| 5.2 | 1.6 | 6 | 221 | 3 | |
a. Percentage of pseudogenes in subtelomeric regions (less than 30 kb from a chromosome end).
b. Percentage of active genes in subtelomeric regions.
c. Number of pseudogenes without annotated functional paralog in the genome.
d. Number of pseudogenes with annotated functional paralog whose closest homolog is in the same genome (in) or in another genome (out).
*. Species for which all chromosomes are fully sequenced, including their telomeric repeats.
Figure 3Possible origin of pseudogenes. See text for explanations. The diamonds correspond to distinctive criteria and rectangles to deduced origin.
Classification of pseudogenes according to their possible origin.
| Species | Species-specific duplicationa | Ancestral duplicationb | Function lossc | |||
|---|---|---|---|---|---|---|
| 73 | 1 | 3 | 1 | 0 | 1, 6 | |
| 27 | 3 | 8 | 1 | 1 | 3, 0 | |
| 71 | 20 | 14 | 6 | 41 | 1, 3 | |
| 39 | 11 | 18 | 0 | 17 | 0, 16 | |
| 65 | 23 | 29 | 4 | 13 | 1, 12 | |
| 35 | 12 | 14 | 5 | 7 | 0, 4 | |
| 153 | 14 | 8 | 2 | 62 | 1, 3 | |
| 213 | 11 | 6 | 6 | 19 | 2, 8 |
a. number of pseudogenes originating from mutational inactivation of a duplicated gene copy formed after speciation.
b. number of pseudogenes originating from mutational inactivation of a duplicated gene copy formed before speciation.
c. number of pseudogenes originating from mutational inactivation of a single copy gene.
d. number of duplicated pseudogenes among the first category (a).
e. number of pseudogenes being part of a duplicated segment involving other adjacent genes among the first category (a).
f. number of retro-processed pseudogenes, among the first category (a), identified by: either the presence of a 3' poly(A)-tail (first number) or the proximity of retrotransposon-related sequence (second number). In each species, the candidates identified by these 2 criteria are different.
Figure 4Scenario for the multiplication of pseudogenes in . a) Rectangle represents the functional gene, dashed rectangles represent its corresponding pseudogenes. The tree topology is obtained by maximum likelihood reconstruction [69] based on the aligned nucleic acid sequences (branch lengths ignored). The emergence of frameshift mutations (!) and in-frame stop-codons (*) are indicated above corresponding branches. b) Alignment of the translation products of YALI0A14927g and its pseudogenes obtained by MUSCLE. frameshift mutations (!) and in-frame stop-codons (*) are boxed.
Number subtelomeric pseudogenes according to their possible origin.
| Species | Species-specifica | Ancestral | Function loss |
|---|---|---|---|
| 49 | 1 | 3 | |
| 18 | 0 | 5 | |
| 45 | 7 | 5 | |
| 32 | 8 | 6 | |
| 20 | 7 | 4 | |
| 16 | 2 | 1 | |
| 67 | 4 | 0 | |
| 11 | 0 | 0 |
See Table 5 for legend.
Figure 5Pseudogenes in pairs of ohnologs in . Same legend as Figure 2. Pairs of ohnologs, i.e. paralogs originating from the whole-genome duplication [70], are linked by brackets.
Pseudogenes with evidence of transcriptionin S. cerevisiae
| Name | stops | frameshifts | R.L | Reference |
|---|---|---|---|---|
| SACE0Ap1 | 0 | 1 | 1 | [ |
| SACE0Ap2 | 1 | 0 | 0.9 | [ |
| SACE0Ap7 | 1 | 4 | 0.11 | [ |
| SACE0Ap13 | 0 | 1 | 1 | [ |
| SACE0Bp1 | 5 | 15 | 1 | [ |
| SACE0Cp2 | 2 | 0 | 0.06 | [ |
| SACE0Cp3 | 10 | 7 | 1 | [ |
| SACE0Dp6 | 1 | 0 | 0.82 | [ |
| SACE0Dp7 | 5 | 11 | 0.57 | [ |
| SACE0Hp1 | 1 | 8 | 0.86 | [ |
| SACE0Lp4 | 0 | 0 | 0.55 | [ |
| SACE0Pp4 | 1 | 2 | 0.77 | [ |
The number of disabling mutations within each pseudogene is given in columns 2 and 3. The relative length (R.L) of the pseudogene with respect to its closest functional homolog is indicated in column 4. Last column indicates the reference of the data set where evidence of transcription is found (see text for details). For a given chromosome, all identified pseudogenes are separated by several genes. There is no bias of any kind among these pseudogenes.