| Literature DB >> 14611660 |
Kazuhiko Ohshima1, Masahira Hattori, Tetsusi Yada, Takashi Gojobori, Yoshiyuki Sakaki, Norihiro Okada.
Abstract
BACKGROUND: Abundant pseudogenes are a feature of mammalian genomes. Processed pseudogenes (PPs) are reverse transcribed from mRNAs. Recent molecular biological studies show that mammalian long interspersed element 1 (L1)-encoded proteins may have been involved in PP reverse transcription. Here, we present the first comprehensive analysis of human PPs using all known human genes as queries.Entities:
Mesh:
Substances:
Year: 2003 PMID: 14611660 PMCID: PMC329124 DOI: 10.1186/gb-2003-4-11-r74
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Processed pseudogene content of the human genome
| Gene class* | Genes that generated PPs | PPs |
| Kinase | 24 | 37 |
| Dehydrogenase | 16 | 80 |
| Transferase | 15 | 25 |
| Peptidase | 10 | 15 |
| Phosphatase | 9 | 13 |
| Synthase | 8 | 20 |
| Synthetase | 5 | 23 |
| Translocase | 4 | 7 |
| Protease | 4 | 4 |
| Reductase | 3 | 3 |
| Phospholipase | 2 | 5 |
| RNA polymerase | 2 | 3 |
| Others | 46 | 63 |
| Total | 148 | 298 |
| Ribosomal proteins | 31 | 416 |
| Actin-related proteins | 9 | 23 |
| Keratin | 5 | 57 |
| Ribosomal proteins (mitochondrial) | 4 | 7 |
| Tubulin | 4 | 6 |
| Histone | 2 | 5 |
| Myosin | 2 | 4 |
| Dynein | 2 | 3 |
| Kinesin | 1 | 1 |
| Total | 60 | 522 |
| Ligand-binding proteins‡ | 30 | 56 |
| Transcription factor‡ | 11 | 23 |
| RNA-binding proteins‡ | 11 | 15 |
| Translation initiation/termination | 9 | 21 |
| Proteasome | 9 | 19 |
| Heat-shock protein | 8 | 29 |
| Solute carrier | 7 | 14 |
| Zinc finger protein‡ | 7 | 11 |
| Ring finger protein‡ | 7 | 10 |
| Nuclear ribonucleoprotein‡ | 6 | 19 |
| Autoantigen | 6 | 12 |
| Receptor | 6 | 8 |
| Splicing factor‡ | 5 | 7 |
| DEAD/H box polypeptide | 4 | 4 |
| Carcinoma-associated antigen | 4 | 4 |
| Channel | 3 | 11 |
| Thioredoxin | 3 | 5 |
| Others | 295 | 464 |
| Total | 431 | 732 |
| Total annotated genes | 639 | 1,552 |
| 660 | 2,112 | |
| Grand total | 1,299 | 3,664 |
*The functional annotation of NCBI Reference Sequence (RefSeq) collection (v2003.01.06) was used for this classification [61]. Respective genes were classified into only one category. †Ensembl gene transcripts (v1.1.0) which are correspond to the RefSeq collection (v2003.01.06). ‡These seven gene classes were classified as 'Ligand binding' in Figure 1a,b for simplicity. §Ensembl gene transcripts (v1.1.0) that do not correspond to the RefSeq collection (v2003.01.06).
Figure 1Difference between the profiles of the PP parental genes and PPs in the human genome. (a) Classifications of the PP parental genes. (b) Classifications of the PPs. Gene classes were based on the functional annotation of the NCBI Reference Sequence collection [61] for the respective genes (see Table 1) and were further integrated into four main classes. Ligand-binding proteins, transcription factors, RNA-binding proteins, zinc finger protein, ring finger proteins, nuclear ribonucleoproteins and splicing factors were classified as 'Ligand binding'.
The most abundant PPs in the human genome
| PP number* | Ensembl ID | RefSeq ID | Gene name | mRNA (bases)† | GC content‡ | Chromosome |
| 52 | ENST00000228652 | NM_000224 | Keratin 18 (KRT18) | 1,311 | 0.59 | 12 |
| 43 | ENST00000229239 | NM_002046 | Glyceraldehyde-3-phosphate dehydrogenase (GAPD) | 975 | 0.55 | 12 |
| 38 | ENST00000241454 | NM_000982 | Ribosomal protein L21 (RPL21) | 623 | 0.41 | 13 |
| 36 | ENST00000264258 | NM_000993 | Ribosomal protein L31 (RPL31) | 412 | 0.46 | 2 |
| 32 | ENST00000226734 | NM_000995 | Ribosomal protein L34 (RPL34) | 382 | 0.44 | 4 |
| 31 | ENST00000256818 | NM_001019 | Ribosomal protein S15a (RPS15A) | 440 | 0.45 | 16 |
| 23 | ENST00000202773 | NM_000970 | Ribosomal protein L6 (RPL6) | 861 | 0.47 | 12 |
| 23 | ENST00000241929 | NM_000969 | Ribosomal protein L5 (RPL5) | 951 | 0.43 | 1 |
| 21 | ENST00000255320 | NM_002128 | High-mobility group box 1 (HMGB1) | 971 | 0.41 | 13 |
| 20 | ENST00000245458 | NM_001032 | Ribosomal protein S29 (RPS29) | 195 | 0.53 | 14 |
| 18 | ENST00000260896 | NM_001026 | Ribosomal protein S24 (RPS24) | 390 | 0.44 | 10 |
| 17 | ENST00000009589 | NM_001023 | Ribosomal protein S20 (RPS20) | 504 | 0.47 | 8 |
| 16 | ENST00000225430 | NM_000981 | Ribosomal protein L19 (RPL19) | 667 | 0.52 | 17 |
| 14 | ENST00000230050 | NM_001016 | Ribosomal protein S12 (RPS12) | 493 | 0.49 | 6 |
| 12 | ENST00000253004 | NM_054012 | Argininosuccinate synthetase (ASS) | 1,245 | 0.56 | 9 |
| 12 | ENST00000216296 | NM_004500 | Heterogeneous nuclear ribonucleoprotein C (C1/C2) (HNRPC) | 1,588 | 0.43 | 14 |
| 12 | ENST00000211372 | NM_022551 | Ribosomal protein S18 (RPS18) | 494 | 0.51 | 6 |
| 11 | ENST00000263097 | NM_004368 | Calponin 2 (CNN2) | 882 | 0.61 | 19 |
| 11 | ENST00000253788 | NM_000988 | Ribosomal protein L27 (RPL27) | 450 | 0.46 | 17 |
| 11 | ENST00000259689 | NM_001010 | Ribosomal protein S6 (RPS6) | 784 | 0.46 | 9 |
| 11 | ENST00000260379 | NM_001003 | Ribosomal protein, large, P1 (RPLP1) | 510 | 0.56 | 15 |
| 11 | ENST00000011649 | NM_007104 | Ribosomal protein L10a (RPL10A) | 682 | 0.51 | 6 |
| 10 | ENST00000255477 | NM_003295 | Tumor protein, translationally-controlled 1 (TPT1) | 829 | 0.45 | 13 |
| 10 | ENST00000227378 | NM_006597 | Heat shock 70 kDa protein 8 (HSPA8) | 1,938 | 0.46 | 11 |
| 10 | ENST00000218437 | NM_001007 | Ribosomal protein S4, X-linked (RPS4X) | 853 | 0.48 | X |
| 9 | ENST00000220072 | NM_001021 | Ribosomal protein S17 (RPS17) | 453 | 0.49 | 15 |
| 9 | ENST00000265385 | NM_000883 | IMP (inosine monophosphate) dehydrogenase 1 (IMPDH1) | 1,425 | 0.59 | 7 |
| 9 | ENST00000265264 | NM_000986 | Ribosomal protein L24 (RPL24) | 447 | 0.48 | 3 |
| 9 | ENST00000216146 | NM_000967 | Ribosomal protein L3 (RPL3) | 1,265 | 0.54 | 22 |
| 8 | ENST00000196551 | NM_001009 | Ribosomal protein S5 (RPS5) | 720 | 0.58 | 19 |
| 8 | ENST00000228140 | NM_001017 | Ribosomal protein S13 (RPS13) | 495 | 0.45 | 11 |
| 7 | ENST00000221267 | NM_003333 | Ubiquitin A-52 residue ribosomal protein fusion product 1 (UBA52) | 384 | 0.53 | 19 |
| 7 | ENST00000233609 | NM_001018 | Ribosomal protein S15 (RPS15) | 469 | 0.62 | 19 |
| 7 | ENST00000257522 | NM_030940 | Hypothetical protein MGC4276 similar to CG8198 (MGC4276) | 255 | 0.38 | 9 |
| 7 | ENST00000265333 | NM_003374 | Voltage-dependent anion channel 1 (VDAC1) | 1,498 | 0.45 | 5 |
| 7 | ENST00000236900 | NM_001028 | Ribosomal protein S25 (RPS25) | 426 | 0.45 | 11 |
| 7 | ENST00000264254 | NM_024065 | Hypothetical protein MGC3062 (MGC3062) | 955 | 0.42 | 2 |
| 7 | ENST00000246201 | NM_003908 | Eukaryotic translation initiation factor 2, subunit 2 beta (EIF2S2) | 1,300 | 0.39 | 20 |
| 6 | ENST00000245206 | NM_002080 | Glutamic-oxaloacetic transaminase 2, mitochondrial (GOT2) | 2,331 | 0.49 | 16 |
| 6 | ENST00000238591 | NM_015962 | CGI-35 protein (CGI-35) | 1,019 | 0.37 | 14 |
| 6 | ENST00000249380 | NM_005000 | NADH dehydrogenase 1 alpha subcomplex, 5, 13 kDa (NDUFA5) | 339 | 0.41 | 7 |
| 6 | ENST00000228825 | NM_005719 | Actin-related protein 2/3 complex, subunit 3, 21 kDa (ARPC3) | 786 | 0.41 | 12 |
| 6 | ENST00000261565 | NM_003187 | TATA box binding protein (TBP)-associated factor, 32 kDa (TAF9) | 833 | 0.34 | 5 |
| 6 | ENST00000227157 | NM_005566 | Lactate dehydrogenase A (LDHA) | 1,589 | 0.43 | 11 |
| 6 | ENST00000264221 | NM_006452 | Phosphoribosylaminoimidazole carboxylase, (PAICS) | 1,385 | 0.41 | 4 |
| 6 | ENST00000037869 | NM_032822 | Hypothetical protein FLJ14668 (FLJ14668) | 414 | 0.56 | 2 |
| 6 | ENST00000235094 | NM_001688 | ATP synthase, mitochondrial F0 complex, subunit b (ATP5F1) | 1,104 | 0.43 | 1 |
| 6 | ENST00000234875 | NM_000983 | Ribosomal protein L22 (RPL22) | 541 | 0.41 | 1 |
| 6 | ENST00000005593 | NM_001152 | Solute carrier family 25, member 5 (SLC25A5) | 894 | 0.52 | X |
| 5 | ENST00000216252 | NM_032758 | PHD finger protein 5A (PHF5A) | 330 | 0.48 | 22 |
*The number of PPs that were derived from respective genes. The top 50 genes are shown. †Length of the Ensembl gene transcripts (v1.1.0). ‡GC content of the Ensembl gene transcripts (v1.1.0). The list of all the genes is available as Additional data file 2.
Figure 2GC content of the PP parental genes and the number of PP copies of those genes. The total number of PP parental genes having a given GC content is shown as individual bars in increments of 4%. The PP-generation rate (the PP number/gene) is shown as a line that connects averages for respective groups. The vertical error bars indicate standard error of the mean.
Chromosomal distribution and density of human PPs
| Chromosome | PPs | Genes that generated PPs | Number of genes (Ensembl 4.28.1) | Genes/Mb* | PPs/Mb |
| All | 3,664 | 1,299 | 23,863 | 7.33 | 1.12 |
| 1 | 359 | 117 | 2,482 | 8.90 | 1.28 |
| 2 | 241 | 84 | 1,550 | 6.31 | 0.98 |
| 3 | 225 | 76 | 1,277 | 5.94 | 1.04 |
| 4 | 163 | 57 | 868 | 4.33 | 0.81 |
| 5 | 193 | 72 | 1,093 | 5.61 | 0.99 |
| 6 | 207 | 64 | 1,297 | 7.07 | 1.12 |
| 7 | 176 | 72 | 1,251 | 7.59 | 1.06 |
| 8 | 144 | 47 | 787 | 5.23 | 0.95 |
| 9 | 150 | 49 | 934 | 6.57 | 1.05 |
| 10 | 178 | 45 | 939 | 6.56 | 1.24 |
| 11 | 238 | 79 | 1,506 | 9.98 | 1.57 |
| 12 | 234 | 78 | 1,212 | 8.25 | 1.59 |
| 13 | 94 | 24 | 425 | 3.61 | 0.80 |
| 14 | 146 | 45 | 785 | 7.33 | 1.36 |
| 15 | 114 | 41 | 770 | 7.65 | 1.13 |
| 16 | 95 | 54 | 1,040 | 10.15 | 0.92 |
| 17 | 126 | 63 | 1,272 | 14.44 | 1.43 |
| 18 | 74 | 20 | 370 | 4.43 | 0.88 |
| 19 | 123 | 78 | 1,504 | 20.80 | 1.70 |
| 20 | 59 | 27 | 640 | 10.15 | 0.93 |
| 21 | 34 | 11 | 232 | 5.20 | 0.76 |
| 22 | 62 | 34 | 577 | 12.14 | 1.30 |
| X | 207 | 51 | 922 | 5.84 | 1.31 |
| Y | 22 | 11 | 130 | 2.53 | 0.42 |
*The number of Ensembl genes per megabases.
Figure 3Chromosomal origins of human PPs. Individual bars indicate the total number of PPs in each chromosome. The different colors represent the chromosomal origins of the PPs.
Figure 4PP and gene density within each chromosome. For each chromosome, the number of PPs per megabase is plotted against the number of genes per megabase.
Figure 5Age distribution of human retroposons represented by the level of nucleotide substitutions. (a) Human PPs. The number of nucleotide substitutions per 100 bases (except CpG sites) was calculated for each PP, and the total number of PPs having a given number of substitutions is shown as individual bars in one-nucleotide increments. For comparison, the line shows a Poisson distribution of the same average values for PPs. (b) Alu repeats, calculated and presented as in (a). The line shows a Poisson distribution of the same average values for Alus. (c) Alu subfamilies, calculated as in (a). The curves connect apices of respective bars calculated as in (a). For simplicity, subfamilies that contain less than 5,000 Alus, such as Alu Ya and Yb, are not shown. (d) L1s, calculated and presented as in (a). (e) L1 subfamilies, calculated and presented as in (c). For simplicity, subfamilies that contain less than 1,000 L1s, such as L1PA1 (L1Hs) and L1P1, are not shown. L1PA6, L1PA7 and L1PA8 are shown as bold blue lines.
Figure 6Phylogenetic relationships between L1 subfamilies. Amino-acid substitutions within the 'C domain' at particular stages of L1 evolution are denoted in boxes. The phylogenetic tree was constructed using the neighbor-joining method [62] based on the last 900 bp of the consensus sequences of respective subfamilies.
Figure 7Timing of the retrotranspositional explosion during primate evolution. Phylogenetic relationships among primates and the estimated timeframes are based on data from references [34,36] and [37], and references therein.