| Literature DB >> 20210993 |
Zhengdong D Zhang1, Adam Frankish, Toby Hunt, Jennifer Harrow, Mark Gerstein.
Abstract
BACKGROUND: Unitary pseudogenes are a class of unprocessed pseudogenes without functioning counterparts in the genome. They constitute only a small fraction of annotated pseudogenes in the human genome. However, as they represent distinct functional losses over time, they shed light on the unique features of humans in primate evolution.Entities:
Mesh:
Year: 2010 PMID: 20210993 PMCID: PMC2864566 DOI: 10.1186/gb-2010-11-3-r26
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Method for identifying human unitary pseudogenes in comparison to the mouse genome. (a) The overall methodological flowchart. The number of entries in the input/output data set used at certain steps is shown in parentheses. (b) Detailed inspection and synteny check of the potential human unitary pseudogenic loci. Entries in the initial set of pseudogenic loci are removed based on various criteria at different steps. The final result - the unitary pseudogenes and the polymorphic pseudogenes in human - are listed in Tables 1 and 2. See the main text for details. MGI, Mouse Genome Informatics. OR, olfactory receptor; VR, vomeronasal receptor; ZF, zinc finger protein.
Human unitary pseudogenes
| Human unitary pseudogene genomic location | Mouse ortholog symbol | Mouse gene name |
|---|---|---|
| chr12+:110821507-110823878 | a disintegrin and metallopeptidase domain 1b | |
| chr8+:17371392-17373372 | a disintegrin and metallopeptidase domain 26B | |
| chr8-:39450156-39489335 | a disintegrin and metallopeptidase domain 3 (cyritestin) | |
| chr8+:39299218-39358412 | a disintegrin and metallopeptidase domain 5 | |
| chr9-:103136199-103141451 | acyl-coenzyme A amino acid N-acyltransferase 2 | |
| chr18+:54814947-54887164 | acyltransferase 3 [RIKEN cDNA 5330437I02 gene] | |
| chr1+:92304452-92305907 | acyltransferase like 1B | |
| chr11+:71909632-71910345 | ADP-ribosyltransferase 2b | |
| chr2+:201166115-201364602 | aldehyde oxidase 3-like 1 | |
| chr16+:2351147-2415839 | ATP-binding cassette, sub-family A (ABC1), member 17 | |
| chr1-:51789487-51812353 | calreticulin 4 | |
| chr16-:30823174-30826438 | cardiotrophin 2 | |
| chr4-:123871155-123872802 | centrin 4 | |
| chr19-:46006279-46009136 | cytochrome P450, family 2, subfamily t, polypeptide 4 | |
| chr2-:178665477-178677441 | cytochrome c, testis | |
| chr4-:68540001-68564082 | Desc4 [RIKEN cDNA 9930032O22 gene] | |
| chr11-:67136888-67140266 | double C2, gamma | |
| chr9+:35423704-35439561 | Feta [RIKEN cDNA 4930417 M19 gene] | |
| chr10-:114057930-114106344 | guanylate cyclase 2 g | |
| chr8:27473706-27502505 | gulonolactone (L-) oxidase | |
| chr1-:226718541-226718916 | histone cluster 3, H2ba | |
| chr7+:123241442-123256569 | hyaluronoglucosaminidase 6 | |
| chr9-:114761447-114764366 | major urinary protein 4 | |
| chr10+:81670064-81672769 | mannose binding lectin (A) 1 | |
| chr6+:118061593-118072916 | nephrocan | |
| chr3+:47028800-47029644 | neurotrophin receptor associated death domain | |
| chr1+:115181467-115195621 | nuclear receptor subfamily 1, group H, member 5 | |
| chrX+:101400687-101403403 | preferentially expressed antigen in melanoma | |
| chr1+:200404371-200425048 | protein tyrosine phosphatase, receptor type, V | |
| chr5+:140786050-140870922 | protocadherin gamma subfamily B, 8 | |
| chr19+:53875091-53876096 | secretory blood group 1 | |
| chr20-:1696610-1708642 | Sirpb3 [RIKEN cDNA F830045P16 gene] | |
| chr2+:20449670-20459798 | solute carrier family 7 (cationic amino acid transporter, y+ system), member 15 | |
| chr4-:70692183-70714196 | sulfotransferase family 1D, member 1 | |
| chr7+:142844251-142845153 | taste receptor, type 2, member 134 | |
| chr17+:59285910-59292052 | testicular cell adhesion molecule 1 | |
| chrX+:83901067-83903982 | testis expressed gene 16 | |
| chr14-:63882652-63893934 | testis expressed gene 21 | |
| chr8-:145268106-145414584 | testis-specific serine kinase 5 | |
| chr17-:73756179-73757460 | threonine aldolase 1 | |
| chr1+:33704438-33707143 | toll-like receptor 12 | |
| chr6:-132971083-132972109 | trace amine-associated receptor 3 | |
| chr6-:132957230-132958269 | trace amine-associated receptor 4 | |
| chr11+:3587708-3615320 | transient receptor potential cation channel, subfamily C, member 2 | |
| chr4-:68314827-68322204 | transmembrane protease, serine 11c | |
| chr16-:2829662-2831734 | transmembrane protease, serine 8 (intestinal) | |
| chr1-:84603696-84623086 | urate oxidase |
See Table S2 in Additional file 1 for the list of 29 human unitary pseudogenes identified using unannotated mouse gene transcripts.
Figure 2The origin of human unitary pseudogenes in the paralogous gene sets. The human unitary pseudogenes with annotation from orthologous mouse genes are assigned to human paralogous gene sets, whose names are shown in the middle. The number of human unitary pseudogenes in each paralogous gene set and the number of members in each paralogous gene set are plotted as green and blue bars, respectively. Five unitary pseudogenes with uninformative annotation are denoted with question marks. Unitary pseudogenes without close paralogs are enclosed by dashed lines. The unitary pseudogenes from the tandem gene families are indicated by gray bars. Inset: box plot of the number of human unitary pseudogenes in each paralogous gene set and the number of members in each paralogous gene set.
Figure 3The human-specific pseudogene of the major urinary protein. A G-to-A nucleotide substitution (with the reverse highlight) at the donor site of the second intron (delineated by the underlined splicing sites) abolishes the ORF of the coding sequence. The sequence conservation is clearly discernable from the multiple sequence alignment of polypeptide sequences translated from partial exonic sequences upstream and downstream of the splicing junction of MUP from 24 species.
Figure 4Enrichment of Gene Ontology terms and Pfam domains in the human unitary pseudogene. Enriched GO terms and their positions in the hierarchy of (a) biological process and (b) molecular function terms. Yellow nodes correspond to significant GO terms. (c) P-values for significant GO terms and Pfam domains.
Figure 5Dating the pseudogenization events. (a) Timing of the disruptive mutations that gave rise to human unitary pseudogenes by analyzing shared mutations. Only pseudogenes with annotations from orthologous mouse genes are shown. Ones without close paralogs are underlined. (b) Timing of several pseudogenization events that occurred in the human lineage after the human-chimp divergence. See Table S3 in Additional file 1 for the estimates and their standard errors. LCA, last common ancestor.
Human polymorphic pseudogenes
| Gene | CDS disruptive mutation | dbSNP IDc | HapMap SNP ID | |
|---|---|---|---|---|
| Changea | Locationb | |||
| Nonsense mutation | ||||
| | taT (Y) → taA | chr5+:135,300,350 | rs17169429 (+27) | rs17169429 (+27) |
| | Cag (Q) → Tag | chr1+:159,826,011 | rs3933769 (-60) | rs3933769 (-60) |
| | Cga (R) → Tga | chr14-:31,022,505 | rs17097921 | rs17097921 |
| | Caa (Q) → Taa | chr1+:143,815,304 | rs2794062 | rs16826061 (+95) |
| | Gaa (E) → Taa | chr18+:59,530,818 | rs4940595 | rs4940595 |
| | Aaa (K) → Taa | chr6+:132,901,302 | rs2842899 | rs2842899 |
| Frame-shift mutation | ||||
| | ΔCA | chr11-:104,268,394-5 | rs497116 (-67) | rs497116 (-67) |
| | ΔT | chr21-:31123841 | rs35359062 | rs9982775 (-20) |
| | ∇A | chr4-:7,487,457 | rs58463471 | rs4484302 (+441) |
| | ∇A | chr3-:45,242,396 | rs11402022 | rs33751 (+725) |
| | ΔC | chr16-:1,219,240 | rs2234647 | rs2745145 (-1771) |
aBase change, deletion, and insertion are denoted by '→', '∇', and 'Δ' respectively. bThe genomic location, based on the NCBI build 36 of the Human Reference Genome, includes the chromosome, the strand ('+' being forward and '-' reverse), and the coordinate of the base change. cThe identifier of the mutation as in the dbSNP (build 129). If a mutation is not included in the dbSNP, the identifier of the closest SNP and its distance (shown in parentheses) to the mutation are shown instead.
Polymorphic pseudogenes with the disruptive sites typed in the HapMap Projecta
| CDS disrupted gene |
|
|
|
|---|---|---|---|
| Disruptive mutationb | Cga (R) → Tga | Gaa (E) → Taa | Aaa (K) → Taa |
| dbSNP ID | rs17097921 | rs4940595 | rs2842899 |
| Genomic location | chr14-:31,022,505 | chr18+:59,530,818 | chr6+:132,901,302 |
| Disrupted codon positionc | 140 (332) | 89 (388) | 61 (344) |
| Reference allele in human | T | T | T |
| Reference allele in other primatesd | C | T | T |
| Test statistic for HWE in the meta-populatione | 0.285 ( | 8.659 ( | 0.071 ( |
aSee Table S4 in Additional file 1 for allele frequency information. bBoth codons before and after the mutation (→) are shown with the affected base capitalized. The amino acid residue encoded by the codon is given in parentheses. cThe disrupted codon position in the coding sequence (CDS). The number of codons in the CDS is given in parentheses. dWidely regarded as the ancestral allele. Other primates currently include chimp, orangutan, and macaque. eThe χ2 goodness-of-fit test is used to test for the Hardy-Weinberg equilibrium (HWE) in the meta-population using the pooled genotype and allele frequency data.
Figure 6Population structure analysis for SNP rs4940595. (a) Hierarchical clustering of 11 populations using the FST metric. Two subdivisions in the meta-population, as indicated by the dashed line, are clearly visible in the cluster. (b) Histogram of FST from the permutation test using the population subdivisions as seen in (a).
Figure 7Unitary pseudogene relativity. Given the phylogeny of human, chimpanzee, and mouse, a human unitary pseudogenes can arise from a gene loss that occurred in different lineages, including: (a) the human lineage after the human-chimp divergence; (b) the human-ancestral lineage after the human-mouse divergence but before the human-chimp divergence; and under different circumstances, such as (c) loss of a subfunctionalized gene in the human lineage after a duplication event before the human-chimp divergence. Because the absence of a functional gene in a species is only identifiable through the comparison with another species that has the functional ortholog, the human unitary pseudogene can be identified in (a) by comparing the human gene set to either the chimp or the mouse set as both of them have the human ortholog. In (b, c), however, the human unitary pseudogene can only be identified by comparing the human gene set to one of either the mouse or chimp gene set, as the other one does not have the human ortholog given the evolutionary history of the gene under consideration.
Figure 8Polymorphic pseudogenes in human populations. (a) Human-specific pseudogenic polymorphism generated by gene inactivation. (b) Pseudogenic polymorphism since the last common ancestor. (c) Human-specific pseudogenic polymorphism generated by pseudogene resurrection.