Literature DB >> 11827946

Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22.

Paul M Harrison1, Hedi Hegyi, Suganthi Balasubramanian, Nicholas M Luscombe, Paul Bertone, Nathaniel Echols, Ted Johnson, Mark Gerstein.   

Abstract

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.

Entities:  

Mesh:

Year:  2002        PMID: 11827946      PMCID: PMC155275          DOI: 10.1101/gr.207102

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  32 in total

1.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

2.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model.

Authors:  J Qian; N M Luscombe; M Gerstein
Journal:  J Mol Biol       Date:  2001-11-02       Impact factor: 5.469

3.  A complete map of the human ribosomal protein genes: assignment of 80 genes to the cytogenetic map and implications for human disorders.

Authors:  T Uechi; T Tanaka; N Kenmochi
Journal:  Genomics       Date:  2001-03-15       Impact factor: 5.736

4.  Computational inference of homologous gene structures in the human genome.

Authors:  R F Yeh; L P Lim; C B Burge
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

5.  The relationship between protein structure and function: a comprehensive survey with application to the yeast genome.

Authors:  H Hegyi; M Gerstein
Journal:  J Mol Biol       Date:  1999-04-23       Impact factor: 5.469

6.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.

Authors:  P M Harrison; N Echols; M B Gerstein
Journal:  Nucleic Acids Res       Date:  2001-02-01       Impact factor: 16.971

7.  Mining the draft human genome.

Authors:  E Birney; A Bateman; M E Clamp; T J Hubbard
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

Review 8.  Processed pseudogenes: characteristics and evolution.

Authors:  E F Vanin
Journal:  Annu Rev Genet       Date:  1985       Impact factor: 16.830

9.  The complete human olfactory subgenome.

Authors:  G Glusman; I Yanai; I Rubin; D Lancet
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

10.  A draft annotation and overview of the human genome.

Authors:  F A Wright; W J Lemon; W D Zhao; R Sears; D Zhuo; J P Wang; H Y Yang; T Baer; D Stredney; J Spitzner; A Stutz; R Krahe; B Yuan
Journal:  Genome Biol       Date:  2001-07-04       Impact factor: 13.583

View more
  62 in total

1.  A question of size: the eukaryotic proteome and the problems in defining it.

Authors:  Paul M Harrison; Anuj Kumar; Ning Lang; Michael Snyder; Mark Gerstein
Journal:  Nucleic Acids Res       Date:  2002-03-01       Impact factor: 16.971

2.  Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes.

Authors:  Nathaniel Echols; Paul Harrison; Suganthi Balasubramanian; Nicholas M Luscombe; Paul Bertone; Zhaolei Zhang; Mark Gerstein
Journal:  Nucleic Acids Res       Date:  2002-06-01       Impact factor: 16.971

3.  The transcriptional activity of human Chromosome 22.

Authors:  John L Rinn; Ghia Euskirchen; Paul Bertone; Rebecca Martone; Nicholas M Luscombe; Stephen Hartman; Paul M Harrison; F Kenneth Nelson; Perry Miller; Mark Gerstein; Sherman Weissman; Michael Snyder
Journal:  Genes Dev       Date:  2003-02-15       Impact factor: 11.361

4.  Associations between human disease genes and overlapping gene groups and multiple amino acid runs.

Authors:  Samuel Karlin; Chingfer Chen; Andrew J Gentles; Michael Cleary
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-09       Impact factor: 11.205

5.  Pseudogene-mediated posttranscriptional silencing of HMGA1 can result in insulin resistance and type 2 diabetes.

Authors:  Eusebio Chiefari; Stefania Iiritano; Francesco Paonessa; Ilaria Le Pera; Biagio Arcidiacono; Mirella Filocamo; Daniela Foti; Stephen A Liebhaber; Antonio Brunetti
Journal:  Nat Commun       Date:  2010-07-27       Impact factor: 14.919

6.  Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.

Authors:  Zhaolei Zhang; Paul M Harrison; Yin Liu; Mark Gerstein
Journal:  Genome Res       Date:  2003-12       Impact factor: 9.043

7.  A genome-wide survey of human pseudogenes.

Authors:  David Torrents; Mikita Suyama; Evgeny Zdobnov; Peer Bork
Journal:  Genome Res       Date:  2003-12       Impact factor: 9.043

8.  Distribution of NF-kappaB-binding sites across human chromosome 22.

Authors:  Rebecca Martone; Ghia Euskirchen; Paul Bertone; Stephen Hartman; Thomas E Royce; Nicholas M Luscombe; John L Rinn; F Kenneth Nelson; Perry Miller; Mark Gerstein; Sherman Weissman; Michael Snyder
Journal:  Proc Natl Acad Sci U S A       Date:  2003-10-03       Impact factor: 11.205

9.  Retroposed copies of the HMG genes: a window to genome dynamics.

Authors:  Liora Z Strichman-Almashanu; Michael Bustin; David Landsman
Journal:  Genome Res       Date:  2003-05       Impact factor: 9.043

10.  An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression.

Authors:  David N Messina; Jarret Glasscock; Warren Gish; Michael Lovett
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.