Literature DB >> 15489323

Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.

Terrence S Furey1, Mark Diekhans, Yontao Lu, Tina A Graves, Lachlan Oddy, Jennifer Randall-Maher, LaDeana W Hillier, Richard K Wilson, David Haussler.   

Abstract

The NCBI Reference Sequence (RefSeq) project and the NIH Mammalian Gene Collection (MGC) together define a set of approximately 30,000 nonredundant human mRNA sequences with identified coding regions representing 17,000 distinct loci. These high-quality mRNA sequences allow for the identification of transcribed regions in the human genome sequence, and many researchers accept them as the correct representation of each defined gene sequence. Computational comparison of these mRNA sequences and the recently published essentially finished human genome sequence reveals several thousand undocumented nonsynonymous substitution and frame shift discrepancies between the two resources. Additional analysis is undertaken to verify that the euchromatic human genome is sufficiently complete--containing nearly the whole mRNA collection, thus allowing for a comprehensive analysis to be undertaken. Many of the discrepancies will prove to be genuine polymorphisms in the human population, somatic cell genomic variants, or examples of RNA editing. It is observed that the genome sequence variant has significant additional support from other mRNAs and ESTs, almost four times more often than does the mRNA variant, suggesting that the genome sequence is more accurate. In approximately 15% of these cases, there is substantial support for both variants, suggestive of an undocumented polymorphism. An initial screening against a 24-individual genomic DNA diversity panel verified 60% of a small set of potential single nucleotide polymorphisms from which successful results could be obtained. We also find statistical evidence that a few of these discrepancies are due to RNA editing. Overall, these results suggest that the mRNA collections may contain a substantial number of errors. For current and future mRNA collections, it may be prudent to fully reconcile each genome sequence discrepancy, classifying each as a polymorphism, site of RNA editing or somatic cell variation, or genome sequence error.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15489323      PMCID: PMC528917          DOI: 10.1101/gr.2467904

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  28 in total

1.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

Authors:  Y Okazaki; M Furuno; T Kasukawa; J Adachi; H Bono; S Kondo; I Nikaido; N Osato; R Saito; H Suzuki; I Yamanaka; H Kiyosawa; K Yagi; Y Tomaru; Y Hasegawa; A Nogami; C Schönbach; T Gojobori; R Baldarelli; D P Hill; C Bult; D A Hume; J Quackenbush; L M Schriml; A Kanapin; H Matsuda; S Batalov; K W Beisel; J A Blake; D Bradt; V Brusic; C Chothia; L E Corbani; S Cousins; E Dalla; T A Dragani; C F Fletcher; A Forrest; K S Frazer; T Gaasterland; M Gariboldi; C Gissi; A Godzik; J Gough; S Grimmond; S Gustincich; N Hirokawa; I J Jackson; E D Jarvis; A Kanai; H Kawaji; Y Kawasawa; R M Kedzierski; B L King; A Konagaya; I V Kurochkin; Y Lee; B Lenhard; P A Lyons; D R Maglott; L Maltais; L Marchionni; L McKenzie; H Miki; T Nagashima; K Numata; T Okido; W J Pavan; G Pertea; G Pesole; N Petrovsky; R Pillai; J U Pontius; D Qi; S Ramachandran; T Ravasi; J C Reed; D J Reed; J Reid; B Z Ring; M Ringwald; A Sandelin; C Schneider; C A M Semple; M Setou; K Shimada; R Sultana; Y Takenaka; M S Taylor; R D Teasdale; M Tomita; R Verardo; L Wagner; C Wahlestedt; Y Wang; Y Watanabe; C Wells; L G Wilming; A Wynshaw-Boris; M Yanagisawa; I Yang; L Yang; Z Yuan; M Zavolan; Y Zhu; A Zimmer; P Carninci; N Hayatsu; T Hirozane-Kishikawa; H Konno; M Nakamura; N Sakazume; K Sato; T Shiraki; K Waki; J Kawai; K Aizawa; T Arakawa; S Fukuda; A Hara; W Hashizume; K Imotani; Y Ishii; M Itoh; I Kagawa; A Miyazaki; K Sakai; D Sasaki; K Shibata; A Shinagawa; A Yasunishi; M Yoshino; R Waterston; E S Lander; J Rogers; E Birney; Y Hayashizaki
Journal:  Nature       Date:  2002-12-05       Impact factor: 49.962

2.  DDBJ in the stream of various biological data.

Authors:  S Miyazaki; H Sugawara; K Ikeo; T Gojobori; Y Tateno
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  The EMBL Nucleotide Sequence Database.

Authors:  Tamara Kulikova; Philippe Aldebert; Nicola Althorpe; Wendy Baker; Kirsty Bates; Paul Browne; Alexandra van den Broek; Guy Cochrane; Karyn Duggan; Ruth Eberhardt; Nadeem Faruque; Maria Garcia-Pastor; Nicola Harte; Carola Kanz; Rasko Leinonen; Quan Lin; Vincent Lombard; Rodrigo Lopez; Renato Mancuso; Michelle McHale; Francesco Nardone; Ville Silventoinen; Peter Stoehr; Guenter Stoesser; Mary Ann Tuli; Katerina Tzouvara; Robert Vaughan; Dan Wu; Weimin Zhu; Rolf Apweiler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

4.  NCBI Reference Sequence project: update and current status.

Authors:  Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data.

Authors:  Maximilian Diehn; Gavin Sherlock; Gail Binkley; Heng Jin; John C Matese; Tina Hernandez-Boussard; Christian A Rees; J Michael Cherry; David Botstein; Patrick O Brown; Ash A Alizadeh
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  Quality assessment of the human genome sequence.

Authors:  Jeremy Schmutz; Jeremy Wheeler; Jane Grimwood; Mark Dickson; Joan Yang; Chenier Caoile; Eva Bajorek; Stacey Black; Yee Man Chan; Mirian Denys; Julio Escobar; Dave Flowers; Dea Fotopulos; Carmen Garcia; Maria Gomez; Eidelyn Gonzales; Lauren Haydu; Frederick Lopez; Lucia Ramirez; James Retterer; Alex Rodriguez; Stephanie Rogers; Angelica Salazar; Ming Tsai; Richard M Myers
Journal:  Nature       Date:  2004-05-27       Impact factor: 49.962

7.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.

Authors:  Robert L Strausberg; Elise A Feingold; Lynette H Grouse; Jeffery G Derge; Richard D Klausner; Francis S Collins; Lukas Wagner; Carolyn M Shenmen; Gregory D Schuler; Stephen F Altschul; Barry Zeeberg; Kenneth H Buetow; Carl F Schaefer; Narayan K Bhat; Ralph F Hopkins; Heather Jordan; Troy Moore; Steve I Max; Jun Wang; Florence Hsieh; Luda Diatchenko; Kate Marusina; Andrew A Farmer; Gerald M Rubin; Ling Hong; Mark Stapleton; M Bento Soares; Maria F Bonaldo; Tom L Casavant; Todd E Scheetz; Michael J Brownstein; Ted B Usdin; Shiraki Toshiyuki; Piero Carninci; Christa Prange; Sam S Raha; Naomi A Loquellano; Garrick J Peters; Rick D Abramson; Sara J Mullahy; Stephanie A Bosak; Paul J McEwan; Kevin J McKernan; Joel A Malek; Preethi H Gunaratne; Stephen Richards; Kim C Worley; Sarah Hale; Angela M Garcia; Laura J Gay; Stephen W Hulyk; Debbie K Villalon; Donna M Muzny; Erica J Sodergren; Xiuhua Lu; Richard A Gibbs; Jessica Fahey; Erin Helton; Mark Ketteman; Anuradha Madan; Stephanie Rodrigues; Amy Sanchez; Michelle Whiting; Anup Madan; Alice C Young; Yuriy Shevchenko; Gerard G Bouffard; Robert W Blakesley; Jeffrey W Touchman; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Yaron S N Butterfield; Martin I Krzywinski; Ursula Skalska; Duane E Smailus; Angelique Schnerch; Jacqueline E Schein; Steven J M Jones; Marco A Marra
Journal:  Proc Natl Acad Sci U S A       Date:  2002-12-11       Impact factor: 11.205

8.  The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

Authors:  Daniela S Gerhard; Lukas Wagner; Elise A Feingold; Carolyn M Shenmen; Lynette H Grouse; Greg Schuler; Steven L Klein; Susan Old; Rebekah Rasooly; Peter Good; Mark Guyer; Allison M Peck; Jeffery G Derge; David Lipman; Francis S Collins; Wonhee Jang; Steven Sherry; Mike Feolo; Leonie Misquitta; Eduardo Lee; Kirill Rotmistrovsky; Susan F Greenhut; Carl F Schaefer; Kenneth Buetow; Tom I Bonner; David Haussler; Jim Kent; Mark Kiekhaus; Terry Furey; Michael Brent; Christa Prange; Kirsten Schreiber; Nicole Shapiro; Narayan K Bhat; Ralph F Hopkins; Florence Hsie; Tom Driscoll; M Bento Soares; Tom L Casavant; Todd E Scheetz; Michael J Brown-stein; Ted B Usdin; Shiraki Toshiyuki; Piero Carninci; Yulan Piao; Dawood B Dudekula; Minoru S H Ko; Koichi Kawakami; Yutaka Suzuki; Sumio Sugano; C E Gruber; M R Smith; Blake Simmons; Troy Moore; Richard Waterman; Stephen L Johnson; Yijun Ruan; Chia Lin Wei; S Mathavan; Preethi H Gunaratne; Jiaqian Wu; Angela M Garcia; Stephen W Hulyk; Edwin Fuh; Ye Yuan; Anna Sneed; Carla Kowis; Anne Hodgson; Donna M Muzny; John McPherson; Richard A Gibbs; Jessica Fahey; Erin Helton; Mark Ketteman; Anuradha Madan; Stephanie Rodrigues; Amy Sanchez; Michelle Whiting; Anup Madari; Alice C Young; Keith D Wetherby; Steven J Granite; Peggy N Kwong; Charles P Brinkley; Russell L Pearson; Gerard G Bouffard; Robert W Blakesly; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Yaron S N Butterfield; Malachi Griffith; Obi L Griffith; Martin I Krzywinski; Nancy Liao; Ryan Morin; Ryan Morrin; Diana Palmquist; Anca S Petrescu; Ursula Skalska; Duane E Smailus; Jeff M Stott; Angelique Schnerch; Jacqueline E Schein; Steven J M Jones; Robert A Holt; Agnes Baross; Marco A Marra; Sandra Clifton; Kathryn A Makowski; Stephanie Bosak; Joel Malek
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

9.  Concatenation cDNA sequencing for transcriptome analysis.

Authors:  Preethi H Gunaratne; Jia Qian Wu; Angela M Garcia; Steven Hulyk; Kim C Worley; Judith F Margolin; Richard A Gibbs
Journal:  C R Biol       Date:  2003 Oct-Nov       Impact factor: 1.583

10.  A Drosophila full-length cDNA resource.

Authors:  Mark Stapleton; Joe Carlson; Peter Brokstein; Charles Yu; Mark Champe; Reed George; Hannibal Guarin; Brent Kronmiller; Joanne Pacleb; Soo Park; Ken Wan; Gerald M Rubin; Susan E Celniker
Journal:  Genome Biol       Date:  2002-12-23       Impact factor: 13.583

View more
  11 in total

1.  Placing confidence limits on the molecular age of the human-chimpanzee divergence.

Authors:  Sudhir Kumar; Alan Filipski; Vinod Swarna; Alan Walker; S Blair Hedges
Journal:  Proc Natl Acad Sci U S A       Date:  2005-12-19       Impact factor: 11.205

2.  Ensembl 2006.

Authors:  E Birney; D Andrews; M Caccamo; Y Chen; L Clarke; G Coates; T Cox; F Cunningham; V Curwen; T Cutts; T Down; R Durbin; X M Fernandez-Suarez; P Flicek; S Gräf; M Hammond; J Herrero; K Howe; V Iyer; K Jekosch; A Kähäri; A Kasprzyk; D Keefe; F Kokocinski; E Kulesha; D London; I Longden; C Melsopp; P Meidl; B Overduin; A Parker; G Proctor; A Prlic; M Rae; D Rios; S Redmond; M Schuster; I Sealy; S Searle; J Severin; G Slater; D Smedley; J Smith; A Stabenau; J Stalker; S Trevanion; A Ureta-Vidal; J Vogel; S White; C Woodwark; T J P Hubbard
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

3.  Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes.

Authors:  Mikita Suyama; Eoghan Harrington; Peer Bork; David Torrents
Journal:  PLoS Comput Biol       Date:  2006-05-16       Impact factor: 4.475

4.  Systematic identification of pseudogenes through whole genome expression evidence profiling.

Authors:  Alison Yao; Rosane Charlab; Peter Li
Journal:  Nucleic Acids Res       Date:  2006-08-31       Impact factor: 16.971

5.  Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes.

Authors:  Robert Castelo; Alexandre Reymond; Carine Wyss; Francisco Câmara; Genís Parra; Stylianos E Antonarakis; Roderic Guigó; Eduardo Eyras
Journal:  Nucleic Acids Res       Date:  2005-04-04       Impact factor: 16.971

6.  Violating the splicing rules: TG dinucleotides function as alternative 3' splice sites in U2-dependent introns.

Authors:  Karol Szafranski; Stefanie Schindler; Stefan Taudien; Michael Hiller; Klaus Huse; Niels Jahn; Stefan Schreiber; Rolf Backofen; Matthias Platzer
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

7.  Retrocopy contributions to the evolution of the human genome.

Authors:  Robert Baertsch; Mark Diekhans; W James Kent; David Haussler; Jürgen Brosius
Journal:  BMC Genomics       Date:  2008-10-08       Impact factor: 3.969

8.  Identification of RNA editing sites in the SNP database.

Authors:  Eli Eisenberg; Konstantin Adamsky; Lital Cohen; Ninette Amariglio; Abraham Hirshberg; Gideon Rechavi; Erez Y Levanon
Journal:  Nucleic Acids Res       Date:  2005-08-12       Impact factor: 16.971

9.  Genetic algorithm learning as a robust approach to RNA editing site prediction.

Authors:  James Thompson; Shuba Gopal
Journal:  BMC Bioinformatics       Date:  2006-03-16       Impact factor: 3.169

10.  Mutational profiling in acute lymphoblastic leukemia by RNA sequencing and chromosomal genomic array testing.

Authors:  Cecilia Yeung; Xiaoyu Qu; Olga Sala-Torra; David Woolston; Jerry Radich; Min Fang
Journal:  Cancer Med       Date:  2021-07-20       Impact factor: 4.452

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.