Literature DB >> 12560505

A novel algorithm for computational identification of contaminated EST libraries.

Rotem Sorek1, Hershel M Safer.   

Abstract

A key goal of the Human Genome Project was to understand the complete set of human proteins, the proteome. Since the genome sequence by itself is not sufficient for predicting new genes and alternative splicing events that lead to new proteins, expressed sequence tags (ESTs) are used as the primary tool for these purposes. The high prevalence of artifacts in dbEST, however, often leads to invalid predictions. Here we describe a novel method for recognizing genomic DNA contamination and other artifacts that cannot be identified using current EST cleaning techniques. Our method uses the alignment of the entire set of ESTs to the human genome to identify highly contaminated EST libraries. We discovered 53 highly contaminated libraries and a subset of 24 766 ESTs from these libraries that probably represent contamination with genomic DNA, pre-mRNA, and ESTs that span non-canonical introns. Although this is only a small fraction of the entire EST dataset, each contaminating sequence could create a spurious transcript prediction. Indeed, in the clustering and assembly tool that we used, these sequences would have caused incorrect inference of 9575 new splice variants and 6370 new genes. Conclusions based on EST analysis, including prediction of alternative splicing, should be re-evaluated in light of these results. Our method, along with the identified set of contaminated sequences, will be essential for applications that depend on large EST datasets.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12560505      PMCID: PMC149192          DOI: 10.1093/nar/gkg170

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  53 in total

1.  Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled expressed sequence tags.

Authors:  K Garg; P Green; D A Nickerson
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

2.  ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome.

Authors:  L Croft; S Schandorff; F Clark; K Burrage; P Arctander; J S Mattick
Journal:  Nat Genet       Date:  2000-04       Impact factor: 38.330

3.  EST comparison indicates 38% of human mRNAs contain possible alternative splice forms.

Authors:  D Brett; J Hanke; G Lehmann; S Haase; S Delbrück; S Krueger; J Reich; P Bork
Journal:  FEBS Lett       Date:  2000-05-26       Impact factor: 4.124

4.  Gene index analysis of the human genome estimates approximately 120,000 genes.

Authors:  F Liang; I Holt; G Pertea; S Karamycheva; S L Salzberg; J Quackenbush
Journal:  Nat Genet       Date:  2000-06       Impact factor: 38.330

5.  Analysis of expressed sequence tags indicates 35,000 human genes.

Authors:  B Ewing; P Green
Journal:  Nat Genet       Date:  2000-06       Impact factor: 38.330

6.  Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags.

Authors:  S J de Souza; A A Camargo; M R Briones; F F Costa; M A Nagai; S Verjovski-Almeida; M A Zago; L E Andrade; H Carrer; H F El-Dorry; E M Espreafico; A Habr-Gama; D Giannella-Neto; G H Goldman; A Gruber; C Hackel; E T Kimura; R M Maciel; S K Marie; E A Martins; M P Nobrega; M L Paco-Larson; M I Pardini; G G Pereira; J B Pesquero; V Rodrigues; S R Rogatto; I D da Silva; M C Sogayar; M de Fátima Sonati; E H Tajara; S R Valentini; M Acencio; F L Alberto; M E Amaral; I Aneas; M H Bengtson; D M Carraro; A F Carvalho; L H Carvalho; J M Cerutti; M L Corrêa; M C Costa; C Curcio; T Gushiken; P L Ho; E Kimura; L C Leite; G Maia; P Majumder; M Marins; A Matsukuma; A S Melo; C A Mestriner; E C Miracca; D C Miranda; A N Nascimento; F G Nóbrega; E P Ojopi; J R Pandolfi; L G Pessoa; P Rahal; C A Rainho; N da Rós; R G de Sá; M M Sales; N P da Silva; T C Silva; W da Silva; D F Simão; J F Sousa; D Stecconi; F Tsukumo; V Valente; H Zalcbeg; R R Brentani; F L Reis; E Dias-Neto; A J Simpson
Journal:  Proc Natl Acad Sci U S A       Date:  2000-11-07       Impact factor: 11.205

7.  Identification of foreign gene sequences by transcript filtering against the human genome.

Authors:  Griffin Weber; Jay Shendure; David M Tanenbaum; George M Church; Matthew Meyerson
Journal:  Nat Genet       Date:  2002-01-14       Impact factor: 38.330

8.  Genome annotation assessment in Drosophila melanogaster.

Authors:  M G Reese; G Hartzell; N L Harris; U Ohler; J F Abril; S E Lewis
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

9.  Comprehensive analyses of prostate gene expression: convergence of expressed sequence tag databases, transcript profiling and proteomics.

Authors:  P S Nelson; D Han; Y Rochon; G L Corthals; B Lin; A Monson; V Nguyen; B R Franza; S R Plymate; R Aebersold; L Hood
Journal:  Electrophoresis       Date:  2000-05       Impact factor: 3.535

10.  Frequent alternative splicing of human genes.

Authors:  A A Mironov; J W Fickett; M S Gelfand
Journal:  Genome Res       Date:  1999-12       Impact factor: 9.043

View more
  34 in total

1.  Intronic sequences flanking alternatively spliced exons are conserved between human and mouse.

Authors:  Rotem Sorek; Gil Ast
Journal:  Genome Res       Date:  2003-07       Impact factor: 9.043

2.  A transcript finishing initiative for closing gaps in the human transcriptome.

Authors:  Mari Cleide Sogayar; Anamaria A Camargo; Fabiana Bettoni; Dirce Maria Carraro; Lilian C Pires; Raphael B Parmigiani; Elisa N Ferreira; Eloísa de Sá Moreira; Maria do Rosário D de O Latorre; Andrew J G Simpson; Luciana Oliveira Cruz; Theri Leica Degaki; Fernanda Festa; Katlin B Massirer; Mari C Sogayar; Fernando Camargo Filho; Luiz Paulo Camargo; Marco A V Cunha; Sandro J De Souza; Milton Faria; Silvana Giuliatti; Leonardo Kopp; Paulo S L de Oliveira; Paulo B Paiva; Anderson A Pereira; Daniel G Pinheiro; Renato D Puga; Jorge Estefano S de Souza; Dulcineia M Albuquerque; Luís E C Andrade; Gilson S Baia; Marcelo R S Briones; Ana M S Cavaleiro-Luna; Janete M Cerutti; Fernando F Costa; Eugenia Costanzi-Strauss; Enilza M Espreafico; Adriana C Ferrasi; Emer S Ferro; Maria A H Z Fortes; Joelma R F Furchi; Daniel Giannella-Neto; Gustavo H Goldman; Maria H S Goldman; Arthur Gruber; Gustavo S Guimarães; Christine Hackel; Flavio Henrique-Silva; Edna T Kimura; Suzana G Leoni; Cláudia Macedo; Bettina Malnic; Carina V Manzini B; Suely K N Marie; Nilce M Martinez-Rossi; Marcelo Menossi; Elisabete C Miracca; Maria A Nagai; Francisco G Nobrega; Marina P Nobrega; Sueli M Oba-Shinjo; Márika K Oliveira; Guilherme M Orabona; Audrey Y Otsuka; Maria L Paço-Larson; Beatriz M C Paixão; Jose R C Pandolfi; Maria I M C Pardini; Maria R Passos Bueno; Geraldo A S Passos; Joao B Pesquero; Juliana G Pessoa; Paula Rahal; Cláudia A Rainho; Caroline P Reis; Tatiana I Ricca; Vanderlei Rodrigues; Silvia R Rogatto; Camila M Romano; Janaína G Romeiro; Antonio Rossi; Renata G Sá; Magaly M Sales; Simone C Sant'Anna; Patrícia L Santarosa; Fernando Segato; Wilson A Silva; Ismael D C G Silva; Neusa P Silva; Andrea Soares-Costa; Maria F Sonati; Bryan E Strauss; Eloiza H Tajara; Sandro R Valentini; Fabiola E Villanova; Laura S Ward; Dalila L Zanette
Journal:  Genome Res       Date:  2004-06-14       Impact factor: 9.043

3.  ASmodeler: gene modeling of alternative splicing from genomic alignment of mRNA, EST and protein sequences.

Authors:  Namshin Kim; Seokmin Shin; Sanghyuk Lee
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

4.  A non-EST-based method for exon-skipping prediction.

Authors:  Rotem Sorek; Ronen Shemesh; Yuval Cohen; Ortal Basechess; Gil Ast; Ron Shamir
Journal:  Genome Res       Date:  2004-08       Impact factor: 9.043

5.  No evidence for lateral gene transfer between salmonids and schistosomes.

Authors:  Christoph Grunau; Járôme Boissier
Journal:  Nat Genet       Date:  2010-11       Impact factor: 38.330

6.  ECgene: genome-based EST clustering and gene modeling for alternative splicing.

Authors:  Namshin Kim; Seokmin Shin; Sanghyuk Lee
Journal:  Genome Res       Date:  2005-04       Impact factor: 9.043

7.  Sequence conservation, relative isoform frequencies, and nonsense-mediated decay in evolutionarily conserved alternative splicing.

Authors:  Daehyun Baek; Phil Green
Journal:  Proc Natl Acad Sci U S A       Date:  2005-08-25       Impact factor: 11.205

8.  Transcription-mediated gene fusion in the human genome.

Authors:  Pinchas Akiva; Amir Toporik; Sarit Edelheit; Yifat Peretz; Alex Diber; Ronen Shemesh; Amit Novik; Rotem Sorek
Journal:  Genome Res       Date:  2005-12-12       Impact factor: 9.043

9.  Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life.

Authors:  Chris Todd Hittinger; Mark Johnston; John T Tossberg; Antonis Rokas
Journal:  Proc Natl Acad Sci U S A       Date:  2010-01-04       Impact factor: 11.205

10.  A novel human heparanase splice variant, T5, endowed with protumorigenic characteristics.

Authors:  Uri Barash; Victoria Cohen-Kaplan; Gil Arvatz; Svetlana Gingis-Velitski; Flonia Levy-Adam; Ofer Nativ; Ronen Shemesh; Michal Ayalon-Sofer; Neta Ilan; Israel Vlodavsky
Journal:  FASEB J       Date:  2009-12-09       Impact factor: 5.191

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.