Literature DB >> 8889550

Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.

J S Aaronson1, B Eckman, R A Blevins, J A Borkowski, J Myerson, S Imran, K O Elliston.   

Abstract

A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.

Entities:  

Mesh:

Year:  1996        PMID: 8889550     DOI: 10.1101/gr.6.9.829

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  31 in total

1.  GBuilder--an application for the visualization and integration of EST cluster data.

Authors:  J Muilu; P Rodriguez-Tomé; A Robinson
Journal:  Genome Res       Date:  2001-01       Impact factor: 9.043

2.  Shotgun sequencing of the human transcriptome with ORF expressed sequence tags.

Authors:  E Dias Neto; R G Correa; S Verjovski-Almeida; M R Briones; M A Nagai; W da Silva; M A Zago; S Bordin; F F Costa; G H Goldman; A F Carvalho; A Matsukuma; G S Baia; D H Simpson; A Brunstein; P S de Oliveira; P Bucher; C V Jongeneel; M J O'Hare; F Soares; R R Brentani; L F Reis; S J de Souza; A J Simpson
Journal:  Proc Natl Acad Sci U S A       Date:  2000-03-28       Impact factor: 11.205

3.  Gene2EST: a BLAST2 server for searching expressed sequence tag (EST) databases with eukaryotic gene-sized queries.

Authors:  C Gemünd; C Ramu; B Altenberg-Greulich; T J Gibson
Journal:  Nucleic Acids Res       Date:  2001-03-15       Impact factor: 16.971

4.  In silico cloning of novel endothelial-specific genes.

Authors:  L Huminiecki; R Bicknell
Journal:  Genome Res       Date:  2000-11       Impact factor: 9.043

5.  BodyMap: a collection of 3' ESTs for analysis of human gene expression information.

Authors:  S Kawamoto; J Yoshii; K Mizuno; K Ito; Y Miyamoto; T Ohnishi; R Matoba; N Hori; Y Matsumoto; T Okumura; Y Nakao; H Yoshii; J Arimoto; H Ohashi; H Nakanishi; I Ohno; J Hashimoto; K Shimizu; K Maeda; H Kuriyama; K Nishida; A Shimizu-Matsumoto; W Adachi; R Ito; S Kawasaki; K S Chae
Journal:  Genome Res       Date:  2000-11       Impact factor: 9.043

6.  SAGEmap: a public gene expression resource.

Authors:  A E Lash; C M Tolstoshev; L Wagner; G D Schuler; R L Strausberg; G J Riggins; S F Altschul
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

7.  Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription.

Authors:  Douglas Kyung Nam; Sanggyu Lee; Guolin Zhou; Xiaohong Cao; Clarence Wang; Terry Clark; Jianjun Chen; Janet D Rowley; San Ming Wang
Journal:  Proc Natl Acad Sci U S A       Date:  2002-04-23       Impact factor: 11.205

8.  A quantitative evaluation of SAGE.

Authors:  J Stollberg; J Urschitz; Z Urban; C D Boyd
Journal:  Genome Res       Date:  2000-08       Impact factor: 9.043

9.  Long-range heterogeneity at the 3' ends of human mRNAs.

Authors:  Christian Iseli; Brian J Stevenson; Sandro J de Souza; Helena B Samaia; Anamaria A Camargo; Kenneth H Buetow; Robert L Strausberg; Andrew J G Simpson; Philipp Bucher; C Victor Jongeneel
Journal:  Genome Res       Date:  2002-07       Impact factor: 9.043

10.  Directional cDNA library construction assisted by the in vitro recombination reaction.

Authors:  O Ohara; G Temple
Journal:  Nucleic Acids Res       Date:  2001-02-15       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.