Literature DB >> 10568754

A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base.

R T Miller1, A G Christoffels, C Gopalakrishnan, J Burke, A A Ptitsyn, T R Broveak, W A Hide.   

Abstract

The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313, 103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1. 86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented.

Entities:  

Mesh:

Year:  1999        PMID: 10568754      PMCID: PMC310831          DOI: 10.1101/gr.9.11.1143

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  23 in total

1.  Sequence identification of 2,375 human brain genes.

Authors:  M D Adams; M Dubnick; A R Kerlavage; R Moreno; J M Kelley; T R Utterback; J W Nagle; C Fields; J C Venter
Journal:  Nature       Date:  1992-02-13       Impact factor: 49.962

2.  Complementary DNA sequencing: expressed sequence tags and human genome project.

Authors:  M D Adams; J M Kelley; J D Gocayne; M Dubnick; M H Polymeropoulos; H Xiao; C R Merril; A Wu; B Olde; R F Moreno
Journal:  Science       Date:  1991-06-21       Impact factor: 47.728

3.  A novel system for large-scale sequencing of cDNA by PCR amplification.

Authors:  K Okubo; N Hori; R Matoba; T Niiyama; K Matsubara
Journal:  DNA Seq       Date:  1991

4.  ESTablishing a human transcript map.

Authors:  M S Boguski; G D Schuler
Journal:  Nat Genet       Date:  1995-08       Impact factor: 38.330

5.  Construction of a normalized directionally cloned cDNA library from adult heart and analysis of 3040 clones by partial sequencing.

Authors:  T Tanaka; A Ogiwara; I Uchiyama; T Takagi; Y Yazaki; Y Nakamura
Journal:  Genomics       Date:  1996-07-01       Impact factor: 5.736

6.  Mutations in a novel retina-specific gene cause autosomal dominant retinitis pigmentosa.

Authors:  L S Sullivan; J R Heckenlively; S J Bowne; J Zuo; W A Hide; A Gal; M Denton; C F Inglehearn; S H Blanton; S P Daiger
Journal:  Nat Genet       Date:  1999-07       Impact factor: 38.330

7.  The Genexpress Index: a resource for gene discovery and the genic map of the human genome.

Authors:  R Houlgatte; R Mariage-Samson; S Duprat; A Tessier; S Bentolila; B Lamy; C Auffray
Journal:  Genome Res       Date:  1995-10       Impact factor: 9.043

8.  The genetic data environment an expandable GUI for multiple sequence analysis.

Authors:  S W Smith; R Overbeek; C R Woese; W Gilbert; P M Gillevet
Journal:  Comput Appl Biosci       Date:  1994-12

9.  d2_cluster: a validated method for clustering EST and full-length cDNAsequences.

Authors:  J Burke; D Davison; W Hide
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

Review 10.  Identification of new genes by systematic analysis of cDNAs and database construction.

Authors:  K Matsubara; K Okubo
Journal:  Curr Opin Biotechnol       Date:  1993-12       Impact factor: 9.740

View more
  53 in total

1.  GBuilder--an application for the visualization and integration of EST cluster data.

Authors:  J Muilu; P Rodriguez-Tomé; A Robinson
Journal:  Genome Res       Date:  2001-01       Impact factor: 9.043

2.  Computer-based methods for the mouse full-length cDNA encyclopedia: real-time sequence clustering for construction of a nonredundant cDNA library.

Authors:  H Konno; Y Fukunishi; K Shibata; M Itoh; P Carninci; Y Sugahara; Y Hayashizaki
Journal:  Genome Res       Date:  2001-02       Impact factor: 9.043

3.  STACK: Sequence Tag Alignment and Consensus Knowledgebase.

Authors:  A Christoffels; A van Gelder; G Greyling; R Miller; T Hide; W Hide
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

4.  TissueInfo: high-throughput identification of tissue expression profiles and specificity.

Authors:  L Skrabanek; F Campagne
Journal:  Nucleic Acids Res       Date:  2001-11-01       Impact factor: 16.971

5.  Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.

Authors:  Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells
Journal:  Plant Mol Biol       Date:  2002 Mar-Apr       Impact factor: 4.076

6.  Efficient clustering of large EST data sets on parallel computers.

Authors:  Anantharaman Kalyanaraman; Srinivas Aluru; Suresh Kothari; Volker Brendel
Journal:  Nucleic Acids Res       Date:  2003-06-01       Impact factor: 16.971

7.  Identification, characterization and utilization of EST-derived genic microsatellite markers for genome analyses of coffee and related species.

Authors:  Ramesh K Aggarwal; Prasad S Hendre; Rajeev K Varshney; Prasanna R Bhat; V Krishnakumar; Lalji Singh
Journal:  Theor Appl Genet       Date:  2006-11-18       Impact factor: 5.699

8.  An interactive bovine in silico SNP database (IBISS).

Authors:  Rachel J Hawken; Wesley C Barris; Sean M McWilliam; Brian P Dalrymple
Journal:  Mamm Genome       Date:  2004-10       Impact factor: 2.957

9.  A unigene catalogue of 5700 expressed genes in cassava.

Authors:  Camilo Lopez; Véronique Jorge; Benoît Piégu; Chickelu Mba; Diego Cortes; Silvia Restrepo; Mauricio Soto; Michèle Laudié; Christel Berger; Richard Cooke; Michel Delseny; Joe Tohme; Valérie Verdier
Journal:  Plant Mol Biol       Date:  2004-11       Impact factor: 4.076

10.  Gene expression profiling in rice young panicle and vegetative organs and identification of panicle-specific genes through known gene functions.

Authors:  Jiabin Tang; Hong'ai Xia; Dayong Li; Mengliang Cao; Yong Tao; Wei Tong; Xiuqing Zhang; Songnian Hu; Jian Wang; Jun Yu; Huanming Yang; Lihuang Zhu
Journal:  Mol Genet Genomics       Date:  2005-10-07       Impact factor: 3.291

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.