Literature DB >> 15189818

EST clustering error evaluation and correction.

Ji-Ping Z Wang1, Bruce G Lindsay, James Leebens-Mack, Liying Cui, Kerr Wall, Webb C Miller, Claude W dePamphilis.   

Abstract

MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated.
RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.

Entities:  

Mesh:

Year:  2004        PMID: 15189818     DOI: 10.1093/bioinformatics/bth342

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  38 in total

1.  Consistent annotation of gene expression arrays.

Authors:  Benoît Ballester; Nathan Johnson; Glenn Proctor; Paul Flicek
Journal:  BMC Genomics       Date:  2010-05-11       Impact factor: 3.969

2.  Analysis of 13000 unique Citrus clusters associated with fruit quality, production and salinity tolerance.

Authors:  Javier Terol; Ana Conesa; Jose M Colmenero; Manuel Cercos; Francisco Tadeo; Javier Agustí; Enriqueta Alós; Fernando Andres; Guillermo Soler; Javier Brumos; Domingo J Iglesias; Stefan Götz; Francisco Legaz; Xavier Argout; Brigitte Courtois; Patrick Ollitrault; Carole Dossat; Patrick Wincker; Raphael Morillon; Manuel Talon
Journal:  BMC Genomics       Date:  2007-01-25       Impact factor: 3.969

3.  A high-throughput data mining of single nucleotide polymorphisms in Coffea species expressed sequence tags suggests differential homeologous gene expression in the allotetraploid Coffea arabica.

Authors:  Ramon Oliveira Vidal; Jorge Maurício Costa Mondego; David Pot; Alinne Batista Ambrósio; Alan Carvalho Andrade; Luiz Filipe Protasio Pereira; Carlos Augusto Colombo; Luiz Gonzaga Esteves Vieira; Marcelo Falsarella Carazzolle; Gonçalo Amarante Guimarães Pereira
Journal:  Plant Physiol       Date:  2010-09-23       Impact factor: 8.340

4.  Sorghum expressed sequence tags identify signature genes for drought, pathogenesis, and skotomorphogenesis from a milestone set of 16,801 unique transcripts.

Authors:  Lee H Pratt; Chun Liang; Manish Shah; Feng Sun; Haiming Wang; St Patrick Reid; Alan R Gingle; Andrew H Paterson; Rod Wing; Ralph Dean; Robert Klein; Henry T Nguyen; Hong-Mei Ma; Xin Zhao; Daryl T Morishige; John E Mullet; Marie-Michèle Cordonnier-Pratt
Journal:  Plant Physiol       Date:  2005-09-16       Impact factor: 8.340

5.  PEACE: Parallel Environment for Assembly and Clustering of Gene Expression.

Authors:  D M Rao; J C Moler; M Ozden; Y Zhang; C Liang; J E Karro
Journal:  Nucleic Acids Res       Date:  2010-06-03       Impact factor: 16.971

6.  Orthology, function and evolution of accessory gland proteins in the Drosophila repleta group.

Authors:  Francisca C Almeida; Rob Desalle
Journal:  Genetics       Date:  2008-11-17       Impact factor: 4.562

7.  Expressed sequence tag analysis and development of gene associated markers in a near-isogenic plant system of Eragrostis curvula.

Authors:  Gerardo D L Cervigni; Norma Paniego; Marina Díaz; Juan P Selva; Diego Zappacosta; Darío Zanazzi; Iñaki Landerreche; Luciano Martelotto; Silvina Felitti; Silvina Pessino; Germán Spangenberg; Viviana Echenique
Journal:  Plant Mol Biol       Date:  2008-01-15       Impact factor: 4.076

8.  k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage.

Authors:  Lauren M Bragg; Glenn Stone
Journal:  Bioinformatics       Date:  2009-07-01       Impact factor: 6.937

9.  EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data.

Authors:  Ernesto Picardi; Flavio Mignone; Graziano Pesole
Journal:  BMC Bioinformatics       Date:  2009-06-16       Impact factor: 3.169

10.  Comparison of next generation sequencing technologies for transcriptome characterization.

Authors:  P Kerr Wall; Jim Leebens-Mack; André S Chanderbali; Abdelali Barakat; Erik Wolcott; Haiying Liang; Lena Landherr; Lynn P Tomsho; Yi Hu; John E Carlson; Hong Ma; Stephan C Schuster; Douglas E Soltis; Pamela S Soltis; Naomi Altman; Claude W dePamphilis
Journal:  BMC Genomics       Date:  2009-08-01       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.