Literature DB >> 15130818

Text mining of DNA sequence homology searches.

John McCallum1, Siva Ganesh.   

Abstract

Primary tasks in analysis and annotation of expressed sequence tag (EST) datasets are to identify similarity among sequences by unsupervised clustering and assign putative function based on BLAST homology searches. We investigated the usefulness of text mining as a simple approach for further higher-level clustering of EST datasets using IBM Intelligent Miner for Text v2.3 tools. Agglomerative and k-means clustering tools were used to cluster BLASTx homology search documents from two onion EST datasets and optimised by pre-processing and pruning. Subjective evaluation confirmed that these tools provided biologically useful and complementary views of the two libraries, provided new insights into their composition and revealed clusters previously identified by human experts. We compared BLASTx textual clusters for two gene families with their DNA sequence-based clusters and confirmed that these shared similar morphology.

Entities:  

Mesh:

Year:  2003        PMID: 15130818

Source DB:  PubMed          Journal:  Appl Bioinformatics        ISSN: 1175-5636


  2 in total

1.  Mapping SNP-anchored genes using high-resolution melting analysis in almond.

Authors:  Shu-Biao Wu; Iraj Tavassolian; Gholamreza Rabiei; Peter Hunt; Michelle Wirthensohn; John P Gibson; Christopher M Ford; Margaret Sedgley
Journal:  Mol Genet Genomics       Date:  2009-06-14       Impact factor: 3.291

2.  MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

Authors:  Kristen M Naegle; Roy E Welsch; Michael B Yaffe; Forest M White; Douglas A Lauffenburger
Journal:  PLoS Comput Biol       Date:  2011-07-21       Impact factor: 4.475

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.