Literature DB >> 15290785

Data mining tools for biological sequences.

Huiqing Liu1, Limsoon Wong.   

Abstract

We describe a methodology, as well as some related data mining tools, for analyzing sequence data. The methodology comprises three steps: (a) generating candidate features from the sequences, (b) selecting relevant features from the candidates, and (c) integrating the selected features to build a system to recognize specific properties in sequence data. We also give relevant techniques for each of these three steps. For generating candidate features, we present various types of features based on the idea of k-grams. For selecting relevant features, we discuss signal-to-noise, t-statistics, and entropy measures, as well as a correlation-based feature selection method. For integrating selected features, we use machine learning methods, including C4.5, SVM, and Naive Bayes. We illustrate this methodology on the problem of recognizing translation initiation sites. We discuss how to generate and select features that are useful for understanding the distinction between ATG sites that are translation initiation sites and those that are not. We also discuss how to use such features to build reliable systems for recognizing translation initiation sites in DNA sequences.

Mesh:

Substances:

Year:  2003        PMID: 15290785     DOI: 10.1142/s0219720003000216

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  20 in total

1.  The epg5 knockout zebrafish line: a model to study Vici syndrome.

Authors:  Giacomo Meneghetti; Tatjana Skobo; Martina Chrisam; Nicola Facchinello; Camilla Maria Fontana; Stefania Bellesso; Patrizia Sabatelli; Flavia Raggi; Francesco Cecconi; Paolo Bonaldo; Luisa Dalla Valle
Journal:  Autophagy       Date:  2019-03-17       Impact factor: 16.016

2.  Regular primary care plays a significant role in secondary prevention of ischemic heart disease in a Western Australian cohort.

Authors:  Kristjana Einarsdóttir; David B Preen; Jon D Emery; C D'Arcy J Holman
Journal:  J Gen Intern Med       Date:  2011-02-24       Impact factor: 5.128

3.  Regular primary care lowers hospitalisation risk and mortality in seniors with chronic respiratory diseases.

Authors:  Kristjana Einarsdóttir; David B Preen; Jon D Emery; Christopher Kelman; C D'Arcy J Holman
Journal:  J Gen Intern Med       Date:  2010-04-28       Impact factor: 5.128

4.  PreTIS: A Tool to Predict Non-canonical 5' UTR Translational Initiation Sites in Human and Mouse.

Authors:  Kerstin Reuter; Alexander Biehl; Laurena Koch; Volkhard Helms
Journal:  PLoS Comput Biol       Date:  2016-10-21       Impact factor: 4.475

5.  Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework.

Authors:  Yanju Zhang; Ruopeng Xie; Jiawei Wang; André Leier; Tatiana T Marquez-Lago; Tatsuya Akutsu; Geoffrey I Webb; Kuo-Chen Chou; Jiangning Song
Journal:  Brief Bioinform       Date:  2019-11-27       Impact factor: 11.622

6.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome.

Authors:  Arun K Ramani; Razvan C Bunescu; Raymond J Mooney; Edward M Marcotte
Journal:  Genome Biol       Date:  2005-04-15       Impact factor: 13.583

7.  Dragon TIS Spotter: an Arabidopsis-derived predictor of translation initiation sites in plants.

Authors:  Arturo Magana-Mora; Haitham Ashoor; Boris R Jankovic; Allan Kamau; Karim Awara; Rajesh Chowdhary; John A C Archer; Vladimir B Bajic
Journal:  Bioinformatics       Date:  2012-10-30       Impact factor: 6.937

8.  MiRTif: a support vector machine-based microRNA target interaction filter.

Authors:  Yuchen Yang; Yu-Ping Wang; Kuo-Bin Li
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

9.  Fast splice site detection using information content and feature reduction.

Authors:  A K M A Baten; S K Halgamuge; B C H Chang
Journal:  BMC Bioinformatics       Date:  2008-12-12       Impact factor: 3.169

10.  Features generated for computational splice-site prediction correspond to functional elements.

Authors:  Rezarta Islamaj Dogan; Lise Getoor; W John Wilbur; Stephen M Mount
Journal:  BMC Bioinformatics       Date:  2007-10-24       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.