Literature DB >> 20740280

Classification of nucleotide sequences using support vector machines.

Tae-Kun Seo1.   

Abstract

Species identification is one of the most important issues in biological studies. Due to recent increases in the amount of genomic information available and the development of DNA sequencing technologies, the applicability of using DNA sequences to identify species (commonly referred to as "DNA barcoding") is being tested in many areas. Several methods have been suggested to identify species using DNA sequences, including similarity scores, analysis of phylogenetic and population genetic information, and detection of species-specific sequence patterns. Although these methods have demonstrated good performance under a range of circumstances, they also have limitations, as they are subject to loss of information, require intensive computation and are sensitive to model mis-specification, and can be difficult to evaluate in terms of the significance of identification. Here, we suggest a new DNA barcoding method in which support vector machine (SVM) procedures are adopted. Our new method is nonparametric and thus is expected to be robust for a wide range of evolutionary scenarios as well as multilocus analyses. Furthermore, we describe bootstrap procedures that can be used to test the significances of species identifications. We implemented a novel conversion technique for transforming sequence data to real-valued vectors, and therefore, bootstrap procedures can be easily combined with our SVM approach. In this study, we present the results of simulation studies and empirical data analyses to demonstrate the performance of our method and discuss its properties.

Mesh:

Substances:

Year:  2010        PMID: 20740280     DOI: 10.1007/s00239-010-9380-9

Source DB:  PubMed          Journal:  J Mol Evol        ISSN: 0022-2844            Impact factor:   2.395


  49 in total

1.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci.

Authors:  Bruce Rannala; Ziheng Yang
Journal:  Genetics       Date:  2003-08       Impact factor: 4.562

2.  Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data.

Authors:  Tae-Kun Seo; Hirohisa Kishino; Jeffrey L Thorne
Journal:  Proc Natl Acad Sci U S A       Date:  2005-03-11       Impact factor: 11.205

3.  Wedding biodiversity inventory of a large and complex Lepidoptera fauna with DNA barcoding.

Authors:  Daniel H Janzen; Mehrdad Hajibabaei; John M Burns; Winnie Hallwachs; Ed Remigio; Paul D N Hebert
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2005-10-29       Impact factor: 6.237

Review 4.  DNA barcodes for biosecurity: invasive species identification.

Authors:  K F Armstrong; S L Ball
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2005-10-29       Impact factor: 6.237

5.  Identifying Canadian mosquito species through DNA barcodes.

Authors:  A Cywinska; F F Hunter; P D N Hebert
Journal:  Med Vet Entomol       Date:  2006-12       Impact factor: 2.739

6.  Calculating bootstrap probabilities of phylogeny using multilocus sequence data.

Authors:  Tae-Kun Seo
Journal:  Mol Biol Evol       Date:  2008-02-14       Impact factor: 16.240

7.  The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments.

Authors:  Huixiao Hong; Qilong Hong; Roger Perkins; Leming Shi; Hong Fang; Zhenqiang Su; Yvonne Dragan; James C Fuscoe; Weida Tong
Journal:  J Comput Biol       Date:  2009-12       Impact factor: 1.479

8.  Inferring species membership using DNA sequences with back-propagation neural networks.

Authors:  A B Zhang; D S Sikes; C Muster; S Q Li
Journal:  Syst Biol       Date:  2008-04       Impact factor: 15.683

9.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

10.  Estimating the pattern of nucleotide substitution.

Authors:  Z Yang
Journal:  J Mol Evol       Date:  1994-07       Impact factor: 2.395

View more
  8 in total

1.  Can artificial neural replicators be useful for studying RNA replicators?

Authors:  Alexandr A Ezhov
Journal:  Arch Virol       Date:  2020-08-19       Impact factor: 2.574

2.  DNA barcoding of recently diverged species: relative performance of matching methods.

Authors:  Robin van Velzen; Emanuel Weitschek; Giovanni Felici; Freek T Bakker
Journal:  PLoS One       Date:  2012-01-17       Impact factor: 3.240

3.  A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

Authors:  Ai-bing Zhang; Jie Feng; Robert D Ward; Ping Wan; Qiang Gao; Jun Wu; Wei-zhong Zhao
Journal:  PLoS One       Date:  2012-02-20       Impact factor: 3.240

4.  Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods.

Authors:  Yan Liu; Wenxiang Gu; Wenyi Zhang; Jianan Wang
Journal:  Biomed Res Int       Date:  2015-04-15       Impact factor: 3.411

5.  Identification of cichlid fishes from Lake Malawi using computer vision.

Authors:  Deokjin Joo; Ye-seul Kwan; Jongwoo Song; Catarina Pinho; Jody Hey; Yong-Jin Won
Journal:  PLoS One       Date:  2013-10-25       Impact factor: 3.240

6.  Biodefense Oriented Genomic-Based Pathogen Classification Systems: Challenges and Opportunities.

Authors:  Willy A Valdivia-Granda
Journal:  J Bioterror Biodef       Date:  2012-03-16

7.  matK-QR classifier: a patterns based approach for plant species identification.

Authors:  Ravi Prabhakar More; Rupali Chandrashekhar Mane; Hemant J Purohit
Journal:  BioData Min       Date:  2016-12-09       Impact factor: 2.522

8.  Automated high throughput animal CO1 metabarcode classification.

Authors:  Teresita M Porter; Mehrdad Hajibabaei
Journal:  Sci Rep       Date:  2018-03-09       Impact factor: 4.379

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.