Literature DB >> 32058803

Graph Traversal Edit Distance and Extensions.

Ali Ebrahimpour Boroojeny1, Akash Shrestha1, Ali Sharifi-Zarchi2, Suzanne Renick Gallagher1, S Cenk Sahinalp3, Hamidreza Chitsaz1.   

Abstract

Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.

Keywords:  assembly graph; clustering genera; co-assembly; de novo variation detection; graph comparison; graph kernel

Mesh:

Year:  2020        PMID: 32058803      PMCID: PMC7133423          DOI: 10.1089/cmb.2019.0511

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  15 in total

1.  An Eulerian path approach to DNA fragment assembly.

Authors:  P A Pevzner; H Tang; M S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2001-08-14       Impact factor: 11.205

2.  Statistical evaluation of the Predictive Toxicology Challenge 2000-2001.

Authors:  Hannu Toivonen; Ashwin Srinivasan; Ross D King; Stefan Kramer; Christoph Helma
Journal:  Bioinformatics       Date:  2003-07-01       Impact factor: 6.937

Review 3.  Drug research: myths, hype and reality.

Authors:  Hugo Kubinyi
Journal:  Nat Rev Drug Discov       Date:  2003-08       Impact factor: 84.694

4.  Simultaneous structural variation discovery among multiple paired-end sequenced genomes.

Authors:  Fereydoun Hormozdiari; Iman Hajirasouliha; Andrew McPherson; Evan E Eichler; S Cenk Sahinalp
Journal:  Genome Res       Date:  2011-11-02       Impact factor: 9.043

5.  Protein function prediction via graph kernels.

Authors:  Karsten M Borgwardt; Cheng Soon Ong; Stefan Schönauer; S V N Vishwanathan; Alex J Smola; Hans-Peter Kriegel
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

6.  Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

Authors:  Zeinab Taghavi; Narjes S Movahedi; Sorin Draghici; Hamidreza Chitsaz
Journal:  Bioinformatics       Date:  2013-08-05       Impact factor: 6.937

7.  PyGTED: Python Application for Computing Graph Traversal Edit Distance.

Authors:  Ali Ebrahimpour Boroojeny; Akash Shrestha; Ali Sharifi-Zarchi; Suzanne Renick Gallagher; Süleyman Cenk Sahinalp; Hamidreza Chitsaz
Journal:  J Comput Biol       Date:  2020-03       Impact factor: 1.479

8.  Toward simplifying and accurately formulating fragment assembly.

Authors:  E W Myers
Journal:  J Comput Biol       Date:  1995       Impact factor: 1.479

9.  Efficient construction of an assembly string graph using the FM-index.

Authors:  Jared T Simpson; Richard Durbin
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

10.  Efficient Synergistic Single-Cell Genome Assembly.

Authors:  Narjes S Movahedi; Mallory Embree; Harish Nagarajan; Karsten Zengler; Hamidreza Chitsaz
Journal:  Front Bioeng Biotechnol       Date:  2016-05-23
View more
  2 in total

1.  PyGTED: Python Application for Computing Graph Traversal Edit Distance.

Authors:  Ali Ebrahimpour Boroojeny; Akash Shrestha; Ali Sharifi-Zarchi; Suzanne Renick Gallagher; Süleyman Cenk Sahinalp; Hamidreza Chitsaz
Journal:  J Comput Biol       Date:  2020-03       Impact factor: 1.479

2.  The effect of genome graph expressiveness on the discrepancy between genome graph distance and string set distance.

Authors:  Yutong Qiu; Carl Kingsford
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.