Literature DB >> 17237070

An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

Kai Ye1, Walter A Kosters, Adriaan P Ijzerman.   

Abstract

MOTIVATION: Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets.
RESULTS: In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. AVAILABILITY: The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

Mesh:

Substances:

Year:  2007        PMID: 17237070     DOI: 10.1093/bioinformatics/btl665

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  10 in total

1.  Structural variations in protein superfamilies: actin and tubulin.

Authors:  Richard H Wade; Isabel Garcia-Saez; Frank Kozielski
Journal:  Mol Biotechnol       Date:  2009-01-08       Impact factor: 2.695

Review 2.  Analysis of next-generation genomic data in cancer: accomplishments and challenges.

Authors:  Li Ding; Michael C Wendl; Daniel C Koboldt; Elaine R Mardis
Journal:  Hum Mol Genet       Date:  2010-09-15       Impact factor: 6.150

3.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.

Authors:  Kai Ye; Marcel H Schulz; Quan Long; Rolf Apweiler; Zemin Ning
Journal:  Bioinformatics       Date:  2009-06-26       Impact factor: 6.937

4.  Breaking the computational barrier: a divide-conquer and aggregate based approach for Alu insertion site characterisation.

Authors:  Kun Zhang; Wei Fan; Prescott Deininger; Andrea Edwards; Zujia Xu; Dongxiao Zhu
Journal:  Int J Comput Biol Drug Des       Date:  2009-01-04

Review 5.  Expanding the computational toolbox for mining cancer genomes.

Authors:  Li Ding; Michael C Wendl; Joshua F McMichael; Benjamin J Raphael
Journal:  Nat Rev Genet       Date:  2014-07-08       Impact factor: 53.242

6.  PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.

Authors:  Yanju Zhang; Eric-Wubbo Lameijer; Peter A C 't Hoen; Zemin Ning; P Eline Slagboom; Kai Ye
Journal:  Bioinformatics       Date:  2012-01-04       Impact factor: 6.937

7.  Using machine learning tools for protein database biocuration assistance.

Authors:  Caroline König; Ilmira Shaim; Alfredo Vellido; Enrique Romero; René Alquézar; Jesús Giraldo
Journal:  Sci Rep       Date:  2018-07-05       Impact factor: 4.379

8.  PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction.

Authors:  Yongyong Kang; Xiaofei Yang; Jiadong Lin; Kai Ye
Journal:  Genes (Basel)       Date:  2019-01-22       Impact factor: 4.096

9.  Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.

Authors:  Caroline König; Martha I Cárdenas; Jesús Giraldo; René Alquézar; Alfredo Vellido
Journal:  BMC Bioinformatics       Date:  2015-09-29       Impact factor: 3.169

10.  Systematic discovery of complex insertions and deletions in human cancers.

Authors:  Kai Ye; Jiayin Wang; Reyka Jayasinghe; Eric-Wubbo Lameijer; Joshua F McMichael; Jie Ning; Michael D McLellan; Mingchao Xie; Song Cao; Venkata Yellapantula; Kuan-lin Huang; Adam Scott; Steven Foltz; Beifang Niu; Kimberly J Johnson; Matthijs Moed; P Eline Slagboom; Feng Chen; Michael C Wendl; Li Ding
Journal:  Nat Med       Date:  2015-12-14       Impact factor: 53.440

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.