Literature DB >> 24564874

Ranking and compacting binding segments of protein families using aligned pattern clusters.

En-Shiun Lee, Andrew Kc Wong.   

Abstract

BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family.
RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0.
CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery.

Entities:  

Year:  2013        PMID: 24564874      PMCID: PMC3907781          DOI: 10.1186/1477-5956-11-S1-S8

Source DB:  PubMed          Journal:  Proteome Sci        ISSN: 1477-5956            Impact factor:   2.480


  27 in total

1.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

2.  On position-specific scoring matrix for protein function prediction.

Authors:  Jong Cheol Jeong; Xiaotong Lin; Xue-Wen Chen
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 Mar-Apr       Impact factor: 3.710

3.  Cytochrome c release from mitochondria: all or nothing.

Authors:  J C Martinou; S Desagher; B Antonsson
Journal:  Nat Cell Biol       Date:  2000-03       Impact factor: 28.824

4.  Statistical analysis of residue variability in cytochrome c.

Authors:  A K Wong; T S Liu; C C Wang
Journal:  J Mol Biol       Date:  1976-04-05       Impact factor: 5.469

5.  Characterization of polyubiquitin chain structure by middle-down mass spectrometry.

Authors:  Ping Xu; Junmin Peng
Journal:  Anal Chem       Date:  2008-03-20       Impact factor: 6.986

6.  Pfam: a comprehensive database of protein domain families based on seed alignments.

Authors:  E L Sonnhammer; S R Eddy; R Durbin
Journal:  Proteins       Date:  1997-07

7.  [Nucleotide makeup of the DNA of thermophilic bacteria of the genus Thermus].

Authors:  N I Aleksandrushkina; L A Egorova
Journal:  Mikrobiologiia       Date:  1978 Mar-Apr

Review 8.  Principles of early drug discovery.

Authors:  J P Hughes; S Rees; S B Kalindjian; K L Philpott
Journal:  Br J Pharmacol       Date:  2011-03       Impact factor: 8.739

9.  The Pfam protein families database.

Authors:  Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2009-11-17       Impact factor: 16.971

Review 10.  Neurodegenerative diseases: a decade of discoveries paves the way for therapeutic breakthroughs.

Authors:  Mark S Forman; John Q Trojanowski; Virginia M-Y Lee
Journal:  Nat Med       Date:  2004-10       Impact factor: 53.440

View more
  3 in total

1.  Discovering co-occurring patterns and their biological significance in protein families.

Authors:  En-Shiun Lee; Sanderz Fung; Ho-Yin Sze-To; Andrew K C Wong
Journal:  BMC Bioinformatics       Date:  2014-11-06       Impact factor: 3.169

2.  Revealing Subtle Functional Subgroups in Class A Scavenger Receptors by Pattern Discovery and Disentanglement of Aligned Pattern Clusters.

Authors:  Pei-Yuan Zhou; En-Shiun Annie Lee; Antonio Sze-To; Andrew K C Wong
Journal:  Proteomes       Date:  2018-02-08

3.  Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics.

Authors:  Pei-Yuan Zhou; Antonio Sze-To; Andrew K C Wong
Journal:  BMC Med Genomics       Date:  2018-11-20       Impact factor: 3.063

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.