Literature DB >> 17204463

MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database.

Alessandro Pandini1, Laura Bonati, Franca Fraternali, Jens Kleinjung.   

Abstract

MOTIVATION: The size of current protein databases is a challenge for many Bioinformatics applications, both in terms of processing speed and information redundancy. It may be therefore desirable to efficiently reduce the database of interest to a maximally representative subset.
RESULTS: The MinSet method employs a combination of a Suffix Tree and a Genetic Algorithm for the generation, selection and assessment of database subsets. The approach is generally applicable to any type of string-encoded data, allowing for a drastic reduction of the database size whilst retaining most of the information contained in the original set. We demonstrate the performance of the method on a database of protein domain structures encoded as strings. We used the SCOP40 domain database by translating protein structures into character strings by means of a structural alphabet and by extracting optimized subsets according to an entropy score that is based on a constant-length fragment dictionary. Therefore, optimized subsets are maximally representative for the distribution and range of local structures. Subsets containing only 10% of the SCOP structure classes show a coverage of >90% for fragments of length 1-4. AVAILABILITY: http://mathbio.nimr.mrc.ac.uk/~jkleinj/MinSet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17204463     DOI: 10.1093/bioinformatics/btl637

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  Structural alphabets derived from attractors in conformational space.

Authors:  Alessandro Pandini; Arianna Fornili; Jens Kleinjung
Journal:  BMC Bioinformatics       Date:  2010-02-20       Impact factor: 3.169

2.  GSATools: analysis of allosteric communication and functional local motions using a structural alphabet.

Authors:  Alessandro Pandini; Arianna Fornili; Franca Fraternali; Jens Kleinjung
Journal:  Bioinformatics       Date:  2013-06-05       Impact factor: 6.937

3.  Specialized Dynamical Properties of Promiscuous Residues Revealed by Simulated Conformational Ensembles.

Authors:  Arianna Fornili; Alessandro Pandini; Hui-Chun Lu; Franca Fraternali
Journal:  J Chem Theory Comput       Date:  2013-09-27       Impact factor: 6.006

4.  Minimizing proteome redundancy in the UniProt Knowledgebase.

Authors:  Borisas Bursteinas; Ramona Britto; Benoit Bely; Andrea Auchincloss; Catherine Rivoire; Nicole Redaschi; Claire O'Donovan; Maria Jesus Martin
Journal:  Database (Oxford)       Date:  2016-12-26       Impact factor: 3.451

5.  Exploring the potential of a structural alphabet-based tool for mining multiple target conformations and target flexibility insight.

Authors:  Leslie Regad; Jean-Baptiste Chéron; Dhoha Triki; Caroline Senac; Delphine Flatters; Anne-Claude Camproux
Journal:  PLoS One       Date:  2017-08-17       Impact factor: 3.240

6.  Phosphorylation-mediated unfolding of a KH domain regulates KSRP localization via 14-3-3 binding.

Authors:  Irene Díaz-Moreno; David Hollingworth; Thomas A Frenkiel; Geoff Kelly; Stephen Martin; Steven Howell; MaríaFlor García-Mayoral; Roberto Gherzi; Paola Briata; Andres Ramos
Journal:  Nat Struct Mol Biol       Date:  2009-02-08       Impact factor: 15.369

7.  Implicit Solvation Parameters Derived from Explicit Water Forces in Large-Scale Molecular Dynamics Simulations.

Authors:  Jens Kleinjung; Walter R P Scott; Jane R Allison; Wilfred F van Gunsteren; Franca Fraternali
Journal:  J Chem Theory Comput       Date:  2012-06-12       Impact factor: 6.006

8.  Detecting protein candidate fragments using a structural alphabet profile comparison approach.

Authors:  Yimin Shen; Géraldine Picord; Frédéric Guyon; Pierre Tuffery
Journal:  PLoS One       Date:  2013-11-26       Impact factor: 3.240

9.  Using Local States To Drive the Sampling of Global Conformations in Proteins.

Authors:  Alessandro Pandini; Arianna Fornili
Journal:  J Chem Theory Comput       Date:  2016-02-12       Impact factor: 6.006

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.