Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database.

Literature DB >> 17204463

MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database.

Alessandro Pandini¹, Laura Bonati, Franca Fraternali, Jens Kleinjung.

Abstract

MOTIVATION: The size of current protein databases is a challenge for many Bioinformatics applications, both in terms of processing speed and information redundancy. It may be therefore desirable to efficiently reduce the database of interest to a maximally representative subset.
RESULTS: The MinSet method employs a combination of a Suffix Tree and a Genetic Algorithm for the generation, selection and assessment of database subsets. The approach is generally applicable to any type of string-encoded data, allowing for a drastic reduction of the database size whilst retaining most of the information contained in the original set. We demonstrate the performance of the method on a database of protein domain structures encoded as strings. We used the SCOP40 domain database by translating protein structures into character strings by means of a structural alphabet and by extracting optimized subsets according to an entropy score that is based on a constant-length fragment dictionary. Therefore, optimized subsets are maximally representative for the distribution and range of local structures. Subsets containing only 10% of the SCOP structure classes show a coverage of >90% for fragments of length 1-4. AVAILABILITY: http://mathbio.nimr.mrc.ac.uk/~jkleinj/MinSet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Mesh：

Substances：

Year: 2007 PMID： 17204463 DOI： 10.1093/bioinformatics/btl637

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

9 in total

1. Structural alphabets derived from attractors in conformational space.

Authors: Alessandro Pandini; Arianna Fornili; Jens Kleinjung
Journal: BMC Bioinformatics Date: 2010-02-20 Impact factor: 3.169

2. GSATools: analysis of allosteric communication and functional local motions using a structural alphabet.

Authors: Alessandro Pandini; Arianna Fornili; Franca Fraternali; Jens Kleinjung
Journal: Bioinformatics Date: 2013-06-05 Impact factor: 6.937

3. Specialized Dynamical Properties of Promiscuous Residues Revealed by Simulated Conformational Ensembles.

Authors: Arianna Fornili; Alessandro Pandini; Hui-Chun Lu; Franca Fraternali
Journal: J Chem Theory Comput Date: 2013-09-27 Impact factor: 6.006

4. Minimizing proteome redundancy in the UniProt Knowledgebase.

Authors: Borisas Bursteinas; Ramona Britto; Benoit Bely; Andrea Auchincloss; Catherine Rivoire; Nicole Redaschi; Claire O'Donovan; Maria Jesus Martin
Journal: Database (Oxford) Date: 2016-12-26 Impact factor: 3.451

5. Exploring the potential of a structural alphabet-based tool for mining multiple target conformations and target flexibility insight.

Authors: Leslie Regad; Jean-Baptiste Chéron; Dhoha Triki; Caroline Senac; Delphine Flatters; Anne-Claude Camproux
Journal: PLoS One Date: 2017-08-17 Impact factor: 3.240

MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database.

1. Structural alphabets derived from attractors in conformational space.

2. GSATools: analysis of allosteric communication and functional local motions using a structural alphabet.

3. Specialized Dynamical Properties of Promiscuous Residues Revealed by Simulated Conformational Ensembles.

4. Minimizing proteome redundancy in the UniProt Knowledgebase.

5. Exploring the potential of a structural alphabet-based tool for mining multiple target conformations and target flexibility insight.

6. Phosphorylation-mediated unfolding of a KH domain regulates KSRP localization via 14-3-3 binding.

7. Implicit Solvation Parameters Derived from Explicit Water Forces in Large-Scale Molecular Dynamics Simulations.

8. Detecting protein candidate fragments using a structural alphabet profile comparison approach.

9. Using Local States To Drive the Sampling of Global Conformations in Proteins.