Literature DB >> 16001417

Comparison of sequence and structure-based datasets for nonredundant structural data mining.

Carmen K Chu1, Lina L Feng, Merridee A Wouters.   

Abstract

Structural data mining studies attempt to deduce general principles of protein structure from solved structures deposited in the protein data bank (PDB). The entire database is unsuitable for such studies because it is not representative of the ensemble of protein folds. Given that novel folds continue to be unearthed, some folds are currently unrepresented in the PDB while other folds are overrepresented. Overrepresentation can easily be avoided by filtering the dataset. PDB_SELECT is a well-used representative subset of the PDB that has been deduced by sequence comparison. Specifically, structures with sequences that exhibit a pairwise sequence identity above a threshold value are weeded from the dataset. Although length criteria for pairwise alignments have a structural basis, this automated method of pruning is essentially sequence-based and runs into problems in the twilight zone, possibly resulting in some folds being overrepresented. The value-added structure databases SCOP and CATH are also a potential source of a nonredundant dataset. Here we compare the sequence-derived dataset PDB_SELECT with the structural databases SCOP (Structural Classification Of Proteins) and CATH (Class-Architecture-Topology-Homology). We show that some folds remain overrepresented in the PDB_SELECT dataset while other folds are not represented at all. However, SCOP and CATH also have their own problems such as the labor-intensiveness of the update process and the problem of determining whether all folds are equally or sufficiently distant. We discuss areas where further work is required. Copyright 2005 Wiley-Liss, Inc.

Mesh:

Substances:

Year:  2005        PMID: 16001417     DOI: 10.1002/prot.20505

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  4 in total

1.  Protein purification and crystallization artifacts: The tale usually not told.

Authors:  Ewa Niedzialkowska; Olga Gasiorowska; Katarzyna B Handing; Karolina A Majorek; Przemyslaw J Porebski; Ivan G Shabalin; Ewelina Zasadzinska; Marcin Cymborowski; Wladek Minor
Journal:  Protein Sci       Date:  2016-01-26       Impact factor: 6.725

2.  Molecular alignment within beta-sheets in Abeta(14-23) fibrils: solid-state NMR experiments and theoretical predictions.

Authors:  Zimei Bu; Yuan Shi; David J E Callaway; Robert Tycko
Journal:  Biophys J       Date:  2006-10-20       Impact factor: 4.033

3.  Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes.

Authors:  Syed Abbas Bukhari; Gustavo Caetano-Anollés
Journal:  PLoS Comput Biol       Date:  2013-03-28       Impact factor: 4.475

4.  Identifying foldable regions in protein sequence from the hydrophobic signal.

Authors:  Chi N I Pang; Kuang Lin; Merridee A Wouters; Jaap Heringa; Richard A George
Journal:  Nucleic Acids Res       Date:  2007-12-01       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.