Literature DB >> 14992523

Exploring bias in the Protein Data Bank using contrast classifiers.

K Peng1, Z Obradovic, S Vucetic.   

Abstract

In this study we analyzed the bias existing in the Protein Data Bank (PDB) using the novel contrast classifier approach. We trained an ensemble of neural network classifiers, called a contrast classifier, to learn the distributional differences between non-redundant sequence subsets of PDB and SWISS-PROT. Assuming that SWISS-PROT is a representative of the sequence diversity in nature while the PDB is a biased sample, output of the contrast classifier can be used to measure whether the properties of a given sequence or its region are underrepresented in PDB. We applied the contrast classifier to SWISS-PROT sequences to analyze the bias in PDB towards different functional protein properties. The results showed that transmembrane, signal, disordered, and low complexity regions are significantly underrepresented in PDB, while disulfide bonds, metal binding sites, and sites involved in enzyme activity are overrepresented. Additionally, hydroxylation and phosphorylation posttranslational modification sites were found to be underrepresented while acetylation sites were significantly overrepresented. These results suggest the potential usefulness of contrast classifiers in the selection of target proteins for structural characterization experiments.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 14992523     DOI: 10.1142/9789812704856_0041

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  14 in total

1.  Intrinsic disorder in transcription factors.

Authors:  Jiangang Liu; Narayanan B Perumal; Christopher J Oldfield; Eric W Su; Vladimir N Uversky; A Keith Dunker
Journal:  Biochemistry       Date:  2006-06-06       Impact factor: 3.162

2.  Phylogenetic assessment of alignments reveals neglected tree signal in gaps.

Authors:  Christophe Dessimoz; Manuel Gil
Journal:  Genome Biol       Date:  2010-04-06       Impact factor: 13.583

Review 3.  Protein function in precision medicine: deep understanding with machine learning.

Authors:  Burkhard Rost; Predrag Radivojac; Yana Bromberg
Journal:  FEBS Lett       Date:  2016-08-06       Impact factor: 4.124

4.  QSCOP-BLAST--fast retrieval of quantified structural information for protein sequences of unknown structure.

Authors:  Stefan J Suhrer; Markus Gruber; Manfred J Sippl
Journal:  Nucleic Acids Res       Date:  2007-05-03       Impact factor: 16.971

5.  Functional coverage of the human genome by existing structures, structural genomics targets, and homology models.

Authors:  Lei Xie; Philip E Bourne
Journal:  PLoS Comput Biol       Date:  2005-08-19       Impact factor: 4.475

6.  Effects of N-glycosylation on protein conformation and dynamics: Protein Data Bank analysis and molecular dynamics simulation study.

Authors:  Hui Sun Lee; Yifei Qi; Wonpil Im
Journal:  Sci Rep       Date:  2015-03-09       Impact factor: 4.379

7.  Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be.

Authors:  Christian Schaefer; Avner Schlessinger; Burkhard Rost
Journal:  Bioinformatics       Date:  2010-01-16       Impact factor: 6.937

8.  Structure-templated predictions of novel protein interactions from sequence information.

Authors:  Doron Betel; Kevin E Breitkreuz; Ruth Isserlin; Danielle Dewar-Darch; Mike Tyers; Christopher W V Hogue
Journal:  PLoS Comput Biol       Date:  2007-09       Impact factor: 4.475

9.  Reuse of structural domain-domain interactions in protein networks.

Authors:  Benjamin Schuster-Böckler; Alex Bateman
Journal:  BMC Bioinformatics       Date:  2007-07-18       Impact factor: 3.169

10.  Interrogating domain-domain interactions with parsimony based approaches.

Authors:  Katia S Guimarães; Teresa M Przytycka
Journal:  BMC Bioinformatics       Date:  2008-03-26       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.