Literature DB >> 34779043

The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Christopher J Williams1, David C Richardson1, Jane S Richardson1.   

Abstract

We have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.
© 2021 The Protein Society.

Entities:  

Keywords:  Zenodo; protein library; reference data; structural bioinformatics; structure validation

Mesh:

Substances:

Year:  2021        PMID: 34779043      PMCID: PMC8740842          DOI: 10.1002/pro.4239

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  26 in total

1.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB).

Authors:  T Noguchi; H Matsuda; Y Akiyama
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  The rate of cis-trans conformation errors is increasing in low-resolution crystal structures.

Authors:  Tristan Ian Croll
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2015-02-26

3.  A bacterial genetic selection system for ubiquitylation cascade discovery.

Authors:  Olga Levin-Kravets; Neta Tanner; Noa Shohat; Ilan Attali; Tal Keren-Kaplan; Anna Shusterman; Shay Artzi; Alexander Varvak; Yael Reshef; Xiaojing Shi; Ori Zucker; Tamir Baram; Corine Katina; Inbar Pilzer; Shay Ben-Aroya; Gali Prag
Journal:  Nat Methods       Date:  2016-10-03       Impact factor: 28.547

4.  Molprobity's ultimate rotamer-library distributions for model validation.

Authors:  Bradley J Hintze; Steven M Lewis; Jane S Richardson; David C Richardson
Journal:  Proteins       Date:  2016-06-23

5.  Scientific benchmarks for guiding macromolecular energy function improvement.

Authors:  Andrew Leaver-Fay; Matthew J O'Meara; Mike Tyka; Ron Jacak; Yifan Song; Elizabeth H Kellogg; James Thompson; Ian W Davis; Roland A Pache; Sergey Lyskov; Jeffrey J Gray; Tanja Kortemme; Jane S Richardson; James J Havranek; Jack Snoeyink; David Baker; Brian Kuhlman
Journal:  Methods Enzymol       Date:  2013       Impact factor: 1.600

6.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation.

Authors:  J M Word; S C Lovell; J S Richardson; D C Richardson
Journal:  J Mol Biol       Date:  1999-01-29       Impact factor: 5.469

7.  Improving SARS-CoV-2 structures: Peer review by early coordinate release.

Authors:  Tristan I Croll; Christopher J Williams; Vincent B Chen; David C Richardson; Jane S Richardson
Journal:  Biophys J       Date:  2021-01-16       Impact factor: 4.033

8.  MolProbity: all-atom structure validation for macromolecular crystallography.

Authors:  Vincent B Chen; W Bryan Arendall; Jeffrey J Headd; Daniel A Keedy; Robert M Immormino; Gary J Kapral; Laura W Murray; Jane S Richardson; David C Richardson
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2009-12-21

9.  A new structural paradigm in copper resistance in Streptococcus pneumoniae.

Authors:  Yue Fu; Ho-Ching Tiffany Tsui; Kevin E Bruce; Lok-To Sham; Khadine A Higgins; John P Lisher; Krystyna M Kazmierczak; Michael J Maroney; Charles E Dann; Malcolm E Winkler; David P Giedroc
Journal:  Nat Chem Biol       Date:  2013-01-27       Impact factor: 15.040

10.  Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix.

Authors:  Dorothee Liebschner; Pavel V Afonine; Matthew L Baker; Gábor Bunkóczi; Vincent B Chen; Tristan I Croll; Bradley Hintze; Li Wei Hung; Swati Jain; Airlie J McCoy; Nigel W Moriarty; Robert D Oeffner; Billy K Poon; Michael G Prisant; Randy J Read; Jane S Richardson; David C Richardson; Massimo D Sammito; Oleg V Sobolev; Duncan H Stockwell; Thomas C Terwilliger; Alexandre G Urzhumtsev; Lizbeth L Videau; Christopher J Williams; Paul D Adams
Journal:  Acta Crystallogr D Struct Biol       Date:  2019-10-02       Impact factor: 7.652

View more
  1 in total

1.  The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Authors:  Christopher J Williams; David C Richardson; Jane S Richardson
Journal:  Protein Sci       Date:  2021-11-29       Impact factor: 6.725

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.