Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Literature DB >> 34779043

The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Christopher J Williams¹, David C Richardson¹, Jane S Richardson¹.

Abstract

We have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.

Entities: Chemical

Keywords: Zenodo; protein library; reference data; structural bioinformatics; structure validation

Mesh：

Substances：
Proteins

Year: 2021 PMID： 34779043 PMCID： PMC8740842 DOI： 10.1002/pro.4239

Source DB: PubMed Journal: Protein Sci ISSN： 0961-8368 Impact factor: 6.725

26 in total

1. PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB).

Authors: T Noguchi; H Matsuda; Y Akiyama
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. The rate of cis-trans conformation errors is increasing in low-resolution crystal structures.

Authors: Tristan Ian Croll
Journal: Acta Crystallogr D Biol Crystallogr Date: 2015-02-26

3. A bacterial genetic selection system for ubiquitylation cascade discovery.

Authors: Olga Levin-Kravets; Neta Tanner; Noa Shohat; Ilan Attali; Tal Keren-Kaplan; Anna Shusterman; Shay Artzi; Alexander Varvak; Yael Reshef; Xiaojing Shi; Ori Zucker; Tamir Baram; Corine Katina; Inbar Pilzer; Shay Ben-Aroya; Gali Prag
Journal: Nat Methods Date: 2016-10-03 Impact factor: 28.547

4. Molprobity's ultimate rotamer-library distributions for model validation.

Authors: Bradley J Hintze; Steven M Lewis; Jane S Richardson; David C Richardson
Journal: Proteins Date: 2016-06-23

5. Scientific benchmarks for guiding macromolecular energy function improvement.

Authors: Andrew Leaver-Fay; Matthew J O'Meara; Mike Tyka; Ron Jacak; Yifan Song; Elizabeth H Kellogg; James Thompson; Ian W Davis; Roland A Pache; Sergey Lyskov; Jeffrey J Gray; Tanja Kortemme; Jane S Richardson; James J Havranek; Jack Snoeyink; David Baker; Brian Kuhlman
Journal: Methods Enzymol Date: 2013 Impact factor: 1.600

6. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation.

Authors: J M Word; S C Lovell; J S Richardson; D C Richardson
Journal: J Mol Biol Date: 1999-01-29 Impact factor: 5.469

7. Improving SARS-CoV-2 structures: Peer review by early coordinate release.

Authors: Tristan I Croll; Christopher J Williams; Vincent B Chen; David C Richardson; Jane S Richardson
Journal: Biophys J Date: 2021-01-16 Impact factor: 4.033

8. MolProbity: all-atom structure validation for macromolecular crystallography.

Authors: Vincent B Chen; W Bryan Arendall; Jeffrey J Headd; Daniel A Keedy; Robert M Immormino; Gary J Kapral; Laura W Murray; Jane S Richardson; David C Richardson
Journal: Acta Crystallogr D Biol Crystallogr Date: 2009-12-21

9. A new structural paradigm in copper resistance in Streptococcus pneumoniae.

Authors: Yue Fu; Ho-Ching Tiffany Tsui; Kevin E Bruce; Lok-To Sham; Khadine A Higgins; John P Lisher; Krystyna M Kazmierczak; Michael J Maroney; Charles E Dann; Malcolm E Winkler; David P Giedroc
Journal: Nat Chem Biol Date: 2013-01-27 Impact factor: 15.040

10. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix.

Authors: Dorothee Liebschner; Pavel V Afonine; Matthew L Baker; Gábor Bunkóczi; Vincent B Chen; Tristan I Croll; Bradley Hintze; Li Wei Hung; Swati Jain; Airlie J McCoy; Nigel W Moriarty; Robert D Oeffner; Billy K Poon; Michael G Prisant; Randy J Read; Jane S Richardson; David C Richardson; Massimo D Sammito; Oleg V Sobolev; Duncan H Stockwell; Thomas C Terwilliger; Alexandre G Urzhumtsev; Lizbeth L Videau; Christopher J Williams; Paul D Adams
Journal: Acta Crystallogr D Struct Biol Date: 2019-10-02 Impact factor: 7.652

1 in total

1. The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Authors: Christopher J Williams; David C Richardson; Jane S Richardson
Journal: Protein Sci Date: 2021-11-29 Impact factor: 6.725

1 in total