| Literature DB >> 26335248 |
Shane Ó Conchúir1, Kyle A Barlow2, Roland A Pache1, Noah Ollikainen2, Kale Kundert3, Matthew J O'Meara4, Colin A Smith5, Tanja Kortemme6.
Abstract
The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26335248 PMCID: PMC4559433 DOI: 10.1371/journal.pone.0130433
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Types of benchmarks and protocols currently included in the web resource.
Tests estimating energetic effects of mutation (orange, A), design tests (purple, B-E) and structure prediction tests (green, F).
Fig 2Comparison of occurrences of different amino acid residue types observed at buried positions between natural sequences and sequences designed with two different Rosetta energy functions.
Barplot showing the percent occurrence of each type of amino acid found at buried positions in natural and designed sequences across 40 diverse protein families. Buried positions are defined as positions with greater than 14 neighboring positions, where neighboring positions have C-β atoms within 8Å of the C-β atom of the residue of interest. The X-axis is sorted by the magnitude of improvement of the Talaris energy function relative to the previous Score12 energy function with respect to the similarity to the natural percent occurrences.
Fig 3Benchmark and protocols capture website.
Left: The website presents an overview of each benchmark and publishes the performance of different methods using a set of standardized metrics. Parameters important to the protocol performance are also provided. Right: Each benchmark capture is stored in a documented version-controlled archive. The most recent version can be downloaded directly from the website.