Jelena Vucinic1,2, David Simoncini1,3, Manon Ruffini1,2, Sophie Barbe1, Thomas Schiex2. 1. LISBP, Université de Toulouse, CNRS, INRA, INSA, 31400 Toulouse, France. 2. MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan Cedex, France. 3. IRIT UMR 5505-CNRS, Université de Toulouse, 31042 Cedex 9, France.
Abstract
MOTIVATION: Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems. RESULTS: We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms. AVAILABILITY AND IMPLEMENTATION: https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Structure-based computational protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. The usual approach considers a single rigid backbone as a target, which ignores backbone flexibility. Multistate design (MSD) allows instead to consider several backbone states simultaneously, defining challenging computational problems. RESULTS: We introduce efficient reductions of positive MSD problems to Cost Function Networks with two different fitness definitions and implement them in the Pompd (Positive Multistate Protein design) software. Pompd is able to identify guaranteed optimal sequences of positive multistate full protein redesign problems and exhaustively enumerate suboptimal sequences close to the MSD optimum. Applied to nuclear magnetic resonance and back-rubbed X-ray structures, we observe that the average energy fitness provides the best sequence recovery. Our method outperforms state-of-the-art guaranteed computational design approaches by orders of magnitudes and can solve MSD problems with sizes previously unreachable with guaranteed algorithms. AVAILABILITY AND IMPLEMENTATION: https://forgemia.inra.fr/thomas.schiex/pompd as documented Open Source. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.