Tyler C Shimko1, Polly M Fordyce1,2,3,4, Yaron Orenstein5. 1. Department of Genetics. 2. Department of Bioengineering. 3. Stanford ChEM-H, Stanford University, Stanford, CA 94305, USA. 4. Chan Zuckerberg Biohub, San Francisco, CA 94158, USA. 5. School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.
Abstract
MOTIVATION: High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. RESULTS: We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states. AVAILABILITY AND IMPLEMENTATION: github.com/OrensteinLab/DeCoDe. CONTACT: yaronore@bgu.ac.il. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. RESULTS: We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states. AVAILABILITY AND IMPLEMENTATION: github.com/OrensteinLab/DeCoDe. CONTACT: yaronore@bgu.ac.il. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Brian Kuhlman; Gautam Dantas; Gregory C Ireton; Gabriele Varani; Barry L Stoddard; David Baker Journal: Science Date: 2003-11-21 Impact factor: 47.728
Authors: Karen S Sarkisyan; Dmitry A Bolotin; Margarita V Meer; Dinara R Usmanova; Alexander S Mishin; George V Sharonov; Dmitry N Ivankov; Nina G Bozhanova; Mikhail S Baranov; Onuralp Soylemez; Natalya S Bogatyreva; Peter K Vlasov; Evgeny S Egorov; Maria D Logacheva; Alexey S Kondrashov; Dmitry M Chudakov; Ekaterina V Putintseva; Ilgar Z Mamedov; Dan S Tawfik; Konstantin A Lukyanov; Fyodor A Kondrashov Journal: Nature Date: 2016-05-11 Impact factor: 49.962