Literature DB >> 24598732

Improving the chances of successful protein structure determination with a random forest classifier.

Samad Jahandideh1, Lukasz Jaroszewski1, Adam Godzik1.   

Abstract

Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007), Protein Sci. 16, 2472-2482] was developed. XtalPred classifies proteins into five `crystallization classes' based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted targets.

Keywords:  XtalPred; machine-learning methods; structural genomics; target selection

Mesh:

Substances:

Year:  2014        PMID: 24598732      PMCID: PMC3949519          DOI: 10.1107/S1399004713032070

Source DB:  PubMed          Journal:  Acta Crystallogr D Biol Crystallogr        ISSN: 0907-4449


  38 in total

1.  Structural proteomics of an archaeon.

Authors:  D Christendat; A Yee; A Dharamsi; Y Kluger; A Savchenko; J R Cort; V Booth; C D Mackereth; V Saridakis; I Ekiel; G Kozlov; K L Maxwell; N Wu; L P McIntosh; K Gehring; M A Kennedy; A R Davidson; E F Pai; M Gerstein; A M Edwards; C H Arrowsmith
Journal:  Nat Struct Biol       Date:  2000-10

2.  Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis.

Authors:  Chern-Sing Goh; Ning Lan; Shawn M Douglas; Baolin Wu; Nathaniel Echols; Andrew Smith; Duncan Milburn; Gaetano T Montelione; Hongyu Zhao; Mark Gerstein
Journal:  J Mol Biol       Date:  2004-02-06       Impact factor: 5.469

3.  Toward rational protein crystallization: A Web server for the design of crystallizable protein variants.

Authors:  Lukasz Goldschmidt; David R Cooper; Zygmunt S Derewenda; David Eisenberg
Journal:  Protein Sci       Date:  2007-08       Impact factor: 6.725

4.  ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction.

Authors:  Ian M Overton; Gianandrea Padovani; Mark A Girolami; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2008-02-19       Impact factor: 6.937

Review 5.  Protein-protein crystal-packing contacts.

Authors:  O Carugo; P Argos
Journal:  Protein Sci       Date:  1997-10       Impact factor: 6.725

6.  Solvent content of protein crystals.

Authors:  B W Matthews
Journal:  J Mol Biol       Date:  1968-04-28       Impact factor: 5.469

7.  The high-throughput protein sample production platform of the Northeast Structural Genomics Consortium.

Authors:  Rong Xiao; Stephen Anderson; James Aramini; Rachel Belote; William A Buchwald; Colleen Ciccosanti; Ken Conover; John K Everett; Keith Hamilton; Yuanpeng Janet Huang; Haleema Janjua; Mei Jiang; Gregory J Kornhaber; Dong Yup Lee; Jessica Y Locke; Li-Chung Ma; Melissa Maglaqui; Lei Mao; Saheli Mitra; Dayaban Patel; Paolo Rossi; Seema Sahdev; Seema Sharma; Ritu Shastry; G V T Swapna; Saichu N Tong; Dongyan Wang; Huang Wang; Li Zhao; Gaetano T Montelione; Thomas B Acton
Journal:  J Struct Biol       Date:  2010-08-03       Impact factor: 2.867

8.  High-throughput production of human proteins for crystallization: the SGC experience.

Authors:  Pavel Savitsky; James Bray; Christopher D O Cooper; Brian D Marsden; Pravin Mahajan; Nicola A Burgess-Brown; Opher Gileadi
Journal:  J Struct Biol       Date:  2010-06-10       Impact factor: 2.867

9.  The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods.

Authors:  Margaret J Gabanyi; Paul D Adams; Konstantin Arnold; Lorenza Bordoli; Lester G Carter; Judith Flippen-Andersen; Lida Gifford; Juergen Haas; Andrei Kouranov; William A McLaughlin; David I Micallef; Wladek Minor; Raship Shah; Torsten Schwede; Yi-Ping Tao; John D Westbrook; Matthew Zimmerman; Helen M Berman
Journal:  J Struct Funct Genomics       Date:  2011-04-07

10.  Bioinformatic analysis of xenobiotic reactive metabolite target proteins and their interacting partners.

Authors:  Jianwen Fang; Yakov M Koen; Robert P Hanzlik
Journal:  BMC Chem Biol       Date:  2009-06-12
View more
  18 in total

1.  Protael: protein data visualization library for the web.

Authors:  Mayya Sedova; Lukasz Jaroszewski; Adam Godzik
Journal:  Bioinformatics       Date:  2015-10-29       Impact factor: 6.937

Review 2.  Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity.

Authors:  Huilin Wang; Liubin Feng; Geoffrey I Webb; Lukasz Kurgan; Jiangning Song; Donghai Lin
Journal:  Brief Bioinform       Date:  2018-09-28       Impact factor: 11.622

3.  Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity.

Authors:  Qizhi Zhu; Lihua Wang; Ruyu Dai; Wei Zhang; Wending Tang; Yannan Bin; Zeliang Wang; Junfeng Xia
Journal:  Interdiscip Sci       Date:  2021-06-18       Impact factor: 2.233

4.  A statistical model for improved membrane protein expression using sequence-derived features.

Authors:  Shyam M Saladi; Nauman Javed; Axel Müller; William M Clemons
Journal:  J Biol Chem       Date:  2018-01-29       Impact factor: 5.157

5.  Structural Analysis Provides Mechanistic Insight into Nicotine Oxidoreductase from Pseudomonas putida.

Authors:  Margarita A Tararina; Kim D Janda; Karen N Allen
Journal:  Biochemistry       Date:  2016-11-18       Impact factor: 3.162

Review 6.  Computational crystallization.

Authors:  Irem Altan; Patrick Charbonneau; Edward H Snell
Journal:  Arch Biochem Biophys       Date:  2016-01-11       Impact factor: 4.013

Review 7.  The "Sticky Patch" Model of Crystallization and Modification of Proteins for Enhanced Crystallizability.

Authors:  Zygmunt S Derewenda; Adam Godzik
Journal:  Methods Mol Biol       Date:  2017

8.  PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

Authors:  Huilin Wang; Mingjun Wang; Hao Tan; Yuan Li; Ziding Zhang; Jiangning Song
Journal:  PLoS One       Date:  2014-08-22       Impact factor: 3.240

Review 9.  Data to knowledge: how to get meaning from your result.

Authors:  Helen M Berman; Margaret J Gabanyi; Colin R Groom; John E Johnson; Garib N Murshudov; Robert A Nicholls; Vijay Reddy; Torsten Schwede; Matthew D Zimmerman; John Westbrook; Wladek Minor
Journal:  IUCrJ       Date:  2015-01-01       Impact factor: 4.769

10.  Predicting Crystallization Propensity of Proteins from Arabidopsis Thaliana.

Authors:  Shaomin Yan; Guang Wu
Journal:  Biol Proced Online       Date:  2015-11-23       Impact factor: 3.244

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.