Literature DB >> 20647376

Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis.

Rakesh Kaundal1, Reena Saini, Patrick X Zhao.   

Abstract

A complete map of the Arabidopsis (Arabidopsis thaliana) proteome is clearly a major goal for the plant research community in terms of determining the function and regulation of each encoded protein. Developing genome-wide prediction tools such as for localizing gene products at the subcellular level will substantially advance Arabidopsis gene annotation. To this end, we performed a comprehensive study in Arabidopsis and created an integrative support vector machine-based localization predictor called AtSubP (for Arabidopsis subcellular localization predictor) that is based on the combinatorial presence of diverse protein features, such as its amino acid composition, sequence-order effects, terminal information, Position-Specific Scoring Matrix, and similarity search-based Position-Specific Iterated-Basic Local Alignment Search Tool information. When used to predict seven subcellular compartments through a 5-fold cross-validation test, our hybrid-based best classifier achieved an overall sensitivity of 91% with high-confidence precision and Matthews correlation coefficient values of 90.9% and 0.89, respectively. Benchmarking AtSubP on two independent data sets, one from Swiss-Prot and another containing green fluorescent protein- and mass spectrometry-determined proteins, showed a significant improvement in the prediction accuracy of species-specific AtSubP over some widely used "general" tools such as TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-PLoc, and our newly created All-Plant method. Cross-comparison of AtSubP on six nontrained eukaryotic organisms (rice [Oryza sativa], soybean [Glycine max], human [Homo sapiens], yeast [Saccharomyces cerevisiae], fruit fly [Drosophila melanogaster], and worm [Caenorhabditis elegans]) revealed inferior predictions. AtSubP significantly outperformed all the prediction tools being currently used for Arabidopsis proteome annotation and, therefore, may serve as a better complement for the plant research community. A supplemental Web site that hosts all the training/testing data sets and whole proteome predictions is available at http://bioinfo3.noble.org/AtSubP/.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20647376      PMCID: PMC2938157          DOI: 10.1104/pp.110.156851

Source DB:  PubMed          Journal:  Plant Physiol        ISSN: 0032-0889            Impact factor:   8.340


  61 in total

1.  Protein secondary structure prediction based on position-specific scoring matrices.

Authors:  D T Jones
Journal:  J Mol Biol       Date:  1999-09-17       Impact factor: 5.469

2.  Experimental analysis of the Arabidopsis mitochondrial proteome highlights signaling and regulatory components, provides assessment of targeting prediction programs, and indicates plant-specific mitochondrial proteins.

Authors:  Joshua L Heazlewood; Julian S Tonti-Filippini; Alexander M Gout; David A Day; James Whelan; A Harvey Millar
Journal:  Plant Cell       Date:  2003-12-11       Impact factor: 11.277

Review 3.  Global organellar proteomics.

Authors:  Steven W Taylor; Eoin Fahy; Soumitra S Ghosh
Journal:  Trends Biotechnol       Date:  2003-02       Impact factor: 19.536

4.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Authors:  Manoj Bhasin; G P S Raghava
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  Predicting subcellular localization of proteins using machine-learned classifiers.

Authors:  Z Lu; D Szafron; R Greiner; P Lu; D S Wishart; B Poulin; J Anvik; C Macdonell; R Eisner
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

6.  Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species.

Authors:  J R Lobry
Journal:  Gene       Date:  1997-12-31       Impact factor: 3.688

7.  Relation between amino acid composition and cellular location of proteins.

Authors:  J Cedano; P Aloy; J A Pérez-Pons; E Querol
Journal:  J Mol Biol       Date:  1997-02-28       Impact factor: 5.469

8.  Mimicking cellular sorting improves prediction of subcellular localization.

Authors:  Rajesh Nair; Burkhard Rost
Journal:  J Mol Biol       Date:  2005-04-22       Impact factor: 5.469

9.  RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information.

Authors:  Rakesh Kaundal; Gajendra P S Raghava
Journal:  Proteomics       Date:  2009-05       Impact factor: 3.984

10.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.

Authors:  Mamoon Rashid; Sudipto Saha; Gajendra Ps Raghava
Journal:  BMC Bioinformatics       Date:  2007-09-13       Impact factor: 3.169

View more
  34 in total

1.  Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins.

Authors:  Ying Wang; Jun Ding; Henry Daniell; Haiyan Hu; Xiaoman Li
Journal:  Plant Mol Biol       Date:  2012-06-26       Impact factor: 4.076

2.  PAPST2 Plays Critical Roles in Removing the Stress Signaling Molecule 3'-Phosphoadenosine 5'-Phosphate from the Cytosol and Its Subsequent Degradation in Plastids and Mitochondria.

Authors:  Natallia Ashykhmina; Melanie Lorenz; Henning Frerigmann; Anna Koprivova; Eduard Hofsetz; Nils Stührwohldt; Ulf-Ingo Flügge; Ilka Haferkamp; Stanislav Kopriva; Tamara Gigolashvili
Journal:  Plant Cell       Date:  2018-11-21       Impact factor: 11.277

3.  Agrobacterium may delay plant nonhomologous end-joining DNA repair via XRCC4 to favor T-DNA integration.

Authors:  Zarir E Vaghchhipawala; Balaji Vasudevan; Seonghee Lee; Mustafa R Morsy; Kirankumar S Mysore
Journal:  Plant Cell       Date:  2012-10-12       Impact factor: 11.277

4.  Proteomic analysis of the Cyanophora paradoxa muroplast provides clues on early events in plastid endosymbiosis.

Authors:  Fabio Facchinelli; Mathias Pribil; Ulrike Oster; Nina J Ebert; Debashish Bhattacharya; Dario Leister; Andreas P M Weber
Journal:  Planta       Date:  2012-12-02       Impact factor: 4.116

5.  NatB-Mediated N-Terminal Acetylation Affects Growth and Biotic Stress Responses.

Authors:  Monika Huber; Willy V Bienvenut; Eric Linster; Iwona Stephan; Laura Armbruster; Carsten Sticht; Dominik Layer; Karine Lapouge; Thierry Meinnel; Irmgard Sinning; Carmela Giglione; Ruediger Hell; Markus Wirtz
Journal:  Plant Physiol       Date:  2019-11-19       Impact factor: 8.340

6.  SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data.

Authors:  Malgorzata Ryngajllo; Liam Childs; Marc Lohse; Federico M Giorgi; Anja Lude; Joachim Selbig; Björn Usadel
Journal:  Front Plant Sci       Date:  2011-09-12       Impact factor: 5.753

7.  Genome-wide classification and expression analysis of MYB transcription factor families in rice and Arabidopsis.

Authors:  Amit Katiyar; Shuchi Smita; Sangram Keshari Lenka; Ravi Rajwanshi; Viswanathan Chinnusamy; Kailash Chander Bansal
Journal:  BMC Genomics       Date:  2012-10-10       Impact factor: 3.969

8.  TESTLoc: protein subcellular localization prediction from EST data.

Authors:  Yao-Qing Shen; Gertraud Burger
Journal:  BMC Bioinformatics       Date:  2010-11-15       Impact factor: 3.169

9.  RUS6, a DUF647-containing protein, is essential for early embryonic development in Arabidopsis thaliana.

Authors:  Nathaniel Perry; Colin D Leasure; Hongyun Tong; Elias M Duarte; Zheng-Hui He
Journal:  BMC Plant Biol       Date:  2021-05-25       Impact factor: 4.215

10.  A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard.

Authors:  Jonathan M Keith; Christian M Davey; Sarah E Boyd
Journal:  BMC Bioinformatics       Date:  2012-07-27       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.