Literature DB >> 28884168

AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.

Sheng Wang1,2, Siqi Sun1, Jinbo Xu1.   

Abstract

Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.

Entities:  

Year:  2016        PMID: 28884168      PMCID: PMC5584645          DOI: 10.1007/978-3-319-46227-1_1

Source DB:  PubMed          Journal:  Mach Learn Knowl Discov Databases


  29 in total

1.  Exploratory undersampling for class-imbalance learning.

Authors:  Xu-Ying Liu; Jianxin Wu; Zhi-Hua Zhou
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  2008-12-16

2.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

Review 3.  Predicting intrinsic disorder in proteins: an overview.

Authors:  Bo He; Kejun Wang; Yunlong Liu; Bin Xue; Vladimir N Uversky; A Keith Dunker
Journal:  Cell Res       Date:  2009-08       Impact factor: 25.617

4.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity.

Authors:  David T Jones; Domenico Cozzetto
Journal:  Bioinformatics       Date:  2014-11-12       Impact factor: 6.937

5.  JPred4: a protein secondary structure prediction server.

Authors:  Alexey Drozdetskiy; Christian Cole; James Procter; Geoffrey J Barton
Journal:  Nucleic Acids Res       Date:  2015-04-16       Impact factor: 16.971

6.  SCOP2 prototype: a new approach to protein structure mining.

Authors:  Antonina Andreeva; Dave Howorth; Cyrus Chothia; Eugene Kulesha; Alexey G Murzin
Journal:  Nucleic Acids Res       Date:  2013-11-29       Impact factor: 16.971

7.  The Protein Model Portal--a comprehensive resource for protein structure and model information.

Authors:  Juergen Haas; Steven Roth; Konstantin Arnold; Florian Kiefer; Tobias Schmidt; Lorenza Bordoli; Torsten Schwede
Journal:  Database (Oxford)       Date:  2013-04-26       Impact factor: 3.451

8.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

Authors:  Sheng Wang; Jian Peng; Jianzhu Ma; Jinbo Xu
Journal:  Sci Rep       Date:  2016-01-11       Impact factor: 4.379

9.  AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model.

Authors:  Jianzhu Ma; Sheng Wang
Journal:  Biomed Res Int       Date:  2015-08-03       Impact factor: 3.411

10.  RaptorX-Property: a web server for protein structure property prediction.

Authors:  Sheng Wang; Wei Li; Shiwang Liu; Jinbo Xu
Journal:  Nucleic Acids Res       Date:  2016-04-25       Impact factor: 16.971

View more
  10 in total

1.  A Membrane Burial Potential with H-Bonds and Applications to Curved Membranes and Fast Simulations.

Authors:  Zongan Wang; John M Jumper; Sheng Wang; Karl F Freed; Tobin R Sosnick
Journal:  Biophys J       Date:  2018-10-23       Impact factor: 4.033

2.  Structural insights into phosphopantetheinyl hydrolase PptH from Mycobacterium tuberculosis.

Authors:  John Mosior; Ronnie Bourland; Shivatheja Soma; Carl Nathan; James Sacchettini
Journal:  Protein Sci       Date:  2020-01-20       Impact factor: 6.725

3.  Machine-Learned Molecular Surface and Its Application to Implicit Solvent Simulations.

Authors:  Haixin Wei; Zekai Zhao; Ray Luo
Journal:  J Chem Theory Comput       Date:  2021-09-13       Impact factor: 6.578

4.  DeepRefiner: high-accuracy protein structure refinement by deep network calibration.

Authors:  Md Hossain Shuvo; Muhammad Gulfam; Debswapna Bhattacharya
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

5.  A deep auto-encoder model for gene expression prediction.

Authors:  Rui Xie; Jia Wen; Andrew Quitadamo; Jianlin Cheng; Xinghua Shi
Journal:  BMC Genomics       Date:  2017-11-17       Impact factor: 3.969

6.  The Role of APOSTART in Switching between Sexuality and Apomixis in Poa pratensis.

Authors:  Gianpiero Marconi; Domenico Aiello; Bryan Kindiger; Loriano Storchi; Alessandro Marrone; Lara Reale; Niccolò Terzaroli; Emidio Albertini
Journal:  Genes (Basel)       Date:  2020-08-14       Impact factor: 4.096

Review 7.  Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.

Authors:  Jun-Tao Guo; Fareeha Malik
Journal:  Biomolecules       Date:  2022-08-26

Review 8.  Opportunities and obstacles for deep learning in biology and medicine.

Authors:  Travers Ching; Daniel S Himmelstein; Brett K Beaulieu-Jones; Alexandr A Kalinin; Brian T Do; Gregory P Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M Hoffman; Wei Xie; Gail L Rosen; Benjamin J Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M Cofer; Christopher A Lavender; Srinivas C Turaga; Amr M Alexandari; Zhiyong Lu; David J Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura K Wiley; Marwin H S Segler; Simina M Boca; S Joshua Swamidass; Austin Huang; Anthony Gitter; Casey S Greene
Journal:  J R Soc Interface       Date:  2018-04       Impact factor: 4.293

9.  DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields.

Authors:  Mingfu Shao; Jianzhu Ma; Sheng Wang
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

10.  LRRpredictor-A New LRR Motif Detection Method for Irregular Motifs of Plant NLR Proteins Using an Ensemble of Classifiers.

Authors:  Eliza C Martin; Octavina C A Sukarta; Laurentiu Spiridon; Laurentiu G Grigore; Vlad Constantinescu; Robi Tacutu; Aska Goverse; Andrei-Jose Petrescu
Journal:  Genes (Basel)       Date:  2020-03-08       Impact factor: 4.096

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.