Literature DB >> 28881989

Direct AUC optimization of regulatory motifs.

Lin Zhu1, Hong-Bo Zhang1, De-Shuang Huang1.   

Abstract

MOTIVATION: The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences.
RESULTS: We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities.
AVAILABILITY AND IMPLEMENTATION: CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8 . CONTACT: dshuang@tongji.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28881989      PMCID: PMC5870558          DOI: 10.1093/bioinformatics/btx255

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  GAPWM: a genetic algorithm method for optimizing a position weight matrix.

Authors:  Leping Li; Yu Liang; Robert L Bass
Journal:  Bioinformatics       Date:  2007-03-06       Impact factor: 6.937

2.  Discriminative motif analysis of high-throughput dataset.

Authors:  Zizhen Yao; Kyle L Macquarrie; Abraham P Fong; Stephen J Tapscott; Walter L Ruzzo; Robert C Gentleman
Journal:  Bioinformatics       Date:  2013-10-25       Impact factor: 6.937

3.  Improving MEME via a two-tiered significance analysis.

Authors:  Emi Tanaka; Timothy L Bailey; Uri Keich
Journal:  Bioinformatics       Date:  2014-03-24       Impact factor: 6.937

4.  Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data.

Authors:  Robert C McLeay; Timothy L Bailey
Journal:  BMC Bioinformatics       Date:  2010-04-01       Impact factor: 3.169

5.  Evaluation of methods for modeling transcription factor sequence specificity.

Authors:  Matthew T Weirauch; Atina Cote; Raquel Norel; Matti Annala; Yue Zhao; Todd R Riley; Julio Saez-Rodriguez; Thomas Cokelaer; Anastasia Vedenko; Shaheynoor Talukder; Harmen J Bussemaker; Quaid D Morris; Martha L Bulyk; Gustavo Stolovitzky; Timothy R Hughes
Journal:  Nat Biotechnol       Date:  2013-01-27       Impact factor: 54.908

6.  Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.

Authors:  Zhen Gao; Jianhua Ruan
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

7.  Inferring direct DNA binding from ChIP-seq.

Authors:  Timothy L Bailey; Philip Machanick
Journal:  Nucleic Acids Res       Date:  2012-05-18       Impact factor: 16.971

8.  SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps.

Authors:  Manu Setty; Christina S Leslie
Journal:  PLoS Comput Biol       Date:  2015-05-27       Impact factor: 4.475

9.  A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data.

Authors:  Yaron Orenstein; Ron Shamir
Journal:  Nucleic Acids Res       Date:  2014-02-05       Impact factor: 16.971

10.  Discovery of regulatory elements is improved by a discriminatory approach.

Authors:  Eivind Valen; Albin Sandelin; Ole Winther; Anders Krogh
Journal:  PLoS Comput Biol       Date:  2009-11-13       Impact factor: 4.475

View more
  5 in total

1.  Recurrent Neural Network for Predicting Transcription Factor Binding Sites.

Authors:  Zhen Shen; Wenzheng Bao; De-Shuang Huang
Journal:  Sci Rep       Date:  2018-10-15       Impact factor: 4.379

2.  How to balance the bioinformatics data: pseudo-negative sampling.

Authors:  Yongqing Zhang; Shaojie Qiao; Rongzhao Lu; Nan Han; Dingxiang Liu; Jiliu Zhou
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

3.  Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter.

Authors:  Weizhong Lu; Ye Tang; Hongjie Wu; Hongmei Huang; Qiming Fu; Jing Qiu; Haiou Li
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

4.  Ranking near-native candidate protein structures via random forest classification.

Authors:  Hongjie Wu; Hongmei Huang; Weizhong Lu; Qiming Fu; Yijie Ding; Jing Qiu; Haiou Li
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

5.  Pushing the accuracy limit of shape complementarity for protein-protein docking.

Authors:  Yumeng Yan; Sheng-You Huang
Journal:  BMC Bioinformatics       Date:  2019-12-24       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.