Literature DB >> 35997560

NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction.

Ren Yi1, Kyunghyun Cho1,2,3, Richard Bonneau1,2,3,4.   

Abstract

MOTIVATION: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here, we propose NetTIME, a multitask learning framework for predicting cell-type-specific TF binding sites with base-pair resolution.
RESULTS: We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method's predictive performance with two state-of-the-art methods, Catchitt and Leopard, and show that our method outperforms previous methods under both supervised and transfer learning settings.
AVAILABILITY AND IMPLEMENTATION: NetTIME is freely available at https://github.com/ryi06/NetTIME and the code is also archived at https://doi.org/10.5281/zenodo.6994897. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35997560      PMCID: PMC9563695          DOI: 10.1093/bioinformatics/btac569

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  58 in total

1.  Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase.

Authors:  B van Steensel; S Henikoff
Journal:  Nat Biotechnol       Date:  2000-04       Impact factor: 54.908

2.  Why transcription factor binding sites are ten nucleotides long.

Authors:  Alexander J Stewart; Sridhar Hannenhalli; Joshua B Plotkin
Journal:  Genetics       Date:  2012-08-10       Impact factor: 4.562

3.  Deep learning for inferring gene relationships from single-cell expression data.

Authors:  Ye Yuan; Ziv Bar-Joseph
Journal:  Proc Natl Acad Sci U S A       Date:  2019-12-10       Impact factor: 11.205

4.  Exploring single-cell data with deep multitasking neural networks.

Authors:  Matthew Amodio; David van Dijk; Krishnan Srinivasan; Guy Wolf; Smita Krishnaswamy; William S Chen; Hussein Mohsen; Kevin R Moon; Allison Campbell; Yujiao Zhao; Xiaomei Wang; Manjunatha Venkataswamy; Anita Desai; V Ravi; Priti Kumar; Ruth Montgomery
Journal:  Nat Methods       Date:  2019-10-07       Impact factor: 28.547

5.  The multiple-specificity landscape of modular peptide recognition domains.

Authors:  David Gfeller; Frank Butty; Marta Wierzbicka; Erik Verschueren; Peter Vanhee; Haiming Huang; Andreas Ernst; Nisa Dar; Igor Stagljar; Luis Serrano; Sachdev S Sidhu; Gary D Bader; Philip M Kim
Journal:  Mol Syst Biol       Date:  2011-04-26       Impact factor: 11.429

Review 6.  ChIP-seq: advantages and challenges of a maturing technology.

Authors:  Peter J Park
Journal:  Nat Rev Genet       Date:  2009-09-08       Impact factor: 53.242

7.  Quantifying similarity between motifs.

Authors:  Shobhit Gupta; John A Stamatoyannopoulos; Timothy L Bailey; William Stafford Noble
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

8.  BindSpace decodes transcription factor binding signals by large-scale sequence embedding.

Authors:  Han Yuan; Meghana Kshirsagar; Lee Zamparo; Yuheng Lu; Christina S Leslie
Journal:  Nat Methods       Date:  2019-08-12       Impact factor: 28.547

Review 9.  Protein-DNA binding: complexities and multi-protein codes.

Authors:  Trevor Siggers; Raluca Gordân
Journal:  Nucleic Acids Res       Date:  2013-11-16       Impact factor: 16.971

10.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk.

Authors:  Jian Zhou; Chandra L Theesfeld; Kevin Yao; Kathleen M Chen; Aaron K Wong; Olga G Troyanskaya
Journal:  Nat Genet       Date:  2018-07-16       Impact factor: 38.330

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.