Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A Contrastive Learning Pre-Training Method for Motif Occupancy Identification.

Literature DB >> 35563090

A Contrastive Learning Pre-Training Method for Motif Occupancy Identification.

Ken Lin¹, Xiongwen Quan¹, Wenya Yin¹, Han Zhang¹.

Abstract

Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman-Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning.

Entities: Chemical

Keywords: contrastive learning; data augmentation; edit distance; motif occupancy identification; pre-training; sequence similarity

Mesh：
Algorithms

Year: 2022 PMID： 35563090 PMCID： PMC9103107 DOI： 10.3390/ijms23094699

Source DB: PubMed Journal: Int J Mol Sci ISSN： 1422-0067 Impact factor: 6.208

Keyword Cloud
References

16 in total

Review 1. Too many transcription factors: positive and negative interactions.

Authors: M Karin
Journal: New Biol Date: 1990-02

Review 2. Transcription factors: an overview.

Authors: D S Latchman
Journal: Int J Biochem Cell Biol Date: 1997-12 Impact factor: 5.085

A Contrastive Learning Pre-Training Method for Motif Occupancy Identification.

Review 1. Too many transcription factors: positive and negative interactions.

Review 2. Transcription factors: an overview.

3. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

4. Whole Genome Chromatin IP-Sequencing (ChIP-Seq) in Skeletal Muscle Cells.

5. Identification of common molecular subsequences.

6. scNAME: Neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.

7. Predicting effects of noncoding variants with deep learning-based sequence model.

8. Text Data Augmentation for Deep Learning.

9. Contrastive self-supervised clustering of scRNA-seq data.

10. An integrated encyclopedia of DNA elements in the human genome.