| Literature DB >> 28961686 |
Hanjun Dai1, Ramzan Umarov2, Hiroyuki Kuwahara2, Yu Li2, Le Song1, Xin Gao2.
Abstract
MOTIVATION: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28961686 PMCID: PMC5870668 DOI: 10.1093/bioinformatics/btx480
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The proposed graphical model for embedding a 12-mer DNA binding sequence. The x is the nucleotide at position i in the binding sequence, H is the hidden variable at position i, and K is the affinity value of this binding sequence
Fig. 2.The architecture of the baseline CNN model
Comparison of different methods on the HiTS-FLIP dataset with 83 252 12 bp DNA sequences (Nutiu )
| Measure | PWM | BaMM | LM | SVR | DNN | CNN | S2V |
|---|---|---|---|---|---|---|---|
| RMSE | 181.87 | N/A | 128.61 | 115.16 | 116.70 | 113.70 | |
| PCC | 0.27 | 0.39 | 0.73 | 0.79 | 0.79 | 0.80 | |
| SCC | 0.01 | 0.33 | 0.63 | 0.71 | 0.70 | 0.71 |
Note: PWM, position weight matrix; BaMM, Bayesian Markov Model motif discovery (Siebert and Söding, 2016); LM, the DREAM-winning HK →ME linear model (Annala ); SVR, the two round WD kernel-based SVR model (Wang ); DNN, the multi-layer neural network model; CNN, the convolutional neural network model (Alipanahi ); S2V, the proposed Sequence2Vec method. The best performance under each measure is in bold.
Fig. 3.(a) The scatter plot between the true and predicted K values by Sequence2Vec on one fold of the HiTS-FLIP dataset; (b) the 7-mer sequence logo made by the top-ranked 7-mers, predicted by Sequence2Vec, of the HiTS-FLIP dataset; and (c) the scatter plot between the AUC of DeepBind and that of Sequence2Vec over the 66 PBM datasets
Comparison of the average performance of different methods over the MITOMI 2.0 datasets for 28 TFs in Saccharomyces cerevisiae (Fordyce )
| Measure | PWM | BaMM | LM | SVR | DNN | CNN | FS | S2V |
|---|---|---|---|---|---|---|---|---|
| RMSE | 0.049 | N/A | 0.080 | 0.042 | 0.044 | 0.039 | 0.043 | |
| PCC | 0.06 | 0.24 | 0.26 | 0.41 | 0.16 | 0.45 | 0.34 | |
| SCC | 0.07 | 0.23 | 0.11 | 0.23 | 0.13 | 0.20 | 0.26 |
Note: PWM, position weight matrix; BaMM, Bayesian Markov Model motif discovery (Siebert and Söding, 2016); LM, the DREAM-winning HK →ME linear model (Annala ); WD, the two round weighted degree kernel-based SVR model (Wang ); DNN, the multi-layer neural network model; CNN, the convolutional neural network model (Alipanahi ); FS, the Fisher kernel-based SVR model (Jaakkola and Haussler, 1999); S2V, the proposed Sequence2Vec model. The best performance under each measure is in bold.