| Literature DB >> 35488998 |
Miao Wang1, Fuyi Li2, Hao Wu3, Quanzhong Liu4, Shuqin Li5.
Abstract
Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.Entities:
Keywords: Deep Forest; Deep learning; Feature fusion; Feature selection; Machine learning; Promoter
Mesh:
Year: 2022 PMID: 35488998 DOI: 10.1007/s12539-022-00520-4
Source DB: PubMed Journal: Interdiscip Sci ISSN: 1867-1462 Impact factor: 3.492