| Literature DB >> 35641900 |
Ying Wang1, Qinke Peng2, Xu Mou1, Xinyuan Wang1, Haozhou Li1, Tian Han1, Zhao Sun1, Xiao Wang1.
Abstract
BACKGROUND: The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes.Entities:
Keywords: Convolutional neural networks (CNNs); Fully connected networks; Promoter identification; Structural profiles
Mesh:
Year: 2022 PMID: 35641900 PMCID: PMC9158169 DOI: 10.1186/s12859-022-04735-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Datasets and the details of eukaryotic promoters
| Organism | Data sources | Dataset type | Numbers of sequences | Location/length |
|---|---|---|---|---|
| Human | EPD | Promoters | 29,597 | [− 200, + 50] bp |
| UCSC | Non-promoters | 50,000 | 251 bp | |
| Plants | PlantProm DB | Promoters | 8272 | [− 200, + 50] bp |
| TAIR | Non-promoters | 12,834 | 251 bp |
Datasets and the details of prokaryotic promoters
| Organism | Data sources | Subtype/type | Numbers of sequences | Location/length |
|---|---|---|---|---|
| Regulon DB | σ24 | 484 | [− 60, + 20] bp | |
| σ28 | 134 | [− 60, + 20] bp | ||
| σ32 | 291 | [− 60, + 20] bp | ||
| σ38 | 163 | [− 60, + 20] bp | ||
| σ54 | 94 | [− 60, + 20] bp | ||
| σ70 | 1694 | [− 60, + 20] bp | ||
| Non-promoters | 2860 | 81 bp |
Detailed outcomes of the four methods mentioned above
| Organism | Method | ||||
|---|---|---|---|---|---|
| Human | PSFNcce | 92.32 | 90.26 | 0.7930 | |
| PSFN | 85.16 | ||||
| ResNet | 82.13 | 90.60 | 87.45 | 0.7303 | |
| GoogLeNet | 83.75 | 86.84 | 85.69 | 0.6982 | |
| Plants | PSFNcce | 94.62 | 91.99 | 0.8314 | |
| PSFN | 86.96 | ||||
| ResNet | 80.92 | 89.56 | 86.17 | 0.7086 | |
| GoogLeNet | 87.92 | 92.83 | 90.90 | 0.8089 |
The best results of each measure are shown in bold
Fig. 1a Features derived from the training sets of plants through the PSFNcce. b Features derived from the training sets of plants through the PSFN. c Features derived from the test sets of plants through the PSFNcce. d Features derived from the test sets of plants through the PSFN
Detailed outcomes of the four methods mentioned above with the input of matrices of structural profile properties
| Organism | Method | ||||
|---|---|---|---|---|---|
| Human | DSPN | 80.60 | |||
| CNNs | 86.76 | 85.22 | 0.6874 | ||
| ResNet | 80.81 | 86.84 | 84.59 | 0.6725 | |
| GoogLeNet | 78.31 | 89.56 | 85.37 | 0.6847 | |
| Plants | DSPN | 75.24 | 90.33 | ||
| CNNs | 85.50 | 83.23 | 0.6607 | ||
| ResNet | 71.37 | 90.80 | 83.18 | 0.6428 | |
| GoogLeNet | 67.51 | 83.42 | 0.6500 |
The best results of each measure are shown in bold
The comparison of the performance of the HMPI and other methods mentioned above at identifying eukaryotic promoters
| Organism | Results source | Method | |||
|---|---|---|---|---|---|
| Human | This article | HMPI | |||
| 82.13 | 90.60 | 0.7303 | ResNet | ||
| 83.75 | 86.84 | 0.6982 | GoogLeNet | ||
| [ | 85.19 | 81.91 | SD-MSAEs | ||
| [ | 78.45 | 0.6413 | SCS | ||
| [ | 79.67 | 78.90 | * | DCDE-MSVM | |
| Plants | This article | 90.34 | HMPI | ||
| 80.92 | 89.56 | 0.7086 | ResNet | ||
| 87.92 | 92.83 | 0.8089 | GoogLeNet | ||
| [ | 89 | 86 | * | PromoBot | |
| [ | 86 | 0.82 | TSSPlant |
The best results of each measure are shown in bold
*The represented measurements are not calculated
The comparison of performance of the HMPI and other methods mentioned above at identifying prokaryotic promoters
| Organism | Method | ||||
|---|---|---|---|---|---|
| HMPI | |||||
| Stability | 76.61 | 79.48 | 78.04 | 0.5615 | |
| iPro54 | 77.76 | 83.15 | 80.45 | 0.6100 | |
| iPromoter-2L | 79.20 | 83.15 | 80.45 | 0.6343 | |
| MULTiPly | 87.27 | 86.57 | 86.92 | 0.7385 |
The best results of each measure are shown in bold
Fig. 2a The training set features derived by the second layer of the PSFN within the HMPI. b The training set features derived by the second layer of the PSFN within the HMPIat
Comparison of the performance of the HMPI, HMPIat, HMPIlsr, and iPromoter-2L on identifying subtypes of Escherichia coli K-12
| Organism | Subtype | iPromoter-2L | HMPI | HMPIat | HMPIlsr | ||||
|---|---|---|---|---|---|---|---|---|---|
| σ24 | 93.50 | 0.7338 | 95.45 | 0.8443 | 94.76 | 0.8138 | |||
| σ28 | 96.82 | 0.5708 | 97.20 | 0.6547 | 97.20 | 0.6777 | |||
| σ32 | 94.41 | 0.6524 | 93.71 | 0.6343 | 94.76 | 0.6855 | |||
| σ38 | 0.2962 | 94.41 | 0.2644 | 94.41 | 94.06 | 0.2219 | |||
| σ54 | 94.04 | 96.50 | 0.2616 | 0.3196 | 96.15 | 0.2531 | |||
| σ70 | 80.66 | 0.6056 | 85.66 | 0.7037 | 86.01 | 0.7117 | |||
The best results are shown in bold
Fig. 3The framework of the HMPI (hybrid model for promoter identification)
Fig. 4The framework of the PSFN (promoter sequence features network)
Fig. 5The framework of the DSPN (deep structural profiles network)