| Literature DB >> 35892972 |
Valentina Rudenko1, Eugene Korotkov1.
Abstract
In this study, we used a mathematical method for the multiple alignment of highly divergent sequences (MAHDS) to create a database of potential promoter sequences (PPSs) in the Capsicum annuum genome. To search for PPSs, 20 statistically significant classes of sequences located in the range from -499 to +100 nucleotides near the annotated genes were calculated. For each class, a position-weight matrix (PWM) was computed and then used to identify PPSs in the C. annuum genome. In total, 825,136 PPSs were detected, with a false positive rate of 0.13%. The PPSs obtained with the MAHDS method were tested using TSSFinder, which detects transcription start sites. The databank of the found PPSs provides their coordinates in chromosomes, the alignment of each PPS with the PWM, and the level of statistical significance as a normal distribution argument, and can be used in genetic engineering and biotechnology.Entities:
Keywords: Capsicum annuum; MAHDS; pepper genome; plant promoter database; potential promoter sequences; promoter classification; promoter prediction
Year: 2022 PMID: 35892972 PMCID: PMC9332048 DOI: 10.3390/biology11081117
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Numbers of elements in the created classes of C. annuum promoters.
| Class № | Number of Elements | Class № | Number of Elements |
|---|---|---|---|
| 1 | 5402 | 11 | 203 |
| 2 | 1976 | 12 | 176 |
| 3 | 993 | 13 | 171 |
| 4 | 695 | 14 | 159 |
| 5 | 515 | 15 | 141 |
| 6 | 400 | 16 | 129 |
| 7 | 390 | 17 | 118 |
| 8 | 321 | 18 | 118 |
| 9 | 230 | 19 | 117 |
| 10 | 214 | 20 | 106 |
Figure 1Part of the PWM’ for the first C. annuum promoter class. Elements with values <−4 and >4 are highlighted red and green, respectively.
Figure 2Profile diagrams of C. annuum promoters of class 1 (a) and class 2 (b). Black and white circles indicate X(j) for promoter sequences and random sequences, respectively.
Figure 3Cluster dendrogram of C. annuum promoter classes. Blue rectangles indicate clusters of promoter classes obtained at the association level of 0.8.
Number of PPSs in the C. annuum genome for different Z levels.
| Results | Z Level | ||||
|---|---|---|---|---|---|
| ≥ 5.0 | ≥ 5.5 | ≥ 6.0 | ≥ 6.5 | ≥ 7.0 | |
| Real sequences | 1,679,534 | 1,242,664 | 825,136 | 491,647 | 263,864 |
| Random sequences | 20,490 | 5064 | 1068 | 221 | 44 |
| FDR | 1.21% | 0.41% | 0.13% | 0.04% | 0.02% |
Figure 4Histogram Z for PPS.
Matches of the detected PPSs with promoter regions of the annotated genes in the C. annuum genome (Z ≥ 6.0).
| Chromosome № | Annotated Genes ( | Matches in the Strands of Annotated Promoters and PPSs (n) * | Total Matches ( | % of Matches | |||
|---|---|---|---|---|---|---|---|
| ++ | −− | +− | −+ | ||||
| 1 | 2161 | 431 | 387 | 53 | 38 | 909 | 42% |
| 2 | 1759 | 294 | 420 | 56 | 25 | 795 | 45% |
| 3 | 1988 | 426 | 325 | 21 | 50 | 822 | 41% |
| 4 | 1269 | 220 | 286 | 31 | 29 | 566 | 45% |
| 5 | 1054 | 240 | 145 | 11 | 32 | 428 | 41% |
| 6 | 1555 | 425 | 185 | 14 | 110 | 734 | 47% |
| 7 | 1231 | 223 | 223 | 22 | 28 | 496 | 40% |
| 8 | 649 | 113 | 124 | 9 | 15 | 261 | 40% |
| 9 | 1104 | 187 | 241 | 58 | 14 | 500 | 45% |
| 10 | 1075 | 223 | 180 | 19 | 49 | 471 | 44% |
| 11 | 1166 | 187 | 198 | 22 | 30 | 437 | 37% |
| 12 | 1274 | 215 | 222 | 26 | 31 | 494 | 39% |
| Total | 16,285 | 3184 | 2936 | 342 | 451 | 6913 | 42% |
* + and − indicate forward and reverse strands, respectively; the first and second characters refer to the annotated promoters and PPSs, respectively.
Numbers of TSSs identified in the PPSs of the C. annuum genome using TSSFinder with different training sets.
| Chromosome № |
|
|
| |||
|---|---|---|---|---|---|---|
| All PSSs | TATA-Containing PSSs | All PSSs | TATA-Containing PSSs | All PSSs | TATA-Containing PSSs | |
| 1 | 46 | 9 | 8 | 0 | 16 | 0 |
| 2 | 38 | 3 | 7 | 0 | 15 | 0 |
| 3 | 60 | 13 | 17 | 0 | 12 | 0 |
| 4 | 37 | 8 | 6 | 0 | 15 | 0 |
| 5 | 61 | 13 | 13 | 0 | 21 | 0 |
| 6 | 72 | 18 | 11 | 0 | 16 | 0 |
| 7 | 49 | 10 | 13 | 0 | 19 | 0 |
| 8 | 60 | 10 | 10 | 0 | 18 | 0 |
| 9 | 41 | 4 | 7 | 0 | 21 | 0 |
| 10 | 65 | 9 | 10 | 0 | 25 | 0 |
| 11 | 52 | 7 | 13 | 0 | 15 | 0 |
| 12 | 48 | 11 | 10 | 0 | 20 | 0 |
| Total | 629 | 115 | 125 | 0 | 213 | 0 |