| Literature DB >> 35991673 |
Duguma Dibbisa1, Gobena Wagari2.
Abstract
Microbial genes and their product were diverse and beneficial for heavy metal bioremediation from the contaminated sites. Screening of genes and gene products plays a significant role in the detoxification of pollutants. Understanding of the promoter region and its regulatory elements is a vital implication of microbial genes. To the best of our knowledge, there is no in silico study reported so far on mer gene families used for heavy metal bioremediation. The motif distribution was observed densely upstream of the TSSs (transcription start sites) between +1 and -350 bp and sparsely distributed above -350 bp, according to the current study. MEME identified the best common candidate motifs of TFs (transcription factors) binding with the lowest e value (7.2e-033) and is the most statistically significant candidate motif. The EXPREG output of the 11 TFs with varying degrees of function such as activation, repression, transcription, and dual purposes was thoroughly examined. Data revealed that transcriptional gene regulation in terms of activation and repression was observed at 36.4% and 54.56%, respectively. This shows that most TFs are involved in transcription gene repression rather than activation. Likewise, EXPREG output revealed that transcriptional conformational modes, such as monomers, dimers, tetramers, and other factors, were also analyzed. The data indicated that most of the transcriptional conformation mode was dual, which accounts for 96%. CpG island analysis using online and offline tools revealed that the gene body had fewer CpG islands compared to the promoter regions. Understanding the common candidate motifs, transcriptional factors, and regulatory elements of the mer operon gene cluster using a machine learning approach could help us better understand gene expression patterns in heavy metal bioremediation.Entities:
Year: 2022 PMID: 35991673 PMCID: PMC9391164 DOI: 10.1155/2022/6185615
Source DB: PubMed Journal: Int J Genomics ISSN: 2314-436X Impact factor: 2.758
Mercury bioremediation genes and their general function and genome coordinates.
| SN | Gene ID | Gene symbol | Genome coordinate | Gene function |
|---|---|---|---|---|
| 1. | 69751970 |
| c33607-31961 | Hg2+ reductase applications |
| 2. | 66762507 |
| c3805546-3806184 | Organomercurial lyase |
| 3. | 66762509 |
| c3808349-3807915 | Organomercurial transporter |
| 4. | 69747981 |
| 188629-188994 | Mercury resistance coregulator |
| 5. | 69751968 |
| c31582-31346 | Broad-range mercury transporter |
| 6. | 69751971 |
| c33849-33604 | Mercury resistance protein |
| 7. | 69751974 |
| 34565-34999 | Hg2+ responsive transcriptional regulator |
| 8. | 69751972 |
| c34127-33852 | Mercury resistance system periplasmic binding protein |
| 9. | 69747978 |
| 186216-186566 | Mercuric ion transporter |
| 10. | 46432416 |
| 5771173-5771826 | Phenyl mercury resistance protein |
Genes extracted from NCBI.
TSS number, its promoter predictive score values, and distance from 5′UTR region of the corresponding gene.
| SN | Gene ID | Gene symbol | No. of predictive promoter | No. of TSS identified | The predictive score value cut off at 0.80 | 5′UTR region size (bp) | Orientation of complementary strands |
|---|---|---|---|---|---|---|---|
| 1. | 69751970 |
| 2 | 2 | 0.97, 0.91 | -929 | -ve |
| 2. | 66762507 |
| 2 | 2 | 0.85, 0.82 | -1951 | -ve |
| 3. | 66762509 |
| 1 | 1 | 0.85 | -686 | -ve |
| 4. | 69747981 |
| 2 | 2 | 0.93, 0.89 | 2921 | +ve |
| 5. | 69751968 |
| 2 | 2 | 0.86, 0.94 | -1361 | -ve |
| 6. | 69751971 |
| 2 | 2 | 0.97, 0.91 | -687 | -ve |
| 7. | 69751974 |
| 4 | 4 | 0.97, 0.89, 0.89, 0.86 | 865 | +ve |
| 8. | 69751972 |
| 2 | 2 | 0.97, 0.91 | -409 | -ve |
| 9. | 69747978 |
| 3 | 3 | 0.92, 0.93, 0.89 | 663 | +ve |
| 10. | 46432416 |
| 3 | 3 | 0.89, 0.94, 0.85 | 2217 | +ve |
NNPP tool prediction results are considered reliable at 0.8 cutoff values for the prokaryotic organism [9].
List of predicted motifs and the number and proportion of promoter-containing motifs.
| SN | Predicted and discovered candidate motifs | No. of the promoter for each of the motifs in % |
| Motif widths | No. of the binding sites |
|---|---|---|---|---|---|
| 1. | Motif_1 | 10 (100%) | 7.3 | 50 | 10 |
| 2. | Motif_2 | 7 (70%) | 1.1 | 50 | 7 |
| 3. | Motif_3 | 7 (70%) | 2.0 | 50 | 7 |
| 4. | Motif_4 | 9 (90%) | 1.4 | 50 | 9 |
| 5. | Motif_5 | 7 (70%) | 7.2 | 41 | 7 |
aProbability of finding an equally well-conserved motif in random sequences.
List of matching candidates for EXPREG transcription factor (TF).
| SN | Candidate of TF | Strains showed motif sequence binding | GC (%) | Regulatory roles | Statistical significance | |||
|---|---|---|---|---|---|---|---|---|
| Activation (%) | Repression (%) | Dual (%) | Not specified (%) | |||||
| 1. |
|
| 46.88 | 0 | 100 | 0 | 0 | 2.11 |
| 2. |
|
| 46.67 | 90 | 10 | 0 | 0 | 2.29 |
| 3. |
|
| 59.33 | 7 | 0 | 0 | 92 | 3.43 |
| 4. |
|
| 20.41 | 0 | 0 | 0 | 100 | 3.99 |
| 5. |
|
| 40.25 | 0 | 13 | 0 | 85 | 4.88 |
| 6. |
|
| 52.83 | 0 | 0 | 0 | 100 | 5.95 |
| 7. |
|
| 47.23 | 0 | 0 | 0 | 100 | 6.75 |
| 8. |
|
| 26.32 | 9 | 36 | 0 | 53 | 6.87 |
| 9. |
|
| 46.55 | 0 | 100 | 0 | 0 | 7.38 |
| 10. |
|
| 40.00 | 1 | 1 | 0 | 97 | 7.91 |
| 11. |
|
| 28.95 | 0 | 0 | 20 | 80 | 9.29 |
CRP: cAMP receptor protein; PhhR: phenylalanine hydroxylase regulator; VqsM: virulence and QS modulator; Fur: Ferric uptake regulation protein; CcpA: Catabolite control protein A; MatP: membrane-associated transfer protein; LrP: leucine-responsive regulatory protein.
Figure 1Sequence logos for mercuric bioremediation identified common motifs. The analysis was done by the MEME suite.
Figure 2The relative locations of potential motifs in the promoter region relative to TSSs are illustrated in block diagrams. The nucleotide locations in the promoter region for mer genes encoding for mercury bioremediation are indicated at the bottom of the graph, ranging from +1 (start of TSSs) to upstream 1 kb (-1 kb) from MEME suite output.
List of match candidates of EXPREG transcription Confirmation Factor (TCF).
| SN | Candidate TF | Strains that show motif sequence binding | GC (%) | TF confirmation mode | Not specified (%) | Statistical significance | |||
|---|---|---|---|---|---|---|---|---|---|
| Monomer (%) | Dimer (%) | Tetramer (%) | Other (%) | ||||||
| 1. |
|
| 46.88 | 0 | 0 | 0 | 0 | 100 | 2.11 |
| 2. |
|
| 46.67 | 0 | 100 | 0 | 0 | 0 | 2.29 |
| 3. |
|
| 59.33 | 0 | 0 | 0 | 0 | 100 | 3.43 |
| 4. |
|
| 20.41 | 0 | 0 | 0 | 0 | 100 | 3.99 |
| 5. |
|
| 40.25 | 0 | 100 | 0 | 0 | 0 | 4.88 |
| 6. |
|
| 52.83 | 0 | 100 | 0 | 0 | 0 | 5.95 |
| 7. |
|
| 4723 | 0 | 100 | 0 | 0 | 0 | 6.75 |
| 8. |
|
| 26.32 | 0 | 0 | 0 | 0 | 100 | 6.87 |
| 9. |
|
| 46.55 | 0 | 0 | 0 | 0 | 100 | 7.38 |
| 10. |
|
| 40.00 | 0 | 96 | 0 | 3 | 7.91 | |
| 11. |
|
| 28.95 | 0 | 0 | 0 | 0 | 100 | 9.29 |
CRP: cAMP receptor protein; PhhR: phenylalanine hydroxylase regulator; VqsM: virulence and QS modulator; Fur: Ferric uptake regulation protein; CcpA: Catabolite control protein A; MatP: membrane-associated transfer protein; LrP: leucine-responsive regulatory protein.
CpG islands identified at both promoter and gene body regions.
| SN | Gene ID | Gene body regions | Promoter regions | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Start | End | Length | No. of CpG found | GC% | Start | End | Length | No. of CpG found | GC% | ||
| 1. | 46432416 | 8 | 631 | 624 | 1 | 57 | 11 | 1970 | 1960 | 1 | 58 |
| 2. | 66762507 | 1 | 631 | 631 | 1 | 50 | 1 | 1987 | 1987 | 1 | 62 |
| 3. | 66762509 | – | – | – | – | – | – | – | – | – | – |
| 4. | 69747978 | – | – | – | – | – | 1 | 1978 | 1978 | 1 | 54 |
| 5. | 69747981 | – | – | – | – | – | 1 | 1990 | 1990 | 1 | 64 |
| 6. | 69751968 | – | – | – | – | – | – | – | – | – | – |
| 7. | 69751970 | 1 | 1639 | 1639 | 1 | 53 | 6 | 1997 | 1992 | 1 | 60 |
| 8. | 69751971 | – | – | – | – | – | 1 | 1964 | 1965 | 1 | 57 |
| 9. | 69751972 | – | – | – | – | – | 1 | 1996 | 1996 | 1 | 70 |
| 10. | 69751974 | – | – | – | – | – | – | – | – | – | – |
Database of CpG islands and analytical tools [16].
MspI cutting sites and fragment sizes in promoter regions.
| Region | Corresponding sequences | Nucleotide positions of | Fragment size between 40 and 220 bp |
|---|---|---|---|
| Promoter region | Prom_69751970 | 12 (102, 121, 485, 528, 935, 941, 969, 991, 1324, 1346, 1565, 1877) | 43 |
| Prom_66762507 | 15 (238, 382, 392, 412, 529, 661, 1021, 1027, 1172, 1230, 1384, 1504, 1579, 1614, 1818) | 144, 117, 132, 145, 58, 154, 120, 75, 204 | |
| Prom_66762509 | 11 (295, 497, 1059, 1165, 1221, 1542, 1558, 1577, 1758, 1797, 1960) | 202, 106, 56, 181, 163 | |
| Prom_69747981 | 19 (35, 317, 433, 445, 470, 589, 768, 901, 1027, 1045, 1167, 1225, 1379, 1499, 1547, 1574, 1609, 1813, 1976) | 116, 119, 179, 133, 126, 122, 58, 154, 120, 48, 204, 163 | |
| Prom_69751968 | 23 (52, 173, 226, 407, 655, 666, 684, 806, 854, 864, 976, 1018, 1138, 1186, 1213, 1248, 1452, 1659, 1676, 1729, 1744, 1848, 1975) | 121, 53, 181, 122, 48, 112, 42, 120, 48, 207, 53, 104, 127 | |
| Prom_69751971 | 14 (6, 160, 185, 344, 363, 727, 770, 1177, 1183, 1211, 1233, 1566, 1588, 1807) | 154, 159, 219 | |
| Prom_69751974 | 15 (40, 152, 162, 210, 332, 350, 361, 609, 790, 843, 964, 1164, 1476, 1695, 1717) | 112, 48, 122, 181, 53, 121, 200, 212, 219 | |
| Prom_69751972 | 15 (63, 271, 284, 438, 463, 622, 641, 1005, 1048, 1455, 1461, 1489, 1511, 1844, 1866) | 208, 154, 159, 43 | |
| Prom_69747978 | 6 (104, 306, 666, 700, 715, 1877) | 104 | |
| Prom_46432416 | 12 (129, 347, 562, 1179, 1361, 1536, 1556, 1591, 1763, 1795, 1836, 1958) | 218, 215, 182, 175, 172, 41, 122 |
MspI cutting sites and fragment sizes in gene body regions.
| Region | Corresponding sequences | Nucleotide positions of | Fragment size between 40 and 220 bp |
|---|---|---|---|
| Gene bodies | ORF 69751970 | 17 (77, 198, 251, 432, 680, 691, 709, 831, 879, 889, 1001, 1043, 1163, 1211, 1238, 1273, 1477) | 121, 53, 181, 122, 48, 112, 42, 120, 48, 204 |
| ORF 66762507 | 3 (54, 279, 319) | 40 | |
| ORF 66762509 | 1 (403) | − | |
| ORF 69747981 | 5 (21, 38, 91, 210, 337) | 53, 119, 127 | |
| ORF 69751968 | 4 (47, 119, 131, 179) | 72, 48 | |
| ORF 69751971 | 1 (119) | − | |
| ORF 69751974 | 4 (50,72 100, 106) | − | |
| ORF 69751972 | 1 (85) | − | |
| ORF 69747978 | 3 (17, 210, 232) | 193 | |
| ORF 46432416 | 1 (636) | − |