| Literature DB >> 30050664 |
Emanuel Flores-Bautista1,2, Carenne Ludeña Cronick3, Anny Rodriguez Fersaca4, Mario Alberto Martinez-Nuñez2, Ernesto Perez-Rueda5,6.
Abstract
The repertoire of 304 DNA-binding transcription factors (TFs) in Escherichia coli K-12 has been described recently, with 196 TFs experimentally characterized and 108 proteins predicted by sequence comparisons. Based on 303 expression profile patterns retrieved from the Colombos database 12 clusters were identified, including hypothetical and experimentally characterized TFs, using a spectral clustering algorithm based on a 3NN graph built using 14 principal components that represent 65% of the variance of the expression data. In a posterior step, clusters were characterized in terms of their associated overrepresented functions, based on KEGG, Supfam annotations and Pfam assignments among other functional categories using an enrichment test, reinforcing the notion that the identified clusters are functionally similar among them. Based on these data, the we identified 12 clusters in which hypothetical and known TFs share similar regulatory and physiological functions, such as module associations of toxin-antitoxin (TA) systems with DNA repair mechanisms, amino acid biosynthesis, and carbon metabolism/transport, among others. This analysis has increased our knowledge about gene regulation in E. coli K-12 and can be further expanded to other organisms.Entities:
Keywords: Escherichia coli; Gene expression; Hypothetical proteins; Spectral clustering; Transcription factor
Year: 2018 PMID: 30050664 PMCID: PMC6055005 DOI: 10.1016/j.csbj.2018.03.003
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1a) Cummulative proportion of explained variance as a function of the number of principal components. Dashed line corresponds to 16 components accounting for 68% of total variance. In X-axis is the number of dimensions and in Y-axis is the cumulative variance ratio. b) Heatmap of absolute value of correlations of projections of original attributes over the first 14 principal components, accounting for over 65% of the overall variance. Red is equivalent to low correlations and white to high correlations.
Clusters of TFs identified by similar profile patterns of expression.
| Cluster | Experimentally characterized | Hypothetical TFs | N | WSS | |
|---|---|---|---|---|---|
| Strong | Weak | ||||
| C1 | AraC, CsgD, FeaR, GadX, GalS, LsrR, MelR, Mlc, RclR (YkgD) | AbgR, CdaR, FucR, MhpR, SrlR (GutR), HcaR, MtlR, PrpR, RhaR,TdcA, YiaJ, YeiL | YahB, YbiI, YneJ, YgeV, YgfI, YihL, ChpS, SfsA (MalQ), DmlR (YeaT), LgoR (yjjM), YebK (HexR) | 32 | 11.81 |
| C2 | GalR, LrhA, NarP, RelE, RelB, PuuR, MqsA,LexA, McbR (YncC), SoxR, YefM | GlcC, Hha, MalI, MntR, NsrR, UidR, | YdjF, YeeY, YgjM (HigA), YgiT, YjgJ | 22 | 5.43 |
| C3 | AlaS, ArgR, BaeR, CpxR, CysB, FruR, NhaR, NikR, NrdR, PepA, PurR, SdiA, TreR, TrpR, TyrR, UxuR, YehT, YjiE (HypT), YqhC, | AllR, AppY, BglJ, DeoR, EbgR, FabR, IdnR, LacI, UhpA, | YcaN, YdiA (PpsR), YfeR, YgbI, YggD (FumE), YhaJ, YidZ | 35 | 10.88 |
| C4 | Ada, CadC, ChbR, GlpR, DpiA, IhfB, HipB, GlrR (YfhA), PhoB, RutR, SlyA, TtdR, YdeO, | Crl, CreB, DhaR, IlvY, PerR, SfsB | DicC (regulated by DicA), (regulated by RcsB-BglJ), YbcL, YbcM, YbeF, YbhD, YdaS, YddM, YnfL, YdhB, YfhH, YqeH, YhjB, YjhI, YjjQ, YjjJ | 34 | 9.13 |
| C5 | CaiF, Cbl, CytR, FlhDC, Hns, HupA, HupB, IhfA, Lrp, NarL, GlnG, StpA, | CsiR, EutR, MalT, | OgrK, YpdC, YphH, | 19 | 8.52 |
| C6 | CspA, MarA, UlaR, ComR (YcfQ), | MarR, NemR (YdhM) | CspH, CspG, YdfH, YbaO | 10 | 2.13 |
| C7 | AidB, GadE, PutA, | BolA, LldR, | CspD, DctR, YhjC, YiaG, YjdC | 10 | 1.59 |
| C8 | AgaR, DicA, ArgP, EvgA, ExuR, FadR, FNR, Fur, IscR, GadW, NagC, NanR, PdhR, Rob, RstA, UvrY, YcgE (BluR) MlrA | GcvA, GntR, KdgR, | YbaQ, YciT (DeoT), YfgA (RodZ) | 25 | 3.14 |
| C9 | AdiY, ArcA, CueR, HdfR, LeuO, Nac, OmpR, OxyR, PhoP, PgrR (YcjZ), PspF, RcsA, RcsB, SoxS, YqjI YeaM (NimR), | AsnC, BetI, DsdC, LysR, PspC, PspF, YfaX(RhmR) | YafC, YbiH, YeiE, YieP, YtfH, YijO | 28 | 7.15 |
| C10 | AtoC, Crp, CusR, DcuR, FhlA, Fis, KdpE, MprA, SgrR, RhaS, XylR | AllS, AlpA, ArsR, EnvR, EnvY, HycA, NadR, GutM, HyfR, PhnF, RbsR, RpiR (AlsR), RtcR, TdcR, | YbdO, CspE, YmfL, YdiP, YqeI, YgeH, YidL, YidP, FrvR, FimZ (YbcA), SgcR (YjhJ), SlmA (YicB), DgoR (YidW) | 38 | 9.93 |
| C11 | AcrR, IclR, ModE, RcnR, TorR, YdcN, YedW, YegW, RcdA (YbjK), YahA (PdeL) | Cnu | YagI, YbfE, CspI, CspB, CspF | 16 | 3.44 |
| C12 | BirA, BasR, CynR, MetR, MngR, MurR (YfeT), PaaX, | AscG, FrlR (yhfR), NorR, XapR, MazE | YafN, YcjW, YdcQ, YdcR, YeiI, YfiE, YiaU, YihW, YjhU, YjiR | 22 | 3.4 |
Columns are as follow: Cluster number, known TFs (strong and weak evidences) and hypothetical TFs; number of TFs per cluster; and individual (within) sum of squares (WSS).
Functional characterization of clusters.
| Cluster no. | Number of target genes | Regulatory mechanism | KEGG | Supfam | PFAM |
|---|---|---|---|---|---|
| C1 | 321 | – | Carbohydrate metabolism; amino acid metabolism; metabolism of other amino acids | Metabolism - amino acids; carbohydrate; energy | FGGY_C;MR_MLE_C;Rieske; FGGY_N;Asp_Glu_race; BPD_transp_2; Aldolase_II;AraC_binding; |
| C2 | 272 | Repressor | Replication and repair; Metabolism of other amino acids; drug resistance: antimicrobial | Information - DNA replication-repair; metabolism - other enzymes, redox; Processes_EC - toxins-defense; Processes_IC - proteases | Mur_ligase_C; Mur_ligase_M; HOK_GEF; IMS; IMS_C; IMS_HHH; Peptidase_S24; PhdYeFM_antitox; RelB; UVR; UvrD_C; UvrD-helicase |
| C3 | 371 | Repressor | Amino acid metabolism; energy metabolism; nucleotide metabolism; glycan biosynthesis and metabolism; metabolism of cofactors and vitamins | Metabolism - amino acids, carbohydrate, coenzyme, transferases; Processes_IC – transport | DAHP_synth_1; GATase; LTXXQ; MGS; SBP_bac_3; SKI;CPSase_L_D3; CPSase_sm_chain; PAPS_reduct |
| C4 | 177 | Activator | Lipid metabolism; membrane transport; signal transduction | Information - DNA replication-repair; regulation - signal transduction | HTH_18 |
| C5 | 869 | Repressor | Carbohydrate metabolism; xenobiotics biodegradation and metabolism; energy metabolism; amino acid metabolism; translation; membrane transport | General – general; metabolism – redox; Processes_EC - cell adhesion; Processes_IC – protein modification, transport | Fimbrial; molybdopterin; Molydop_binding; Molybdop_Fe4S4; Fer4_11; ABC_tran; Peripla_BP_6; Fer4_4 |
| C6 | 51 | Activator | Metabolism; carbohydrate metabolism; glycan biosynthesis and metabolism; drug resistance: antimicrobial | Metabolism – carbohydrate, E-transfer; regulation - kinases-phosphatases | AA_permease_2; GerE |
| C7 | 47 | Activator | Amino acid metabolism; glycan biosynthesis and metabolism; metabolism of other amino acids | Information - DNA replication-repair; metabolism - other enzymes; | AA_permease_2; ABC_tran |
| C8 | 745 | Dual | Energy metabolism; Lipid metabolism; amino acid metabolism; metabolism of terpenoids and membrane transport; polyketides | Information – transcription; metabolism - coenzyme other; Processes_IC - transport | FecCD; Fe-S_biosyn; Plug; TonB_dep_Rec; N_methyl |
| C9 | 474 | Activator | Metabolism of other amino acids | Metab - energy transfer | BPD_transp_2; Peripla_BP_4; FGGY_C; HTH_8; FGGY_N; Proton_antipo_M |
| C10 | 939 | Activator | Carbohydrate metabolism; metabolism of cofactors and vitamins; translation membrane transport | Information - DNA replication-repair; metabolism – nucleotide; other enzymes, transferases | Nitroreductase |
| C11 | 86 | – | Drug resistance: antimicrobial; folding, sorting and degradation; membrane transport; signal transduction; cellular community - prokaryotes | Metab - redox | – |
| C12 | 64 | Repressor | Metabolism of cofactors and vitamins; metabolism of other amino acids | Metabolism - other enzymes Processes_IC - protein modification | PaaA_PaaC |
Columns are as follows: cluster number; number of target genes per cluster; enriched regulatory roles associated to target genes; functions according to Supfam, KEGG and Pfam. A P-value <0.045 were considered as threshold.
Fig. 2Functional assignments based on a) KEGG, b) SUPFAM and c) Pfam annotations. Functional assignments were evaluated per cluster. Only enriched functions were plotted as a heatmap. Colorbar represents −log P-value with Benjamini Hochberg correction.
Fig. 3Heatmap of cluster 9 TFs (TFs for multiple types of stress response and anaerobic metabolism). Experimental and well-known TFs were mapped into Colombos, and their expression patterns are displayed.