| Literature DB >> 28836963 |
Pritha Ghosh1, Ramanathan Sowdhamini2.
Abstract
BACKGROUND: Pathogenic bacteria have evolved various strategies to counteract host defences. They are also exposed to environments that are undergoing constant changes. Hence, in order to survive, bacteria must adapt themselves to the changing environmental conditions by performing regulations at the transcriptional and/or post-transcriptional levels. Roles of RNA-binding proteins (RBPs) as virulence factors have been very well studied. Here, we have used a sequence search-based method to compare and contrast the proteomes of 16 pathogenic and three non-pathogenic E. coli strains as well as to obtain a global picture of the RBP landscape (RBPome) in E. coli.Entities:
Keywords: Escherichia coli; Genome-wide survey; PELOTA; Pathogen; RNA-binding proteins; Ribonuclease PH; Uncharacterised; Virulence
Mesh:
Substances:
Year: 2017 PMID: 28836963 PMCID: PMC5571608 DOI: 10.1186/s12864-017-4045-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
E. coli proteomes for comparative study. The 19 E. coli proteomes from UniProt (May 2016) used in the study for the comparison of RBPomes of pathogenic and non-pathogenic strains have been listed in this table. The pathogenic and the non-pathogenic E. coli strains have been represented in red and green fonts, respectively
apoints to the differences among the strains at the genome level, considering the strain K12 (or lab strain) as the standard
Complete E. coli proteomes. The 166 E. coli complete proteomes from RefSeq (May 2016) that have been used in the study have been listed in this table
| Organism/Name | Strain | Assembly | Proteins |
|---|---|---|---|
|
| Sakai substr. RIMD 0509952 | GCA_000008865.1 | 5292 |
|
| IAI39 | GCA_000026345.1 | 4725 |
|
| K-12 substr. MG1655 | GCA_000005845.2 | 4140 |
|
| NRG 857C | GCA_000183345.1 | 4582 |
|
| 2011C-3493 | GCA_000299455.1 | 5149 |
|
| CFT073 | GCA_000007445.1 | 4897 |
|
| BL21(DE3) | GCA_000009565.2 | 4302 |
|
| K-12 substr. W3110 | GCA_000010245.1 | 4410 |
|
| SE11 | GCA_000010385.1 | 4968 |
|
| SE15 | GCA_000010485.1 | 4573 |
|
| 12,009 | GCA_000010745.1 | 5423 |
|
| 11,128 | GCA_000010765.1 | 5673 |
|
| UTI89 | GCA_000013265.1 | 4963 |
|
| 536 | GCA_000013305.1 | 4542 |
|
| APEC O1 | GCA_000014845.1 | 5292 |
|
| E24377A | GCA_000017745.1 | 5021 |
|
| HS | GCA_000017765.1 | 4366 |
|
| REL606 | GCA_000017985.1 | 4344 |
|
| ATCC 8739 | GCA_000019385.1 | 4434 |
|
| K-12 substr. DH10B | GCA_000019425.1 | 4450 |
|
| SMS-3-5 | GCA_000019645.1 | 4908 |
|
| EC4115 | GCA_000021125.1 | 5631 |
|
| TW14359 | GCA_000022225.1 | 5537 |
|
| K-12 substr. BW2952 | GCA_000022345.1 | 4347 |
|
| BL21(DE3) | GCA_000022665.2 | 4302 |
|
| DH1 | GCA_000023365.1 | 4369 |
|
| BL21-Gold(DE3)pLysS AG | GCA_000023665.1 | 4322 |
|
| CB9615 | GCA_000025165.1 | 5262 |
|
| IHE3034 | GCA_000025745.1 | 4911 |
|
| 55,989 | GCA_000026245.1 | 4953 |
|
| IAI1 | GCA_000026265.1 | 4450 |
|
| S88 | GCA_000026285.1 | 4696 |
|
| E2348/69 | GCA_000026545.1 | 4924 |
|
| 42 | GCA_000027125.1 | 5131 |
|
| 11,368 | GCA_000091005.1 | 5833 |
|
| KO11 | GCA_000147855.3 | 4850 |
|
| ABU 83972 | GCA_000148365.1 | 4862 |
|
| UM146 | GCA_000148605.1 | 4779 |
|
| W | GCA_000184185.1 | 4825 |
|
| ETEC H10407 | GCA_000210475.1 | 5124 |
|
| UMNK88 | GCA_000212715.2 | 5542 |
|
| NA114 | GCA_000214765.2 | 4720 |
|
| PCN033 | GCA_000219515.3 | 4881 |
|
| UMNF18 | GCA_000220005.2 | 5521 |
|
| CE10 | GCA_000227625.1 | 5152 |
|
| clone D i2 | GCA_000233875.1 | 4740 |
|
| clone D i14 | GCA_000233895.1 | 4742 |
|
| RM12579 | GCA_000245515.1 | 5213 |
|
| P12b | GCA_000257275.1 | 4549 |
|
| KO11FL | GCA_000258025.1 | 4732 |
|
| W | GCA_000258145.1 | 4831 |
|
| Xuzhou21 | GCA_000262125.1 | 5402 |
|
| K-12 substr. MG1655 | GCA_000269645.2 | 4405 |
|
| DH1 | GCA_000270105.1 | 4351 |
|
| K-12 substr. MG1655 | GCA_000273425.1 | 4404 |
|
| LF82 | GCA_000284495.1 | 4544 |
|
| EC958 | GCA_000285655.3 | 5037 |
|
| 2009EL-2050 | GCA_000299255.1 | 5283 |
|
| 2009EL-2071 | GCA_000299475.1 | 5227 |
|
| APEC O78 | GCA_000332755.1 | 4598 |
|
| K-12 substr. MDS42 | GCA_000350185.1 | 3713 |
|
| LY180 | GCA_000468515.1 | 4586 |
|
| PMV-1 | GCA_000493595.1 | 5100 |
|
| JJ1886 | GCA_000493755.1 | 5151 |
|
| K-12 substr. MC4100 | GCA_000499485.1 | 4284 |
|
| RM13514 | GCA_000520035.1 | 5524 |
|
| RM13516 | GCA_000520055.1 | 5354 |
|
| ST540 | GCA_000597845.1 | 4498 |
|
| ST540 | GCA_000599625.1 | 4532 |
|
| ST540 | GCA_000599645.1 | 4550 |
|
| ST2747 | GCA_000599665.1 | 4665 |
|
| ST2747 | GCA_000599685.1 | 4585 |
|
| ST2747 | GCA_000599705.1 | 4547 |
|
| RM12761 | GCA_000662395.1 | 5349 |
|
| RM12581 | GCA_000671295.1 | 5520 |
|
| Nissle 1917 | GCA_000714595.1 | 4990 |
|
| KLY | GCA_000725305.1 | 4478 |
|
| SS17 | GCA_000730345.1 | 5532 |
|
| EDL933 | GCA_000732965.1 | 5530 |
|
| ATCC 25922 | GCA_000743255.1 | 4940 |
|
| K-12 substr. BW25113 | GCA_000750555.1 | 4398 |
|
| ECONIH1 | GCA_000784925.1 | 5320 |
|
| ER2796 | GCA_000800215.1 | 4311 |
|
| ER3413 | GCA_000800765.1 | 4309 |
|
| RS218 | GCA_000800845.2 | 4791 |
|
| RM9387 | GCA_000801165.1 | 4775 |
|
| 94–3024 | GCA_000801185.2 | 4792 |
|
| K-12 substr. MG1655 | GCA_000801205.1 | 4387 |
|
| SS52 | GCA_000803705.1 | 5489 |
|
| APEC IMT5155 | GCA_000813165.1 | 4840 |
|
| 6409 | GCA_000814145.2 | 4893 |
|
| GCA_000819645.1 | 4996 | |
|
| Santai | GCA_000827105.1 | 4776 |
|
| 1303 | GCA_000829985.1 | 4849 |
|
| C41(DE3) | GCA_000830035.1 | 4302 |
|
| ECC-1470 | GCA_000831565.1 | 4673 |
|
| BL21 (TaKaRa) | GCA_000833145.1 | 4262 |
|
| MNCRE44 | GCA_000931565.1 | 5137 |
|
| K-12 substr. RV308 | GCA_000952955.1 | 4342 |
|
| K-12 substr. HMS174 | GCA_000953515.1 | 4344 |
|
| HUSEC2011 | GCA_000967155.1 | 5294 |
|
| VR50 | GCA_000968515.1 | 4968 |
|
| CI5 | GCA_000971615.1 | 4874 |
|
| ER3454 | GCA_000974405.1 | 4375 |
|
| ER3440 | GCA_000974465.1 | 4367 |
|
| ER3476 | GCA_000974505.1 | 4354 |
|
| ER3445 | GCA_000974535.1 | 4360 |
|
| ER3466 | GCA_000974575.1 | 4415 |
|
| ER3446 | GCA_000974825.1 | 4357 |
|
| ER3475 | GCA_000974865.1 | 4359 |
|
| ER3435 | GCA_000974885.1 | 4443 |
|
| K-12 substr. AG100 | GCA_000981485.1 | 4394 |
|
| C227–11 | GCA_000986765.1 | 5269 |
|
| SEC470 | GCA_000987875.1 | 4941 |
|
| SQ37 | GCA_000988355.1 | 4405 |
|
| SQ88 | GCA_000988385.1 | 4403 |
|
| SQ2203 | GCA_000988465.1 | 4402 |
|
| CFSAN029787 | GCA_001007915.1 | 5090 |
|
| K-12 substr. GM4792 | GCA_001020945.2 | 4362 |
|
| K-12 substr. GM4792 | GCA_001021005.2 | 4368 |
|
| PCN061 | GCA_001029125.1 | 4680 |
|
| C43(DE3) | GCA_001039415.1 | 4254 |
|
| NCM3722 | GCA_001043215.1 | 4530 |
|
| ACN001 | GCA_001051135.1 | 4671 |
|
| DH1Ec095 | GCA_001183645.1 | 4345 |
|
| DH1Ec104 | GCA_001183665.1 | 4342 |
|
| DH1Ec169 | GCA_001183685.1 | 4342 |
|
| RR1 | GCA_001276585.1 | 4337 |
|
| SF-088 | GCA_001280325.1 | 5019 |
|
| SF-468 | GCA_001280345.1 | 5218 |
|
| SF-166 | GCA_001280385.1 | 4773 |
|
| SF-173 | GCA_001280405.1 | 4936 |
|
| WS4202 | GCA_001307215.1 | 5294 |
|
| K-12 substr. MG1655 | GCA_001308065.1 | 4398 |
|
| K-12 substr. MG1655_TMP32XR1 | GCA_001308125.1 | 4398 |
|
| K-12 substr. MG1655_TMP32XR2 | GCA_001308165.1 | 4399 |
|
| 2012C-4227 | GCA_001420935.1 | 5142 |
|
| 2009C-3133 | GCA_001420955.1 | 5311 |
|
| YD786 | GCA_001442495.1 | 4604 |
|
| CQSW20 | GCA_001455385.1 | 4142 |
|
| uk_P46212 | GCA_001469815.1 | 5023 |
|
| ST648 | GCA_001485455.1 | 4838 |
|
| CD306 | GCA_001513615.1 | 5021 |
|
| JJ2434 | GCA_001513635.1 | 5099 |
|
| ACN002 | GCA_001515725.1 | 4618 |
|
| MRE600 | GCA_001542675.2 | 4603 |
|
| K-12 substr. MG1655 | GCA_001544635.1 | 4407 |
|
| JEONG-1266 | GCA_001558995.1 | 5358 |
|
| C2566 | GCA_001559615.1 | 4209 |
|
| C3029 | GCA_001559635.1 | 4317 |
|
| DHB4 | GCA_001559655.1 | 4522 |
|
| C3026 | GCA_001559675.1 | 4731 |
|
| JW5437–1 substr. MG1655 | GCA_001566335.1 | 4405 |
|
| SaT040 | GCA_001566615.1 | 4963 |
|
| G749 | GCA_001566635.1 | 4983 |
|
| ZH193 | GCA_001566675.1 | 5040 |
|
| ZH063 | GCA_001577325.1 | 4984 |
|
| JJ1887 | GCA_001593565.1 | 5142 |
|
| Sanji | GCA_001610755.1 | 5172 |
|
| 28RC1 | GCA_001612475.1 | 5504 |
|
| SRCC 1675 | GCA_001612495.1 | 5511 |
|
| Ecol_732 | GCA_001617565.1 | 5243 |
|
| Ecol_743 | GCA_001618325.1 | 4866 |
|
| Ecol_745 | GCA_001618345.1 | 4803 |
|
| Ecol_448 | GCA_001618365.1 | 4956 |
|
| B7A | GCA_000725265.1 | 5384 |
Fig. 1Search scheme for the genome-wide survey. A schematic representation of the search method for the GWS has been represented in this figure. Starting from 437 structure-centric and 746 sequence-centric RBP families, a library of 1183 RBP family HMMs were built. These mathematical profiles were then used to search proteomes of 19 different E. coli strains (16 pathogenic and three non-pathogenic strains). It is to be noted here that the same search scheme has been used later to extend the study to all 166 available E. coli proteomes in the RefSeq database as of May 2016 (see text for further details)
Fig. 7Uncharacterised pathogen-specific RNA-binding protein. The characterisation of the uncharacterised pathogen-specific RBP has been represented in this figure. a Schematic representation of the domain architecture of the protein. The RNA-binding PELOTA_1 domain and its model has been shown here. b Structural superposition of the L7Ae K-turn binding domain (PDB code: 4BW0: B) (in red) and the model of the uncharacterised protein PELOTA_1 domain (in blue). c. Comparison of the kink-turn RNA-bound forms of the L7Ae K-turn binding domain (PDB code: 4BW0: B) (up) and that of the model of the uncharacterised protein PELOTA_1 domain (down). The RNA-binding residues have been highlighted in yellow
Fig. 2Statistics for the genome-wide survey of 19 E. coli strains. The different statistics obtained from the GWS have been represented in this figure. In panels a and b, the pathogenic strains have been represented in red and the non-pathogenic ones in green. The non-pathogenic strains have also been highlighted with green boxes. a. The number of RBPs in each strain. The pathogenic O26:H11 strain encodes the highest number of RBPs in its proteome. b. The percentage of RBPs in the proteome of each strain. These percentages have been calculated with respect to the proteome size of the strain under consideration. The difference in this number among the pathogenic and the non-pathogenic strains are insignificant (Welch Two Sample t-test: t = 3.2384, df = 2.474, p-value = 0.06272). c. The type of Pfam domains encoded by each strain. The difference in the types of Pfam domains, as well as Pfam RBDs, encoded by the pathogenic and the non-pathogenic strains are insignificant (Welch Two Sample t-test for types of Pfam domains: t = −1.3876, df = 2.263, p-value = 0.2861; Welch Two Sample t-test for types of Pfam RBDs: t = −0.9625, df = 2.138, p-value = 0.4317). d. The abundance of Pfam RBDs. 185 types of Pfam RBDs were found to be encoded in the RBPs, of which DEAD domains have the highest representation (approximately 4% of all Pfam RBDs)
Pfam RNA-binding domains. The Pfam RBDs and their corresponding occurrences in the GWS of 19 E. coli strains have been listed in this table. The Pfam domains listed are on the basis of Pfam database (v.28)
| Pfam domain | Number of occurrences | Pfam domain | Number of occurrences |
|---|---|---|---|
| DEAD | 250 | MnmE_helical | 19 |
| S1 | 247 | RNA_pol_Rpb2_7 | 19 |
| GTP_EFTU | 168 | tRNA-synt_2e | 19 |
| GTP_EFTU_D2 | 168 | Ribosomal_L11_N | 19 |
| HOK_GEF | 160 | KilA-N | 19 |
| S4 | 152 | Ribosomal_S4 | 19 |
| PseudoU_synth_2 | 150 | RTC | 19 |
| CSD | 133 | Ribosomal_L2 | 19 |
| SpoU_methylase | 114 | DnaB_bind | 19 |
| tRNA_anti-codon | 113 | IPPT | 19 |
| RNase_T | 99 | IF3_C | 19 |
| tRNA-synt_1 | 95 | UPF0020 | 19 |
| tRNA-synt_2 | 94 | RNA_pol_A_bac | 19 |
| Ribonuc_L-PSP | 91 | RNA_pol_Rpb1_3 | 19 |
| PRD | 90 | Se-cys_synth_N | 19 |
| Anticodon_1 | 76 | RimM | 19 |
| Formyl_trans_N | 75 | Val_tRNA-synt_C | 19 |
| Aconitase | 75 | TruB_C_2 | 19 |
| RF-1 | 70 | RNA_pol_Rpb1_4 | 19 |
| tRNA-synt_1c | 58 | dsrm | 19 |
| RNase_PH | 57 | RNA_pol_Rpb2_1 | 19 |
| RNase_PH_C | 57 | Rho_N | 19 |
| Dus | 57 | CheR | 19 |
| HGTP_anticodon | 57 | SHS2_FTSA | 19 |
| tRNA-synt_2b | 57 | Helicase_RecD | 19 |
| tRNA_bind | 56 | RNA_pol_Rpb6 | 19 |
| Sua5_yciO_yrdC | 56 | RNA_pol_Rpb2_6 | 19 |
| tRNA_U5-meth_tr | 56 | SpoU_methylas_C | 19 |
| zf-FPG_IleRS | 55 | Trm112p | 19 |
| Ldr_toxin | 52 | RNA_pol_Rpb1_1 | 19 |
| RtcB | 50 | CsrA | 19 |
| CorA | 46 | Ribosomal_S20p | 19 |
| MazE_antitoxin | 44 | TruB-C_2 | 19 |
| ProQ | 41 | DALR_2 | 19 |
| DbpA | 39 | IF-2 | 19 |
| HA2 | 39 | tRNA_Me_trans | 19 |
| PolyA_pol_RNAbd | 38 | KH_1 | 19 |
| TruD | 38 | ABC1 | 19 |
| IF2_N | 38 | tRNA-synt_1_2 | 19 |
| PseudoU_synth_1 | 38 | PUA | 19 |
| Methyltr_RsmF_N | 38 | CAT_RBD | 19 |
| SpoU_sub_bind | 38 | PNPase | 19 |
| SgrR_N | 38 | tRNA_m1G_MT | 19 |
| tRNA_edit | 38 | SelB-wing_2 | 19 |
| PolyA_pol | 38 | RNase_H | 19 |
| FtsJ | 38 | PRC | 19 |
| HRDC | 38 | Ribosomal_L18p | 19 |
| tRNA-synt_1b | 38 | GIDA_assoc | 19 |
| THUMP | 38 | RrnaAD | 19 |
| DALR_1 | 38 | YjeF_N | 19 |
| NusB | 38 | LigT_PEase | 19 |
| RNase_E_G | 38 | Ub-RnfH | 19 |
| ASCH | 37 | Nol1_Nop2_Fmu_2 | 19 |
| DNA_pol_A_exo1 | 37 | RRF | 19 |
| tRNA_SAD | 36 | B5 | 18 |
| DHHA1 | 36 | FDX-ACB | 18 |
| GTP_EFTU_D3 | 35 | GidB | 18 |
| MqsA_antitoxin | 27 | Sigma70_r1_1 | 18 |
| TGT | 21 | IF3_N | 18 |
| RNA_pol_Rpb1_2 | 20 | B3_4 | 18 |
| Queuosine_synth | 20 | tRNA-synt_2c | 17 |
| RVT_1 | 20 | Colicin-DNase | 16 |
| tRNA_synt_2f | 19 | PTS_2-RNA | 16 |
| SelB-wing_3 | 19 | CRISPR_Cse1 | 14 |
| Methyltrans_RNA | 19 | CRISPR_Cse2 | 14 |
| Ribosomal_L11 | 19 | FinO_N | 13 |
| CRS1_YhbY | 19 | PIN | 13 |
| Tyr_Deacylase | 19 | GIIM | 11 |
| Ribosomal_L4 | 19 | IlvGEDA_leader | 11 |
| MutS_II | 19 | SymE_toxin | 11 |
| RNA_pol_Rpb1_5 | 19 | PRTase_1 | 10 |
| TilS | 19 | PELOTA_1 | 10 |
| Endonuclease_1 | 19 | DNA_primase_S | 10 |
| Ribosomal_L25p | 19 | IlvB_leader | 9 |
| Sigma54_CBD | 19 | YafO_toxin | 8 |
| RapA_C | 19 | MT-A70 | 8 |
| TilS_C | 19 | RPAP2_Rtr1 | 6 |
| OB_NTP_bind | 19 | TisB_toxin | 5 |
| GlutR_dimer | 19 | MqsR_toxin | 5 |
| RNA_pol_Rpb2_3 | 19 | Ibs_toxin | 5 |
| RtcR | 19 | RNA_ligase | 4 |
| TruB_N | 19 | N36 | 3 |
| RsmJ | 19 | Cloacin | 3 |
| tRNA_bind_2 | 19 | Colicin_D | 2 |
| tRNA-synt_1c_C | 19 | Viral_helicase1 | 2 |
| RNA_pol_Rpb2_45 | 19 | IF-2B | 1 |
| HisG | 19 | Colicin_immun | 1 |
| Rho_RNA_bind | 19 | RPOL_N | 1 |
| PNPase_C | 19 | RNA_pol | 1 |
| RNase_HII | 19 | RnlA_toxin | 1 |
| RNA_pol_A_CTD | 19 | NYN | 1 |
| RNA_pol_L | 19 | DUF3850 | 1 |
| Rsd_AlgQ | 19 |
Fig. 3Clusters of RNA-binding proteins. The percentage of RBPs in the different clusters has been represented in this figure. The RBPs obtained from each of the 19 E. coli strains (16 pathogenic and three non-pathogenic strains) have been clustered on the basis of homology searches (see text for further details). Five of the biggest clusters and their identities are as follows: Cluster 5 (ATP-binding subunit of transporters), Cluster 41 (Small toxic polypeptides), Cluster 15 (RNA helicases), Cluster 43 (Cold shock proteins) and Cluster 16 (Pseudouridine synthases)
Pathogen-specific RNA-binding protein clusters. The size of RBP clusters with members from only the pathogenic E. coli strains in our GWS of 19 E. coli strains have been listed in this table
| Cluster number | Number of members | Cluster name |
|---|---|---|
| Cluster 283 | 13 | KilA-N domain phage proteins |
| Cluster 299 | 13 | tRNA(fMet)-specific VapC endonucleases |
| Cluster 176 | 10 | DEAD/DEAH box helicases |
| Cluster 307 | 10 | CRISPR type I-E/−associated proteins CasA/Cse1 |
| Cluster 308 | 10 | CRISPR-associated proteins Cas6/Cse3/CasE, subtype I-E |
| Cluster 309 | 10 | CRISPR type I-E/−associated proteins CasB/Cse2 |
| Cluster 310 | 10 | CRISPR-associated proteins Cas5/CasD, subtype I-E |
| Cluster 60 | 10 | Putative ATP/GTP-binding proteins |
| Cluster 318 | 9 | ASCH domain-containing proteins |
| Cluster 122 | 8 | Adenine DNA methyltransferases, phage-associated |
| Cluster 305 | 8 | Post-segregational killing toxins |
| Cluster 116 | 7 | RNA-binding proteins |
| Cluster 298 | 6 | VagC, VagC-homologs |
| Cluster 317 | 6 | Type I restriction-modification enzymes, M subunits |
| Cluster 319 | 6 | Type I restriction-modification enzymes, R subunits |
| Cluster 125 | 5 | Helicases |
| Cluster 246 | 5 | DNA-binding proteins |
| Cluster 287 | 5 | ATP-dependent helicases |
| Cluster 288 | 5 | DEAD/DEAH box helicases |
| Cluster 290 | 5 | DEAD/DEAH box helicases |
| Cluster 63 | 5 | UvrD/REP helicase-like proteins |
| Cluster 98a | 5 | Protein kinases |
| Cluster 161 | 4 | 2′-5′ RNA ligases |
| Cluster 300 | 4 | proQ/FINO family proteins |
| Cluster 313 | 4 | ATP-dependent helicases, res subunit of Type III restriction enzyme, SNF2 family helicases |
| Cluster 329 | 4 | Serine/Threonine kinases |
| Cluster 301 | 3 | YajA proteins |
| Cluster 323 | 3 | Putative pyridoxal phosphate-dependent enzymes, Putative transferases |
| Cluster 324 | 3 | Sigma-54 dependent transcription regulators, Putative transcriptional regulators of NtrC family |
| Cluster 326 | 3 | Leucin-rich repeat proteins |
| Cluster 330 | 3 | Ankyrin repeat-containing domain proteins |
| Cluster 172 | 2 | Zn-dependent hydrolases, including glyoxylases, Beta-lactamases |
| Cluster 238 | 2 | Hypothetical proteins |
| Cluster 302 | 2 | Peptidyl-arginine deiminases |
| Cluster 303 | 2 | Exonucleases |
| Cluster 315 | 2 | KilA-N domain phage proteins |
| Cluster 320 | 2 | Pyocin, putative colicin activity proteins |
| Cluster 325 | 2 | Hyothetical proteins |
| Cluster 327 | 2 | KilA-N domain proteins |
| Cluster 332 | 2 | Hypothetical phage associated proteins |
| Cluster 334 | 2 | Hypothetical proteins |
| Cluster 335 | 2 | Chromosome partitioning ParA proteins |
| Cluster 336 | 2 | Serine/Threonine kinases |
aAll the proteins in this cluster have sequence homologues in humans
Fig. 4Statistics for the genome-wide survey of 166 E. coli strains. The different statistics obtained from the GWS have been represented in this figure. a The number of RBPs as determined by different methods (see text for further details). b The abundance of Pfam RBDs. 188 types of Pfam RBDs were found to be encoded in the RBPs, of which DEAD domains have the highest representation (approximately 6% of all Pfam RBDs). c The length distribution of RBPs
Fig. 5Modelling of the RNase PH proteins from two different E. coli strains. The structural modelling of the RNase PH protein has been represented in this figure. a Schematic diagram of the active (above) and the inactive (below) RNase PH proteins. The RNase PH and the RNase_PH_C domains, as defined by Pfam (v.28), have been represented in magenta and pink, respectively. The five residues that have undergone mutations due to a point deletion and the ten residues that are missing from the inactive RNase PH protein from strain K12 have been depicted in orange and yellow, respectively. These two sets of residues are the ones of interest in this study. b Model of the RNase PH monomer from strain O26:H11. The residues with the same colour codes as mentioned in panel (a), have been represented on the structure of the model. The residues that are within an 8 Å cut-off distance from the residues of interest have been highlighted in cyan (left). c Structure of the RNase PH hexamer from strain O26:H11 (left) and the probable structure of the inactive RNase PH hexamer from strain K12 (right). The dimers marked in black boxes are the ones that were randomly selected for MD simulations. d Electrostatic potential on the solvent accessible surface of the RNase PH hexamer from strain O26:H11 (left) and that of the inactive RNase PH hexamer from strain K12 (right)
Fig. 6Energy values for the active and inactive RNase PH monomers, dimers and hexamers. The energy values (in kJ/mol) for the active (blue) and the inactive (red) RNase PH proteins, as calculated by SYBYL (in panel a) and PPCheck (in panel b) have been plotted in this figure. a The energy values for the active and the inactive RNase PH monomers and hexamers. The results show that both the monomeric, as well as the hexameric forms of the inactive RNase PH protein, is unstable as compared to the those of the active RNase PH protein. b The interface energy values for the active and the inactive RNase PH dimers (as marked in black boxes in Fig. 5c). The results show that the dimer interface of the inactive RNase PH protein is less stabilised as compared to that of the active RNase PH protein
Fig. 8Hydrogen bonding patterns in molecular dynamics simulations. The number of H-bonds formed over each picosecond of the MD simulations (described in this Chapter) have been shown in this figure. Each of the six panels (systems) shows the H-bond traces from three replicates (represented in different colours). a Active RNase PH monomer. b Inactive RNase PH monomer. c Active RNase PH dimer. d Inactive RNase PH dimer. e PELOTA_1 domain from the ‘uncharacterised’ protein in complex with kink-turn RNA. f L7Ae K-turn binding domain from A. fulgidus in complex with kink-turn RNA from H. marismortui
Fig. 9Sequence analysis of pathogen-specific Cas6-like proteins. Comparison of sequence features of Cas6 proteins from pathogenic (Cluster 308) and non-pathogenic K12 strains. a Comparison of RNA-binding residues. The RNA-binding residues in E. coli strain K12 Cas6 protein have been highlighted in yellow on its sequence (CAS6_ECOLI) on the MSA. The corresponding residues in the other proteins on the MSA, which are same as that in CAS6_ECOLI, have also been highlighted in yellow, whereas those which differ have been highlighted in red. b Comparison of protein-interacting residues. The protein-interacting residues in E. coli strain K12 Cas6 protein have been highlighted in yellow on its sequence (CAS6_ECOLI). A similar colour scheme has also been followed here. c Secondary structure prediction. The α-helices have been highlighted in cyan and the β-strands in green