Literature DB >> 28364038

How Changes in Anti-SD Sequences Would Affect SD Sequences in Escherichia coli and Bacillus subtilis.

Akram Abolbaghaei1, Jordan R Silke2, Xuhua Xia3,2.   

Abstract

The 3' end of the small ribosomal RNAs (ssu rRNA) in bacteria is directly involved in the selection and binding of mRNA transcripts during translation initiation via well-documented interactions between a Shine-Dalgarno (SD) sequence located upstream of the initiation codon and an anti-SD (aSD) sequence at the 3' end of the ssu rRNA. Consequently, the 3' end of ssu rRNA (3'TAIL) is strongly conserved among bacterial species because a change in the region may impact the translation of many protein-coding genes. Escherichia coli and Bacillus subtilis differ in their 3' ends of ssu rRNA, being GAUCACCUCCUUA3' in E. coli and GAUCACCUCCUUUCU3' or GAUCACCUCCUUUCUA3' in B. subtilis Such differences in 3'TAIL lead to species-specific SDs (designated SDEc for E. coli and SDBs for B. subtilis) that can form strong and well-positioned SD/aSD pairing in one species but not in the other. Selection mediated by the species-specific 3'TAIL is expected to favor SDBs against SDEc in B. subtilis, but favor SDEc against SDBs in E. coli Among well-positioned SDs, SDEc is used more in E. coli than in B. subtilis, and SDBs more in B. subtilis than in E. coli Highly expressed genes and genes of high translation efficiency tend to have longer SDs than lowly expressed genes and genes with low translation efficiency in both species, but more so in B. subtilis than in E. coli Both species overuse SDs matching the bolded part of the 3'TAIL shown above. The 3'TAIL difference contributes to the host specificity of phages.
Copyright © 2017 Abolbaghaei et al.

Entities:  

Keywords:  Bacillus subtilis; Escherichia coli; Shine-Dalgarno; anti-SD-sequence; ssu rRNA; translation efficiency

Mesh:

Substances:

Year:  2017        PMID: 28364038      PMCID: PMC5427494          DOI: 10.1534/g3.117.039305

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Many studies suggest that initiation is the principle bottleneck of the translation process in bacteria (Liljenstrom and von Heijne 1987; Bulmer 1991; Xia 2007a; Xia ; Kudla ; Tuller ; Prabhakaran ). Successful initiation requires that the ribosome is able to bind to the mRNA template in such a manner that the start codon correctly lines up at the ribosomal P site (Farwell ; Komarova ; Duval ). This translation initiation process in most bacterial species is facilitated by (1) ribosomal protein S1 (RPS1) acting as an RNA chaperone that unfolds secondary structural elements that may otherwise embed the start codon and obscure the start signal (Vellanoweth and Rabinowitz 1992; Duval ; Prabhakaran ), and (2) the Shine-Dalgarno (SD) sequence located upstream of the start codon (Shine and Dalgarno 1974, 1975; Steitz and Jakes 1975; Dunn ; Taniguchi and Weissmann 1978; Eckhardt and Luhrmann 1979; Luhrmann ) that base-pairs with anti-SD (aSD) located at the free 3′ end of the small ribosomal rRNA (ssu rRNA, whose 3′ end will hereafter be referred to as 3′TAIL). A well-positioned SD/aSD pairing and reduced secondary structure in sequences flanking the start codon and SD are the hallmarks of highly expressed genes in Escherichia coli and Staphylococcus aureus, as well as their phages (Prabhakaran ). The SD/aSD pairing offers a simple and elegant solution to start codon recognition in bacteria and their phages (Hui and de Boer 1987; Vimberg ; Prabhakaran ). Because many protein-coding genes depend on aSD motifs located at 3′TAIL for translation, strong sequence conservation is observed in the 3′TAIL among diverse bacterial species (Woese 1987; Orso ; Clarridge 2004; Chakravorty ). Conversely, a change in 3′TAIL is expected to result in fundamental changes in SD usage in protein-coding genes. E. coli, as a representative of the gram-negative bacteria, and Bacillus subtilis, as a representative of gram-positive bacteria, differ in their 3′TAIL in only a minor detail, with the former ending with A and the latter with 3′UCU or 3′AUCU (Table 1). 3′UCU was suggested by early experimental studies (Murray and Rabinowitz 1982; Band and Henner 1984), and annotated in the B. subtilis genome database SubtiList (http://genolist.pasteur.fr/SubtiList/). However, 3′AUCU appears in B. subtilis genomes annotated in GenBank (e.g., NC_000964). A recent study on B. subtilis ribosomal structure (e.g., Sohmen ) also assumed a 3′AUCU tail in ssu rRNA (D. Wilson, personal communication). Existing evidence suggests heterogeneous “mature” ssu rRNA pool given that mature ssu rRNA in bacterial species results from endoribonuclease digestion from the precursor 30S rRNA followed by exonuclease nibbling (Britton ; Yao ; Kurata ). For example, 3′→5′ exoribonucleases such as RNases II, R, and PH, as well as PNPase, all participate in maturation of the 3′TAIL of ssu rRNA (Sulthana and Deutscher 2013), and endoribonuclease YbeY has also been recently shown to participate in the 3′ end maturation of ssu rRNA (Davies ; Jacob ). In E. coli, 67% of mature ssu rRNA ends with the 3′TAIL in Table 1 (Kurata ). Thus, the trailing 3′UCU and 3′ACUC may both be present in functional ssu rRNA of B. subtilis.
Table 1

ssu rRNA 3′ ends that are free to base-pair with SD motifs in E. coli and B. subtilis and their compatible motifs

Species and 3′ TAIL SequenceaSD Motifsb
E. coliUAAG
3′-AUUCCUCCACUAG-5′UAAGG
UAAGGA
UAAGGAG
UAAGGAGG
UAAGGAGGUG
B. subtilisUAGAAGAA
3′-AUCUUUCCUCCACUAG-5′UAGAAAGAAA
UAGAAAAGAAAG
UAGAAAGAGAAAGG
UAGAAAGGAGAAAGGA
UAGAAAGGAAGAAAGGAG
UAGAAAGGAGAGAAAGGAGG
UAGAAAGGAGGAGAAAGGAGGU
UAGAAAGGAGGUAGAAAGGAGGUG
AAAGGAAA
AAAGGGAAAG
AAAGGAGAAAGG
AAAGGAGGAAAGGA
AAAGGAGGGAAAGGAG
AAAGGAGGUGAAAGGAGG
AAAGGAGGUGGAAAGGAGGU
AAAGGAGGUGAGAAAGGAGGUG
AAAGGAGGUGAUGAAAGGAGGUGA

Bolded letters show the differences in the base composition between two species. (E. coli ends with A whereas B. subtilis ends with UCU or AUCU). The underlined nucleotides denote the alternative 3′-AUCU-5′ TAIL and motifs exclusively compatible with it.

The SD motifs shown are derived from differences in 3′TAIL (boldface) for both species.

Bolded letters show the differences in the base composition between two species. (E. coli ends with A whereas B. subtilis ends with UCU or AUCU). The underlined nucleotides denote the alternative 3′-AUCU-5′ TAIL and motifs exclusively compatible with it. The SD motifs shown are derived from differences in 3′TAIL (boldface) for both species. The minor difference in 3′TAIL between E. coli and B. subtilis suggests different sets of permissible SDs between the two species, i.e., some SDs that function well in one species may not function at all in the other. These species-specific SDs (Table 1) include six in E. coli (designated SDEc) and 25 in B. subtilis (designated SDBs). Such differences in permissible SDs could contribute to fundamental species differences in translation. Most E. coli mRNAs cannot be efficiently translated in B. subtilis (McLaughlin ,b), but most B. subtilis mRNAs can be efficiently translated in E. coli (Stallcup ). Many gram-negative bacteria, including E. coli, can even translate poly(U) messages (Nirenberg and Matthaei 1961; Stallcup ) but gram-positive bacteria, including B. subtilis, cannot translate poly(U) messages (Stallcup ). In retrospect, it was indeed good luck that Nirenberg and Matthaei (1961) happened to experiment with E. coli instead of B. subtilis, otherwise the landmark study would have ended up with nothing to report. It is also known that E. coli translation machinery can translate leaderless mRNAs (O’Donnell and Janssen 2002; Krishnan ; Vesper ; Giliberti ), and that its 30S ribosomal subunit can still localize the start codon even when the last 30 nucleotides of ssu rRNA is deleted (Melancon ). The difference in mRNA permissibility between gram-negative and gram-positive bacteria is often attributed to the presence of the six-domain that is highly conserved RPS1 in gram-negative bacteria (Subramanian 1983), but absent or highly variable in gram-positive bacteria with translation specificity (Roberts and Rabinowitz 1989). RPS1 facilitates translation initiation by reducing secondary structure that could otherwise embed the translation initiation region (TIR) which includes SD and start codon (Roberts and Rabinowitz 1989; Farwell ; Tzareva ). B. subtilis has a homologous gene with four domains that are not conserved among gram-positive bacteria, with Mycoplasma pulmonis and Spiroplasma kunkelli having only one domain with weak homology to any known functional RPS1 (Salah ). These findings corroborate earlier experimental evidence (McLaughlin ; Band and Henner 1984) demonstrating that B. subtilis requires a more stringent SD region for gene expression than does E. coli. However, the conventional belief that E. coli possesses a more permissible translation machinery than B. subtilis is not always true. In rare cases, some mRNAs that can be translated efficiently in B. subtilis cannot be translated well in E. coli, and one such mRNA is gene 6 of the B. subtilis phage ϕ29 (Vellanoweth and Rabinowitz 1992). In particular, such translation specificity can often be traced to the 30S ribosome and the mRNAs, rather than other components of the translation machinery, strongly suggesting SD/aSD pairing as the cause for the translation specificity. Indeed, as we show later, gene 6 of phage ϕ29 can form a well-positioned SD/aSD pair only with the 3′TAIL of B. subtilis but not with that of E. coli. Thus, proper SD/aSD pairing of mRNAs may be the key factor in specifying host specificity of phages, in determining whether a horizontally transferred gene will function in the new genetic background of the host cell, and, ultimately, in speciation and diversification of bacterial lineages. To facilitate the quantification of optimal positioning of SD/aSD base pairing, we adopted a model of SD/aSD interaction proposed recently (Prabhakaran ), illustrated with DtoStart as a better measure of optimal SD/aSD positioning than the conventional distance between SD and the start codon (Figure 1, A and B). DtoStart is constrained within a narrow range in both E. coli (Figure 1C) and B. subtilis (Figure 1D). This observation serves as a justification for excluding putative SD/aSD matchings lying outside of this range (see Materials and Methods section for details).
Figure 1

A model of SD sequence and aSD interactions. (A) The free 3′ end of SSU rRNA (3′TAIL) of E. coli and B. subtilis based on the predicted secondary structure of the 3′ end of the ssu rRNA of E. coli and B. subtilis from mfold 3.1, adapted from the comparative RNA web site and project (http://www.rna.icmb.utexas.edu). (B) A schematic representation of SD and aSD interaction illustrates DtoStart as a better measure for quantifying the optimal positioning of SD and aSD than the conventional distance from putative SD to start codon. SD1 or SD2, as illustrated, are equally good in positioning the start codon AUG against the anticodon of the initiation tRNA, but they differ in their distances to the start codon. DtoStart is the same for the two SDs. (C, D) DtoStart is constrained to a narrow range in E. coli (C) and B. subtilis (D); solid blue line denotes SD hits with the UCU-ending TAIL, and the dashed red line shows SD hits with the UCUA-ending TAIL. The y-axis in (C) and (D) represents the percentage of SD motif hits detected. See Materials and Methods section for details.

A model of SD sequence and aSD interactions. (A) The free 3′ end of SSU rRNA (3′TAIL) of E. coli and B. subtilis based on the predicted secondary structure of the 3′ end of the ssu rRNA of E. coli and B. subtilis from mfold 3.1, adapted from the comparative RNA web site and project (http://www.rna.icmb.utexas.edu). (B) A schematic representation of SD and aSD interaction illustrates DtoStart as a better measure for quantifying the optimal positioning of SD and aSD than the conventional distance from putative SD to start codon. SD1 or SD2, as illustrated, are equally good in positioning the start codon AUG against the anticodon of the initiation tRNA, but they differ in their distances to the start codon. DtoStart is the same for the two SDs. (C, D) DtoStart is constrained to a narrow range in E. coli (C) and B. subtilis (D); solid blue line denotes SD hits with the UCU-ending TAIL, and the dashed red line shows SD hits with the UCUA-ending TAIL. The y-axis in (C) and (D) represents the percentage of SD motif hits detected. See Materials and Methods section for details. The difference in 3′TAIL (Figure 1A and Table 1), and in consequent species-specific compatible motifs (Table 1), between the two bacterial species suggests that selection mediated by 3′TAIL should (1) favor SDEc in E. coli and SDBs in B. subtilis, and (2) be stronger in highly expressed genes (HEGs) than in lowly expressed genes (LEGs). Here, we report results from a comprehensive genomic analysis to test these two predictions.

Materials and Methods

Retrieval of genome sequence and protein abundance data

The annotated whole genome sequences for E. coli K12 (accession number# NC_000913.3) and B. subtilis 168 (accession # NC_000964.3) in GenBank format were downloaded from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov). Excluding 180 sequences annotated as pseudogenes in the E. coli genome from the analysis resulted in a final total of 4139 genes from E. coli and 4175 from B. subtilis. Protein abundance data were retrieved from PaxDB (Wang ) at www.pax-db.org. The integrated data sets were downloaded for both B. subtilis and E. coli in order to maximize coverage and consistency scores. We downloaded the paxdb-uniprot-links file relevant to the species (e.g., 224308-paxdb_uniprot.txt for B. subtilis), saved the Uniprot ID (the last column) to a file (e.g., BsUniprotID.txt), and browsed to http://www.uniprot.org/uploadlists (last accessed March 7, 2017) to obtain GeneID. Under “Provide your identifiers,” we uploaded the BsUniprotID.txt file, under “Selection options,” we selected the mapping from “UniProtKB AC/ID” to “Gene name” (or GeneID), and clicked “Go”. The STRING identifiers used for each gene in the protein abundance data sets were converted into Gene IDs using UniProt’s retrieve/ID mapping tool (http://www.uniprot.org/uploadlists/) for use in subsequent analyses. The resulting mapping file was generated with two columns (original input Uniprot IDs and the mapped gene name (or GIs GeneID) corresponding to gene name or other IDs in a GenBank file. Unmapped ID is stored in a separate file, also available for downloading.

HEGs and LEGs

Genes were delimited as HEGs or LEGs on the basis of two metrics: steady state protein abundance levels taken from PaxDB, and ITE (Index of translation elongation) scores computed with DAMBE (Xia 2013) using the default reference files for E. coli and B. subtilis, which were included in the DAMBE distribution. ITE is advantageous over codon adaptation index (CAI Sharp and Li 1987) or its improved form (Xia 2007b) in that it takes background mutation bias into consideration (Xia 2015). DAMBE’s ITE function has four settings that differ in their treatment of synonymous codon families, and we selected the option breaking sixfold degenerate codon families into four and twofold families. For E. coli and B. subtilis, the top and bottom 10% of genes for both of these metrics were designated as HEGs and LEGs, respectively.

Genes of high translation efficiency (HTE) and low translation efficiency (LTE)

HEGs and LEGs defined as above may not be the same as HTE genes and LTE genes. HTE and LTE genes may be characterized by regressing protein abundance on mRNA abundance, so that, given genes with the same mRNA level, those producing many proteins are translated more efficiently than those producing few. The former would be HTE genes, and the latter LTE genes. This requires proteomic and transcriptomic studies carried out with similar bacterial strains, and under similar culture and growth conditions. For E. coli, we have used proteomic data from Lu deposited at PaxDB (Wang ), and transcriptomic data in RPKM (reads per kilobase per million matched reads) from the wild-type strain of E. coli (BioProject PRJNA257498, Pobre and Arraiano 2015). For B. subtilis, the proteomic data are from Chi deposited in PaxDB and transcriptomic raw counts for three wild-type replicates were downloaded from BioProject PRJNA319983 (GSM2137056 to SM2137058), and then normalized to RPKM. These two transcriptomic studies ignored reads that match to multiple paralogous genes. We have reanalyzed the data with the software ARSDA for analyzing RNA-Seq data (Xia 2017), but the results are nearly identical, partly because there are relatively few paralogous genes in the two bacterial species.

Identification of anti-SD and SD sequences

The 3′TAILs for B. subtilis and E. coli used in this paper were based on early empirical evidence (Shine and Dalgarno 1974; Brosius ; Gold ; Luhrmann ; Murray and Rabinowitz 1982; Band and Henner 1984; Tu ), as well as a series of chemical modification and nuclease digestion experiments that aimed to identify the sequence and secondary structure of bacterial ssu rRNAs using E. coli and Bacillus brevis (Woese ). The experimentally derived 3′TAILs for both species are compatible with their corresponding ssu rRNA secondary structure schematics from the Comparative RNA Web Site & Project at www.rna.icmb.utexas.edu, which is curated by the Gutell Lab at the University of Texas at Austin. The schematics include base pairing interactions that are predicted based on the minimum free energy (MFE) state of the structure that in turn were predicted using mfold version 3.1 (http://unafold.rna.albany.edu/?q=mfold; Zuker 2003), with the resulting free 3′ ends shown in Figure 1A. The sequence of the 3′TAIL used in our analysis for E. coli is 3′-AUUCCUCCACUAG-5′ (Shine and Dalgarno 1974; Brosius ; Gold ; Luhrmann ; Band and Henner 1984; Tu ), because, based on the E. coli SSU rRNA secondary structure (Woese ; Noah ; Yassin ; Kitahara ; Prabhakaran ), these are the 13 nt at the 3′ end of the ssu rRNA that are free to base pair with the SD sequence. There are two versions of 3′TAIL for B. subtilis: 3′-UCUUUCCUCCACUAG (Murray and Rabinowitz 1982; Band and Henner 1984), and 3′-AUCUUUCCUCCACUAG in the genomic annotation. We discussed the possibility of heterogeneous “mature” ssu rRNA pool in the Introduction.

Identification of putative SD sequences

We followed the method of Prabhakaran to identify valid SD sequences, as illustrated in Figure 1. For each gene in each species, we extracted the 30 nt upstream of the star codon and searched matches against the 3′TAIL of the two species by using the “Analyzing 5′UTR” function in DAMBE (Xia 2013). An SD with at least four consecutive nucleotide matches, and positioned with DtoStart in the range of 10–22 nt, was considered as a good SD for the E. coli translation machinery. For B. subtilis, a DtoStart range of 12–23 nt was used for the 3′UCU TAIL, or 13–24 nt for the 3′AUCU TAIL. As shown in Figure 1D, the DtoStart values for the 3′-AUCU-5′ TAIL in B. subtilis are shifted by 1 nt because this measure depends on 3′TAIL length. For this reason, taking 13–24 nt as the optimal range for the 16 nt 3′TAIL is equivalent to using 12–23 nt for the 15 nt 3′TAIL.

Data availability

All data used to generate the results are available upon request. Software DAMBE for characterizing SD sequences and computing the index of translation elongation (ITE), and software ARSDA for characterizing gene expression is available free at http://dambe.bio.uottawa.ca/Include/software.aspx.

Results and Discussion

E. coli has 4323 protein-coding genes (CDSs), with 180 annotated as pseudogenes in the genome and excluded from the analysis, resulting in 4144 functional CDSs. B. subtilis has 4175 CDSs with none annotated as pseudogenes. The genomic nucleotide frequencies are 0.2462, 0.2542, 0.2537, and 0.2459, respectively for A, C, G, and T in E. coli. The corresponding values in B. subtilis are 0.2818, 0.2181, 0.2171, and 0.2830, respectively.

SDEc and SDBs are used more in E. coli and B. subtilis, respectively

As expected, SDEc are much more frequent in E. coli than in B. subtilis, with 455 in E. coli, in contrast to 267 in B. subtilis (Table 2). The difference is highly significant, either against the null hypothesis of equal frequencies (χ2 = 48.9529, P < 0.0001), against the expected value based on the relative number of CDSs (χ2 = 50.3648, P < 0.0001; a slightly increased χ2 is because E. coli has slightly fewer included CDSs than B. subtilis), or against the expected values based on both relative number of CDSs and genomic nucleotide frequencies (e.g., AGAA is proportional to PA3PG, AGAAA to PA4PG, and so on, where PX is the genomic frequency of nucleotide X in either E. coli or B. subtilis), with χ2 = 103.07, P < 0.0001.
Table 2

Number of SDEc hits (N) and their proportion (Prop) in E. coli and B. subtilis genes

SDEc motifsOccurrence in E. coliOccurrence in B. subtilis
NPropNProp
UAAG850.0205150.0036
UAAGG910.0220540.0129
UAAGGA1510.0365300.0072
UAAGGAG1170.0283740.0177
UAAGGAGG100.0024740.0177
UAAGGAGGU00140.0033
UAAGGAGGUG10.000260.0014
Total4550.10992670.0640

SDEc, SDs that pair perfectly with the 3′ end of small subunit rRNA from E. coli, but not from B. subtilis.

SDEc, SDs that pair perfectly with the 3′ end of small subunit rRNA from E. coli, but not from B. subtilis. The relative abundance of different SDs depends on selection favoring an optimal SD length, and mutations disrupting long SDs. In E. coli, the optimal SD length is six (Vimberg ). B. subtilis favors longer SDs. In an experiment with B. subtilis with SD lengths of 5, 6, 7, and 12, longer SDs consistently produce more proteins than shorter ones (Band and Henner 1984). This is consistent with the results presented in Table 2, where UAAG is expected to be strongly selected against in B. subtilis because it can form only 3 bp against B. subtilis 3′TAIL. However, the longer SDEc is not selected against because an SDEc such as UAAGGAGG can form 7 bp (except for the first U) against B. subtilis 3′TAIL. Also as expected, SDBs are also more frequent in B. subtilis than in E. coli, with 1203 SDBs in B. subtilis in contrast to 576 in E. coli (Table 3). The difference is also highly significant (P < 0.0001) using the same tests for SDEc results in Table 2. However, one interesting deviation from the SDEc data is that SDBs of length 4 exhibit the opposite pattern, being more frequent in E. coli than in B. subtilis (Table 3), which assumes a 3′UCU-ending in B. subtilis 3′TAIL. The pattern is the same with 3′AUCU-ending of the 3′TAIL (Table S1). This observation can be explained by stronger selection against short SD/aSD in B. subtilis than in E. coli. Translation efficiency increases with longer and more stringent SD/aSD binding in B. subtilis, and such dependence is much stronger in B. subtilis than in E. coli (Band and Henner 1984). The predicted free energy of SD/aSD for an average B. subtilis message is at least 6 kcal/mol more than that of an average SD/aSD in E. coli (Hager and Rabinowitz 1985). Thus, a short SD is expected to be selected against, and, consequently, rare in B. subtilis, consistent with our results (Table 3), showing that longer SDBs (5–8 nt) are more frequent in B. subtilis than in E. coli.
Table 3

Number of SDBs hits (N) and their proportion (Prop) in all Bacillus subtilis and Escherichia coli genes considering UCU as the 3′TAIL

SDBs motifsOccurrence in B. subtilisOccurrence in E.coli
NPropNProp
AGAA120.0029510.0123
AGAAA660.0158600.0145
AGAAAG600.0144140.0034
AGAAAGG540.012970.0017
AGAAAGGA600.014460.0014
AGAAAGGAG280.006740.0010
AGAAAGGAGG110.002610.0002
AGAAAGGAGGU10.000200
Subtotal2920.06991430.0345
GAAA160.0038650.0157
GAAAG410.0098280.0068
GAAAGG680.0163180.0043
GAAAGGA510.0122150.0036
GAAAGGAG570.0137100.0024
GAAAGGAGG180.004310.0002
GAAAGGAGGU30.000700
GAAAGGAGGUG10.000200
GAAAGGAGGUGA10.000200
Subtotal2400.05751370.0331
AAAG190.0046380.0092
AAAGG1710.0410830.0200
AAAGGA760.01821010.0244
AAAGGAG2220.0532640.0155
AAAGGAGG1430.034360.0014
AAAGGAGGU310.007430.0007
AAAGGAGGUG60.001400
AAAGGAGGUGA30.000710.0002
Subtotal6710.16072960.0715
Total12030.28815760.1391

Highly expressed genes tend to have longer SDs

In addition to the observed difference in SD length between E. coli and B. subtilis (Figure 2 and Table 3; B. subtilis SDs tend to be longer than E. coli SDs), there is also clear difference between HEGs and LEGs, or between genes of HTE and of LTE. Although SDs of length four are the most frequent in E. coli, longer SDs are relatively more represented in HTE genes than in LTE genes (Figure 2A). This is consistent with previous experimental studies demonstrating an optimal SD length of six (Schurr ; Komarova ; Vimberg ). Optimal SDs in B. subtilis are even longer (Band and Henner 1984) than in E. coli (Figure 2). We thus expect HEGs or HTE genes to have relatively longer SDs than LEGs or LTE genes, especially in B. subtilis. Our empirical results (Figure 2) strongly support this expectation. Short SDs are overrepresented in LEGs and LTE genes, and longer SDs overrepresented in HEGs and HTE genes in both E. coli and B. subtilis, but more so in B. subtilis (Figure 2). This pattern (i.e., association of long SDs with HEGs and HTE genes) is highly significant for B. subtilis (chi-square = 12.0375, d.f. = 1, P-value = 0.0005214) when tested by the Cochran-Armitage test (Agresti 2002, pp. 181–182) for contingency tables with a linear trend as implemented in the coin package in R (Hothorn , 2008). The result for E. coli, while consistent with the expectation, is not significant at the 0.05 level (chi-square = 3.3948, d.f. = 1, P-value = 0.0654).
Figure 2

Distribution of SDs from 200 HTE genes and 200 LTE genes over SD length for E. coli (A) and B. subtilis (B). Classifying genes into HEGs and LEGs generates equivalent results, with HEGs similar to HTE genes, and LEGs similar to LTE genes. HEGs and HTE genes tend to have longer SDs than LEGs and LTE genes.

Distribution of SDs from 200 HTE genes and 200 LTE genes over SD length for E. coli (A) and B. subtilis (B). Classifying genes into HEGs and LEGs generates equivalent results, with HEGs similar to HTE genes, and LEGs similar to LTE genes. HEGs and HTE genes tend to have longer SDs than LEGs and LTE genes.

Differential usage of SDEc and SDBs in HEGs and LEGs

SDEc is used more frequently in HEGs than LEGs in E. coli (Table 4). In contrast, SDBs is used mainly in LEGs in B. subtilis (Table 5), prompting the question of what SDs are used by B. subtilis HEGs, and whether the core aSD region (where most HEGs have SD to pair against) for B. subtilis HEGs include the trailing 3′UCU (or 3′AUCU). The pattern is similar when contrasting between HTE genes and LTE genes (results not shown). The core aSD region is centered at CCUCC in the overwhelming majority of surveyed prokaryotes (Ma ; Nakagawa ; Lim ). If B. subtilis has the same core aSD region, then the trailing 3′UCU (or 3′AUCU) will be used rarely, consequently with few SDBs pairing to it. The distribution of SDs in E. coli and B. subtilis is consistent with this interpretation (Figure 3). SDs overrepresented in HEGs relative to LEGs use exclusively 3′AUUCCUCCA as the core aSD region in E. coli, and 3′UUCCUCCA as the core aSD region in B. subtilis (Figure 3). The trailing 3′UCU (or 3′AUCU) is used as part of aSD mainly by LEGs in B. subtilis.
Table 4

Number of SDEc hits (N) and their proportion (Prop) in HEGs and LEGs

SDEc motifsOccurrence in E. coliOccurrence in B. subtilis
HEGsLEGsHEGsLEGs
NPropNPropNPropNProp
UAAG220.005370.001710.000230.0007
UAAGG320.007760.001440.001030.0007
UAAGGA360.0087200.004830.000700
UAAGGAG400.0097120.002990.0022100.0024
UAAGGAGG20.000510.0002140.003420.0005
UAAGGAGGU00000010.0002
UAAGGAGGUG000040.001000
Total1320.0319460.0111350.0084190.0046
Table 5

Number of SDBs hits (N) and their proportion (Prop) in highly and lowly expressed genes

SDBs motifsOccurrence in B. subtilisOccurrence in E. coli
HEGsLEGsHEGsLEGs
NProp.NProp.NProp.NProp.
AGAA0020.000530.000730.0007
AGAAA20.000580.001970.001790.0022
AGAAAG60.001440.001010.000210.0002
AGAAAGG30.000760.001410.000200
AGAAAGGA40.001020.000520.000500
AGAAAGGAG20.000530.000710.000200
AGAAAGGAGG10.000220.00050000
AGAAAGGAGGU00000000
Subtotal180.0043270.0065150.0036130.0031
GAAA0020.000550.0012100.0024
GAAAG20.000570.001730.000710.0002
GAAAGG30.0007110.00260000
GAAAGGA40.001050.001250.001200
GAAAGGAG20.000560.001410.000210.0002
GAAAGGAGG20.000520.00050000
GAAAGGAGGU00000000
GAAAGGAGGUG00000000
GAAAGGAGGUGA00000000
Subtotal130.0031330.0074140.0034120.0029
AAAG10.000240.001020.000520.0005
AAAGG80.0019200.004870.0017120.0029
AAAGGA50.0012100.0024100.002490.0022
AAAGGAG170.0041260.006270.001770.0017
AAAGGAGG140.0033210.005010.000200
AAAGGAGGU20.000510.000210.000200
AAAGGAGGUG10.0002000000
AAAGGAGGUGA00000010.0002
Subtotal480.0115820.0196280.0068310.0075
Total790.01891420.0335570.0138560.0135
Figure 3

Distribution of E. coli and B. subtilis SDs for HEGs and LEGs. SDs that are more frequent in HEGs than LEGs match the core aSD (in bold red) of 16S rRNA. The trailing 3′ nucleotides in B. subtilis are used mainly for SD/aSD pairing in LEGs. Classifying genes into genes of HTE and LTE generates similar results.

Distribution of E. coli and B. subtilis SDs for HEGs and LEGs. SDs that are more frequent in HEGs than LEGs match the core aSD (in bold red) of 16S rRNA. The trailing 3′ nucleotides in B. subtilis are used mainly for SD/aSD pairing in LEGs. Classifying genes into genes of HTE and LTE generates similar results. The mature ssu rRNA pool may be heterogeneous in B. subtilis. A number of 3′→5′ exoribonucleases, such as RNases II, R, and PH, as well as PNPase, participate in maturation of the 3′TAIL of ssu rRNA (Sulthana and Deutscher 2013), and nuclease YbeY has also been shown recently to participate in the 3′ end maturation of ssu rRNA (Davies ; Jacob ). The continuous 3′→5′ digestion implies that the 3′AUCU end will become 3′UCU, 3′CU, and so on. It would make sense for HEGs to use SDs paired with the less volatile part of the 3′TAIL of ssu rRNA (Table 5). Figure 3, Table 4, and Table 5 suggest that many HEGs in E. coli use the species-specific SDEc and will experience translation initiation problems when translated by the B. subtilis translation machinery. In contrast, most HEGs in B. subtilis do not use the species-specific SDBs, and will have no translation initiation problems when translated by the E. coli translation machinery. Early studies have suggested a more permissible translation machinery in E. coli than in B. subtilis, i.e., most E. coli mRNAs cannot be efficiently translated in B. subtilis (McLaughlin ,b) but most B. subtilis mRNAs can be efficiently translated in E. coli (Stallcup ). The discrepancy in this translation permissibility is often attributed to the presence of the six-domain highly conserved RPS1 in gram-negative bacteria (Subramanian 1983) but absent in gram-positive bacteria with translation specificity (Roberts and Rabinowitz 1989). Our results (Figure 3, Table 4, and Table 5) suggest an alternative explanation for the discrepancy. Because these early studies often involve HEGs, and because E. coli HEGs often use species-specific SDEc (Table 4) whereas B. subtilis HEGs rarely use species-specific SDBs, it is not surprising that E. coli HEG messages tend to fail in translation initiation in B. subtilis, but B. subtilis HEG messages tend to have no problem in translation initiation in E. coli.

Species-specific SD and host specificity

One rare exception to the general observation that E. coli possesses a more permissible translation machinery than B. subtilis is gene 6 (gp6) of the B. subtilis phage ϕ29, which can be translated efficiently in B. subtilis but not in E. coli (Vellanoweth and Rabinowitz 1992). Among the 16 nonhypothetical genes in phage ϕ29, gp6 is the only one that uses a species-specific SDBs (UAGAAAG) exclusively (Table 6). This SD used all four nucleotides at 3′TAIL of B. subtilis, and consequently cannot form SD/aSD in E. coli (Table 6). Other genes, such as gp7 and gp8, have two alternative SDs, with one being the species-specific SDBs, but they have another SD that can form SD/aSD binding in E. coli (Table 6). Because gp6 is an essential gene, its use of a SDBs may explain its host-specificity. That is, even if it gains entry into an E. coli-like host, it will not be able to survive and reproduce successfully.
Table 6

SD/aSD binding of nonhypothetical genes in B. subtilis phage φ29 in E. coli and B. subtilis

GeneE. coliB. subtilis
DtoStartaSDDtoStartbSD
gp214AAGGA17AAAGGA
gp317AAGGAG20GAAAGGAG
gp418AGGAGGU21AGGAGGU
gp515AAGGA18AAAGGA
gp619UAGAAAG
gp716GAGGUGA18,19UAGAAAG,GAGGUGA
gp818GAGGU21,21AGAAA,GAGGU
gp8.520GGAGGUG23GGAGGUG
gp916,19UAAGG,AGGUG22AGGUG
gp1015GAGGUGA18GAGGUGA
gp1116GGUGA19GGUGA
gp1215UAAGGAGG18AAGGAGG
gp1317GAGGU20GAGGU
gp1417AAGGAG20AAAGGAG
gp1517UAAGGAGG20AAGGAGG
gp1616GAGGUG19GAGGUG

Gene gp6, which uses a species-specific SDBs, cannot form a well-positioned SD/aSD in E. coli to be translated efficiently.

The optimal DtoStart is within the range of 10–21 in E. coli.

3′AUCUUUCCUCCACUAG is used as 3′TAIL for B. subtilis, with the optimal DtoStart within the range of 15–25.

Gene gp6, which uses a species-specific SDBs, cannot form a well-positioned SD/aSD in E. coli to be translated efficiently. The optimal DtoStart is within the range of 10–21 in E. coli. 3′AUCUUUCCUCCACUAG is used as 3′TAIL for B. subtilis, with the optimal DtoStart within the range of 15–25. Another case of host-specificity that may be explained by SD/aSD binding is E. coli phage PRD1, which has codon usage deviating greatly from that of its host, in contrast to the overwhelming majority of E. coli phages, whose codon usage exhibits high concordance with that of the host (Chithambaram ). Phage PRD1 belongs to the peculiar Tectiviridae family whose other members, i.e., phages PR3, PR4, PR5, L17, and PR772, parasitize gram-positive bacteria. Phage PRD1 is the only species in the family known to parasitize a variety of gram-negative bacteria, including Salmonella, Pseudomonas, Escherichia, Proteus, Vibrio, Acinetobacter, and Serratia species (Bamford ; Grahn ). Phage PRD1 is extremely similar to its sister lineages, parasitizing gram-positive bacteria; there is only one amino acid difference in the coat protein between PRDl and PR4 (Bamford ). It is thus quite likely that the ancestor of phage PRD1 parasitizes gram-positive bacteria. The lineage leading to Phage PRD1 may have switched to gram-negative bacterial hosts only recently, and thus still has codon usage similar to its ancestral gram-positive bacterial host, which is indeed the case (Chithambaram ). However, one nonhypothetical gene in phage PRD1 (PRD1_09) has evolved an E. coli-specific SD (UAAG), and does not have alternative SD that can form a well-positioned SD/aSD with B. subtilis 3′TAIL. This may have contributed to the host limitation of phage PRD1 within E. coli-like species. The study of coevolution between SD and aSD sequences would be facilitated if 3′TAILs of many bacterial species were characterized experimentally, and if these 3′TAILs differ substantially from each other in different lineages. At present, strong experimental evidence is available for 3′TAIL of E. coli and B. subtilis (except for the uncertainty on whether the 3′TAIL ends with 3′UCU or 3′AUCU). However, RNA-Seq data may become available for many bacterial species in the near future, and should pave the way for rapid characterization of 3′TAIL of different species by simply mapping the sequence reads to ssu rRNA genes on the genome. One problem to be aware of is that most transcriptomic studies will use an rRNA removal kit to remove the large rRNAs, i.e., 16S and 23S rRNA, in bacteria, because otherwise sequence reads from these large rRNAs will dominate the RNA-seq data. There are two main types of rRNA Remove Kits in the markets: (1) RiboMinus Kit from Invitrogen or MICROBExpress Bacterial mRNA Enrichment Kit (formerly Ambion, now Invitrogen), which have two probes located within the conserved sequence region at each ends of 16S and 23S rRNAs. Full-length rRNA or partial rRNA that pairs with these probes are removed. This implies that such RNA-seq data will lack reads mapped to the 5′ or 3′ ends of ssu rRNAs. The other type of rRNA removal kit is represented by the Ribo-Zero Kit from Epicentre (an Illumina company). This kit removes rRNA across the entire length and does not specifically targets the 5′ and 3′ ends. We used ARSDA (Xia 2017) to confirm that transcriptomic studies using this RNA removal kit have reads that map to the 3′ end of ssu rRNA.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.039305/-/DC1. Click here for additional data file.
  69 in total

1.  Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures.

Authors:  Jiong Ma; Allan Campbell; Samuel Karlin
Journal:  J Bacteriol       Date:  2002-10       Impact factor: 3.490

2.  A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria.

Authors:  Soumitesh Chakravorty; Danica Helb; Michele Burday; Nancy Connell; David Alland
Journal:  J Microbiol Methods       Date:  2007-02-22       Impact factor: 2.363

3.  Novel essential gene Involved in 16S rRNA processing in Escherichia coli.

Authors:  Tatsuaki Kurata; Shinobu Nakanishi; Masayuki Hashimoto; Masato Taoka; Yukiko Yamazaki; Toshiaki Isobe; Jun-Ichi Kato
Journal:  J Mol Biol       Date:  2014-12-26       Impact factor: 5.469

4.  Complete nucleotide sequence of a 16S ribosomal RNA gene from Escherichia coli.

Authors:  J Brosius; M L Palmer; P J Kennedy; H F Noller
Journal:  Proc Natl Acad Sci U S A       Date:  1978-10       Impact factor: 11.205

5.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1987-02-11       Impact factor: 16.971

Review 6.  Translational initiation in prokaryotes.

Authors:  L Gold; D Pribnow; T Schneider; S Shinedling; B S Singer; G Stormo
Journal:  Annu Rev Microbiol       Date:  1981       Impact factor: 15.500

7.  Role of Escherichia coli YbeY, a highly conserved protein, in rRNA processing.

Authors:  Bryan W Davies; Caroline Köhrer; Asha I Jacob; Lyle A Simmons; Jianyu Zhu; Lourdes M Aleman; Uttam L Rajbhandary; Graham C Walker
Journal:  Mol Microbiol       Date:  2010-09-16       Impact factor: 3.501

8.  The effect of Escherichia coli ribosomal protein S1 on the translational specificity of bacterial ribosomes.

Authors:  M W Roberts; J C Rabinowitz
Journal:  J Biol Chem       Date:  1989-02-05       Impact factor: 5.157

9.  Structure of the Bacillus subtilis 70S ribosome reveals the basis for species-specific stalling.

Authors:  Daniel Sohmen; Shinobu Chiba; Naomi Shimokawa-Chiba; C Axel Innis; Otto Berninghausen; Roland Beckmann; Koreaki Ito; Daniel N Wilson
Journal:  Nat Commun       Date:  2015-04-23       Impact factor: 14.919

10.  The +4G site in Kozak consensus is not related to the efficiency of translation initiation.

Authors:  Xuhua Xia
Journal:  PLoS One       Date:  2007-02-07       Impact factor: 3.240

View more
  6 in total

1.  ComEB protein is dispensable for the transformation but must be translated for the optimal synthesis of comEC.

Authors:  Micaela De Santis; Jeanette Hahn; David Dubnau
Journal:  Mol Microbiol       Date:  2021-02-08       Impact factor: 3.979

2.  Re-annotation of 12,495 prokaryotic 16S rRNA 3' ends and analysis of Shine-Dalgarno and anti-Shine-Dalgarno sequences.

Authors:  Mohammad Ruhul Amin; Alisa Yurovsky; Yuping Chen; Steve Skiena; Bruce Futcher
Journal:  PLoS One       Date:  2018-08-23       Impact factor: 3.240

Review 3.  Control of Translation at the Initiation Phase During Glucose Starvation in Yeast.

Authors:  Yoshika Janapala; Thomas Preiss; Nikolay E Shirokikh
Journal:  Int J Mol Sci       Date:  2019-08-19       Impact factor: 5.923

4.  Elucidating the 16S rRNA 3' boundaries and defining optimal SD/aSD pairing in Escherichia coli and Bacillus subtilis using RNA-Seq data.

Authors:  Yulong Wei; Jordan R Silke; Xuhua Xia
Journal:  Sci Rep       Date:  2017-12-15       Impact factor: 4.379

5.  RNA-Seq-Based Analysis Reveals Heterogeneity in Mature 16S rRNA 3' Termini and Extended Anti-Shine-Dalgarno Motifs in Bacterial Species.

Authors:  Jordan R Silke; Yulong Wei; Xuhua Xia
Journal:  G3 (Bethesda)       Date:  2018-12-10       Impact factor: 3.154

6.  Unique Shine-Dalgarno Sequences in Cyanobacteria and Chloroplasts Reveal Evolutionary Differences in Their Translation Initiation.

Authors:  Yulong Wei; Xuhua Xia
Journal:  Genome Biol Evol       Date:  2019-11-01       Impact factor: 3.416

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.