| Literature DB >> 16254081 |
Uri Laserson1, Hin Hark Gan, Tamar Schlick.
Abstract
Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103-104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16254081 PMCID: PMC1270951 DOI: 10.1093/nar/gki911
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1The hydrogen-bonding scheme of the streptomycin-binding aptamer from crystal structure [adapted from (32)]. The dotted lines represent hydrogen bonds between the streptomycin molecule (center) and the nucleotides in the binding pocket. Information about binding specificity is critical for creating accurate aptamer motif descriptors.
Figure 2The four experimental aptamers (left side of each set), constructed motif descriptors (middle) and candidate natural aptamers (right). Regions with many N's are variable in length. The nucleic acid base symbols are defined as follows: N = any base; V = A, C, or G; S = C or G. We search the genomes for any sequence that fits the descriptor consensus (allowing for slight mismatches). The experimental papers associated with the aptamers are (a) ATP (38), (b) chloramphenicol (41), (c) neomycin B (42,43) and (d) streptomycin (37).
The number of matches to the antibiotic-binding aptamer descriptors in bacterial genomes (‘Obs.’) and the expected number of matches calculated using random sequences with uniformly distributed nucleotides (‘Exp.’)
| Genome (NCBI accession number) | Size (Mb) | Chloramphenicol | Streptomycin | Neomycin B | Total Obs. | |||
|---|---|---|---|---|---|---|---|---|
| Obs. | Exp. | Obs. | Exp. | Obs. | Exp. | |||
| 9.2 | 0 | 3.1 | 2 | 0.7 | 6 | 19.8 | 8 | |
| 8.8 | 0 | 2.9 | 1 | 0.7 | 1 | 18.9 | 2 | |
| 4.7 | 7 | 1.6 | 1 | 0.4 | 11 | 10.1 | 19 | |
| 5.6 | 7 | 1.9 | 1 | 0.4 | 17 | 12.0 | 25 | |
| 5.6 | 7 | 1.9 | 1 | 0.4 | 19 | 12.0 | 27 | |
| 5.3 | 7 | 1.8 | 0 | 0.4 | 17 | 11.4 | 24 | |
| 2.3 | 1 | 0.8 | 0 | 0.2 | 9 | 4.9 | 10 | |
| 2.2 | 5 | 0.7 | 0 | 0.2 | 9 | 4.7 | 14 | |
| 3.7 | 1 | 1.2 | 1 | 0.3 | 3 | 7.9 | 5 | |
| 1.1 | 4 | 0.4 | 0 | 0.1 | 1 | 2.4 | 5 | |
| Total | 39 | 7 | 93 | 139 | ||||
The motif matches are generated by the RNAMotif program without any filtering.
The number of matches to the antibiotic-binding aptamer descriptors in archaeal genomes (‘Obs.’) along with the expected number of matches calculated using random sequences with uniformly distributed nucleotides (‘Exp.’)
| Genome (NCBI accession number) | Size (Mb) | ATP | |
|---|---|---|---|
| Obs. | Exp. | ||
| 1.7 | 6 | 1.6 | |
| 2.2 | 6 | 2.1 | |
| 2.6 | 4 | 2.5 | |
| 1.8 | 6 | 1.7 | |
| 1.7 | 4 | 1.6 | |
| 1.7 | 3 | 1.6 | |
| 5.8 | 12 | 5.5 | |
| 4.1 | 5 | 3.9 | |
| 2.3 | 2 | 2.2 | |
| 1.8 | 3 | 1.7 | |
| 1.9 | 8 | 1.8 | |
| 1.8 | 3 | 1.7 | |
| 3.0 | 2 | 2.8 | |
| 2.7 | 4 | 2.5 | |
| 1.6 | 2 | 1.5 | |
| 1.6 | 2 | 1.5 | |
| Total | 72 | ||
No additional filtering is applied to the motif matches found by the RNAMotif program.
The candidate natural aptamer sequences that fold to the target structures
| Sequence | Genome | Location | Gene | V | |||||
|---|---|---|---|---|---|---|---|---|---|
| Chloramphenicol | |||||||||
| 1 | E.coli K12 | 1 172 203 | Mfd | −11.90 | 78.0 | − | 1.09 | − | |
| 2 | E.coli O157:H7 | 1 532 105 | ECs1492 | ||||||
| 3 | E.coli O157:H7 EDL933 | 1 617 201 | Mfd | −10.58 | 72.6 | − | 2.40 | − | |
| 4 | N.meningitidis Z2491 | 2 179 300 | Non-coding | −9.17 | 54.4 | − | 5.28 | − | |
| 5 | S.meliloti | 493 471 | Non-coding | −14.44 | 48.8 | − | 6.18 | − | |
| Streptomycin | |||||||||
| 6 | S.avermitilis | 7 323 209 | Non-coding | −28.64 | 111.4 | − | 0.84 | − | |
| ATP | |||||||||
| | pNRC200 92 229 | Non-coding | |||||||
| | pNRC100 92 229 | Non-coding | −19.43 | 71.6 | + | 4.54 | + | ||
| 9 | M.acetivorans C2A | 1 295 025 | MA1090 | −16.04 | 71.2 | − | 6.40 | − | |
| 10 | M.mazei Goe1 | 3 868 884 | vorA | −13.71 | 69.4 | − | 8.46 | − | |
| 11 | P.furiosus; DSM 3638 | 168 991 | PF0158 | −7.48 | 63.8 | − | 9.49 | − | |
| Neomycin B | |||||||||
| 12 | E.coli K12 | 3 430 475 | smf_1 | −5.37 | 67.2 | − | 8.23 | − | |
| 13 | E.coli K12 | 1 832 073 | Non-coding | ||||||
| 14 | E.coli O157:H7 EDL933 | 2 508 764 | Non-coding | −4.12 | 59.6 | − | 3.25 | − | |
| 15 | E.coli CFT073 | 1 987 894 | Non-coding | ||||||
| 16 | E.coli K12 | 3 973 740 | wecE | −3.54 | 60.0 | − | 4.27 | − | |
| | E.coli K12 | 1 324 039 | yciQ | −6.10 | 57.8 | + | 2.24 | − | |
| | E.coli O157:H7 | 1 826 735 | ECs1840 | ||||||
| | E.coli O157:H7 EDL933 | 2 260 441 | Z2542 | ||||||
| | E.coli CFT073 | 1 569 619 | yciQ | ||||||
| 21 | E.coli O157:H7 | 1 373 724 | ECs1300 | ||||||
| 22 | E.coli O157:H7 EDL933 | 1 457 496 | Z1560 | −5.55 | 61.4 | − | 1.41 | − | |
| 23 | E.coli O157:H7 EDL933 | 1 061 889 | Z1121 | ||||||
| 24 | E.coli CFT073 | 1 130 797 | c1166 | ||||||
| 25 | E.coli O157:H7 | 1 037 740 | ECs0954 | ||||||
| 26 | E.coli O157:H7 EDL933 | 1 039 400 | Z1102 | −3.65 | 60.8 | − | 1.33 | − | |
| 27 | E.coli CFT073 | 960 757 | c1001 | ||||||
| 28 | E.coli O157:H7 | 3 148 310 | Non-coding | ||||||
| 29 | E.coli O157:H7 EDL933 | 3 218 265 | Non-coding | −4.98 | 57.4 | − | 0.42 | − | |
| 30 | E.coli CFT073 | 2 704 642 | Non-coding | ||||||
| | E.coli O157:H7 EDL933 | 5 503 670 | Non-coding | −7.61 | 81.0 | + | 1.12 | + | |
| | E.coli CFT073 | 3 864 206 | Smf | −10.79 | 83.4 | + | 1.44 | + | |
| 33 | N.meningitidis MC58 | 1 221 920 | Non-coding | −7.44 | 65.2 | − | 8.29 | − | |
| 34 | N.meningitidis Z2491 | 1 283 084 | Non-coding | ||||||
| 35 | N.meningitidis MC58 | 1 468 213 | Non-coding | −2.55 | 51.6 | − | 1.22 | − | |
| 36 | N.meningitidis Z2491 | 1 553 595 | Non-coding | ||||||
| 37 | C.trachtomatis | 604 373 | yjeE | −9.07 | 83.0 | − | 0.76 | − | |
The physical properties, energetic test results, and location(s) of each candidate sequence are reported. , Tm, and V denote the free energy, melting temperature and Valley index, respectively. The and test results (+ or −) at 90% confidence level are calculated using the thermodynamic scatter plots (see text and Figure 4). Location refers to the position of the start site of a candidate sequence (by nucleotide number) and Gene refers to the gene's name (GenBank annotations) or non-coding region containing the candidate sequence. The locations of sequences 7 and 8 are in the associated species' plasmids. The most promising candidates are numbered in bold.
Figure 3Expected number of matches versus the observed number of matches for 46 aptamer motif/genome pairs; the expected number is calculated using random sequences. Deviations from the diagonal line represent genomes with either an over or under-representation of the given motif.
The expected frequency of each aptamer descriptor per 1 and 10 million base pairs of random sequence with uniformly distributed nucleotides
| Descriptor | Frequency ± 1 standard error | |
|---|---|---|
| 1 Mb | 10 Mb | |
| Chloramphenicol | 0.3320 ± 0.0007 | 3.320 ± 0.007 |
| Streptomycin | 0.0778 ± 0.0003 | 0.778 ± 0.003 |
| Neomycin B | 2.1471 ± 0.0015 | 21.47 ± 0.015 |
| ATP | 0.9424 ± 0.0009 | 9.424 ± 0.009 |
The nucleotide distribution of genomes
| T(U) (%) | C (%) | A (%) | G (%) | |
|---|---|---|---|---|
| Streptomyces avermitilis | 14.6 | 35.4 | 14.7 | 35.3 |
| Streptomyces coelicolor | 14.0 | 36.0 | 13.9 | 36.1 |
| Escherichia coli K12 | 24.6 | 25.4 | 24.6 | 25.4 |
| Escherichia coli O157:H7 | 24.7 | 25.2 | 24.8 | 25.3 |
| Escherichia coli O157:H7 EDL933 | 24.7 | 25.2 | 24.8 | 25.2 |
| Escherichia coli CFT073 | 24.7 | 25.3 | 24.8 | 25.2 |
| Neisseria meningitidis MC58 | 24.3 | 25.6 | 24.2 | 26.0 |
| Neisseria meningitidis Z2491 | 24.2 | 25.9 | 24.0 | 25.9 |
| Sinorhizobium meliloti | 18.6 | 31.5 | 18.6 | 31.2 |
| Chlamydia trachomatis | 29.3 | 20.6 | 29.4 | 20.7 |
| Aeropyrum pernix | 22.1 | 28.4 | 21.6 | 28.0 |
| Archaeoglobus fulgidis DSM 4304 | 25.6 | 24.2 | 25.8 | 24.4 |
| Halobacterium sp. NRC-1 | 17.1 | 33.0 | 17.0 | 33.0 |
| Methanobacterium thermoautotrophicum str. ΔH | 25.4 | 24.7 | 25.1 | 24.8 |
| Methanococcus jannaschii | 34.3 | 15.5 | 34.4 | 15.7 |
| Methanopyrus kandleri AV19 | 19.4 | 30.7 | 19.5 | 30.4 |
| Methanosarcina acetivorans C2A | 28.8 | 21.4 | 28.5 | 21.3 |
| Methanosarcina mazei Goe1 | 29.2 | 20.7 | 29.3 | 20.8 |
| Pyrobaculum aerophilum | 24.1 | 25.4 | 24.5 | 25.9 |
| Pyrococcus abyssi | 27.7 | 22.4 | 27.6 | 22.3 |
| Pyrococcus furiosus DSM 3638 | 29.6 | 20.4 | 29.6 | 20.4 |
| Pyrococcus horikoshii | 29.1 | 21.2 | 29.0 | 20.7 |
| Sulfolobus solfataricus | 32.3 | 17.9 | 31.9 | 17.9 |
| Sulfolobus tokodaii | 33.8 | 16.3 | 33.4 | 16.5 |
| Thermoplasma acidophilum | 26.8 | 22.9 | 27.2 | 23.1 |
| Thermoplasma volcanium | 29.9 | 19.9 | 30.2 | 20.0 |
The physical properties and stability test results for five biological RNA sequences
| Sequences | RNA | NCBI accession | V | |||||
|---|---|---|---|---|---|---|---|---|
| B1 | 5S RNA | M34003 | −61.78 | 93.6 | + | 4.13 | + | |
| B2 | Gln tRNA | −29.65 | 87.4 | + | 2.65 | + | ||
| B3 | U5 RNA | AF095839 | −42.96 | 86.8 | + | 7.25 | + | |
| B4 | U6 RNA | AF053588 | −38.43 | 67.4 | + | 18.06 | − | |
| B5 | U7 RNA | M17910 | −19.45 | 93.4 | + | 8.85 | + | |
, Tm and V denote the free energy, melting temperature and Valley index, respectively. The and test results (+ or −) at 90% confidence level are calculated using the thermodynamic scatter plots (see text and Figure 4).
Figure 4Free energy versus melting temperature Tm (left column) and free energy versus Valley index V (middle column) for four good candidate aptamers (diamond symbol; right column) and their 1000 randomly shuffled sequences. Points in the ellipses cover 90% of sequences. Sequence numbers refer to those in Table 3.
Figure 5Examples of good and poor conformational energy landscapes. The good landscape is represented by candidate sequence 32 (black filled circles, shaded) and the poor landscape by candidate sequence 33/34 (red, diamonds) for the neomycin B antibiotic. The good candidate sequence exhibits a sharp, steep slope while the poor candidate sequence has multiple low-energy minima. The optimal and suboptimal structures are generated using the Vienna RNA Package.