Literature DB >> 31860665

Regulatory context drives conservation of glycine riboswitch aptamers.

Matt Crum1, Nikhil Ram-Mohan1, Michelle M Meyer1.   

Abstract

In comparison to protein coding sequences, the impact of mutation and natural selection on the sequence and function of non-coding (ncRNA) genes is not well understood. Many ncRNA genes are narrowly distributed to only a few organisms, and appear to be rapidly evolving. Compared to protein coding sequences, there are many challenges associated with assessment of ncRNAs that are not well addressed by conventional phylogenetic approaches, including: short sequence length, lack of primary sequence conservation, and the importance of secondary structure for biological function. Riboswitches are structured ncRNAs that directly interact with small molecules to regulate gene expression in bacteria. They typically consist of a ligand-binding domain (aptamer) whose folding changes drive changes in gene expression. The glycine riboswitch is among the most well-studied due to the widespread occurrence of a tandem aptamer arrangement (tandem), wherein two homologous aptamers interact with glycine and each other to regulate gene expression. However, a significant proportion of glycine riboswitches are comprised of single aptamers (singleton). Here we use graph clustering to circumvent the limitations of traditional phylogenetic analysis when studying the relationship between the tandem and singleton glycine aptamers. Graph clustering enables a broader range of pairwise comparison measures to be used to assess aptamer similarity. Using this approach, we show that one aptamer of the tandem glycine riboswitch pair is typically much more highly conserved, and that which aptamer is conserved depends on the regulated gene. Furthermore, our analysis also reveals that singleton aptamers are more similar to either the first or second tandem aptamer, again based on the regulated gene. Taken together, our findings suggest that tandem glycine riboswitches degrade into functional singletons, with the regulated gene(s) dictating which glycine-binding aptamer is conserved.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31860665      PMCID: PMC6944388          DOI: 10.1371/journal.pcbi.1007564

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


Introduction

Structured RNA motifs play vital roles in all kingdoms of life. They are essential in protein translation [1,2], perform catalytic functions [3,4], and regulate gene expression [5,6]. Understanding the evolution and conservation of structured RNAs provides insight into the cellular processes in which they are involved. However, in comparison to protein coding sequences, the impact of mutation and natural selection on the sequence and function of structured RNAs is not well understood. Some RNAs such as the ribosomal RNA, and other large functional RNAs, are widely distributed across entire kingdoms. However, many others are narrowly distributed to only a few organisms, suggesting that their evolution may be quite rapid [7,8]. The ability of traditional phylogenetic approaches to investigate the evolution of structured RNAs is frequently inadequate due to a reliance on primary sequence information alone. Currently, only RAxML provides an algorithm for phylogenetic analysis that takes secondary structure into account, but it relies on a consensus secondary structure provided after sequence alignment [9]. Though this is an advantage over software that rely solely on the primary sequence, this still removes the potential for subtle structural similarities that might be present in the case of a pairwise assessment of structural similarity. Moreover, the antagonistic role that decreasing sequence length and increasing taxonomic diversity play on phylogenetic confidence limits the investigation of many structured RNA regulators, which tend to be short and have high primary sequence diversity [10]. To overcome these challenges, we utilized graph clustering, in which each vertex corresponds to an RNA sequence and edges are weighted based on a distance metric (e.g. RNA structural similarity, sequence similarity, or some combination of both), to assess the relative similarity of related RNAs. A graph-based clustering method was recently utilized to cluster contigs belonging to the same transcripts in de novo transcriptome analysis [11], but it lacks the structural component necessary for accurate clustering of structured ncRNA sequences. Other approaches for detection and clustering of structured RNAs provide robust methods for researchers to classify members of existing RNA families [12-19] and identify novel structured ncRNA [20-25]. However, these tools are fundamentally not designed to compare degrees of conservation across clusters of homologous structured RNAs. To investigate motif divergence an analysis approach that provides fine-scale comparison of similarity within and between groups of homologous RNAs is necessary. Our approach allows any measure of pairwise similarity between two RNA sequences to be utilized for comparison and relative conservation between groups to be investigated. Thus, measures that incorporate purely secondary structure information [26,27], purely sequence information [28,29], or a combination of both [30-34], may be utilized so that all available information can be captured. While this approach does not yield the same kind of inferences concerning the line of descent as traditional phylogenetic approaches, it does enable many more variables to be accounted for while assessing ncRNA similarity. Regulatory RNAs have quickly become the largest class of functional RNAs. Yet, our understanding of their evolution lags that of the larger catalytic RNAs such as the ribosome. One class of regulatory RNAs that, similarly to the ribosome, take their function from a three-dimensional structure are riboswitches. Riboswitches are bacterial cis-regulatory elements that occur within the mRNA 5’-UTR and alter transcription attenuation or translation initiation of gene(s) directly downstream in response to a specific ligand [5,6]. This is accomplished by coordination between an aptamer domain, that binds the ligand, and an expression platform that translates this binding into downstream gene expression. Several riboswitches are identified across many phyla of bacteria [6,35-39], and may be ancient in origin [40,41]. However, riboswitches have also evolved under varied environmental and genetic contexts. Typically aptamer domains are structurally well-conserved while expression platforms may be quite variable across different bacterial species with Firmicutes and Fusobacteria species showing a preference for transcription attenuation, and Proteobacteria and Actinobacteria species preferring translation inhibition mechanisms [6]. The wide-spread distribution and structurally well-conserved aptamer domains of riboswitches make them useful models to better understand evolution of structured RNA motifs. The glycine riboswitch provides a unique opportunity to directly assess how riboswitch architecture may change over time or be influenced by which genes are regulated. The glycine riboswitch is commonly found in a tandem conformation where two homologous aptamers interact through tertiary contacts to regulate a single expression platform (tandem) [42]. A more conventional single-aptamer conformation also appears in nature (singleton), but singleton glycine riboswitches require a “ghost-aptamer” that functions as a scaffold for tertiary interactions similar to those observed in tandem glycine riboswitches [43]. Glycine riboswitch singletons are divided into two types distinguished by the location of the ghost-aptamer with respect to the ligand-binding aptamer. Type-1 singletons have a ghost-aptamer 3’ of the aptamer, while the ghost aptamer is 5’ of the glycine aptamer for type-2 singletons [43]. The relationship between singleton and tandem glycine riboswitches is not well characterized, and the how and why of tandem vs singleton riboswitch emergence and conservation is a subject of debate. Glycine riboswitches have been identified regulating several different sets of genes (genomic context) [42,44-46], and may function as either expressional activators (On-switch) [42] or repressors (Off-switch) [46]. In order to assess the relationship between singleton and tandem glycine riboswitches we used both traditional phylogenetics and graph clustering approaches to examine glycine aptamer sequences across a range of diverse bacterial species. Our investigation reveals that genomic context effects which tandem glycine-binding aptamer is more highly conserved. It also demonstrates that singleton riboswitches are more similar to the first or second tandem aptamer based on genomic context. Taken together, our findings suggest strongly that many singleton glycine riboswitches result from degradation of tandems, with the genomic context dictating which glycine-binding aptamer retains ligand responsiveness.

Results

Glycine riboswitches within the Bacillaceae and Vibronaceae families cluster based on genomic context

The tandem glycine riboswitch conformation is well-studied biophysically [44,45,47-51]. However, there is lack of consensus regarding the mechanism of ligand-binding or which of the tandem aptamers is more essential for ligand-binding to induce gene regulation. Extensive in vitro investigation of a tandem glycine riboswitch originating from Vibrio cholerae found that ligand binding of the second aptamer (aptamer-2) controlled the expression platform and gene expression, while the first aptamer (aptamer-1) primarily played a role in structural stabilization and aptamer dimerization [44]. However, in vivo investigation of a tandem glycine riboswitch within Bacillus subtilis found that disruption of aptamer-1’s binding pocket impeded riboswitch regulation more strongly than disrupting aptamer-2’s [45]. To resolve the differences observed between the V. cholerae and B. subtilis tandem riboswitches, we conducted a comprehensive sequence analysis of glycine aptamers. To identify glycine riboswitch aptamers, we used the RFAM covariance model RF00504 to search RefSeq77 [14,15,52]. Identified aptamers within 100 nucleotides (nts) of each other were considered to be part of a tandem riboswitch. A tandem aptamer covariance model was created using infernal and trained from this identified set and used to search RefSeq77 to supplement the dataset [12,13]. In total, 2,998 individual riboswitches were identified, 2,216 tandem riboswitches and 782 singleton riboswitches. Each was classified by genomic context based on the RefSeq annotated function of the putatively regulated gene. This dataset does not include the variant glycine riboswitches identified in a previous study [53], as the vast majority of these examples are present in metagenomic data, and therefore are not within RefSeq77. To determine whether the functional differences observed between the glycine riboswitches from B. subtilis and V. cholerae are reflective of detectable sequence variation/divergence across their respective families, we first performed a phylogenetic analysis on examples from our dataset found within the Bacillaceae and Vibrionaceae bacterial families. We gathered sequences spanning both aptamers from 48 Bacillaceae and 37 Vibrionaceae tandem riboswitches. Within this set, all 37 Vibrionaceae riboswitches regulate transport proteins (TP), while 41 Bacillaceae riboswitches regulate glycine cleavage system (GCV) and the remaining 7 regulate TP (). We then utilized RAxML to generate a phylogenetic tree () from these sequences and the consensus secondary structure from alignment to our tandem covariance model using Infernal. The tree shows clustering within the group of Bacillaceae riboswitches regulating GCV, as well as within Bacillaceae riboswitches regulating TPs. However, there is a clear phylogenetic separation of the two groups of Bacillaceae riboswitches, splitting them into distinct clades. Furthermore, the clade representing Bacillaceae riboswitches regulating TP more closely clustered with the clade of riboswitches from Vibrionaceae regulating TP, although each set forms a distinct group. This finding suggests that genomic context may play a prominent role in the evolution of tandem glycine riboswitches.

Phylogenetic comparison of tandem riboswitches across Bacillaceae and Vibrionaceae.

A) 48 Bacillaceae and 37 Vibrionaceae tandem riboswitches were clustered based on aptamer sequence and structure across both aptamers of the riboswitch. After phylogenetic clustering, individual aptamers were colored based on the class of gene being regulated and the bacterial family of origin (Vibrionaceae TP are orange, Bacillaceae TP are purple, Bacillaceae GCV are green). Clusters have been labeled with the bacterial family and gene class being regulated. Bootstrap support values are displayed for 100 replicates when > = 70. B,C) Phylogenetic clustering of 48 tandem riboswitches, separated into aptamer-1 (B) and aptamer-2 (C), taken from the Bacillaceae family and colored according to class of gene being regulated (GCV are green, TP are purple). All trees are midpoint rooted.

Bacillaceae tandem riboswitches display different patterns of aptamer conservation based on genomic context

To investigate whether the evolutionary pressure driving divergence of tandem glycine riboswitches regulating GCV and TP occurs evenly across both aptamers, or is specific to a single aptamer, we split tandem riboswitches within the Bacillaceae family into individual aptamers (). This provided us with two groups: one containing all aptamer-1’s (first aptamer) and one containing all aptamer-2’s (second aptamer). We then generated a phylogenetic tree for each set to determine whether the phylogenetic divergence seen within the riboswitch set is explained by variances within one specific aptamer or is present in both (. Both aptamer-1 and aptamer-2 sets display clear clustering based on genomic context. This indicates that divergence of tandem glycine riboswitches in differing genomic contexts cannot be fully explained by variation within the first or second aptamer alone. Moreover, within the GCV context, it appears that aptamer-1 is more highly conserved than aptamer-2, as indicated by the shorter branch lengths across the clade (.

Genomic context dictates aptamer clustering in Bacillaceae and Vibrionaceae

To better understand how the homologous aptamers of the tandem glycine riboswitch have diverged, we broadened our taxonomic scope and focused our investigation onto the individual aptamer domains of the glycine riboswitch. However, the shorter sequence length of the individual glycine aptamers confounded our analysis. Thus, relative conservation of aptamer-1 to aptamer-2 in different genomic contexts was investigated using graph clustering of all Bacillaceae and Vibrionaceae aptamers within a given genomic context, excluding identical aptamer pairs coming from different strains of the same species. This set comprised of 84 pairs of aptamer-1 and aptamer-2 from Bacillaceae regulating GCV and 36 pairs from Vibrionaceae regulating TP (). The number of TP riboswitches was reduced by one in this analysis compared to the previous, as one of the riboswitches was no longer unique within the set when evaluating only the individual aptamer sequences. We generated networks comprised of vertices corresponding to individual glycine riboswitch aptamers with edges weighted based on the pairwise RNAmountAlign distance score [34]. RNAmountAlign was chosen as the primary metric for edge-weighting in our work due to its implemented use of primary sequence information and ensemble mountain distance of secondary structure to generate a pairwise score more quickly and efficiently than other software. After weighting with RNAmountAlign, edges were trimmed if they were below a selected RNAmountAlign threshold, thus altering the topology of the network from completely pairwise to containing clusters of aptamers whose similarity is greater than the threshold. Thresholding was done across a range of RNAmountAlign scores to identify conserved aptamer groups which retained tight clustering (). Each network corresponds to a specific genomic context, TP or GCV. We find that within these contexts, aptamers group based on their position within the tandem arrangement (aptamer-1 vs. aptamer-2). Network density, defined as the fraction of edges present within a group compared to the total number of edges in the non-thresholded network, was calculated across a range of RNAmountAlign score thresholds and used to gauge relative conservation of each aptamer type for each genomic context (). Differing cluster densities between aptamer types revealed that genomic context effects which aptamer is more highly conserved: aptamer-1 is more highly conserved in riboswitches regulating the GCV, while aptamer-2 is more highly conserved in those regulating TP. A Wilcoxon rank-sum analysis of all intra-group edges was performed as well to validate these findings (). We obtain very similar findings using a variety of alternative distance metrics calculated using Dynalign [30,31], FoldAlign [32,33], Clustal Omega [28], and RNApdist [26,27] ().

Clustering of tandem riboswitch aptamers across Bacillaceae and Vibrionaceae.

A) Bacillaceae and Vibrionaceae tandem riboswitch aptamers were clustered using RNAmountAlign as a distance metric (threshold of 5). All represented Bacillaceae riboswitches regulate GCV (top), while Vibrionaceae riboswitches regulate TP (bottom). Aptamers are colored based on aptamer type, purple for aptamer-1 and green for aptamer-2. B) Network density was calculated for each aptamer in both networks across a range of RNAmountAlign thresholds. Dotted red line indicates the RNAmountAlign threshold (5) at which the networks in A were visualized. C) Box blots represent all pairwise edge-weights within each aptamer type. **** p-value < 2x10.

Bacilli class of bacteria shows clustering of singleton and tandem aptamers together

To assess the relationship between singleton and tandem riboswitches, we implemented graph clustering of individual aptamers from both tandem and singleton glycine riboswitches. We first categorized singleton aptamers within our dataset (includes all bacteria, refseq77-microbial) into singleton type-1 or singleton type-2 based on whether the ghost aptamer was found 3’ or 5’ of the glycine aptamer. Of 782 singleton riboswitches, 342 were characterized as singlet type-1, 125 as singleton type-2, and 305 were unable to be conclusively characterized as one or the other (designated singleton type-0) (). We found that singleton type-1 riboswitches regulate GCV 93% of the time, while 90% of the singleton type-2 riboswitches regulate TP. This context dependent appearance of singleton riboswitches agrees with previous findings and gives confidence in our singleton annotation pipeline [54]. We then implemented graph clustering on a set containing all glycine riboswitch aptamers from Bacilli, excluding identical aptamers coming from different strains of the same species, totaling 436 aptamers () (). This set from the Bacilli class was selected for its representation of both GCV (58%) and TP (31%) regulating riboswitches. Remaining riboswitches were labeled as regulating genes involved in glycine metabolism (Gly_Met) that is not part of the GCV operon or as Other. Using four distinct de novo community detection algorithms available in R (see methods) we identified modular communities within the set. Communities were selected based on each groups’ core cluster, which was present within all community detection algorithms utilized. Aptamers that were found to be grouped with the core cluster in at least half of the community detection algorithms were subsequently added to the cluster. Cluster stability was verified using 100 replicates of parametric bootstrapping () (see methods), as well as comparison to MCL [55] () and DBSCAN [56] () clustering output. This resulted in clusters comprised of a highly conserved core set and aptamers that closely grouped with them. Most communities contain either aptamers derived from a tandem arrangement or singleton aptamers. However, two communities included both singlet and tandem derived aptamers. The first contains singleton type-1 aptamers and aptamer-1 of tandem riboswitches, all regulating GCV. The second includes singleton type-2 aptamers and aptamer-2 of tandem riboswitches, all regulating TP ().

Clustering of glycine riboswitch aptamers identified within the Bacilli class of bacteria.

A) Aptamers within the Bacilli bacterial class were identified and clustered based on RNAmountAlign pairwise similarity (visualized at threshold of 12). B) Sub-clusters (communities) were identified using the four community detection functions within R’s igraph package. Two communities were identified that contain two different aptamer types: aptamer-1 and singlet type-1, and aptamer-2 and singlet type-2 that regulate GCV and TP respectively. Network shows visualization of the community detection algorithm cluster_fast_greedy (as implemented by R). Node colors correspond to distinct clusters detected. C) The two sub-clusters containing different aptamer types were parsed from the overall network, the tandem aptamers’ partners were added to the set (as an out group within the same context), and graph clustering was visualized (RNAmountAlign threshold of 5). D) Edge density between aptamer groups was calculated for networks generated across a range of RNAmountAlign edge-weight thresholds. Dotted red line indicates the RNAmountAlign threshold (5) at which the networks in (C) were visualized. Members of both mixed communities were extracted and networks were generated for each as described above (). For aptamers originally part of a tandem arrangement, the paired aptamer was included to assess relative conservation () of the singlet aptamers to each tandem aptamer type. We determined relative conservation between aptamer types by calculating the network density of edges connecting each aptamer type (inter-edge density) () across a range of RNAmountAlign thresholds. We observe that singleton type-1 aptamers regulating GCV are most similar to aptamer-1 of tandem riboswitches in the same context and conversely that singleton type-2 aptamers regulating TP are most similar to aptamer-2 of tandem riboswitches in the same context. The inter-edge density between singleton type-1 aptamers and tandem aptamer-1’s regulating GCV is comparable to that seen between singleton type-2 aptamers and tandem aptamer-2’s regulating TP (). These two groupings also represent the highest conservation across aptamer types within their networks, with other pairings being comparable to inter-edge density measurements with a random set of 40 aptamers (). Using Dynalign, FoldAlign, Clustal Omega, and RNApdist as distance metrics yields similar findings (). To further investigate the similarities between these aptamers, we generated consensus structures of the riboswitches found within each genomic context using a combination of tools (see methods). Consensus structures of riboswitches regulating GCV show tandem aptamer-1 and singleton type-1 aptamers have high conservation of the P2 and P3 stems, as well as the binding pocket, while the P1 stem of tandem aptamer-2 shows high conservation with the singleton type-1 ghost aptamer (). This conservation of the ghost aptamer P1 stem correlates with the region required for tertiary interactions of tandem and singleton riboswitches [43]. This is observed within riboswitches regulating TP as well, except the aptamer of singleton type-2 and tandem aptamer-2 are the conserved aptamers ().

Consensus structures of Bacilli riboswitches within a given genomic context display conservation between tandem and singleton aptamers.

Consensus secondary structure of the singleton and tandem riboswitches delineated by the genomic context. Conservation and covariation of base pairing generated using R2R with the individual covariance models. Tandem (A) and singleton (B) riboswitches regulating GCV. Tandem (C) and singleton (D) riboswitches regulating TP. Together, our results indicate three things about Bacilli glycine riboswitch aptamers within each genomic context: 1) one tandem aptamer shows high conservation to the singleton aptamer, 2) conservation between the alternative tandem aptamer and singleton aptamers is no greater than conservation of the singleton to a random set of glycine riboswitch aptamers, and 3) ghost aptamer location correlates with the less conserved tandem aptamer. This fits with a model wherein these singleton riboswitches are the result of tandem riboswitch degradation, and which aptamer to be conserved and which to be degraded is dependent on genomic context. If the situation was reversed and tandems were the result of duplication events of singleton riboswitches, we would expect higher conservation between the singleton aptamer and both tandem aptamers compared to a random set of glycine riboswitch aptamers. However, we only observe such conservation with one tandem aptamer in each genomic context.

Actinobacteria riboswitches display similar clustering pattern observed in Bacilli

To determine whether these patterns are observed within other clades of bacteria, we gathered all glycine aptamers within our dataset in the Actinobacteria phylum (distantly related to both the Vibrio and Bacilli classes analyzed previously), excluding identical aptamers coming from different strains of the same species, totaling 606 aptamers (). We then evaluated all aptamers within the set in the same manner as our Bacilli investigation. Within this phylum, glycine riboswitches primarily regulate GCV (74%) or other genes involved in glycine metabolism (22%). We identified a group of 34 conserved aptamers corresponding to riboswitches regulating GCV () and utilized de novo community detection algorithms to validate our observation (). Cluster stability was verified using 100 replicates of parametric bootstrapping () (see methods), as well as comparison to MCL () and DBSCAN (S10 Fig) clustering output. The aptamers within this group comprised primarily singleton type-1 aptamers and tandem aptamer-1 sequences, with five singleton type-0 and two singleton type-2 aptamer sequences accounting for the remainder. The singleton type-2 aptamers within the set may be misclassified aptamers or examples of singleton aptamers which do not conform to the patterns observed for other investigated aptamers. We performed graph clustering on the group, with paired tandem aptamer-2s included as an out-group, to investigate conservation of aptamer types () (). We then calculated the edge densities within and between singleton type-1 aptamers, tandem aptamer-1s, and tandem aptamer-2s, which demonstrate a clear conservation between singleton type-1 aptamers and tandem aptamer-1s (). These findings fit our conclusions drawn from the Bacilli class of bacteria. Using Dynalign, FoldAlign, Clustal Omega, and RNApdist as distance metrics yield similar findings ().

Clustering based on genomic context is observed throughout entire bacterial kingdom

To determine whether clustering patterns observed within the Bacilli class and Actinobacteria phylum are reflected throughout the rest of the bacterial kingdom and can be observed among randomly selected aptamers, we randomly selected 150 distinct glycine riboswitch aptamers each for the GCV and TP genomic context (). Our selection retained comparable numbers of each aptamer type and excluded singleton type-0 aptamers. Singleton type-1 and type-2 aptamers are underrepresented in the TP and GCV regulating sets, respectively, because each aptamer type has few instances within that genomic context. Despite the diverse taxonomic range represented within this dataset, the generated networks display clustering patterns which align with our previous observations: a tendency towards clustering of singleton type-1 aptamers with tandem aptamer-1s when regulating GCV, and clustering of singleton type-2 aptamers with tandem aptamer-2s when regulating TP (. Inter-edge density graphs of the aptamers shows similar trends to those seen within the Bacilli class and Actinobacteria phyla (). Using Dynalign, FoldAlign, Clustal Omega, and RNApdist as distance metrics yield similar findings ().

Clustering of random glycine riboswitch aptamers across the bacterial kingdom.

A) Network visualization of 150 randomly chosen aptamers regulating GCV and clustered based on RNAmountAlign pairwise similarity (threshold -5). B) Network visualization of 150 randomly chosen aptamers regulating TP and clustered based on RNAmountAlign pairwise similarity (threshold -5). Only singletons that could be classified as type-1 or type-2 were included in this set.

Discussion

The tandem aptamers of the glycine riboswitch have fascinated RNA biologists since their identification in 2004 [42,57]. Extensive work has assessed whether the two homologous aptamers of the tandem glycine riboswitch functioned cooperatively [47-50], which tandem aptamer was more important for ligand binding [44,45], and what, if any, benefit a tandem conformation provided over the singleton glycine riboswitch [43]. In this work we use graph clustering analysis to investigate a similarly divisive question: what is the evolutionary relationship between tandem and singleton glycine riboswitches? While it may appear intuitive to believe that the tandem riboswitches that have been identified are the result of a duplication of identified singleton riboswitches, our findings point towards most singleton riboswitches being the result of tandem riboswitch degradation. Phylogenetic evaluation of Bacillaceae and Vibrionaceae tandem riboswitches revealed that genomic context impacts riboswitch evolution. This is illustrated by Bacillaceae riboswitches which regulate TP grouping more closely with Vibrionaceae riboswitches regulating TP than with Bacillaceae riboswitches regulating GCV. Further investigation into the individual aptamers of Bacillaceae tandem riboswitches regulating GCV compared to TP showed that both aptamers individually show the same pattern of divergence. Taking this analysis a step farther using graph clustering, we were able to determine that genomic context dictates which aptamer within a tandem glycine riboswitch is more highly conserved: aptamer-1 is more highly conserved in riboswitches regulating GCV, while aptamer-2 is more highly conserved in those regulating TP. These findings provide an elegant answer to a contradiction within the field in which investigations of diverse glycine riboswitch homologs yielded different results for whether ligand-binding of the first or second aptamer is more important for functionality [44,45]. Our results align with both studies’ findings: the aptamer identified as the essential binding partner for regulation in each study is the aptamer found in our study to be more highly conserved within that genomic context. With the results of these previous studies combined with this new perspective provided by our data, it is reasonable to conclude that a difference in genomic context has driven glycine riboswitches to conserve different primary ligand-binding aptamers. Widespread horizontal transfer of the riboswitch with its accompanying gene could account for our findings. To investigate this possibility, we generated gene trees for aminomethyltransferases and symporters preceded by glycine riboswitches. From these trees, there is limited evidence of horizontal transfer of these genes () (). Our observation that tandem glycine riboswitch evolution is affected by genomic context led us to question the impact of genomic context on singleton glycine riboswitches. We extended our network analysis to singleton riboswitches, which provided valuable insight into the relationship of tandem and singleton glycine riboswitches. Clustering of singleton and tandem aptamers from the Bacilli and Actinobacteria clades revealed that singleton aptamers are more similar to the first or second tandem aptamer based on genomic context: singleton type-1 aptamers regulating GCV are more similar to aptamer-1 of tandems regulating GCV, while singleton type-2 aptamers regulating TP are more similar to aptamer-2 of tandems regulating TP. This similarity of singletons to one tandem aptamer within a genomic context is highlighted by the fact that the singleton aptamers show no higher similarity with the other tandem aptamer than with a random set of 40 glycine riboswitch aptamers. This is observed within both GCV and TP regulating riboswitches and directs us towards the conclusion that singleton riboswitches are the remnants of degraded tandem aptamers. We propose a model for the evolutionary path of the glycine riboswitch in which tandem riboswitches become singleton riboswitches by undergoing degradation of one aptamer into a ghost aptamer which retains regions relevant to tertiary interaction (). In this model, the aptamer which is conserved is dependent on genomic context. The different conservation of tandem aptamers based on genomic context also fits recent studies that demonstrate a high likelihood that whether a glycine riboswitch is regulating TP or GCV is predictive of whether they are an on or off switch [42,45,46,54]. This fits a logical model for cellular response to high concentrations of glycine as a toxin [45,58-62]: genes responsible for glycine degradation become upregulated and those involved in glycine uptake become downregulated. In this way riboswitches in each genomic context protect the cell from the glycine toxicity as concentrations increase. This difference in regulation functionality accounts for riboswitches in different genomic contexts diverging, culminating in conservation of different aptamers and ultimately the formation of singleton riboswitches. It is possible that some singletons may have arisen from deletion of the middle section of a tandem riboswitch, leaving the 5’ half of aptamer 1 and the 3’ half of aptamer 2, resulting in a singleton. However, this scenario seems unlikely because it does not account for the ghost aptamer, which is important for structural stability of the glycine riboswitch.

Model of glycine riboswitch evolution.

Model proposed for the evolution and divergence of the glycine riboswitch. In this model a progenitor tandem riboswitch conserves one of the tandem aptamers based on the genomic context of the riboswitch, while the other slowly degrades down to the minimalistic components required for tertiary interaction to drive gene regulation. In this way, tandem glycine riboswitches may degrade into functional singleton tandem riboswitches. While we have used graph clustering to unravel a specific discrepancy arising in existing experimental data, our approach may be used more broadly to assess how other riboswitches and other ncRNAs evolve and change over time. Variations in homologous riboswitch aptamers have demonstrated functional consequences. There are a range of variant riboswitch classes that interact with differing ligands [53,63,64], the most compelling of which are the homologous ykkC riboswitches [57,65] which include at least five subclasses each of which binds a distinct ligand involved in guanidine degradation and export [39,66-68]. There are also examples where structurally distinct riboswitches interact with the same or very similar ligands, such as the seven riboswitch classes involved in regulating S-adenosylmethionine concentration [69,70]. These RNAs include the SAM/SAH riboswitch which has been proposed to be a minimalistic form of a SAM riboswitch that evolved in organisms which readily degrade SAH [71]. The approaches we developed circumvent the limitations of traditional phylogenetic methods for assessing ncRNA similarity and enable identification of patterns in aptamer conservation that may point toward differences in biological function across diverse organisms.

Materials and methods

Riboswitch identification

Infernal’s cmsearch function was used to query all RefSeq77 bacterial genomes using the RFAM glycine riboswitch covariance model to identify all individual putative glycine riboswitch aptamers (RF00504) [13,14,52]. All hits were filtered based on e-value, with a threshold of 1 x 10-5. Putative riboswitches were sorted into components of a singleton or tandem glycine riboswitch based on their proximity to any other putative aptamer. Two hits within 100nts of each other were considered to be the two aptamers of a tandem riboswitch; the largest distance between tandem aptamers observed was 32 nts. We then used a set of 30 tandem riboswitches to generate a covariance model which identified both tandem riboswitch aptamers together. The generated model is able to identify tandem glycine riboswitch aptamers, but does not explicitly include the expression platform due to the diversity of mechanisms of action for the glycine riboswitch. This tandem covariance model was used to query the RefSeq77 bacterial database and supplement our current set of putative glycine riboswitches with any tandems that may have been missed by the RFAM covariance model. 2,998 individual riboswitches were identified, 2,216 tandems riboswitches and 782 singleton riboswitches. Singletons were then classified as type-1 or type-2 based on the location of the ghost aptamer, an adjacent stem structure that functions as a scaffold for tertiary interaction with the ligand-binding aptamer. Ghost aptamer location was determined based on conformation to covariance models generated from singleton type-1 and type-2 riboswitches reported in [54]. Of our 782 singleton riboswitches, 342 were characterized as singlet type-1, 125 were characterized as singleton type-2, and 305 were unable to be characterized as one or the other (called singleton type-0). Bedtools was utilized to determine the nearest downstream gene within 500 nucleotides on the same strand, providing a gene that is putatively regulated by each given riboswitch [72]. Genes were binned based on function for determination of genomic context of the glycine riboswitch.

Riboswitch phylogenetic analysis

Tandem riboswitches were grouped based on taxonomic origin and genomic context. In order to incorporate secondary structure information, groups were aligned using LocARNA’s mlocarna function for de novo alignment and folding [73-75] and Infernal’s cmalign function to align to the tandem covariance model [12,13]. Maximum likelihood phylogenetic trees were generated from these aligned groups using RAxML [9]. Trees from alignments generated by cmalign were run with an accompanying secondary structure file to guide phylogenetic maximum likelihood analysis based on aptamer sequence and structure. In each case, 100 bootstrap replicates were performed and maximum likelihood bootstrap confidence values > = 70 are reported.

Graph clustering and network generation

Graphs of aptamer sequences were generated, with vertices representing individual aptamers and edges representing a pairwise similarity metric relating aptamer pairs. Edge weights were then thresholded, resulting in trimmed networks of clustered aptamers containing only edges connecting pairs with higher similarity than the threshold value. Aptamer networks were generated to determine clustering based on a number of different pairwise metrics. These included partition function (RNApdist) [26,27], sequence and structural similarity (FoldAlign and Dynalign) [30-33], ensemble expected mountain height (RNAmountAlign) [34], and sequence similarity (Clustal Omega) [28]. Pairwise values were utilized to generate and visualize aptamer networks using the igraph and qgraph R-libraries [76,77]. Optimal visualization thresholds vary between sets relative to the taxonomic diversity represented within them. Clustering of riboswitch groups based on genomic context and aptamer type were compared by network density across a range of thresholds for each distance metric. Modular clusters were identified using igraph’s community identification functions cluster_fast_greedy, cluster_walktrap, cluster_edge_betweenness, and cluster_leading_eigen. Following cluster identification, we performed 100 replicates of a parametric bootstrapping analysis which perturbs 5% of the network and then re-clusters. This analysis perturbs the network by adding/removing edges at random in a 1:1 ratio, resulting in a network that contains the same nodes and an equivalent number of edges, but 5% of the edges connect different nodes. For each iteration, we determined the new clustering for our group of interest using the igraph community detection methods. These were then compared to the original group that had no perturbation. This comparison was done across all 100 iterations and uses Jaccard Similarity Index to calculate similarity of each post-perturbation cluster to the original cluster. Average Jaccard Similarity Index across the 100 iterations was used to determine robustness for our clusters (). Cluster composition was also validated by MCL [55] and DBSCAN [56] clustering. Network density was calculated by determining the percent of total possible pairwise edges remaining for a given set of vertices after edge-trimming based on a distance metric threshold. Inter- and intra-group density calculations represent edge-density within a group and between groups, respectively. The network density measured across a range of thresholds correlates to aptamer similarity with respect to a given distance metric used to weight edges. These generated graphs equate to a flipped cumulative distribution of possible edges and actual edges for a cluster as we threshold the network based on edge weights.

Consensus structure generation

We generated Stockholm files for sets of riboswitches using a combination of LocARNA’s mlocarna function [73] and alignment to our covariance models using Infernal’s cmsearch function [13]. Ralee was then used to perform minor curation of alignments and VARNA was implemented for secondary structure visualization throughout the process [78,79]. R2R was then used to generate consensus structures based on these Stockholm files [80]. The “#=GF R2R SetDrawingParam autoBreakPairs true” flag was used to allow for breaking of base pairs in instances where aptamer stems were not highly conserved.

Clustering of Bacillaceae tandem riboswitch aptamers using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign intra-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign intra-edge density across a range of 0 to 2000. C) Clustal Omega intra-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Clustering of Vibrionaceae tandem riboswitch aptamers using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign intra-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign intra-edge density across a range of 0 to 2000. C) Clustal Omega intra-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Cluster stability after 100 bootstrap replicates for Bacilli and Actinobacteria clustering.

Average Jaccard Similarity Index after 100 bootstrap replicates for Bacilli cluster regulating GCV, Bacilli cluster regulating TP, and Actinobacteria cluster regulating GCV. The first row indicates the cluster and the first column indicates the community detection method used. The methods tend to show good cluster stability, particularly cluster_walktrap. However, the cluster_fast_greedy algorithm tended to over group clusters after bootstrapping, leading to poor Jaccard Similarity Indexes for some clusters. (EPS) Click here for additional data file.

MCL clustering of Bacilli riboswitch aptamers.

A) Bacilli riboswitch aptamers clustered based on RNAmountAlign pairwise similarity (visualized at threshold of 8). Node colors correspond to aptamer type and node shape corresponds to genomic context. B) MCL clustering of Bacilli riboswitch aptamers, accomplished using R’s MCL package. Nodes are colored to distinguish distinct clusters. Red circles correspond (roughly) to the set of nodes used in our main analysis, as identified using four of R’s igraph community detection functions. (EPS) Click here for additional data file.

DBSCAN clustering of Bacilli riboswitch aptamers.

A) Bacilli riboswitch aptamers clustered based on RNAmountAlign pairwise similarity (visualized at threshold of 12). Node colors correspond to aptamer type and node shape corresponds to genomic context. B) DBSCAN clustering of Bacilli riboswitch aptamers, accomplished using R’s fpc package. An epsilon value of 3.65 was used and the minimum neighbors was set 4. Nodes are colored to distinguish distinct clusters. Clustering of tandem aptamer 1 and singleton type-1 aptamers regulating GCV is observed in light blue. C) DBSCAN clustering of Bacilli riboswitch aptamers, accomplished using R’s fpc package. An epsilon value of 2.85 was used and the minimum neighbors was set 2. Nodes are colored to distinguish distinct clusters. Clustering of tandem aptamer 2 and singleton type-2 aptamers regulating TP is observed in yellow. Red circles correspond (roughly) to the set of nodes used in our main analysis, as identified using four of R’s igraph community detection functions. (EPS) Click here for additional data file.

Clustering of Bacilli aptamer-1 and singleton type-1 aptamer subset using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign inter-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign inter-edge density across a range of 0 to 2000. C) Clustal Omega inter-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Clustering of Bacilli aptamer-2 and singleton type-2 aptamer subset using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign inter-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign inter-edge density across a range of 0 to 2000. C) Clustal Omega inter-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Clustering of glycine riboswitch aptamers identified within the Actinobacteria phylum of bacteria.

A) Aptamers within the Actinobacteria bacterial phylum were identified and clustered based on RNAmountAlign pairwise similarity (visualized at threshold of 12). B) Sub-clusters (communities) were identified using the four community detection functions within R’s igraph package. One community containing primarily two different aptamer types: aptamer-1 and singlet type-1 was identified. Display visualization uses the community detection algorithm cluster_fast_greedy. Node colors correspond to distinct clusters detected. C) The community containing different aptamer types was parsed from the overall network, the tandem aptamers’ partners were added (as an out group within the same context), and graph clustering was visualized (RNAmountAlign threshold of 5). D) Edge density between aptamer groups was calculated for networks generated across a range of RNAmountAlign edge-weight thresholds. Solid lines correspond to edge density within a group and dashed line correspond to edge density between the two indicated groups. Dotted red line indicates the RNAmountAlign threshold (5) at which the networks in (C) were visualized. (EPS) Click here for additional data file.

MCL clustering of Actinobacteria riboswitch aptamers.

A) Actinobacteria riboswitch aptamers clustered based on RNAmountAlign pairwise similarity (visualized at threshold of 12). Node colors correspond to aptamer type and node shape corresponds to genomic context. B) MCL clustering of Actinobacteria riboswitch aptamers, accomplished using R’s MCL package. Nodes are colored to distinguish distinct clusters. Red circles correspond (roughly) to the set of nodes used in our main analysis, as identified using four of R’s igraph community detection functions. (EPS) Click here for additional data file.

DBSCAN clustering of Actinobacteria riboswitch aptamers.

A) Actinobacteria riboswitch aptamers clustered based on RNAmountAlign pairwise similarity (visualized at threshold of 12). Node colors correspond to aptamer type and node shape corresponds to genomic context. B) DBSCAN clustering of Actinobacteria riboswitch aptamers, accomplished using R’s fpc package. An epsilon value of 4 was used and the minimum neighbors was set 8. Nodes are colored to distinguish distinct clusters. Clustering of tandem aptamer 1 and singleton type-1 aptamers regulating GCV is observed in orange. Red circles correspond (roughly) to the set of nodes used in our main analysis, as identified using four of R’s igraph community detection functions. (EPS) Click here for additional data file.

Clustering of Actinobacteria aptamer-1 and singleton type-1 aptamer subset using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign inter-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign inter-edge density across a range of 0 to 2000. C) Clustal Omega inter-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Inter-edge density of random glycine riboswitch aptamer networks.

150 aptamers regulating GCV and 150 aptamers regulating TP were randomly selected from across the bacterial kingdom and evaluated based on RNAmountAlign similarity score. Inter-edge density was calculated between aptamer types across a range of RNAmountAlign thresholds for the GCV regulating set (A) and the TP regulating set (B). Dotted red line on graphs indicate the threshold at which the clusters were visualized in Fig 5. Only singletons that could be classified as type-1 or type-2 were included in this set.
Fig 5

Clustering of random glycine riboswitch aptamers across the bacterial kingdom.

A) Network visualization of 150 randomly chosen aptamers regulating GCV and clustered based on RNAmountAlign pairwise similarity (threshold -5). B) Network visualization of 150 randomly chosen aptamers regulating TP and clustered based on RNAmountAlign pairwise similarity (threshold -5). Only singletons that could be classified as type-1 or type-2 were included in this set.

(EPS) Click here for additional data file.

Clustering of random riboswitch aptamers regulating GCV using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign inter-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign inter-edge density across a range of -500 to 1500. C) Clustal Omega inter-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Clustering of random riboswitch aptamers regulating TP using Dynalign, FoldAlign, Clustal Omega, and RNApdist.

A) Dynalign inter-edge density across a range of -500 to 0 (x-axis reversed to display decreasing density). B) FoldAlign inter-edge density across a range of -500 to 1500. C) Clustal Omega inter-edge density across a range of 0 to 100. D) RNApdist inter-edge density across a range of 0 to 100 (x-axis reversed to display decreasing density). (EPS) Click here for additional data file.

Phylogenetic tree of genes coding for gcvT (aminomethyltransferase) and regulated by a glycine riboswitch.

53 gcvT (aminomethyltransferase) genes regulated by glycine riboswitches were aligned using MUSCLE. RAxML was then used to generate a phylogenetic tree. Tips were then colored based on the phyla of bacteria the gene originates from (Firmicutes are red, Proteobacteria are blue). Bootstrap support values are displayed for 100 replicates when > = 70. B) All trees are midpoint rooted. (EPS) Click here for additional data file.

Phylogenetic tree of genes coding for sodium:amino-acid symporters and regulated by a glycine riboswitch.

80 sodium:amino-acid symporter genes regulated by glycine riboswitches were aligned using MUSCLE. RAxML was then used to generate a phylogenetic tree. Tips were then colored based on the phyla of bacteria the gene originates from (Firmicutes are red, Proteobacteria are blue, Actinobacteria are green). Bootstrap support values are displayed for 100 replicates when > = 70. B) All trees are midpoint rooted. (EPS) Click here for additional data file.

48 Bacillaceae and 37 Vibrionaceae tandem riboswitch sequences used for phylogenetic analysis.

(XLSX) Click here for additional data file.

48 Bacillaceae tandem riboswitch aptamer-1 sequences used for phylogenetic analysis.

(XLSX) Click here for additional data file.

48 Bacillaceae tandem riboswitch aptamer-2 sequences used for phylogenetic analysis.

(XLSX) Click here for additional data file.

168 Bacillaceae tandem riboswitch aptamer sequences used for graph clustering analysis.

(XLSX) Click here for additional data file.

72 Vibrionaceae tandem riboswitch aptamer sequences used for graph clustering analysis.

(XLSX) Click here for additional data file.

782 glycine riboswitch singleton aptamer sequences labeled by type.

(XLSX) Click here for additional data file.

436 Bacilli riboswitch aptamer sequences for graph clustering analysis.

(XLSX) Click here for additional data file.

124 Bacilli riboswitch aptamer sequences from aptamer-1 and singleton type-1 sub-cluster with paired aptamer-2 supplemented in.

(XLSX) Click here for additional data file.

35 Bacilli riboswitch aptamer sequences from aptamer-2 and singleton type-2 sub-cluster with paired aptamer-1 supplemented in.

(XLSX) Click here for additional data file.

40 randomly selected aptamers used as out-group for inter-edge network density.

(XLSX) Click here for additional data file.

606 Actinobacteria riboswitch aptamer sequences used for graph clustering analysis.

(XLSX) Click here for additional data file.

50 Actinobacteria riboswitch aptamer sequences from aptamer-1 and singleton type-1 sub-cluster with paired aptamer-2 supplemented.

(XLSX) Click here for additional data file.

150 riboswitch aptamer sequences regulating GCV for graph clustering analysis.

(XLSX) Click here for additional data file.

150 riboswitch aptamer sequences regulating TP for graph clustering analysis.

(XLSX) Click here for additional data file.

53 gene sequences used for generating gcvT (aminomethyltransferase) gene tree.

(XLSX) Click here for additional data file.

80 gene sequences used for generating sodium:amino-acid symporter gene tree.

(XLSX) Click here for additional data file. 7 Oct 2019 Dear Dr Meyer, Thank you very much for submitting your manuscript, 'Regulatory context drives conservation of glycine riboswitch aptamers', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved. We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. If you have any questions or concerns while you make these revisions, please let us know. Sincerely, Shi-Jie Chen Associate Editor PLOS Computational Biology William Noble Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Crum, Ram-Mohan and Meyer have explored the evolutionary relationships between singleton and tandem glycine, using a number of different approaches. This provides some evidence for which switch (5` or 3`) of the tandem riboswitches is governing gene expression. The number of phylogenetically informative sites in short RNAs is generally very small, hence trees for these are often quite poor. With doublet evolutionary models [1,2], using INDEL-aware evolutionary models [3] and flanking sequence [4] being three strategies that have been employed for addressing this problem. The authors have used a number of approaches to ensure that their results are robust, including more traditional methods (e.g. RaxML) and non-traditional approaches such as RNAMountAlign. I am slightly worried about the latter, if it's based upon the mountain metric proposed by Moulten et al (2000), then there is an issue that the outer-most basepairs in stems have too large an impact on the result, furthermore the "correction" proposed by Moulton et al, fails to correct the problem (based upon my own simulations). While this shouldn't be a problem for this study, I suggesting treating this method with caution. MAJOR COMMENTS: 1. extensive use of graph clustering has been used, was bootstrapping or similar employed to ensure the clusters are robust? What about alternative clustering approaches e.g. MCL or DBSCAN? 2. If the tandem alignment/CM had been split into 5` and 3` halves, then the two half CMs could have also been used to determine which which half the singletons are likely derived from. 3. It is assumed that singletons are exclusively derived from either a 5` or 3` half of a tandem ancestor, yet half 5` + half 3` hybrids have not been excluded as a possibility. E.g. tandem 5555555555555555555lllll33333333333333333333 5p-single 5555555555555555555------------------------- 3p-single ------------------------33333333333333333333 5p3p-hybrid 5555555555------------------------3333333333 4. I didn't really understand the network density plots, are they flipped cumulative distributions? 5. Figures 2, 3 5 and 6 are very similar, are they all necessary? MINOR COMMENTS: 1. what was the distribution of distances between the 2 halves of tandem models? This would help justify the "100 nucleotides" threshold. 2. Ralee is often used as well as R2R in order to curate alignments, was this also tried? 3. Were the GA thresholds provided by Rfam not useful, as opposed to an E-value threshold? 4. Were expression platforms included or excluded from the CM models? 5. Pg. 19, had forgotten what a "ghost" aptamer is by the time I encountered it again. REFERENCES: 1. Schöniger M, von Haeseler A. Toward assigning helical regions in alignments of ribosomal RNA and testing the appropriateness of evolutionary models. Journal of molecular evolution. 1999 Nov 1;49(5):691-8. 2. Gesell T, Von Haeseler A. In silico sequence evolution with site-specific interactions along phylogenetic trees. Bioinformatics. 2005 Dec 6;22(6):716-22. 3. Rivas E, Eddy SR. Probabilistic phylogenetic inference with insertions and deletions. PLoS computational biology. 2008 Sep 19;4(9):e1000172. 4. Pignatelli M, Vilella AJ, Muffato M, Gordon L, White S, Flicek P, Herrero J. ncRNA orthologies in the vertebrate lineage. Database. 2016 Jan 1;2016. http://europepmc.org/articles/PMC4792531 Reviewer #2: review uploaded as an attachment ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Paul P. Gardner Reviewer #2: No Submitted filename: Review_PLoS-CompBio_2019.pdf Click here for additional data file. 7 Nov 2019 Submitted filename: Crum2019_PLoS_Rebuttal.docx Click here for additional data file. 25 Nov 2019 Dear Dr Meyer, We are pleased to inform you that your manuscript 'Regulatory context drives conservation of glycine riboswitch aptamers' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org). Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology. Sincerely, Shi-Jie Chen Associate Editor PLOS Computational Biology William Noble Deputy Editor PLOS Computational Biology Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I am satisfied by the responses to my comments. Good job. Reviewer #2: The authors have satisfactorily addressed all my queries and I therefore recommend that the manuscript be accepted for publication. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 13 Dec 2019 PCOMPBIOL-D-19-01503R1 Regulatory context drives conservation of glycine riboswitch aptamers Dear Dr Meyer, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
  72 in total

Review 1.  Genetic control by metabolite-binding riboswitches.

Authors:  Wade C Winkler; Ronald R Breaker
Journal:  Chembiochem       Date:  2003-10-06       Impact factor: 3.164

2.  Identification of a tertiary interaction important for cooperative ligand binding by the glycine riboswitch.

Authors:  Thanh V Erion; Scott A Strobel
Journal:  RNA       Date:  2010-11-23       Impact factor: 4.942

Review 3.  Riboswitches and the RNA world.

Authors:  Ronald R Breaker
Journal:  Cold Spring Harb Perspect Biol       Date:  2012-02-01       Impact factor: 10.005

4.  Riboswitch Scanner: an efficient pHMM-based web-server to detect riboswitches in genomic sequences.

Authors:  Sumit Mukherjee; Supratim Sengupta
Journal:  Bioinformatics       Date:  2015-10-30       Impact factor: 6.937

5.  RiboD: a comprehensive database for prokaryotic riboswitches.

Authors:  Sumit Mukherjee; Sukhen Das Mandal; Nikita Gupta; Matan Drory-Retwitzer; Danny Barash; Supratim Sengupta
Journal:  Bioinformatics       Date:  2019-09-15       Impact factor: 6.937

6.  Two glycine riboswitches activate the glycine cleavage system essential for glycine detoxification in Streptomyces griseus.

Authors:  Takeaki Tezuka; Yasuo Ohnishi
Journal:  J Bacteriol       Date:  2014-01-17       Impact factor: 3.490

7.  Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis.

Authors:  Laraib Malik; Fatemeh Almodaresi; Rob Patro
Journal:  Bioinformatics       Date:  2018-10-01       Impact factor: 6.937

8.  Automated RNA structure prediction uncovers a kink-turn linker in double glycine riboswitches.

Authors:  Wipapat Kladwang; Fang-Chieh Chou; Rhiju Das
Journal:  J Am Chem Soc       Date:  2012-01-12       Impact factor: 15.419

9.  R2R--software to speed the depiction of aesthetic consensus RNA secondary structures.

Authors:  Zasha Weinberg; Ronald R Breaker
Journal:  BMC Bioinformatics       Date:  2011-01-04       Impact factor: 3.169

10.  RNAscClust: clustering RNA sequences using structure conservation and graph based motifs.

Authors:  Milad Miladi; Alexander Junge; Fabrizio Costa; Stefan E Seemann; Jakob Hull Havgaard; Jan Gorodkin; Rolf Backofen
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

View more
  3 in total

Review 1.  Structural Insights into RNA Dimerization: Motifs, Interfaces and Functions.

Authors:  Charles Bou-Nader; Jinwei Zhang
Journal:  Molecules       Date:  2020-06-23       Impact factor: 4.411

2.  The asymmetry and cooperativity of tandem glycine riboswitch aptamers.

Authors:  Chad D Torgerson; David A Hiller; Scott A Strobel
Journal:  RNA       Date:  2020-01-28       Impact factor: 4.942

Review 3.  Siblings or doppelgängers? Deciphering the evolution of structured cis-regulatory RNAs beyond homology.

Authors:  Elizabeth C Gray; Daniel M Beringer; Michelle M Meyer
Journal:  Biochem Soc Trans       Date:  2020-10-30       Impact factor: 5.407

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.