Literature DB >> 31548338

Genetic robustness of let-7 miRNA sequence-structure pairs.

Qijun He¹, Fenix W Huang¹, Christopher Barrett^1,2, Christian M Reidys^1,3.

Abstract

Genetic robustness, the preservation of evolved phenotypes against genotypic mutations, is one of the central concepts in evolution. In recent years a large body of work has focused on the origins, mechanisms, and consequences of robustness in a wide range of biological systems. In particular, research on ncRNAs studied the ability of sequences to maintain folded structures against single-point mutations. In these studies, the structure is merely a reference. However, recent work revealed evidence that structure itself contributes to the genetic robustness of ncRNAs. We follow this line of thought and consider sequence-structure pairs as the unit of evolution and introduce the spectrum of extended mutational robustness (EMR spectrum) as a measurement of genetic robustness. Our analysis of the miRNA let-7 family captures key features of structure-modulated evolution and facilitates the study of robustness against multiple-point mutations.

Keywords: genetic robustness; let-7 miRNA; multiple-point mutation; sequence–structure pairs

Mesh：

Substances：
MicroRNAs

Year: 2019 PMID： 31548338 PMCID： PMC6859847 DOI： 10.1261/rna.065763.118

Source DB: PubMed Journal: RNA ISSN： 1355-8382 Impact factor: 4.942

INTRODUCTION

Genetic robustness can be characterized in terms of the variation of phenotype distribution induced by genotypic change (Schlichting and Pigliucci 1998; de Visser et al. 2003; Gu et al. 2003) and concerns the insensitivity of a phenotype to genetic changes. Mutational robustness has been studied in the context of noncoding RNA (ncRNA) (Borenstein and Ruppin 2006; Rodrigo and Fares 2012). RNA consists of a single strand of nucleotides (A, C, G, U) that can fold and bond to itself through base-pairing. ncRNAs are known to function in aptamer binding as riboswitches, in chemical catalysis as ribozymes, and in RNA splicing such as spliceosome (Breaker and Joyce 1994; Breaker 1996; Serganov and Patel 2007; Darnell 2011). Most importantly, it is the folded structure that is underlying all these mechanisms, allowing for the interaction with and subsequent modification of other biological molecules. The self-folding of RNA makes it an ideal object to study genotype–phenotype relations. The well-established energy-based prediction of RNA secondary structure, a two-dimensional coarse grain of the real three-dimensional structure, makes such studies feasible (Waterman 1978; Zuker and Stiegler 1981; Hofacker et al. 1994; Mathews et al. 1999). Structures are an important determinant of the function of ncRNAs, whence the robustness of an RNA can be characterized in terms of the variation of secondary structure distribution (i.e., the stability of a secondary structure in the face of genetic sequence changes). Structural robustness of ncRNAs is considered to be a key component of the fitness of the molecule, and much research has been conducted to identify the footprints of natural selection on secondary structures of sncRNAs (Borenstein and Ruppin 2006; Rodrigo and Fares 2012). Borenstein and Ruppin (2006) define neutrality of an RNA sequence σ = a1 a2…a by η(σ) = 1 − (d)/n, where (d) denotes the average, taken over all 3n single-point mutants of σ, of the base pair distance d between the minimum free energy (MFE) structure, S0, of σ and the MFE structures of single-point mutants. The RNA sequence, σ, is then defined to be robust if η(σ) is greater than the average neutrality of 1000 control sequences generated by the program RNAinverse (Lorenz et al. 2011), which fold into the same target structure, S0. The main finding of Borenstein and Ruppin (2006) is that precursor miRNAs (pre-miRNA) exhibit a significantly higher level of mutational robustness than random RNA sequences, having the same structure. Subsequently, Rodrigo and Fares (2012) undertook a similar analysis for bacterial small RNAs. Their main finding was that, surprisingly, bacterial sncRNAs are not significantly more robust when compared with 1000 sequences having the same structure, as computed by RNAinverse. Rodrigo and Fares (2012) based their findings on the notion of ensemble diversity defined earlier in Gruber et al. (2008). In the above-mentioned papers, robustness is defined by taking the average of all 3n single-point mutants, the underlying assumption being that all 3n single-point mutations are equally likely to occur and, in addition, taking exclusively single-point mutations into account. Incorporation of multiple-point mutations poses obvious difficulties, because the number of sequences that need to be taken into consideration will grow exponentially. The neutral theory of Kimura (1983) stipulates that evolution is achieved by neutral mutations (i.e., by mutations that are necessarily compatible with the underlying structure). This is in accordance with recent findings showing that secondary structures have a genuine influence on selection in RNA genes. In Mimouni et al. (2008), the authors classify stem positions into structural classes and validate that they are under different selective constraints. In Price et al. (2011), the authors observe neutral evolution in Drosophila miRNA and evolution increasing the thermal dynamic stability of the RNA. Pollard et al. (2006b) and Beniaminov et al. (2008) study Human Accelerated Regions (HARs) in brains of primates—that is, noncoding RNAs with an accelerated rate of nucleotide substitutions along the lineage between human and chimpanzee. Beniaminov et al. (2008) concludes that this increased rate of nucleotide substitutions is of central importance for the evolution of the human brain. The most divergent of these regions, HAR1, has been biochemically confirmed to fold into distinct RNA secondary structures in human and chimpanzee. Interestingly, the mutations in the human HAR1 sequence, compared to the chimpanzee sequence, stabilize their respective RNA structure (Pollard et al. 2006a), suggesting a shape modulated evolution. Recently, Barrett et al. (2017) proposed a framework considering RNA sequences and their RNA secondary structures simultaneously. This gives rise to an information-theoretic framework for RNA sequence–structure pairs. In particular, the authors studied the “dual Boltzmann distribution”—that is, the Boltzmann distribution of sequences with respect to a fixed structure. The authors develop a Boltzmann sampler of sequences with respect to a given structure and study the “inverse folding rate” (IFR) of the sampled sequences—that is, the proportion of the sampled sequences whose MFE structure equals to the given structure. Barrett et al. (2017) reports that natural structures have higher IFR than random structures, suggesting that natural structures have higher intrinsic robustness than random structures. This suggests an approach to consider sequences and structures—as pairs—as the unit of evolution, instead of just sequences in isolation. In this article, we generalize the notion of inverse folding rate (IFR) and introduce a novel profile, the extended mutational robustness spectrum (EMR spectrum), of a sequence–structure pair. By construction, this spectrum entails both sequence and structure information. The key idea is that mutations will be biased by the underlying thermodynamic energy of the respective structure, instead of being equally likely to occur. In this article, we study the let-7 family miRNA, first discovered in Caenorhabditis elegans and functionally conserved along the evolutionary spectrum, from worm to human (Tanzer and Stadler 2004). First, let-7 miRNAs are a class of RNAs of particular interest: They are small endogenous, noncoding RNAs that regulate gene expression at the translation level, playing important roles in regulating cell proliferation and differentiation during development in different species. let-7 is widely viewed as a tumor suppressor miRNA. In particular, deregulation of let-7 family miRNAs has been observed in a variety of cancer types (Takamizawa et al. 2004; Dahiya et al. 2008; O'Hara et al. 2009). The short, mature miRNAs (∼22 nt), that directly regulate target gene expression, originate from longer RNA precursor molecules that fold into a stem–loop hairpin structure. The secondary structure of let-7 precursor miRNAs plays a crucial role in the gene maturation process (Lee et al. 1993; Belter et al. 2014). The stem–loop structure has been under evolutionary pressures to be conserved. Second, the RNA secondary structure of let-7 miRNAs can be derived by secondary structure prediction algorithms as in Lorenz et al. (2011), rendering it to be particularly suited for computational studies. Folding algorithms using the nearest neighbor thermodynamic model (NNTM) have high accuracy for short sequences like let-7 precursor miRNAs (typically <150 nt) (Doshi et al. 2004; Mathews et al. 2004). As a result, the predicted structure is of biological relevance and can be considered as a meaningful representation of the phenotype. Furthermore, prediction algorithms allow us to analyze the structural, mutational robustness by comparing the predicted structure of original sequence with the predicted structure of the mutants. These features make the let-7 gene family a particularly suitable test bed for studying the evolution of genetic robustness. We organize our study as follows: First, we shall extend the analysis of Borenstein and Ruppin (2006) to EMR spectra and observe that most native sequence–structure pairs have a higher EMR spectrum than sequence–structure pairs obtained by the inverse folding algorithm. We shall investigate different aspects of the robustness of EMR spectra of native sequence–structure pairs and show that these are distinctively more robust against multiple-point mutations. Second, we conduct cross-species comparisons of native sequence–structure pairs, observing that higher metazoan species have higher EMR spectra. Our analysis suggests that EMR spectra are being increased in the course of shape-modulated evolution.

RESULTS

We study a novel profile—that is, the EMR spectrum (-spectrum; see Materials and Methods) of sequence–structure pairs . The -spectrum is the family of mutational robustness indexed by the Hamming distance, k. We shall analyze the -spectrum, for 1 ≤ k ≤ 20. is the fraction of k-point mutants of σ that fold back to S and thus characterizes the mutational robustness of sequence–structure pair (σ, S). is computed by Boltzmann sampling. We study mutational robustness of native let-7 sequence–structure pairs in two ways: first by comparison with pairs obtained by inverse folding and second by cross-species comparisons of the entire let-7 gene family.

Robustness of native let-7 sequence–structure pairs

Borenstein and Ruppin (2006) report that miRNA sequences exhibit a high level of neutrality in comparison with random sequences folding into a similar structure, suggesting that native miRNA sequences are more robust against single-point mutations. These results motivate a further analysis of native and random sequences, in particular whether or not this neutrality is confined to a local neighborhood. To this end we uniformly select 50 of the 401 native sequence–structure pairs of the let-7 miRNA family (see Supplemental Figs. 2 and 3). For each such native pair (σ, S), we derive 100 random sequences, σ*, whose MFE structure is identical to S, as a control set. These control sequences are computed using the inverse folding algorithm presented in(Levin et al. (2012), Garcia-Martin et al. (2016), and Barrett et al. (2017). Given structure S, we compute the partition function of all sequences with respect to S. We then Boltzmann-sample sequences and filter (by rejection) those sequences, which fold into S. We then compute the -spectra for the inverse folding solutions and compare, for each native sequence–structure pair (σ, S), the -spectrum with the -spectrum—that is, the mean of all 100 spectra, , taken at each respective k. We call (σ, S) k-robust, if . We then check the significance of k-robustness via Z-tests, assuming is normally distributed.[4] We call (σ, S) significantly k-robust, if P < 0.05. In Figure 1 we depict the -spectrum of api-let-7 (Acyrthosiphon pisum) sequence–structure pair to illustrate k-robustness. In this example, api-let-7 is k-robust for 1 ≤ k ≤ 20. Furthermore, api-let-7 is significantly k-robust for 5 ≤ k ≤ 20.

FIGURE 1.

The spectrum, where the x-axis denotes the Hamming distance and the y-axis the inverse folding rate. The spectrum of the api-let-7 (Acyrthosiphon pisum) gene (yellow) and the spectrum of means of the corresponding control set, (blue). The error bar denotes the standard deviation of the control set of for each k. Because , api-let-7 is k-robust, for all 1 ≤ k ≤ 20. decreases slower than , as k increases. In Table 1 we summarize the data on k-robustness of the 50 let-7 sequence–structure pairs included in our study. We observe that 96% of the selected let-7 miRNAs are 1-robust. This observation is consistent with Borenstein and Ruppin (2006), providing further evidence that native let-7 sequence–structure pairs are robust against single-point mutations. However, the mutational robustness of native let-7 sequence–structure pairs is not restricted to single-point mutations: For all 1 ≤ k ≤ 20, we observe that k-robustness holds for >90% of the selected sequence–structure pairs.

TABLE 1

k-robustness of native sequence–structure pairs

k-robustness of native sequence–structure pairs This suggests that the mutational robustness of let-7 sequence–structure pairs is not a local phenomenon. Furthermore, the percentage of significantly k-robust genes increases as k increases (see Table 1). In Figure 1, we display the spectrum of the api-let-7 sequence–structure pair, illustrating the aforementioned phenomenon. The api-let-7 sequence–structure pair is 1-robust but not significantly 1-robust. However, for 5 ≤ k ≤ 20, the api-let-7 sequence–structure pair is significantly k-robust. We proceed by considering for each native sequence–structure pair, r, its rank among the values, respectively. Figure 2 present the distribution r, for k = 1, 5, 20. Examination of the rank distribution shows that native pairs exhibit a propensity toward high ranks for each k, supporting the observation of k-robustness. As k increases, the propensity toward high ranks becomes more and more pronounced. In the case of k = 1, only four out of 50 native sequence–structure pairs have . For k = 20, however, this quantity increases to 23. This is consistent with the increasing percentage of significantly k-robust genes as k increases, further demonstrating that native let-7 sequence–structure pairs exhibit mutational robustness against multiple-point mutations.

FIGURE 2.

The distribution of ranks, r, of the native sequence–structure pairs . The x-axis denotes ranking and the y-axis frequency, k = 1 (left), k = 5 (center), and k = 20 (right). For each native sequence–structure pair, (σ, S), the is ranked among 100 values, where σ* is a random sequence whose MFE structure is S. By comparing specific native sequence–structure pairs with the respective control pairs (obtained by inverse folding), we observe that native sequence–structure pairs exhibit in general higher values. In this analysis, the sequence–structure pair (σ, S) is fixed and we contrast native pairs with pairs obtained via random inverse folded sequences. We can augment this analysis by considering the ensemble of spectra of native pairs versus the ensemble of all inverse folded sequence–structure pairs. In that, for any fixed k, we can integrate the information of all native pairs contrasting this with the integrated information of the inverse folded pairs. Figure 3 displays these two distributions: of native pairs and of the inverse folded pairs for k = 1, 5, 20. For each k, we not only observe that the mean of the is greater than that of the terms , but also that the distributions of and are distinctively different. Furthermore, the difference between the two distributions for each k is statistically significant (see Table 2, where the P-value is calculated by two-tailed Wilcoxon signed-rank tests for paired data).

FIGURE 3.

TABLE 2

Distinct distribution of and

The distribution of of native pairs (blue) and of the control sets (yellow). The x-axis denotes the inverse folding rate and the y-axis denotes the frequency, k = 1 (left), k = 5 (center), and k = 20 (right). Distinct distribution of and The increase of significantly k-robust native sequence–structure pairs for increasing k, as well as the tendency of native to assume high ranks, for increasing k, suggest that k-robustness of native sequence–structure pairs, 2 ≤ k ≤ 20, is not a byproduct of their 1-robustness. In other words, the high of native sequence–structure pairs, 2 ≤ k ≤ 20, is not solely obtained by optimizing their . To further demonstrate this point, for each native sequence–structure pair (σ, S), we collect the random sequence σ′ exhibiting the highest among the . A comparative analysis of the values with the corresponding values of the native , shows that for k = 1, 46 out of the 50 σ′ sequences exhibit an value exceeding that of the corresponding value. For k = 20, in contrast, that number[5] drops to 16. In analogy to our previous comparative analysis of versus rankings, we proceed by computing for each respective σ′ the rank among values. Integrating over all σ′ sequences, Figure 4 presents the distribution of the ranks, as well as the previously derived (see Fig. 2) distribution of the native ranks. We observe that the distributions of the ranks and ranks are distinctively different. exhibits a propensity toward higher ranks compared to . These findings suggest that there exist factors beyond 1-robustness that contribute to the k-robustness of native sequence–structure pairs.

FIGURE 4.

The distribution of ranks (yellow) and ranks (blue). The x-axis denotes ranking and the y-axis frequency. For each native sequence–structure pair (σ, S), σ is the corresponding native sequence, and σ′ is the random sequence with the highest . The and the are ranked among 100 values, where σ* is a random sequence whose MFE structure is S. We finally study the role of the free energy for k-robustness. Bonnet et al. (2004) reports that miRNA exhibits increased thermodynamic stability and furthermore that mutations in the HAR further stabilize the structure (Pollard et al. 2006a). This gives rise to the question whether increased k-robustness is a result of the increased thermodynamic stability of miRNAs. To test this, we select a native sequence–structure pair from the miRNA let-7 family and generate two sets of sequences—Σ and Σ—each consisting of 30 sequences that fold into S, having higher and lower energy than the native pair, respectively. It turns out, that sequences generated by RNAinverse (Lorenz et al. 2011) tend to have higher energy than the native pair, whereas sequences generated by the dual Boltzmann sampler, filtered by rejection to fold into S, tend to have lower energy. We compute the mean and the standard deviation of the spectra of the sequences of Σ and Σ, respectively, and integrate our findings in Figure 5.

FIGURE 5.

Spectra of a native sequence–structure pair versus those of inverse folding solutions having lower and higher energies. We display the spectrum of the native, let-7 miRNA pair (yellow), the spectrum of the mean of inverse fold solutions having higher (red) and lower (blue) energy, respectively, and their standard deviations. The x-axis denotes the Hamming distance and the y-axis the inverse folding rate. Figure 5 shows that the -spectrum of the native pair is distinctively higher than the ones derived from Σ and Σ. We then perform a correlation analysis between the -spectrum and energy. Let e(σ, S) denote the free energy of (σ, S). For the 61 pairs mentioned above, the Spearman's rank correlation coefficient between and e(σ, S) is −0.16. For and e(σ, S), the correlation coefficient is −0.10. Both quantities exhibit practically no correlation. We have confirmed this result for 10 additional native sequence–structure pairs. The findings suggest that thermodynamic stability is not the sole factor—native pairs have been selected for. The mutational robustness of native pairs is not a byproduct of evolving toward thermodynamic stability.

Robustness of metazoan species

Freilich et al. (2010) studies genetic robustness of networks of bacterial genes, observing variations among species in their level of genetic robustness, reflecting adaptations to different ecological niches and lifestyles. The genetic robustness of a network refers to its ability to buffer mutations via the existence of alternative pathways. The species-specific variations raise the question whether such variations can also be observed for the -spectra across different animal species. Are -spectra to some extent a reflection of phylogenetic relationships? Herein, we perform an evolutionary analysis of the -spectra of let-7 miRNA sequence–structure pairs across different animal species. We first perform our analysis on six selected taxa: Homo sapiens, Pan paniscus, Anolis carolinensis, Ciona, Drosophila, and Chromadorea. There are 12, 12, 11, 11, 14, and seven let-7 genes found in the miRBase within each taxon, respectively. For almost all 1 ≤ k ≤ 20, we observe the mean value within each taxa to decrease from H. sapiens to Chromadorea (see Figure 6 and Table 3). The only exceptions are the values of P. paniscus, which become slightly larger than those of H. sapiens, for 14 ≤ k ≤ 20. We observe that “higher” metazoan species, such as H. sapiens, P. paniscus, A. carolinensis, and Ciona, all of which, being Chordata, exhibit larger values than “lower” metazoan species, such as Drosophila and Chromadorea, all of which being Ecdysozoa. In addition, the difference of values among evolutionary closely related species, such as H. sapiens, P. paniscus, and A. carolinensis, are small. However, comparing taxa less related in the phylogenetic tree of life, the difference becomes significant. Statistical tests (two-tailed Wilcoxon rank-sum test) are conducted to quantify the statistical significance of this difference within the let-7 genes value distribution among these six taxa (see Figure 7), for k = 1, 5, 20.

FIGURE 6.

TABLE 3.

Mean within each taxa at each k

FIGURE 7.

Statistical significance (P-value) of the difference within the let-7 genes distribution between taxa, for k = 1 (left), k = 5 (middle), and k = 20 (right). P-values denote the probability of observing a larger difference in the distribution between two corresponding taxa at random, assuming they are drawn from the same probability distribution, computed by a two-tailed Wilcoxon rank-sum test. Differences between two taxa that are statistically significant (P < 0.05) are highlighted, and phylogenetic relations among the six taxa are displayed.

Mean within each of the six taxa, from left to right: Homo sapiens, Pan paniscus, Anolis carolinensis, Ciona, Drosophila, and Chromadorea. The x-axis denotes the Hamming distance and the y-axis the inverse folding rate. The relative ordering of nvalues among taxa is consistent for each k (from high to low: H. sapiens, P. paniscus, A. carolinensis, Ciona, Drosophila, and Chromadorea, the only exception being values of P. paniscus becoming slightly larger than those of H. sapiens, for 14 ≤ k ≤ 20). Statistical significance (P-value) of the difference within the let-7 genes distribution between taxa, for k = 1 (left), k = 5 (middle), and k = 20 (right). P-values denote the probability of observing a larger difference in the distribution between two corresponding taxa at random, assuming they are drawn from the same probability distribution, computed by a two-tailed Wilcoxon rank-sum test. Differences between two taxa that are statistically significant (P < 0.05) are highlighted, and phylogenetic relations among the six taxa are displayed. Mean within each taxa at each k The results of statistical tests at k = 1, 5, 20 demonstrate that distributions of the six taxa reflect the complexity of the organisms and their phylogenetic relations. H. sapiens, P. paniscus, and A. carolinensis almost always exhibit significantly higher values than Drosophila and Chromadorea, for k = 1, 5, 20. On the other hand, differences among H. sapiens, P. paniscus, and A. carolinensis are statistically insignificant for k = 1, 5, 20. However, we observe, for k = 20 insignificant differences between H. sapiens, P. paniscus, and A. carolinensis with Drosophila, though the P-values are very close to 0.05. A more distinguished variation is observed comparing Ciona with Drosophila. In the case of k = 1, the difference is close to being significant with p = 0.0554, but for k = 20, the difference is insignificant with p = 0.5417. We proceed by conducting the above analysis for all 401 let-7 sequence–structure pairs of the miRBase. let-7 genes have been found in 82 metazoan species. Multiple homologous miRNA genes are commonly observed in vertebrates, whereas single miRNA let-7 genes were more common in other animal species. The phylogenetic tree of all 82 species is constructed using the “Interactive Tree Of Life” (iTOL) (Letunic and Bork 2006, 2016), based on NCBI taxonomy (Federhen 2011). In addition, iTOL determines taxonomic classes of all internal nodes. Given the phylogenetic tree, we not only compute the mean values of the let-7 sequence–structure pairs within each specific species, but also the mean values within taxa, corresponding to the internal nodes of the phylogenetic tree. Figure 8 depicts a subtree of the phylogenetic tree that includes the major taxa as well as the mean values within each taxon, for k = 1, 5, 20. The complete phylogenetic tree is presented in Supplemental Figures 4–6.

FIGURE 8.

Phylogenetic tree of the major taxa and mean of the let-7 sequence–structure pairs within each taxon, for k = 1, 5, 20. Mean value for k = 1 (right), k = 5 (center), and k = 20 (left).

Phylogenetic tree of the major taxa and mean of the let-7 sequence–structure pairs within each taxon, for k = 1, 5, 20. Mean value for k = 1 (right), k = 5 (center), and k = 20 (left). For let-7 miRNAs, we observe that higher metazoan species typically exhibit higher values as well as significant differences in the distribution across taxa. For instance, distributions of Deuterostomia and Protostomia are found to be significantly different (P = 0.00016, 0.00058, 0.00033, for k = 1, 5, 20, respectively, calculated by two-tailed Wilcoxon rank-sum tests).

DISCUSSION

In this article, we augment the analysis of genetic sequences by incorporating structural information. The information represented by structures is distinctively different from that represented by sequences. Structures encode relations between pairs of loci. Such a relation can be realized in multiple ways—that is, by the bond between loci i and j by AU, UA, GU, UG, CG, and GC. In that, one can expect a meaningful enhancement of the sequence information. Incorporating structural information (i.e., considering sequences and structures combined) provides new ways to analyze genetic material. We investigate mutational robustness of let-7 miRNA, making use of this perspective, and introduce the -spectrum, a novel observable, that quantifies mutational robustness. The -spectrum depends on both the reference sequence and structure and allows us to delocalize the study of mutational robustness beyond single-point mutations. Our study provides evidence for direct evolution of increased robustness in let-7 miRNAs, by comparing native let-7 miRNA sequence–structure pairs with the control pairs obtained by inverse folding algorithms. It provides evidence that mutational robustness is not local (i.e., it cannot be deduced from or restricted to single-point mutations). On the contrary, robustness effects become more pronounced in higher Hamming distances: The percentage of significantly k-robust, native let-7 sequence–structure pairs increases as the Hamming distance k increases. By conducting a comparative analysis between native sequences and random sequences with high robustness against single-point mutations, we provide evidence that there exist additional factors that contribute to mutational robustness against multiple-point mutations. The spectrum itself contains information that cannot be inferred from a local analysis. The pronounced mutational robustness of native let-7 sequence–structure pairs against multiple-point mutations might play a role in genetic robustness on a population level. The presence of native sequences in a population allows for significant sequence variation within a species while still preserving the phenotype. This would suggest that evolution is not identifying sequences that are locally robust but robust within entire regions of sequence space. An evolutionary analysis of the -spectrum of let-7 miRNAs shows that the spectrum resembles phylogenetic relationships. Statistical tests exhibit that closely related species tend to have similar -spectra, whereas distant species tend to exhibit distinct -spectra. In general, higher-level animal species have higher values than lower-level animal species. The increased mutational robustness in higher-level animals appears to reflect the complexity of the organisms. This phenomenon is likely related to the particular role of let-7. let-7 plays an important role in the differentiation of multiple cell types across multiple organisms (Boyerinas et al. 2010). Higher-level animal species are subject to higher selection pressures because of their increased diversity of specialized cell types (McShea 1996). The let-7 family showcases the usability of the -spectrum. A natural question is whether other noncoding RNAs exhibit mutational robustness, in particular, against multiple-point mutations. In fact, our method can be applied to any noncoding RNA with one conserved, stable secondary structure. We conduct a preliminary study on a HIV-1 dimerization initiation site (DIS). The DIS, which is an essential part of the dimer linkage structure, adopts a stem–loop structure (Paillart et al. 1996). We study the mutational robustness of five randomly selected HIV-1 DIS sequences in Rfam (Griffiths-Jones et al. 2005) (see Supplemental Fig. 7 for the list of sequences) by comparing the -spectrum of the native sequence–structure pair with that of the inverse folded sequence–structure pairs. We observe that four out of five native pairs are k-robust, but not significantly k-robust, for all 1 ≤ k ≤ 20. The only exception is AJ288982: It is not k-robust for 1 ≤ k ≤ 4, but then becomes k-robust for 5 ≤ k ≤ 20 (see Fig. 9). Interestingly, for all five native pairs, even though they are not significantly k-robust, the Z-score of for each native pair increases as k goes from 1 to 10. The increase in Z-score is consistent with the pronounced mutational robustness of native let-7 sequence–structure pairs against multiple-point mutations. Such a phenomenon might be a result of “viral quasispecies effects” (Wilke et al. 2001). The mutational robustness against multiple-point mutations facilitates the existence of a broad and highly connected region in the neutral network around the native sequence (Reidys et al. 1997). Such a region might allow a quasispecies to evolve more efficiently in the presence of selection pressure.

FIGURE 9.

The -spectrum. The x-axis denotes the Hamming distance and the y-axis the inverse folding rate. The -spectrum of the AJ288982 HIV1-DIS gene is shown in yellow, and the spectrum of means of the corresponding control set, is shown in blue. The error bar denotes the standard deviation of the control set of for each k. AJ288982 is not k-robust for 1 ≤ k ≤ 4, but then becomes k-robust for 5 ≤ k ≤ 20 . Our framework is applicable to RNA with one stable secondary structure. In reality, there exist many functional noncoding RNAs with multiple stable secondary structures, such as riboswitches. To extend our framework to study the mutational robustness of these RNAs, one potential challenge is to generate k-mutants that are compatible with multiple secondary structures. A recent approach for positive RNA design, “RNARedPrint” (Hammer et al. 2019), might facilitate such an extension. RNARedPrint supports generic Boltzmann-weighted sampling, which enables the positive design of RNA sequences with specific free energies (for each of the multiple target structures) and GC content. The dynamic programming routine in RNARedPrint can be modified to incorporate Hamming distance filtration. Such a modification enables us to efficiently Boltzmann-sample k-mutants that are compatible with multiple secondary structures. The role of structure in the context of the -spectrum differs substantially from the role of neutrality in Borenstein and Ruppin (2006): Structure is not merely used to measure the effect of the mutation but also affects how sequences mutate. This constitutes effectively a feedback loop between sequence and structure and is arguably the driving factor enhancing the signal in native sequence–structure pairs. We use a discrete metric for measuring the structural change induced by mutation in the definition of the spectrum, instead of the base pair distance. This metric produces stable data: The standard error of by sampling 5000 times is small, ±0.0065, compared to the difference of the -spectra between native and control sequence–structure pairs and the difference in the -spectra across species, respectively. This allows us to analyze efficiently mutational robustness against multiple-point mutations by Boltzmann sampling, obsoleting the need for large sample sizes. In our analysis, we use the MFE structures, predicted by the RNA minimum free-energy folding algorithm, as phenotypes. The study can be enhanced by passing from MFE structures to partition functions of structures with respect to a fixed sequence (McCaskill 1990). One can envision an analysis that considers both the dual partition function, considered here, and the partition function of structures. Traditionally, the “information” of a sequence is being identified with its actual sequence of nucleotides. This perspective results in using sequence alignments to quantify sequence similarities, the underlying assumption being that similar sequences should exhibit similar, biological properties. The sequence, however, does not represent the entire picture. Huynen et al. (1996) study the ruggedness of genotype to phenotype maps and show that similar sequences can exhibit distinctly different phenotypes. Furthermore, in the context of sequence design, sequences that fold into the same structure are considered to be equal (Busch and Backofen 2006; Lorenz et al. 2011). The presence of neutral networks (Grüner et al. 1996a,b; Reidys et al. 1997) of RNA secondary structures shows that there are complementary sequences that fold into the same structure. In other words, neither does sequence similarity necessarily imply phenotypic similarity nor does sequence dissimilarity imply phenotypic dissimilarity. This motivates one to augment the sequence information by incorporating structural information. The analysis of -spectra represents such an augmentation: The let-7 sequence–structure pairs exhibit a distinctive difference between native and random pairs. As a result, integrating the information of sequences and structures facilitates at minimum the extension of the local analysis conducted in Borenstein and Ruppin (2006) and may as well, as a novel paradigm alone, lead to further biological insights.

MATERIALS AND METHODS

EMR spectrum: a sequence–structure pair profile

In this section we introduce the technical details of our framework, starting with Borenstein and Ruppin's definition of neutrality (Borenstein and Ruppin 2006), as the average of all 3n single-point mutants. By construction, all mutations are equivalent, and structure has no influence on the mutation rate. However, as suggested in Pollard et al. (2006a), Mimouni et al. (2008), and Price et al. (2011), structure genuinely affects mutation rate, and mutations in turn further increase the thermal dynamic stability of the structure. We consider here mutations with respect to a fixed structure and consequently deal exclusively with compatible mutations. Furthermore, the idea of the following is to favor energetically beneficial mutations, while penalizing detrimental mutations. To adequately quantify the above, we revisit some of the basic concepts of the thermodynamic model of RNA secondary structure. The free energy η can be considered as a result of pairing sequences and structures as follows: where and S denote the space of sequences, σ, and the space of secondary structures, S, respectively, and η(σ, S) is the energy of S on σ. This mapping is computed as the sum of the energy contributions of individual base pairs (Nussinov et al. 1978). A more elaborate model (Mathews et al. 1999; Turner and Mathews 2010) evaluates the total free energy to be the sum of the energies of loops involving multiple base pairs. We remark that, if a sequence σ is not compatible with a structure S, then η(σ, S) = +∞. Then MFE-folding is a map from sequences to distinguished structures (i.e., the MFE structures): To incorporate the thermodynamic information into the probability of mutations, we introduce the notion of the Boltzmann distribution of k-point mutants of a sequence–structure pair. To this end, we consider the partition function of k-point mutants of a sequence–structure pair—namely, the partition function of all sequences that are at Hamming distance k to the given sequence with respect to the given structure. Definition 1. Let σ be a sequence of n nucleotides and let S be its associated structure (i.e., its MFE or native structure). Then the partition function of k-point mutants of σ with respect to S is given by where h is the Hamming distance, η(σ′, S) is the energy of S on σ′, R is the universal gas constant, and T is the temperature. Equation (1) represents the “dual” of McCaskill's partition function (McCaskill 1990) with an additional Hamming distance filtration (Barrett et al. 2018). Given this partition function, we are in position to introduce the Boltzmann distribution of k-point mutants of σ with respect to S: The probability of a specific k-point mutant σ* of σ, h(σ*, σ) = k, with respect to S, is This expression allows us to consider Boltzmann weighted mutations, taking into account the free energy, when realizing S. To quantify mutational robustness (i.e., the ability to maintain the structure), we consider all k-point mutants,that fold again into S. We call the fraction of k-point mutants, σ*, that fold into S, the extended mutational robustness (EMR) of the sequence–structure pair (σ, S) at k—that is, we have thus quantifies the mutational robustness of a sequence–structure pair (σ, S), with respect to Boltzmann weighted k-point mutations, taking into account the energy of the sequence, when assuming the structure S. can be viewed as a variation of Borenstein and Ruppin's definition of neutrality. Instead of a uniform distribution of all 3n single-point mutants, a Boltzmann-weighted distribution for these mutants is used. In contrast to using base pair distance as the metric on structure space, we restrict ourselves to a discrete metric. The -spectrum of a sequence–structure pair (σ, S) (i.e., the collection of for varying k) allows us to study multiple-point mutations instead of confining the analysis to single-point mutations. In this study, we shall analyze the -spectrum of sequence–structure pairs of Hamming distances 1 ≤ k ≤ 20. The -spectrum can be viewed as the conditional probability of the IFR in Barrett et al. (2017), conditional to specific Hamming distances. As a result, we have Where , .

Computing the EMR spectrum via Boltzmann sampling

Computing strictly requires folding all σ′ that are S-compatible, such that h(σ, σ′) = k. Although this task is impractical for large k, can be efficiently computed by means of Boltzmann sampling. This is conducted here via the Hamming distance restricted dual Sampler (HamSampler) (Barrett et al. 2018), which facilitates the approximation of for any given sequence–structure pair (σ, S) and Hamming distance k. HamSampler is available at https://github.com/FenixHuang667/HamSampler. HamSampler takes as input (σ, S) and k and outputs sequences σ* having Hamming distance k with probability We then have In this article, is calculated by sampling 5000 sequences and computing their MFE structures. The standard error of measuring by sampling 5000 times is ±0.0065 (derived by computing the of an aae-let-7 sequence–structure pair 100 times). By computing the , for each 1 ≤ k ≤ 20, we obtain the -spectrum of the sequence–structure pairs (σ, S).

miRNA data

We consider 401 miRNA precursor sequences of the let-7 gene family, obtained from the miRBase database (Kozomara and Griffiths-Jones 2013). These originate from 82 different animal species. In vertebrates and urochordates, multiple homologous miRNA genes are commonly observed, whereas single miRNA let-7 genes were more common in other animal species. We provide in the Supplemental Materials (SM) the numbers of the respective let-7 genes (see Supplemental Fig. 1). The secondary structures of these sequences are derived using a MFE folding algorithm, using the Turner energy model (Turner and Mathews 2010). We compute the -spectra, for 1 ≤ k ≤ 20, for all 401 let-7 sequence–structure pairs.

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.

45 in total

Review 1. Perspective: Evolution and detection of genetic robustness.

Authors: J Arjan G M de Visser; Joachim Hermisson; Günter P Wagner; Lauren Ancel Meyers; Homayoun Bagheri-Chaichian; Jeffrey L Blanchard; Lin Chao; James M Cheverud; Santiago F Elena; Walter Fontana; Greg Gibson; Thomas F Hansen; David Krakauer; Richard C Lewontin; Charles Ofria; Sean H Rice; George von Dassow; Andreas Wagner; Michael C Whitlock
Journal: Evolution Date: 2003-09 Impact factor: 3.694

Genetic robustness of let-7 miRNA sequence-structure pairs.

INTRODUCTION

RESULTS

Robustness of native let-7 sequence–structure pairs

Robustness of metazoan species

DISCUSSION

MATERIALS AND METHODS

EMR spectrum: a sequence–structure pair profile

Computing the EMR spectrum via Boltzmann sampling

miRNA data

SUPPLEMENTAL MATERIAL

Review 1. Perspective: Evolution and detection of genetic robustness.

2. INFO-RNA--a fast approach to inverse RNA folding.

Review 3. Ribozymes, riboswitches and beyond: regulation of gene expression without proteins.

4. PERSPECTIVE METAZOAN COMPLEXITY AND EVOLUTION: IS THERE A TREND?

5. Role of duplicate genes in genetic robustness against null mutations.

6. ViennaRNA Package 2.0.

7. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees.

8. Fixed-parameter tractable sampling for RNA design with multiple target structures.

9. A global sampling approach to designing and reengineering RNA secondary structures.

10. miRBase: annotating high confidence microRNAs using deep sequencing data.

1. Knockdown of circROBO2 attenuates acute myocardial infarction through regulating the miR-1184/TRADD axis.