Literature DB >> 22303350

Improved measurements of RNA structure conservation with generalized centroid estimators.

Yohei Okada¹, Yutaka Saito, Kengo Sato, Yasubumi Sakakibara.

Abstract

Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the most stable structure, MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, instead of MFE itself, several alternative measures for identifying ncRNAs have been proposed such as the structure conservation index (SCI) and the base pair distance (BPD), both of which employ MFE structures. However, these measurements are unfortunately not suitable for identifying ncRNAs in some cases including the genome-wide search and incur high false discovery rate. In this study, we propose improved measurements based on SCI and BPD, applying generalized centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that our proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the centroid-based SCI on CLUSTAL W alignments is more accurate than or comparable with that of the original SCI on structural alignments generated with RAF, a high quality structural aligner, for which twofold expensive computational time is required on average. We conclude that our methods are more suitable for genome-wide alignments which are of low quality from the point of view on secondary structures than the original SCI and BPD.

Entities: Chemical Disease Gene Species

Keywords: centroid estimators; non-coding RNAs; structure conservation index

Year: 2011 PMID： 22303350 PMCID： PMC3268607 DOI： 10.3389/fgene.2011.00054

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.599

Introduction

Many studies have recently discovered essential roles of non-protein-coding RNAs (ncRNAs) in cells such as translation, post-transcriptional gene regulation and maturation of rRNAs, tRNAs, and mRNAs (Eddy, 2001; Mattick and Makunin, 2006). Therefore, to identify ncRNAs in genomes and analyze their functions is a crucial task for not only molecular cell biology but also bioinformatics. It is well-known that such biological functions of ncRNAs are deeply related to their secondary structures since most of ncRNA families share consensus secondary structures but contain highly diversed sequences in terms of sequence identity. This means that standard methods for sequence analysis based on the primary sequence may not help ncRNA analysis, and secondary structures should be employed as a key feature of ncRNA analysis. An RNA secondary structure consists of hydrogen-bonded base pairs including the Watson–Crick base pairs (A-U and G-C), the wobble base pairs (G-U), and other non-canonical base pairs. These base pairs stabilize the structure of RNAs in terms of the free energy. Thus, the secondary structure with the minimum free energy (MFE) has been regarded as the most reliable prediction of RNA secondary structures. However, Rivas and Eddy (2000) have indicated that MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, several comparative approaches for identifying ncRNAs have been proposed (Rivas and Eddy, 2001; di Bernardo et al., 2003; Coventry et al., 2004; Washietl et al., 2005b; Pedersen et al., 2006). In general, most of these methods employ the following strategy: (1) construct a pairwise or multiple alignment of two or more RNA sequences; (2) predict whether each mutation in the alignment occurs under a structurally conservative model or an independent model. Rivas and Eddy (2001) have developed QRNA which classifies a given pairwise alignment as one of three models: the coding model (COD) in which substitutions between synonymous codons occur frequently to conserve amino acid sequences, the non-coding model (RNA) in which covariances of base pairs occur frequently to conserve secondary structures, and the others (OTH). Pedersen et al. (2006) have developed EvoFold based on phylo-SCFGs which assume that any mutations on each column of a given multiple alignment would occur under a given phylogenetic tree of sequences, and the mutation on single bases would occur more frequently than that on base pairs in conserved secondary structures. These assumptions improve the accuracy of predicting secondary structures (Knudsen and Hein, 1999). Washietl et al. (2005b) have developed RNAz which detects a structurally conserved region from a multiple alignment by support vector machines. RNAz employs the averaged z-score of MFE for each sequence and the structure conservation index (SCI). The key idea is that MFE for the common secondary structure is close to that for each sequence if a given multiple alignment is structurally conserved. Thus, SCI is defined as the rate of MFE for the common secondary structure to the averaged MFE for each sequence. MFE values for each sequence and the common secondary structure are calculated by RNAfold and RNAalifold in Vienna RNA package (Hofacker, 2003). Gruber et al. (2008) have employed the base pair distance (BPD) as a measurement of structure conservation. BPD has been originally defined as the normalized Hamming distance between two RNA secondary structures (Flamm et al., 2001). In (Gruber et al., 2008), BPD has been shown to be as accurate as SCI. RNAz with SCI has been used for ncRNA screens in several organisms including humans (Washietl et al., 2005a), nematodes (Missal et al., 2006), plasmodium (Mourier et al., 2008), and arabidopsis (Song et al., 2009), showing that RNAz is one of the most accurate tools for identifying ncRNAs. However, for practical use of RNAz on the genome-wide search, a relatively high false discovery rate has unfortunately been estimated (Washietl et al., 2007). It is conceivable that multiple alignments produced by a standard aligner that does not consider any secondary structures are not suitable for identifying ncRNAs in some cases and incur high false discovery rate. Wang et al. (2007) have also suggested that the genome-wide alignments in the UCSC Genome Browser (Kent et al., 2002) produced by MULTIZ (Blanchette et al., 2004) should be improved in some regions for identifying ncRNAs. To improve the accuracy, two strategies can be considered: the one is to employ a structural aligner such as RAF (Do et al., 2008) to produce high quality alignments, and the other is to develop a more robust method against low quality alignments. Since the former strategy will consume impractical execution time for structural alignments, this study takes the latter strategy. Recently, several researchers have studied high-dimensional space estimation based on maximizing expected accuracy (MEA) including sequence alignment and RNA secondary structure prediction (Ding et al., 2005; Do et al., 2005, 2006; Carvalho and Lawrence, 2008; Hamada et al., 2009, 2011), showing that MEA-based estimation is more reliable than the maximum likelihood estimation. CentroidFold and CentroidFold employ one of MEA-based estimators called γ-centroid estimators for predicting RNA (common) secondary structures, and have been shown to be more accurate than other existing tools such as MFE-based methods (Hamada et al., 2009, 2011). Especially, CentroidFold can predict much more accurate common secondary structures for low quality multiple alignments produced by CLUSTAL W (Thompson et al., 1994) than RNAalifold. In this study, we propose improved measurements for structure conservation based on γ-centroid estimators for RNA (common) secondary structure prediction, instead of MFE-based predictions by RNAfold and RNAalifold, to incorporate the robustness against low quality multiple alignments. We call them C-SCI and C-BPD, which use centroid structures instead of MFE structures to calculate SCI and BPD. Our experiments show that the proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the accuracy of C-SCI on CLUSTAL W alignments is comparable with that of the original SCI on RAF alignments for which twofold expensive computational time is required on average.

Materials and Methods

RNA secondary structure prediction with γ-centroid estimator

CentroidFold implements a γ-centroid estimator which predicts secondary structures with the maximum expected accuracy by a kind of posterior decoding methods on the base-pairing probability matrix (Hamada et al., 2009). Let Σ = {A,C,G,U} and Σ* denote the set of all finite RNA sequences consisting of bases in Σ. For a sequence x = x1 x2…x ∈ Σ*, let |x| denote the number of symbols appearing in x, which is called the length of x. Let 𝒮(x) be a set of secondary structures of an RNA sequence x. An element y ∈ 𝒮 (x) is represented as a |x| × |x| binary-valued triangular matrix y = (y) < j, where y = 1 means that bases x and x form a base pair. CentroidFold employs a gain function between a true secondary structure θ ∈ 𝒮(x) and a predicted secondary structure y ∈ 𝒮(x) defined as where γ is a weight for base pairs, and I(condition) is the indicator function, which takes 1 or 0 depending on whether condition is true or false. The gain function (1) is equal to the weighted sum of the number of true positives and the number of true negatives of base pairs. CentroidFold predicts a secondary structure y ∈ 𝒮(x) which maximizes the expectation of G(θ, y) with respect to an ensemble of all possible secondary structures 𝒮(x) which is distributed under a posterior distribution p(θ|x), where C is a constant independent of y, and p = 𝔼[θ] is the base-pairing probability that the i-th and j-th bases form a base pair. The optimal secondary structure can be calculated efficiently by using the following Nussinov-style DP algorithm: and tracing back from M1,| to recover ŷ. The model of the posterior distribution p(θ|x) can be chosen from various implementations including the McCaskill (1990) model based on the Boltzmann free energy and the CONTRAfold model (Do et al., 2006) based on a machine learning technique. In our experiments, we used the McCaskill model with Boltzmann likelihood parameters (Andronescu et al., 2010). The weight γ in the definition (1) controls the number of predicted base pairs, that is, the trade-off between specificity and sensitivity of predicted base pairs. If γ = 1, this estimator is equivalent to the centroid estimator (Ding et al., 2005; Carvalho and Lawrence, 2008). CentroidFold can predict a common secondary structure of a multiple alignment of RNA sequences by using the γ-centroid estimator under the mixture of the RNAalifold model and the McCaskill model (Hamada et al., 2011). Let A be an alignment of RNA sequences that contains #A sequences. CentroidFold employs a gain function defined as the sum of the gain function (1) for all x ∈ A: CentroidFold predicts a common secondary structure y ∈ 𝒮(A) which maximizes the expectation of under the mixed distribution of the McCaskill model and the RNAalifold model: where p(θ|x) and p(θ|A) are the McCaskill model and the RNAalifold model, respectively. Here, w ∈ [0,1] is a weight between two distributions (w = 0.5 in our experiments). The optimal common secondary structure can similarly be calculated by using the recursion (3) with the averaged base-pairing probability defined as instead of p, where and CentroidFold and CentroidFold have been shown to be more accurate than other existing tools (Hamada et al., 2009, 2011). Especially, CentroidFold can predict much more accurate common secondary structures than RNAalifold for low quality multiple alignments produced by CLUSTAL W.

MFE-based measurements of structure conservation

In this section, we introduce two existing measurements of structure conservation based on prediction of RNA secondary structures that minimizes free energy.

Structure conservation index

The SCI evaluates secondary structure conservation of a given multiple alignment of RNA sequences in terms of the MFE. SCI is defined as For a single sequence x, E(y) denotes the free energy of a secondary structure y ∈ 𝒮(x), and yMFE(x) = arg min E(y) is defined to be the MFE structure of x calculated by RNAfold (Hofacker, 2003). Similarly, for an alignment A, E(y) is the free energy of a consensus structure y ∈ 𝒮(A), and yMFE(A) = arg min E(y) is the consensus MFE structure of A calculated by RNAalifold (Hofacker et al., 2002; Bernhart et al., 2008). The free energy of a consensus structure is defined as the average of the energy contributions of the single sequences plus covariance scores for bonuses of compensatory and consistent co-mutation in the alignment. The consensus MFE alone could be used to identify functional RNAs in terms of thermodynamic stability of consensus structures. However, it is difficult to make straightforward use of it, since the folding energy is heavily biased by the nucleotide composition and the length of the alignment. SCI solved this problem by normalizing E[yMFE(A)] with the average of E[yMFE(x)] for all x ∈ A. From a different view, SCI reflects the idea that for a well-conserved alignment the structure of each sequence resembles each other and the consensus structure resembles all of them, so E[yMFE(A)] would have as low value as E[yMFE(x)], otherwise E[yMFE(A)] would not. SCI is near 0 for an alignment that is not structurally conserved, whereas SCI is near 1 or above for an alignment that is structurally conserved. Especially, if the alignment is structurally well-conserved and compensatory and consistent mutations often occur, SCI may be above 1. As shown in the definition (5), SCI obviously depends on the accuracy of common secondary structure prediction, which is also deeply influenced by the quality of multiple alignments of RNAs. This fact is supported by a previous study (Gruber et al., 2008) and our results shown in Section 3. For the genome-wide search, high quality alignments that consider RNA secondary structures cannot be obtained easily due to the computational cost for calculating structural alignments. Therefore, a robust method that does not require high quality alignments is desired.

Base pair distance

The BPD evaluates secondary structure conservation of a given multiple alignment of RNA sequences by comparing predicted secondary structures directly rather than MFE (Flamm et al., 2001; Gruber et al., 2008). BPD is based on the normalized Hamming distance between two RNA secondary structures defined as: for y ∈ 𝒮(x) and y′ ∈ 𝒮(x′) for two RNA sequences x and x′ with the same length |x| = |x′|. Two variations of BPD can be defined: the first is the mean over all the combination of pairwise distances in the alignment A, that is, The second is the mean distance from the consensus structure to each individual structure in the alignment A, that is, Note that x,x′ ∈ A in Eqs (6) and (7) may contain the gaps (“−”) resulting from the alignment A so that the Hamming distance can be defined. Both variants of BPD have the same drawback as SCI since MFE-based structures are used, that is, unreliable prediction of (common) secondary structures will result in inaccurate identification of ncRNAs.

Centroid-based measurements of structure conservation

Now, we improve the above-mentioned measurements of secondary structure conservation by employing CentroidFold and CentroidFold instead of RNAfold and RNAalifold, respectively, for (common) secondary structure prediction to incorporate the robustness against low quality multiple alignments.

C-SCI

We propose C-SCI based on SCI which employs centroid structures instead of MFE structures. First, we predict the consensus centroid structure for an alignment A, denoted by y(A), and centroid structures for each sequence x ∈ A, denoted by y(x). We map a predicted structure onto each sequence x and calculate its free energy E[y(x)] for all of the sequences. For an alignment, we map a predicted consensus structure onto each sequence x and get rid of gaps and corresponding parts of the structure. To calculate the energy, we use RNAeval (Hofacker, 2003) with the predicted structure on the sequence. The free energy of a consensus secondary structure is calculated from the averaged free energy for all sequences and the covariance score which is implemented according to RNAalifold (Hofacker et al., 2002). Then, C-SCI is calculated as follows: C-SCI has two parameters which affect the discrimination capability. We denote γ as the parameter γ for predicting consensus secondary structures on multiple alignments, and γ as γ for predicting secondary structures on individual sequences. The two parameters for centroid-based measurements were determined empirically; γ = 1.0 and γ = 6.0, which on average gave us accurate results in various conditions.

C-BPD

Since BPD directly compares RNA secondary structures, a more accurate and robust method for predicting RNA secondary structures is desired. We can employ centroid structures instead of MFE structures to calculate BPD. We call this C-BPD. As well as BPD, we can consider two variations of C-BPD:

Results

Dataset and evaluation

To confirm the discrimination capability of C-SCI and C-BPD, we performed the experiments in accordance with the previous study (Gruber et al., 2008) on BRAliBase 2.1 data set (Wilm et al., 2006), which includes 18,990 reference alignments of 36 RNA families. Reference alignments in BRAliBase 2.1 are human-curated alignments which are made from Rfam database (Griffiths-Jones et al., 2005) aiming for evaluating structural alignments. We produced multiple alignments using CLUSTAL W (Thompson et al., 1994) version 1.83 with default settings to investigate the discrimination capability on low quality alignments. We also produced structural alignments using RAF (Do et al., 2008) version 1.00 with default settings. RAF is one of the most efficient structural aligners based on the Sankoff (1985) algorithm which simultaneously aligns and folds given RNA sequences. However, RAF is much slower than CLUSTAL W since secondary structures are taken into account. For each alignment, we generated 10 negative controls by utilizing SISSIz (Gesell and Washietl, 2008), which randomizes columns of a given alignment to destroy its common secondary structure, while maintaining gap patterns, dinucleotide compositions, and sequence length. These alignments were binned according to their normalized Shannon entropy by the size of 0.05. The normalized Shannon entropy is defined as the average of the Shannon entropy of the individual column over all columns in the alignment whose length is |A|: where j is in the alphabet Σ = {A,U,G,C, −} consisting of the four nucleotides and the gap character “−,” and is the relative frequency observing the character j in the column i. We clustered the alignments into the bins from 0.05 to 1.15 stepping with 0.05 of normalized Shannon entropy. Figure 1 shows the distribution of the reference alignments on the bins of the normalized Shannon entropy.

Figure 1

The distribution of the reference alignments over the bins of the normalized Shannon entropy. The density of each bar indicates the number of sequences in the alignments.

The distribution of the reference alignments over the bins of the normalized Shannon entropy. The density of each bar indicates the number of sequences in the alignments. To evaluate the performance of the various strategies, we performed the receiver operating characteristic (ROC) curve analysis. An ROC curve is a plot of true positive rate versus false positive rate in varying the discrimination threshold of a classifier. The area under the ROC curve (AUC) is used for evaluation of the discrimination; the AUC value closer to 1 means better discrimination capability. In our study, we compared C-SCI and C-BPD (pairwise, consensus) with the original SCI and BPD. To compute SCI, we used the program scif available at http://www.biophys.uni-duesseldorf.de/bralibase/. The two variants of BPD were calculated by a modified scif which uses a utility function for the BPD in Vienna RNA package. We implemented C-SCI and C-BPD based on CentroidFold package version 0.0.9 for predicting (common) secondary structures, and Vienna RNA Package version 1.8.5 for calculating the free energy of predicted structures.

Discrimination capability

Figure 2 shows the results of ROC curve analysis of C-SCI and C-BPD comparing with SCI and BPD on the BRAliBase reference alignments and the CLUSTAL W alignments for each bin of normalized Shannon entropy, indicating that C-SCI achieved the highest AUC, especially on low entropy region.

Figure 2

The discrimination capacity of C-SCI, C-BPD, SCI, and BPD in AUC on the reference alignments and the CLUSTAL W alignments for each bin of normalized Shannon entropy.

The discrimination capacity of C-SCI, C-BPD, SCI, and BPD in AUC on the reference alignments and the CLUSTAL W alignments for each bin of normalized Shannon entropy. Table 1 shows the discrimination accuracies for all the alignments of all the methods. This indicates that the centroid-based measurements achieve higher AUC scores on the alignments by all the aligners than their MFE-based counterparts.

Table 1

Area under the ROC curve of all the methods.

Method	Reference	RAF	CLUSTAL W
C-SCI	0.937	0.942	0.837
C-BPD (consensus)	0.890	0.896	0.805
C-BPD (pairwise)	0.744	0.747	0.655
SCI	0.795	0.776	0.632
BPD (consensus)	0.756	0.755	0.672
BPD (pairwise)	0.711	0.713	0.621

Area under the ROC curve of all the methods.

Computational complexity

To address the genome-wide search, the computational cost is a serious problem. As shown in Table 1, it is obvious that the use of reference alignments which are structurally corrected by human curation is desirable. However, it is impractical to always obtain such reference alignments since high quality alignments cannot be obtained due to limited human resources and knowledge. Two alternative approaches are to use structural aligners which can align RNA sequences conserving their secondary structures, and to use the standard aligners like CLUSTAL W. As shown in Table 1, all the measurements on RAF alignments achieve as high accuracy as those on reference alignments, and much higher accuracy than those on CLUSTAL W alignments. However, huge computational time is required for producing structural alignments even by RAF (2.84 s on average), which is known as one of the most efficient structural aligner, comparing with CLUSTAL W (0.0277 s on average). On the other hand, the centroid-based measurements have an advanced property of robustness against low quality alignments. In fact, Table 1 indicates that C-SCI on CLUSTAL W alignments is more accurate than or comparable to SCI on RAF alignments. Furthermore, as shown in Table 2, the elapsed time for calculating SCI through RAF alignments is twice as long as that for C-SCI through CLUSTAL W alignments on average. Figure 3 shows the detailed analysis of elapsed time with respect to the sequence length of the alignments of five sequences, indicating that the non-negligible number of alignments took more than 5 s to compute SCI through RAF alignments even for short sequences less than 200 nt. Therefore, in case that structural alignments might be unavailable such as the genome-wide search, C-SCI is practical to use and is expected to have as high discriminant power as SCI on structural alignments.

Table 2

Calculation time of each measurement.

Method	RAF		CLUSTAL W
	Time	Total time	Time	Total time
C-SCI	0.965 ± 1.65	3.05 ± 7.35	0.979 ± 1.67	1.01 ± 1.71
C-BPD(consensus)	0.948 ± 1.63	3.03 ± 7.33	0.961 ± 1.65	0.989 ± 1.69
C-BPD(pairwise)	0.267 ± 0.444	2.35 ± 6.67	0.268 ± 0.444	0.295 ± 0.477
SCI	0.157 ± 0.270	2.24 ± 6.56	0.159 ± 0.273	0.187 ± 0.306
BPD(consensus)	0.182 ± 0.881	2.27 ± 6.61	0.159 ± 0.274	0.186 ± 0.307
BPD(pairwise)	0.095 ± 0.158	2.18 ± 6.51	0.111 ± 0.809	0.138 ± 0.817

The result of calculation time is shown in seconds. Time: elapsed time for calculating the measurement only. Total time: total elapsed time for aligning sequences and calculating the measurement. All the experiments were executed on a Linux machine with AMD Opteron 2200SE (2.8 GHz).

Figure 3

Elapsed time of calculating SCI following RAF alignments and C-SCI following CLUSTAL W alignments for the alignments of five sequences with respect to the length of sequences. All the experiments were executed on a Linux machine with AMD Opteron 2200SE (2.8 GHz).

Calculation time of each measurement. The result of calculation time is shown in seconds. Time: elapsed time for calculating the measurement only. Total time: total elapsed time for aligning sequences and calculating the measurement. All the experiments were executed on a Linux machine with AMD Opteron 2200SE (2.8 GHz). Elapsed time of calculating SCI following RAF alignments and C-SCI following CLUSTAL W alignments for the alignments of five sequences with respect to the length of sequences. All the experiments were executed on a Linux machine with AMD Opteron 2200SE (2.8 GHz).

Discussion

We proposed centroid-based measurements of secondary structure conservation, and examined their performance. The results clearly show that C-SCI and C-BPD are more accurate than the original SCI and BPD. The discrimination capability of C-SCI for CLUSTAL W alignments is more accurate than or comparative to SCI for RAF structural alignments. This means that our methods are more suitable for genome-wide alignments which are low quality from the point of view on secondary structures. As shown in Figure 2, the original SCI and BPD are inaccurate, especially for low entropy regions, that is, highly conserved alignments. For a highly conserved alignment, the common secondary structure predicted by RNAalifold will be very similar with the individual secondary structure predicted by RNAfold for each sequence in the alignment. This means that the original SCI and BPD cannot distinguish structurally conserved alignments from structurally non-conserved alignments for low entropy regions because of the definition of the original SCI and BPD. Therefore, important genes which are highly conserved would be undetectable as ncRNAs even if such genes could fortunately be aligned between related species by a sequence-based aligner. This is a serious drawback of SCI and BPD. Figure 2 also indicates that our methods are robust on low entropy regions compared with SCI and BPD. For the centroid-based measurements, we used γ = 1.0 and γ = 6.0, which are the weight for base pairs of predicting secondary structures for alignments and individual sequences, respectively. The previous study has shown that almost the best accuracy of predicting secondary structures for individual sequences can be achieved at γ = 6.0 (Hamada et al., 2009), whereas only highly reliable base pairs for alignments might be predicted at γ = 1.0. Therefore, our methods can detect only the alignments with reliable base pairs as structurally conserved. This results in reducing false detections of ncRNAs for low entropy regions as shown in Figure 4.

Figure 4

An example of the results on CLUSTL W alignments with low entropy. Predicted common secondary structures of five sequences in HIV_PBS family are shown. (A) The common secondary structure predicted by CentroidAlifold with γ = 1.0. (B) The common secondary structure predicted by RNAalifold. (C) The common secondary structure predicted by CentroidAlifold with γ = 1.0 for one of the negative control alignments generated by SISSIz. (D) The common secondary structure predicted by RNAalifold for one of the negative control alignments.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

35 in total

1. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history.

Authors: B Knudsen; J Hein
Journal: Bioinformatics Date: 1999-06 Impact factor: 6.937

2. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs.

Authors: E Rivas; S R Eddy
Journal: Bioinformatics Date: 2000-07 Impact factor: 6.937

3. Secondary structure prediction for aligned RNA sequences.

Authors: Ivo L Hofacker; Martin Fekete; Peter F Stadler
Journal: J Mol Biol Date: 2002-06-21 Impact factor: 5.469

4. ProbCons: Probabilistic consistency-based multiple sequence alignment.

Authors: Chuong B Do; Mahathi S P Mahabhashyam; Michael Brudno; Serafim Batzoglou
Journal: Genome Res Date: 2005-02 Impact factor: 9.043

5. RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble.

Authors: Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal: RNA Date: 2005-08 Impact factor: 4.942

6. CONTRAfold: RNA secondary structure prediction without physics-based models.

Authors: Chuong B Do; Daniel A Woods; Serafim Batzoglou
Journal: Bioinformatics Date: 2006-07-15 Impact factor: 6.937

7. Structured RNAs in the ENCODE selected regions of the human genome.

Authors: Stefan Washietl; Jakob S Pedersen; Jan O Korbel; Claudia Stocsits; Andreas R Gruber; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Reiche; Andrea Tanzer; Catherine Ucla; Carine Wyss; Stylianos E Antonarakis; France Denoeud; Julien Lagarde; Jorg Drenkow; Philipp Kapranov; Thomas R Gingeras; Roderic Guigó; Michael Snyder; Mark B Gerstein; Alexandre Reymond; Ivo L Hofacker; Peter F Stadler
Journal: Genome Res Date: 2007-06 Impact factor: 9.043