| Literature DB >> 18045491 |
Abstract
BACKGROUND: The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18045491 PMCID: PMC2238770 DOI: 10.1186/1471-2105-8-464
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Characteristics of secondary structure predictions performed using the P-DCfold algorithm on a tmRNA alignment of 44 sequences and a RNaseP alignment of 54 sequences, when all possible combinations of 4 homologous sequences are considered (left) and when only combinations of 4 sequences among 10 homologous sequences initially selected by the common homology model Mare considered (right).
| All sequences | ||||
| tmRNA | RNaseP | tmRNA | RNaseP | |
| Total number of predictions | 123410 | 266699 | 210 | 210 |
| Nb of predictions with MCC > 75 | 1620 | 1958 | 18 | 38 |
| Average MCC | 45.19 | 41.03 | 56,82 | 60,27 |
| Maximum MCC | 89 | 86 | 85 | 84 |
| Minimum MCC | 10 | 5 | 26 | 30 |
Figure 1Correct and incorrect stem alignments. Single strand regions are generally correctly aligned because they are less variable, whereas the stem regions can be incorrectly aligned. Correct stem alignment results in an alternation between a stem substitution matrix (M) and a single strand substitution matrix (M). On the other hand, the single strand substitution matrix (M) alternates with another substitution matrix (M) different from the stem substitution matrix if stems are not correctly aligned.
Figure 2Theoretical stem substitution matrices. Left top: Stem deviation matrix due to influences of transitions/transversions and of GU intermediate state on stem substitution matrices. Left bottom: Stem deviation matrix due to influences of GC stability. Right: Stem deviation matrix due to all the influences.
Figure 3Base pair substitutions in stems. Double mutations are supported or disadvantaged depending on the stability of the intermediate state. As the GU pair is the most stable and the least deleterious of the intermediate states, the double substitutions which use the GU intermediate state (AU ↔ GC and UA ↔ CG) may occur more frequently than the others.
Nucleotide substitution rates calculated with the Higgs model parameters [30].
| A | C | G | U | |
| A | 0.0201 | 0.1911 | 0.0642 | |
| C | 0.0185 | 0.1194 | 0.1948 | |
| G | 0.1915 | 0.1180 | 0.0871 | |
| U | 0.0641 | 0.1947 | 0.0873 |
Algorithm SSCA
| |
| -Build a model |
| -For each homologous sequence |
| -Calculate the substitution matrix |
| -Calculate a score for |
| -Classify the sequences |
| |
MCC distributions (Average MCC, Maximum MCC and Minimum MCC) of tmRNA and RNaseP secondary structure predictions done with the P-DCfold algorithm using all sequences and using different homologous sequence selection models (M, , and ). The percentage of predictions with MCC > 75 are also given.
| tmRNA | RNAseP | |||||||||
| All | All | |||||||||
| Avg MCC | 45.19 | 56.82 | 63.38 | 67.66 | 67.45 | 41.03 | 60.27 | 73.58 | 70.13 | 75.3 |
| Max MCC | 89 | 85 | 84 | 80 | 85 | 86 | 84 | 85 | 80 | 85 |
| Min MCC | 10 | 26 | 41 | 56 | 41 | 5 | 30 | 56 | 56 | 56 |
| % MCC > 75 | 1.3% | 8.6% | 5.7% | 27.6% | 26.7% | 0.7% | 18% | 48.6% | 23.3% | 60.4% |
Average MCC distributions of tmRNA and RNaseP secondary structure predictions done with the RNAalifold algorithm and using the model Mand the model of SSCA for selecting homologous sequences.
| All sequences | |||
| tmRNA | 52,54 | 58,17 | 60,09 |
| RNaseP | 58,92 | 60,93 | 65,37 |
Figure 4Correlation between SSCA scores (using the model ) and average MCC scores of homologous sequences of tmRNA (left) and RNaseP (right) alignments. Homologous sequences with the lowest SSCA scores have the highest average MCC scores. The best correlation is for the low SSCA scores.