| Literature DB >> 15817128 |
Jianghui Liu1, Jason T L Wang, Jun Hu, Bin Tian.
Abstract
BACKGROUND: Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15817128 PMCID: PMC1090556 DOI: 10.1186/1471-2105-6-89
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Performance comparison of RNA secondary structure tools
| Tool Name | Running Time | Space Requirement | Reference |
| Sankoffa | [17] | ||
| FOLDALIGNb | [18] | ||
| RAGAc | [22] | ||
| rna_alignd | min{ | [26] | |
| Dynaligne | [19] | ||
| stemlocf | N/A | [32] | |
| Rsearchg | [31] | ||
| RNAforesterh | [27] | ||
| CARNACi | [20] | ||
| comRNAj | N/A | [21] |
a N is the average sequence length;
b N is the average length of a given set of RNAs;
c M and N are the lengths of the two given sequences;
d M and N are the two sequence lengths;
e M is the maximum distance allowed to match two nucleotides and N is the length of the shorter sequence;
f L and M are the two RNA sequence lengths; only valid in extreme cases;
g M is the query length and N is the subject sequence length;
h |Fi| is the number of nodes in forest Fi and deg(Fi) is the degree of Fi;
i N is the sequence length, theoretical time complexity of O(N6) could be significantly reduced to around O(N2) by pre-processing of the sequences, as noted by the authors [20].
j M is the maximum number of stems examined and N is the number of total sequences under analysis. The comRNA's average run-time can be significantly improved by carefully chosen parameters, as noted by the authors [21].
Figure 1RNA structure decomposition (A-B) and Partial structure determination (C-E). (A) A hypothetical RNA secondary structure is decomposed into a set of circles. (B) The circles are organized into a hierarchical tree. As shown, circle 8 contains only one pair of bases that are bonded with each other; therefore it corresponds to a loop. Circle 7 contains two pairs of bases that are bonded with each other respectively and also contains a single base (nucleotide C); therefore circle 7 corresponds to a bulge. Circle 6 corresponds to a stem of length two since it does not contain any single base. Circle 2 contains more than two pairs of bonded bases; therefore it corresponds to a junction. (C) A hypothetical RNA secondary structure is used to illustrate how partial structures are determined. (D) The partial structure for the single base G in boldface is shown. (E) The partial structure for the base pair C-G in boldface consists of two parts, a parent structure and a child structure. The base pair itself is included in the child structure.
Figure 2Optimal structure alignment derivation. (A) Structure alignment between the child structure Lin the query and the partial structure Sin the subject. The substructures enclosed by dashed lines are to be inserted/deleted and the substructures enclosed by solid lines are to be matched. (B) Structure alignment between the partial structure Sin the query and the partial structure Sin the subject. The substructures enclosed by dashed lines are to be inserted/deleted and the substructures enclosed by solid lines are to be matched.
Figure 3Database search with an RNA structure containing an IRE motif. A structure element (from base 3,451 to base 3,550) in the 3'UTR of human transferrin receptor (NM_003234) was used as a query to search the UTR structure database. (A) The output from RSmatch showing the top 11 hits. The six columns in the ''Hits'' section are, from left to right, rank, alignment score, region in the query, name of the hit, region in the hit, and annotation of the hit respectively. (B) A pairwise alignment of the query structure and a hit structure (NM_003234:3401-3500), which is the region from base 3,401 to base 3,500 of transferrin receptor (NM_003234). The sequence length is shown after "Query" on the first line: a 31 nt long query sequence containing 7 nt in ss region and 24 nt in ds region. Numbers after "Identity" on the second line are percentages of identity of secondary structure (100%), and primary sequence (54%). The latter is further decomposed into two numbers indicating the sequence identity in ss region (71%) and ds region (50%) respectively. The number of gaps in the overall alignment is shown after "Gap", followed by the number of gaps in ss region and ds region, both shown in parenthesis. The same format is used for nucleotide mismatches. Alignments of both structure and sequence are given, where "|" indicates identical nucleotides in either ss region or ds region, and ":" indicates identical secondary structures with different sequences. RNA structures are presented as follows: nested parentheses are used for base pairs and dots are used for nucleotides in ss regions. (C) The RNA structures corresponding to the query and the subject (hit) structure in (B). (D) Scoring matrices and the gap penalty used in the search. T and U are used interchangeably in this study.
Figure 4The two pattern-based RNA structures used in this study. (A) Histone 3'-UTR (HSL3) motif. (B) Iron Response Element (IRE) motif. A wildcard, represented by a lowercase letter n, is allowed to appear in a motif. When matching the motif with an RNA secondary structure, the wildcard in the motif can be instantiated into zero or one nucleotide in the secondary structure at no cost. Wildcards are used in places where the length of a region, either single-stranded or double-stranded, is variable. For example, the 5' flanking tail of HSL3 can be 4 or 5 nt long, and the lower part of the stem region of IRE can be 2 to 8 nt long.
HSL3 motifs found by RSmatch and PatSearcha,b
| RefSeq ID | Location by PatSearchc | Score of RSmatch | Location by RSmatch | Annotation |
| NM_002105 | 551–572 | 16 | 549–574 | Hs H2A histone family, (H2AFX) |
| NM_003493 | 454–475 | 16 | 452–478 | Hs histone 3, H3 (HIST3H3) |
| NM_003495 | 342–363 | 16 | 341–366 | Hs histone 1, H4i (HIST1H4I) |
| NM_003509 | 445–466 | 16 | 444–469 | Hs histone 1, H2ai (HIST1H2AI) |
| NM_003512 | 521–542 | 16 | 520–542 | Hs histone 1, H2ac (HIST1H2AC) |
| NM_003517 | 413–434 | 16 | 412–437 | Hs histone 2, H2ac (HIST2H2AC) |
| NM_003518 | 408–429 | 16 | 407–432 | Hs histone 1, H2bg (HIST1H2BG) |
| NM_003519 | 429–450 | 16 | 428–450 | Hs histone 1, H2bl (HIST1H2BL) |
| NM_003520 | 425–446 | 16 | 424–449 | Hs histone 1, H2bn (HIST1H2BN) |
| NM_003522 | 406–427 | 16 | 405–427 | Hs histone 1, H2bf (HIST1H2BF) |
| NM_003525 | 413–434 | 16 | 413–434 | Hs histone 1, H2bi (HIST1H2BI) |
| NM_003526 | 414–435 | 16 | 413–435 | Hs histone 1, H2bc (HIST1H2BC) |
| NM_003527 | 442–463 | 16 | 441–466 | Hs histone 1, H2bo (HIST1H2BO) |
| NM_003528 | 476–497 | 16 | 475–500 | Hs histone 2, H2be (HIST2H2BE) |
| NM_003530 | 443–464 | 16 | 442–467 | Hs histone 1, H3d (HIST1H3D) |
| NM_003535 | 454–475 | 16 | 453–478 | Hs histone 1, H3j (HIST1H3J) |
| NM_003537 | 445–466 | 16 | 444–469 | Hs histone 1, H3b (HIST1H3B) |
| NM_003539 | 343–364 | 16 | 342–367 | Hs histone 1, H4d (HIST1H4D) |
| NM_003546 | 340–361 | 16 | 339–364 | Hs histone 1, H4l (HIST1H4L) |
| NM_005320 | 753–774 | 16 | 752–777 | Hs histone 1, H1d (HIST1H1D) |
| NM_005325 | 733–754 | 16 | 732–757 | Hs histone 1, H1a (HIST1H1A) |
| NM_021052 | 494–515 | 16 | 493–516 | Hs histone 1, H2ae (HIST1H2AE) |
| NM_021059 | 483–504 | 16 | 483–504 | Hs histone 2, H3c (HIST2H3C) |
| NM_021062 | 407–428 | 16 | 406–431 | Hs histone 1, H2bb (HIST1H2BB) |
| NM_021063 | 463–484 | 16 | 462–484 | Hs histone 1, H2bd (HIST1H2BD) |
| NM_021064 | 470–491 | 16 | 469–494 | Hs histone 1, H2ag (HIST1H2AG) |
| NM_021066 | 414–435 | 16 | 413–435 | Hs histone 1, H2aj (HIST1H2AJ) |
| NM_021968 | 331–352 | 16 | 330–355 | Hs histone 1, H4j (HIST1H4J) |
| NM_170610 | 413–434 | 16 | 412–437 | Hs histone 1, H2ba (HIST1H2BA) |
| NM_175055 | 428–449 | 16 | 427–450 | Hs histone 3, H2bb (HIST3H2BB) |
| NM_003542 | N/Ac | 14 | 365–390 | Hs histone 1, H4c (HIST1H4C) |
| NM_003548 | N/A | 14 | 371–396 | Hs histone 2, H4 (HIST2H4) |
| NM_021058 | 457–478 | 14 | 455–481 | Hs histone 1, H2bj (HIST1H2BJ) |
| NM_003510 | 436–457 | 12 | 435–456 | Hs histone 1, H2ak (HIST1H2AK) |
| NM_003511 | 446–467 | 12 | 445–466 | Hs histone 1, H2al (HIST1H2AL) |
| NM_003514 | 463–484 | 12 | 462–483 | Hs histone 1, H2am (HIST1H2AM) |
| NM_003516 | 510–531 | 12 | 509–530 | Hs histone 2, H2aa (HIST2H2AA) |
| NM_003523 | 411–432 | 12 | 412–435 | Hs histone 1, H2be (HIST1H2BE) |
| NM_003529 | 439–460 | 12 | 440–462 | Hs histone 1, H3a (HIST1H3A) |
| NM_003536 | 449–470 | 12 | 448–469 | Hs histone 1, H3h (HIST1H3H) |
| NM_005319 | 709–730 | 12 | 708–729 | Hs histone 1, H1c (HIST1H1C) |
| NM_005322 | 766–787 | 12 | 767–787 | Hs histone 1, H1b (HIST1H1B) |
| NM_021018 | 444–465 | 12 | 445–466 | Hs histone 1, H3f (HIST1H3F) |
| NM_175054 | 389–410 | 12 | 388–409 | Hs histone 4, H4 (HIST4H4) |
| NM_175065 | 425–446 | 12 | 424–445 | Hs histone 2, H2ab (HIST2H2AB) |
| NM_033445 | 472–493 | 10 | 471–492 | Hs histone 3, H2a (HIST3H2A) |
| NM_003513 | 452–473 | 8 | 454–476 | Hs histone 1, H2ab (HIST1H2AB) |
| NM_003521 | N/A | 8 | 421–441 | Hs histone 1, H2bm (HIST1H2BM) |
| NM_003524 | 401–422 | 8 | 400–420 | Hs histone 1, H2bh (HIST1H2BH) |
| NM_003533 | 453–474 | 8 | 452–472 | Hs histone 1, H3i (HIST1H3I) |
| NM_003534 | N/A | 8 | 442–462 | Hs histone 1, H3g (HIST1H3G) |
| NM_003540 | N/A | 8 | 348–368 | Hs histone 1, H4f (HIST1H4F) |
| NM_003541 | 331–352 | 8 | 330–350 | Hs histone 1, H4k (HIST1H4K) |
| NM_003543 | N/A | 8 | 349–369 | Hs histone 1, H4h (HIST1H4H) |
| NM_003545 | N/A | 8 | 352–372 | Hs histone 1, H4e (HIST1H4E) |
| NM_170745 | 441–462 | 8 | 440–460 | Hs histone 1, H2aa (HIST1H2AA) |
| NM_003531 | 435–456 | 4 | 438–459 | Hs histone 1, H3c (HIST1H3C) |
| NM_003532 | 438–459 | 4 | 441–459 | Hs histone 1, H3e (HIST1H3E) |
| NM_005323 | 701–722 | -4 | 705–721 | Hs histone 1, H1t (HIST1H1T) |
| NM_021065 | 436–457 | -10 | 314–335 | Hs histone 1, H2ad (HIST1H2AD) |
| NM_005321 | 761–782 | -41 | 85–116 | Hs histone 1, H1e (HIST1H1E) |
| NM_014372 | 1345–1366 | -42 | 1381–1389 | Hs ring finger protein 11 (RNF11) |
aItems listed here include those found by PatSearch and those found by RSmatch using cutoff value of 8 that are related to histone genes.
bRSmatch gets 33 hits at cutoff value of 14 and gets 184 hits at cutoff value of 8.
cmRNAs that are not detected to have the HSL3 motif by PatSearch are marked with "N/A".
Performance of RSmatch in the HSL3 experimenta
| Cutoff Score | Selected Hitsb | True Positives | Specificity | Sensitivityc |
| 14 | 33 | 33 | 100.0% | 53.2% |
| 12 | 47 | 45 | 95.7% | 72.6% |
| 10 | 69 | 46 | 66.7% | 74.2% |
| 8 | 184 | 56 | 30.4% | 90.3% |
aPatSearch has a specificity of 98.2% and sensitivity of 87.1%.
bHits whose scores are greater than or equal to the cutoff value used in this study are selected.
cAssume there are 62 mRNA structures containing the HSL3 motif, which include all histone mRNAs found by RSmatch and PatSearch.
IRE experiment results
| True Positive | RefSeq ID | Location by PatSearch | RSmatch | Rsearch | stemloc | ||||||
| Location | Score | Rank | Location | Score | Rank | Location | Score | Rank | |||
| x | NM_000032a | 13–35 | - | - | - | - | - | - | - | - | - |
| x | NM_014585 | 203–229 | 202–231 | 21 | 1 | 202–231 | 34.11 | 1 | 202–226 | 13.021 | 5 |
| x | NM_003234 | 3479–3511 | 3484–3506 | 19 | 2 | 3480–3510 | 31.42 | 2 | 3486–3503 | 15.936 | 2 |
| x | NM_003234 | 3883–3913 | 3887–3909 | 17 | 3 | 3876–3925 | 27.80 | 6 | 3889–3906 | 10.914 | 7 |
| x | NM_003234 | 3950–3976 | 3952–3974 | 17 | 3 | 3952–3974 | 25.50 | 10 | 3954–3971 | 10.476 | 9 |
| x | NM_003234 | 3996–4024 | 3999–4021 | 17 | 3 | 3999–4022 | 28.53 | 4 | 4042–4048 | 1.149 | 25 |
| x | NM_000146 | 19–41 | 20–40 | 16 | 6 | 7–51 | 27.84 | 5 | 17–41 | 8.574 | 12 |
| NM_032484 | 2353–2376 | 2358–2373 | 13 | 7 | 2355–2377 | 22.26 | 11 | 2354–2375 | 16.411 | 1 | |
| x | NM_003234 | 3429–3461 | 3434–3456 | 13 | 7 | 3433–3458 | 26.40 | 8 | 3436–3453 | 6.218 | 15 |
| NM_018992 | 2182–2205 | 2186–2202 | 12 | 9 | 2186–2202 | 18.43 | 19 | 2186–2202 | 11.459 | 6 | |
| NM_003449 | 2160–2180 | 2163–2178 | 11 | 10 | 2160–2180 | 26.27 | 9 | 2161–2181 | 13.198 | 4 | |
| NM_002081 | 3449–3469 | 3452–3467 | 11 | 10 | 3446–3472 | 20.47 | 17 | 3450–3470 | 8.290 | 14 | |
| NM_173649 | 1371–1398 | 1431–1446 | 7 | 12 | 1372–1398 | 18.83 | 18 | 1376–1396 | 8.493 | 13 | |
| NM_033337 | 1202–1226 | 1253–1257 | 5 | 13 | 1202–1227 | 21.52 | 14 | 1202–1227 | 4.540 | 18 | |
| NM_001234 | 1106–1130 | 1157–1161 | 5 | 13 | 1106–1131 | 21.52 | 15 | 1106–11331 | 4.540 | 19 | |
| NM_153706 | 174–194 | 108–119 | 5 | 13 | 171–198 | 17.49 | 20 | 219–234 | 4.400 | 20 | |
| NM_003607 | 6892–6914 | 6851–6854 | 4 | 16 | 6892–6914 | 21.98 | 12 | 6930–6950 | 10.827 | 8 | |
| NM_002086 | 82–102 | 126–129 | 4 | 16 | 94–117 | 16.48 | 22 | 101–125 | 2.833 | 22 | |
| NM_012256 | 2594–2617 | 2571–2574 | 4 | 16 | 2536–2570 | 20.76 | 16 | 2590–2606 | 1.770 | 24 | |
| x | NM_001098 | 1–23 | 17–19 | 3 | 19 | 1–23 | 30.67 | 3 | 3–20 | 14.185 | 3 |
| NM_006731 | 4439–4465 | 4487–4489 | 3 | 19 | 4442–4462 | 21.65 | 13 | 4443–4460 | 9.920 | 10 | |
| NM_003672 | 2556–2576 | 2592–2594 | 3 | 19 | 2547–2587 | 26.44 | 7 | 2558–2574 | 9.049 | 11 | |
| NM_018234 | 2038–2058 | 2176–2178 | 3 | 19 | 2035–2061 | 14.11 | 24 | 2046–2058 | 4.986 | 16 | |
| NM_024076 | 1799–1822 | 1816–1818 | 3 | 19 | 1800–1821 | 15.00 | 23 | 1832–1850 | 4.876 | 17 | |
| NM_000877 | 3274–3294 | 3336–3338 | 3 | 19 | 3275–3293 | 13.81 | 25 | 3302–3321 | 3.182 | 21 | |
| NM_003675 | 2–27 | 27–31 | 3 | 19 | 1–29 | 16.74 | 21 | 21–31 | 1.980 | 23 | |
| NM_032323 | 1924–1944 | 1990–1992 | 3 | 19 | 1925–1943 | 12.43 | 26 | 1928–1948 | 0.678 | 26 | |
aNM_000032 is used as the query structure for RSmatch, Rsearch, and stemloc. Thus there is no value (shown as "-").
Figure 5Performance comparison of Rsearch and RSmatch and an alignment of two 5S rRNAs. (A) Performance comparison for 64 RNA families. Different colors are applied to represent structures of different sizes. Each point corresponds to one alignment between a query structure and a subject structure. The x-axis is the percent of coverage by Rsearch and y-axis is the percent of coverage by RSmatch. (B) Performance comparison of 5S rRNA. A 5S rRNA was randomly chosen as the query structure and ten others as the subject sequences. The median value of the ten structure coverage values was then calculated. This process was repeated 100 times to generate 100 points for the graph. Therefore, each point represents one particular query structure. An example alignment of two 5S rRNA was shown: (C) the query structure is X07545/505-619; (D) the subject RNA is X02729; and (E) the detailed alignment by RSmatch.
Figure 6CPU time versus database size. From the 5S rRNA family, a randomly picked 5S rRNA was used as the query to search a structure database obtained by folding the rest seed sequences in the family. The program was run 10 times, and the average running time of each time is shown as a circle in the graph.
Figure 7Multiple structure alignment and iterative database search. (A) Flowchart of multiple structure alignment and iterative database search. Step (1a) accepts a query structure to start an iterative database search; step (1b) processes a small database for multiple structure alignment; step (2) derives a profile from the seed alignment; step (3) uses the profile to conduct search; and step (4) updates the profile with new alignment. (B) Multiple structure alignment of several IRE structures. (C) PSSM of the multiple alignment of IRE in (B). Each column in the PSSM corresponds to the position of a structure component, either single base or base pair. Position of a single base is represented by the nucleotide number and position of a base pair is represented by two nucleotide numbers connected by a dash. For each column, the scores of individual structure components in that position are listed in rows where "-" means not applicable.