| Literature DB >> 21554760 |
Thomas King-Fung Wong1, Brenda Wing-Yan Cheung, Tak-Wah Lam, Siu-Ming Yiu.
Abstract
BACKGROUND: Predicting new non-coding RNAs (ncRNAs) of a family can be done by aligning the potential candidate with a member of the family with known sequence and secondary structure. Existing tools either only consider the sequence similarity or cannot handle local alignment with gaps.Entities:
Year: 2011 PMID: 21554760 PMCID: PMC3090760 DOI: 10.1186/1753-6561-5-S2-S2
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Long gap may exist in conserved local region. Multiple sequence alignment of some seed members of the family RF01051 from Rfam 9.1 database. The red and blue highlighted are the base-pair regions. All sequences are aligned according to their structures. If the two circled sequences are selected as query and target, the circled region is the conserved local region between them, in which there exists long gap inside.
The details of the ncRNA families used in the experiments.
| Family | Query Sequence ID | Length | Number of members embedded |
|---|---|---|---|
| RF00014 | CP000468.1/2032552-2032638 | 87 | 96 |
| RF00021 | CP000851.1/113395-113522 | 128 | 100 |
| RF00022 | AAND01000021.1/495-707 | 213 | 100 |
| RF00027 | AAPE01289140.1/8905-8994 | 90 | 100 |
| RF00032 | S49118.1/1081-1106 | 26 | 100 |
| RF00033 | Y15844.1/450-543 | 94 | 100 |
| RF00034 | BX571867.1/288515-288628 | 114 | 100 |
| RF00038 | AJ132964.1/66-198 | 133 | 100 |
| RF00039 | AF370716.1/3603-3656 | 54 | 100 |
| RF00042 | X55895.1/474-565 | 92 | 100 |
| RF00043 | Z47410.1/1220-1294 | 75 | 21 |
| RF00044 | M11813.1/4883-5126 | 244 | 8 |
| RF00046 | AY013245.2/62208-62303 | 96 | 76 |
| RF00048 | AF504534.1/666-726 | 61 | 100 |
| RF00386 | AF363455.1/1-122 | 122 | 100 |
| RF00643 | AASG02000279.1/67999-67862 | 138 | 100 |
| RF00661 | AC154049.1/4734-4855 | 122 | 100 |
| RF01051 | AE014299.1/1112481-1112574 | 94 | 100 |
Summary of comparison on results between global alignment, local alignment without gap penalty and local alignment with affine gap penalty when using the smallest threshold such that there is no false positive.
| Family | Number of members | Number of misses | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gotohscan [ | % | Global [ | % | Local | % | Local with affine gap | % | |||||
| RF00014 | 96 | 2 | 2.1% | 0 | 0% | 0 | 0% | 0 | 0% | |||
| RF00021 | 100 | 10 | 10% | 5 | 5% | 5 | 5% | 2 | 2% | |||
| RF00022 | 100 | 59 | 59% | 20 | 20% | 19 | 19% | 4 | 4% | |||
| RF00027 | 100 | 100 | 100% | 15 | 15% | 9 | 9% | 2 | 2% | |||
| RF00032 | 100 | 59 | 59% | 4 | 4% | 1 | 1% | 0 | 0% | |||
| RF00033 | 100 | 29 | 29% | 27 | 27% | 27 | 27% | 25 | 25% | |||
| RF00034 | 100 | 71 | 71% | 11 | 11% | 22 | 22% | 7 | 7% | |||
| RF00038 | 100 | 88 | 88% | 0 | 0% | 0 | 0% | 0 | 0% | |||
| RF00039 | 100 | 100 | 100% | 1 | 1% | 1 | 1% | 1 | 1% | |||
| RF00042 | 100 | 10 | 10% | 0 | 0% | 0 | 0% | 0 | 0% | |||
| RF00043 | 21 | 3 | 14.3% | 0 | 0% | 0 | 0% | 0 | 0% | |||
| RF00044 | 8 | 1 | 12.5% | 0 | 0% | 0 | 0% | 0 | 0% | |||
| RF00046 | 76 | 9 | 11.8% | 2 | 2.6% | 1 | 1.3% | 0 | 0% | |||
| RF00048 | 100 | 17 | 17% | 0 | 0% | 0 | 0% | 0 | 0% | |||
| RF00386 | 100 | 88 | 88% | 63 | 63% | 62 | 62% | 6 | 6% | |||
| RF00643 | 100 | 98 | 98% | 4 | 4% | 13 | 13% | 0 | 0% | |||
| RF00661 | 100 | 100 | 100% | 87 | 87% | 77 | 77% | 30 | 30% | |||
| RF01051 | 100 | 100 | 100% | 91 | 91% | 85 | 85% | 52 | 52% | |||
Summary of comparison on results between global alignment, local alignment without gap penalty and local alignment with affine gap penalty when setting the threshold which allows 5% or 10% of false positives.
| Family | Number of members | Number of misses | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| False positive rate=5% | False positive rate=10% | |||||||||
| Gotohscan | Global | Local | Local with affine gap | Gotohscan | Global | Local | Local with affine gap | |||
| RF00014 | 96 | 2 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | |
| RF00021 | 100 | 10 | 1 | 1 | 1 | 10 | 1 | 1 | 1 | |
| RF00022 | 100 | 51 | 9 | 5 | 2 | 35 | 4 | 4 | 2 | |
| RF00027 | 100 | 100 | 3 | 5 | 0 | 100 | 2 | 2 | 0 | |
| RF00032 | 100 | 59 | 0 | 0 | 0 | 37 | 0 | 0 | 0 | |
| RF00033 | 100 | 27 | 1 | 25 | 24 | 26 | 1 | 1 | 24 | |
| RF00034 | 100 | 71 | 1 | 0 | 0 | 71 | 1 | 0 | 0 | |
| RF00038 | 100 | 88 | 0 | 0 | 0 | 88 | 0 | 0 | 0 | |
| RF00039 | 100 | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | |
| RF00042 | 100 | 10 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | |
| RF00043 | 21 | 3 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | |
| RF00044 | 8 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | |
| RF00046 | 76 | 9 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | |
| RF00048 | 100 | 11 | 0 | 0 | 0 | 11 | 0 | 0 | 0 | |
| RF00386 | 100 | 88 | 58 | 56 | 1 | 88 | 48 | 38 | 1 | |
| RF00643 | 100 | 98 | 1 | 4 | 0 | 98 | 0 | 2 | 0 | |
| RF00661 | 100 | 100 | 87 | 66 | 23 | 100 | 81 | 52 | 14 | |
| RF01051 | 100 | 100 | 79 | 85 | 47 | 100 | 79 | 81 | 39 | |
Summary of the area (normalized) under ROC curve for false positive rate ≤ 10%
| Family | Area (normalized) under ROC curve | |||
|---|---|---|---|---|
| Gotohscan | Global | Local | Local with affine gap | |
| RF00014 | 0.98 | 1.0 | 1.0 | 1.0 |
| RF00021 | 0.9 | 0.99 | 0.99 | 0.99 |
| RF00022 | 0.53 | 0.92 | 0.93 | 0.98 |
| RF00027 | 0.0 | 0.96 | 0.96 | 1.0 |
| RF00032 | 0.61 | 0.99 | 1.0 | 1.0 |
| RF00033 | 0.73 | 0.93 | 0.79 | 0.76 |
| RF00034 | 0.29 | 0.98 | 0.99 | 0.99 |
| RF00038 | 0.12 | 1.0 | 1.0 | 1.0 |
| RF00039 | 0.0 | 1.0 | 1.0 | 1.0 |
| RF00042 | 0.9 | 1.0 | 1.0 | 1.0 |
| RF00043 | 0.86 | 1.0 | 1.0 | 1.0 |
| RF00044 | 0.88 | 1.0 | 1.0 | 1.0 |
| RF00046 | 0.88 | 1.0 | 1.0 | 1.0 |
| RF00048 | 0.89 | 1.0 | 1.0 | 1.0 |
| RF00386 | 0.12 | 0.42 | 0.49 | 0.98 |
| RF00643 | 0.02 | 0.99 | 0.96 | 1.0 |
| RF00661 | 0.0 | 0.14 | 0.36 | 0.79 |
| RF01051 | 0.0 | 0.18 | 0.17 | 0.56 |
Figure 2Score distribution between the real hits and the false hits when using different algorithms for the family RF00661. The figure shows the comparison on score distribution of real hits (i.e. real members) and false hits for the family RF00661 between different algorithms. It shows that the local structural alignment algorithm with affine gap penalty can increase the difference between the scores of real hits and the scores of false hits compared with the other methods, and so it has a higher distinguishing power to identify the real ncRNA members along the long genome sequence.
mn3) + O(n2) = O(mn3).