| Literature DB >> 19208137 |
Dandan Song1, Yang Yang, Bin Yu, Binglian Zheng, Zhidong Deng, Bao-Liang Lu, Xuemei Chen, Tao Jiang.
Abstract
BACKGROUND: Non-coding RNA (ncRNA) genes do not encode proteins but produce functional RNA molecules that play crucial roles in many key biological processes. Recent genome-wide transcriptional profiling studies using tiling arrays in organisms such as human and Arabidopsis have revealed a great number of transcripts, a large portion of which have little or no capability to encode proteins. This unexpected finding suggests that the currently known repertoire of ncRNAs may only represent a small fraction of ncRNAs of the organisms. Thus, efficient and effective prediction of ncRNAs has become an important task in bioinformatics in recent years. Among the available computational methods, the comparative genomic approach seems to be the most powerful to detect ncRNAs. The recent completion of the sequencing of several major plant genomes has made the approach possible for plants.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19208137 PMCID: PMC2648795 DOI: 10.1186/1471-2105-10-S1-S36
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Group I result. ARGP stands for the intersection of ARP and AGP.
| Chromosome | ||||||
| 1 | 2 | 3 | 4 | 5 | Total | |
| ARP | 54 | 68 | 107 | 17 | 51 | 297 |
| AGP | 106 | 111 | 639 | 48 | 253 | 1157 |
| ARGP | 38 | 36 | 68 | 7 | 30 | 179 |
| Filter 1 | 4 | 13 | 10 | 0 | 11 | 38 |
| Filter 2 | 4 | 13 | 9 | 0 | 8 | 34 |
| Filter 3 | 4 | 9 | 5 | 0 | 5 | 27 |
The row of Filter 1 shows the numbers of predicted ncRNAs that exist in TAIR8 intergenic regions. The row of Filter 2 shows the numbers of putative ncRNAs that do not exist in ncRNA databases. The row of Filter 3 shows the numbers of candidate novel ncRNAs that cannot be classified by Rfam.
Group II result. The three filtration steps are the same as in Table 1.
| Chromosome | ||||||
| 1 | 2 | 3 | 4 | 5 | Total | |
| rrp4 ARP | 17 | 2 | 28 | 40 | 33 | 120 |
| AGP | 19 | 2 | 495 | 54 | 235 | 805 |
| AGRP | 14 | 2 | 20 | 25 | 24 | 85 |
| rrp41 ARP | 8 | 0 | 28 | 12 | 14 | 62 |
| AGP | 11 | 0 | 490 | 22 | 225 | 748 |
| AGRP | 6 | 0 | 20 | 6 | 14 | 46 |
| csl42 ARP | 6 | 2 | 32 | 12 | 14 | 66 |
| AGP | 9 | 2 | 482 | 25 | 224 | 742 |
| AGRP | 6 | 2 | 22 | 2 | 13 | 45 |
| Total ARP | 17 | 4 | 32 | 43 | 41 | 137 |
| AGP | 19 | 4 | 500 | 57 | 239 | 819 |
| AGRP | 14 | 4 | 22 | 27 | 33 | 100 |
| Filter 1 | 6 | 2 | 22 | 12 | 33 | 75 |
| Filter 2 | 5 | 1 | 22 | 10 | 30 | 68 |
| Filter 3 | 0 | 0 | 4 | 0 | 18 | 22 |
Detailed information of the novel ncRNA candidates in group I.
| Genome Location | ||||||
| Name | Chr. | Intergenic Region | Start | End | Strand | Len. |
| SEQ1 | 1 | AT1G28750–AT1G28770 | 10091146 | 10091292 | W | 147 |
| SEQ2 | 1 | AT1G30972–AT1G30974 | 11046502 | 11046670 | W | 169 |
| SEQ3‡ | 1 | AT1G43620–AT1G43630 | 16433517 | 16433593 | W | 77 |
| SEQ4 | 1 | AT1G54890–AT1G54905 | 20470604 | 20470753 | C | 150 |
| SEQ5a† | 2 | AT2G01020–AT2G01022 | 6084 | 6646 | C | 563 |
| SEQ6a† | 2 | AT2G01020–AT2G01022 | 6708 | 7257 | C | 550 |
| SEQ7a#,† | 2 | AT2G01020–AT2G01022 | 7288 | 8699 | C | 1412 |
| SEQ8a† | 2 | AT2G01020–AT2G01022 | 8889 | 9367 | C | 479 |
| SEQ9W | 2 | AT2G07590–AT2G07600 | 3187035 | 3187217 | W | 183 |
| SEQ9C*,# | 2 | AT2G07590–AT2G07600 | 3187035 | 3187217 | C | 183 |
| SEQ10W | 2 | AT2G07689–AT2G07691 | 3338151 | 3338560 | W | 410 |
| SEQ10C | 2 | AT2G07689–AT2G07691 | 3338151 | 3338560 | C | 410 |
| SEQ11C | 2 | AT2G07732–AT2G07733 | 3469569 | 3469970 | C | 402 |
| SEQ11W* | 2 | AT2G07732–AT2G07733 | 3469611 | 3470020 | W | 410 |
| SEQ12W | 2 | AT2G09880–AT2G09890 | 3747858 | 3748031 | W | 174 |
| SEQ12C*,‡ | 2 | AT2G09880–AT2G09890 | 3747858 | 3748031 | C | 174 |
| SEQ13‡ | 2 | AT2G12420–AT2G12430 | 5036416 | 5036560 | C | 145 |
| SEQ5b† | 3 | AT3G41979–AT3G42050 | 14211041 | 14211603 | C | 563 |
| SEQ6b† | 3 | AT3G41979–AT3G42050 | 14211665 | 14212214 | C | 550 |
| SEQ7b#,† | 3 | AT3G41979–AT3G42050 | 14212350 | 14213656 | C | 1307 |
| SEQ8b† | 3 | AT3G41979–AT3G42050 | 14213846 | 14214324 | C | 479 |
| SEQ14♭ | 3 | AT3G42803–AT3G42806 | 14923566 | 14923707 | W | 142 |
| SEQ15 | 5 | AT5G09960–AT5G09970 | 3111082 | 3111016 | W | 67 |
| SEQ16 | 5 | AT5G29805–AT5G29890 | 11327943 | 11328007 | W | 65 |
| SEQ17 | 5 | AT5G32410–AT5G32420 | 12042758 | 12042828 | C | 71 |
| SEQ18a | 5 | AT5G34358–AT5G34376 | 12842957 | 12843077 | W | 121 |
| SEQ18b | 5 | AT5G34412–AT5G34431 | 12885676 | 12885796 | W | 121 |
The three sequences marked by * are not expressed in the bi-directional RT-PCR analysis of the inflorescence tissue. The three sequences marked by # may code for proteins. The sequences marked by † were annotated as LSU-rRNA Ath by RepeatMasker. The sequences marked by ‡ were annotated as LTR by RepeatMasker. The sequences marked by ♭ were annotated as DNA transposons by RepeatMasker.
Detailed information of the novel ncRNA candidates in group II. The legends have the same meaning of those in Table 3.
| Genome Location | ||||||
| Name | Chr. | Intergenic Region | Start | End | Strand | Len. |
| SEQ19a‡ | 3 | At3g33055–At3g33058 | 13598866 | 13598930 | W | 65 |
| SEQ19b‡ | 3 | At3g33055–At3g33058 | 13601377 | 13601440 | W | 64 |
| SEQ20a† | 3 | At3g41979–At3g42050 | 14213822 | 14214274 | C | 453 |
| SEQ20b† | 3 | At3g41979–At3g42050 | 14213822 | 14214274 | W | 453 |
| SEQ21 | 5 | At5g29805–At5g29890 | 11327943 | 11328007 | W | 65 |
| SEQ22C‡ | 5 | At5g34358–At5g34376 | 12828009 | 12828088 | C | 80 |
| SEQ22W | 5 | At5g34358–At5g34376 | 12828009 | 12828088 | W | 80 |
| SEQ23aC‡ | 5 | At5g34358–At5g34376 | 12833485 | 12833557 | C | 73 |
| SEQ23aW | 5 | At5g34358–At5g34376 | 12833485 | 12833557 | W | 73 |
| SEQ23bC‡ | 5 | At5g34358–At5g34376 | 12834229 | 12834301 | C | 73 |
| SEQ23bW | 5 | At5g34358–At5g34376 | 12834229 | 12834301 | W | 73 |
| SEQ24a‡ | 5 | At5g34358–At5g34376 | 12836733 | 12836797 | C | 65 |
| SEQ23c‡ | 5 | At5g34358–At5g34376 | 12841209 | 12841281 | C | 73 |
| SEQ23dC‡ | 5 | At5g34358–At5g34376 | 12841964 | 12842030 | C | 67 |
| SEQ23dW | 5 | At5g34358–At5g34376 | 12841964 | 12842030 | W | 67 |
| SEQ25a | 5 | At5g34358–At5g34376 | 12842908 | 12843077 | W | 170 |
| SEQ24b‡ | 5 | At5g34412–At5g34431 | 12875464 | 12875528 | C | 65 |
| SEQ24c‡ | 5 | At5g34412–At5g34431 | 12879452 | 12879516 | C | 65 |
| SEQ23e‡ | 5 | At5g34412–At5g34431 | 12882930 | 12882999 | C | 70 |
| SEQ23fC‡ | 5 | At5g34412–At5g34431 | 12883928 | 12884000 | C | 73 |
| SEQ23fW | 5 | At5g34412–At5g34431 | 12883928 | 12884000 | W | 73 |
| SEQ25b | 5 | At5g34412–At5g34431 | 12885627 | 12885796 | W | 170 |
Figure 1RT-PCR analysis of the novel ncRNA candidates in group I. (A) ncRNAs predicted to be transcribed in only one direction. Reverse transcription was performed using Oligo dT primers. 1: SEQ1; 2: SEQ2; 3: SEQ3; 4: SEQ4; 5: SEQ13; 6: SEQ5a or SEQ5b; 7: SEQ6a or SEQ6b; 8: SEQ7a or SEQ7b; 9: SEQ8a or SEQ8b; 10: SEQ5a combined with SEQ6a or SEQ5b combined with SEQ6b; 11: SEQ14; 12: SEQ15; 13: SEQ16; 14: SEQ17; 15: SEQ18a or SEQ18b. (B) ncRNAs predicted to be transcribed in both directions. Reverse transcription was performed using primers complement to the predicted ncRNAs. Here, -RT denotes negative control.
Figure 2Flowchart of the computational prediction pipeline. In the flowchart, a rectangle represents an action and an oval represents a dataset.