| Literature DB >> 24704975 |
Felipe A Lessa1, Tainá Raiol2, Marcelo M Brigido3, Daniele S B Martins Neto4, Maria Emília M T Walter5, Peter F Stadler6.
Abstract
The Rfam database contains information about non-coding RNAs emphasizing their secondary structures and organizing them into families of homologous RNA genes or functional RNA elements. Recently, a higher order organization of Rfam in terms of the so-called clans was proposed along with its "decimal release". In this proposition, some of the families have been assigned to clans based on experimental and computational data in order to find related families. In the present work we investigate an alternative classification for the RNA families based on tree edit distance. The resulting clustering recovers some of the Rfam clans. The majority of clans, however, are not recovered by the structural clustering. Instead, they get dispersed into larger clusters, which correspond roughly to well-described RNA classes such as snoRNAs, miRNAs, and CRISPRs. In conclusion, a structure-based clustering can contribute to the elucidation of the relationships among the Rfam families beyond the realm of clans and classes.Entities:
Year: 2012 PMID: 24704975 PMCID: PMC3899987 DOI: 10.3390/genes3030378
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Dendrograms of the consensus structures of all Rfam 10.1 families computed with three different hierarchical clustering methods. Large important classes of ncRNAs are highlighted. Reddish colors denote three classes of microRNAs animal (scarlet), plant (fuchsia), and viral (brown). Box C/D snoRNAs are represented by bright green, while light blue indicates box H/ACA snoRNAs. Prokaryotic CRISPR families are shown in orange.
Figure 2Distribution of α (maximal Jaccard index), β, and γ for all of the 102 Rfam clans. These data show that most clans do not appear tightly clustered w.r.t. any of the three methods. The clans shown in the x-axis, together with α, β, and γ are listed in the supplementary material.
Figure 3Linear representations of the secondary structures of the SRP clan members. The seven depicted SRP clan members display a conserved stem loop structure. The five of them that appear as a cluster in the UPGMA tree are delimited by a blue boundary. Fungal SRP family contains extra loops, while the Small Bacterial SRP families contain only a conserved stem loop domain, thus they have been both excluded from the cluster. An UPGMA neighbor of the Small Bacterial SRP family, an unrelated virus derived RFAM family (Corona package), is shown for comparison.
Figure 4Circular view of the UPGMA structural distance-based tree. (a) Circular view of the complete UPGMA dendrogram. One can see below the tree a gradient indicating more complex structures on the left and simpler ones on the right. Vertical bars represent shortening of branch length; (b) Closer view of clustered snoRNAs (SNORD1 (dark blue), SNORD 2 (light blue) and SNORA (green)), miRNAs (miRNA1 (pink) and miRNA2 (red)) and CRISPR (orange).
Clusters of snoRNAs, miRNAs, and CRISPRs.
| Cluster | Number of Rfam families included | Percentage of Rfam families of the expected ncRNA | Clans (name and identification) with all families included in
|
|---|---|---|---|
| SNORD1 | 334 | 94.9% | SNORD52 (CL00063), U54 (CL00008), SNORD26 (CL00050), |
| SNORD44 (CL00060), SNORD58 (CL00064), SNORD101 (CL00074), | |||
| SNORD105 (CL00075), SNORND104 (CL00077) | |||
| SNORD61 (CL00067), SNORD39 (CL00057), SNORD18 (CL00047), | |||
| SNORD34 (CL00055), SNORD96 (CL00072), SNORD110 (CL00076), | |||
| SNORD30 (CL00052), SNORD19 (CL00048), SNORD100 (CL00073) | |||
| SNORD2 | 86 | 81.4% | SNORD15 (CL00045) |
| SNORA | 158 | 81.0% | SNORA7 (CL00025), SNORA28 (CL00033), SNORA44 (CL00036), |
| SNORA17 (CL00029), SNORA35 (CL00034), SNORA5 (CL00024), | |||
| SCARNA4 (CL00019) | |||
| miRNA1 | 45 | 86.6% | MIR171 (CL00099) |
| miRNA2 | 472 | 85.6% | mir-34 (CL00087), mir-216 (CL00094), mir-279 (CL00095), |
| mir-36 (CL00088), mir-81 (CL00091), mir-182 (CL00093), | |||
| mir-3 (CL00084), mir-50 (CL00089), mir-BART (CL00097), | |||
| mir-137 (CL00092), mir-73 (CL00090) | |||
| CRISPR | 100 | 59.0% | CRISPR-1 (CL00014), CRISPR-2 (CL00015) |