| Literature DB >> 17933769 |
Christian Selig1, Matthias Wolf, Tobias Müller, Thomas Dandekar, Jörg Schultz.
Abstract
An increasing number of phylogenetic analyses are based on the internal transcribed spacer 2 (ITS2). They mainly use the fast evolving sequence for low-level analyses. When considering the highly conserved structure, the same marker could also be used for higher level phylogenies. Furthermore, structural features of the ITS2 allow distinguishing different species from each other. Despite its importance, the correct structure is only rarely found by standard RNA folding algorithms. To overcome this hindrance for a wider application of the ITS2, we have developed a homology modelling approach to predict the structure of RNA and present the results of modelling the ITS2 in the ITS2 Database. Here, we describe the database and the underlying algorithms which allowed us to predict the structure for 86 784 sequences, which is more than 55% of all GenBank entries concerning the ITS2. These are not equally distributed over all genera. There is a substantial amount of genera where the structure of nearly all sequences is predicted whereas for others no structure at all was found despite high sequence coverage. These genera might have evolved an ITS2 structure diverging from the standard one. The current version of the ITS2 Database can be accessed via http://its2.bioapps.biozentrum.uni-wuerzburg.de.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17933769 PMCID: PMC2238964 DOI: 10.1093/nar/gkm827
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Methods used for ITS2 structure prediction and number of folded sequences.
| Method | Description | Count |
|---|---|---|
| 1 | Direct RNAfold | 10 667 |
| 2 | Homology modelling, first iteration | 27 044 |
| 3 | Homology modelling, second iteration | 11 306 |
| 4 | Direct RNAfold, sequence discovery by BLAST | 5 196 |
| 5 | Homology modelling, first iteration, sequence discovery by BLAST | 1 730 |
| 6 | Homology modelling, second iteration, sequence discovery by BLAST | 17 776 |
| 7 | Partial structures from homology modelling, both iterations | 13 065 |
| Total | 86 784 |
Figure 1.Re-annotated sequences, each dot representing a successfully predicted secondary structure—X-axis represents shift in the 5' end of the ITS2, Y-axis change of the length compared to the GenBank annotation. The cluster in the upper right corner consists of 206 sequences from Trifolium spec. Six outliers (GI: 5814072, 57999795, 2896060, 13507073, 4006937, 85724147) are not shown.
Figure 2.Structure coverage—each point indicates one genus. On the Y-axis, the square root of the number of sequences in the genus is indicated. On the X-axis, the percentage of correct structures for all sequences of the genus is plotted. Additionally on top of the scatter plot, a density plot is shown reflecting the coverage distribution over all genera. The colouring indicates the relative frequencies. A concentration of points at 50% is caused by genera containing only two sequences. A similar, less pronounced effect can be seen at 33.3% and 66.6% for genera with three sequences.