| Literature DB >> 17942415 |
Yu Zhou1, Chen Lu, Qi-Jia Wu, Yu Wang, Zhi-Tao Sun, Jia-Cong Deng, Yi Zhang.
Abstract
Group I Intron Sequence and Structure Database (GISSD) is a specialized and comprehensive database for group I introns, focusing on the integration of useful group I intron information from available databases and providing de novo data that is essential for understanding these introns at a systematic level. This database presents 1789 complete intron records, including the nucleotide sequence of each annotated intron plus 15 nt of the upstream and downstream exons, and the pseudoknots-containing secondary structures predicted by integrating comparative sequence analyses and minimal free energy algorithms. These introns represent all 14 subgroups, with their structure-based alignments being separately provided. Both structure predictions and alignments were done manually and iteratively adjusted, which yielded a reliable consensus structure for each subgroup. These consensus structures allowed us to judge the confidence of 20 085 group I introns previously found by the INFERNAL program and to classify them into subgroups automatically. The database provides intron-associated taxonomy information from GenBank, allowing one to view the detailed distribution of all group I introns. CDSs residing in introns and 3D structure information are also integrated if available. About 17 000 group I introns have been validated in this database; approximately 95% of them belong to the IC3 subgroup and reside in the chloroplast tRNA(Leu) gene. The GISSD database can be accessed at http://www.rna.whu.edu.cn/gissd/Entities:
Mesh:
Substances:
Year: 2007 PMID: 17942415 PMCID: PMC2238919 DOI: 10.1093/nar/gkm766
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.GISSD pipeline. Green blocks indicate foreign databases, blue blocks highlight the core data of GISSD and the orange block indicates the local taxonomy data in GISSD. CM: covariance model, which is computed from intron alignments by the INFERNAL software package.
Figure 2.The schematic representation of secondary structure prediction and alignment. (I) Locate the core components in the intron by using conservative sequence patterns. The order is as: (1) find J6/7-P7, which was very conservative; (2) find P3′, which is usually 2 or 3 nt after P7, if no insertion sequences; (3) find J8/7-P7′, according to the base pairing of P7 and P7′ and the conservation of J8/7 and (4) find P3, paired with P3′. (II) Partition the sequence into four parts. In the first part, 5′ exon and 3′ exon sequences were used to find P1′ and P10, and the rest of the sequence before P3 was used to identify P2 and P2.1. (III) The four parts were folded by RNAstructure 4.11 (18) separately. Besides the minimum energy structure, other suboptimal structures were also checked. By comparing the folded structure to known structures, the structure having similar pattern was chosen manually. (IV) The whole structure of an intron was completed by integrating the structures of the subsequences. (V) When structure prediction was finished for certain numbers of introns in a subgroup, the alignment process was started. The core components and peripheral elements were sequentially aligned manually based on their structures. A point to emphasize is that the aligning process and the structure prediction procedure were iteratively done. Once one part of an intron ran into difficulty in the alignment with other sequences, the corresponding structure was reselected from the candidate structures of RNAstructure. Sometimes the core components needed to be reconsidered, and the whole process of structure prediction was redone for that particular intron.
Intron distribution on the host phylogenetic tree
The phylogenies of host organisms are listed on the left and the corresponding numbers of introns annotated in Rfam and CRW are listed in the next four columns. Introns from Rfam and those being filtered using different restrictions are also indicated. The shaded Viridiplantae region representing the major intron difference between Rfam and CRW are extended for two more nodes on the right.
Figure 3.Screenshots of GISSD. (A) Intron search page, (B) search result page, (C) sequence, structure and alignment page, (D) intron distribution page and (E) gIRfam page.