| Literature DB >> 23105930 |
Abstract
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.Entities:
Keywords: intron length distributions; intron prediction; plant; three phases
Year: 2012 PMID: 23105930 PMCID: PMC3475488 DOI: 10.5808/GI.2012.10.1.58
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
List of intron databases
EST, espressed sequence tag.
Tools for detection alternative-splicing/introns
BLAT, Blast-Like Alignment Tool.
Ten plant species genome sequence sources
Fig. 1Flowchart of a comparison of BLAT and Sim4cc results in predicting introns. Intron information, including the following information of one intron: gene name, intron number, intron position in the gene, intron length, intron position in the genome, forward-exon length, backward-exon length, and intron sequences. BLAT, Blast-Like Alignment Tool.
Compared BLAT and Sim4cc predicted intron information with annotated intron information
BLAT, Blast-Like Alignment Tool.
Fig. 2An example of three phases of intron from an Arabidopsis gene, AT1G17600.1. Upper/lowercase sequence indicates exon/intron sequence. Asterisks indicate frameshifts introduced by non-3n introns; intronic in-frame stop codons are underlined. Intron 1 is a 99-bp intron (3n) with one in-frame stop codon. Intron 2 is a 100-bp intron (3n + 2), which has two in-frame stop codons and thus does not interrupt the open reading frame. Intron 3 is a 74-bp intron (3n + 1) with three stop codons.
Intron three-phase distributions of 10 plant species
Comparative comparison of BLAT and Sim4cc in intron prediction
Note: In this table, the data is the average of two model plants (Arabidopsis and rice).
BLAT, Blast-Like Alignment Tool.