| Literature DB >> 12951575 |
Jun Wang1, ShengTing Li, Yong Zhang, HongKun Zheng, Zhao Xu, Jia Ye, Jun Yu, Gane Ka-Shu Wong.
Abstract
To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins. Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistently low false-negative rate. The incorporation of similarity information is meant to reduce the false-positive rate, but in doing so it increases the false-negative rate. The crucial variable is gene size (including introns)--genes of the most extreme sizes, especially very large genes, are most likely to be incorrectly predicted.Mesh:
Year: 2003 PMID: 12951575 DOI: 10.1038/nrg1160
Source DB: PubMed Journal: Nat Rev Genet ISSN: 1471-0056 Impact factor: 53.242