Literature DB >> 16749185

Statistical analysis of GeneMark performance by cross-validation.

J Kleffe1, K Hermann, M Borodovsky.   

Abstract

We have explored the performance of the GeneMark gene identification method using cross-validation over learning samples of E. coli DNA sequences. The computations gave more accurate estimations of the error rates in comparison with previous results when a sample of non-coding regions was derived from GenBank sequences with many true coding regions unannotated. The error rate components have been classified and delineated. It was shown that the method performs differently on class I, II and III genes. The most frequent errors come from misinterpreting the coding potential of the complementary sequence in the same frame. The effects of stop-codons present in alternative frames were also studied to understand better the main factors contributing to GeneMark performance.

Entities:  

Mesh:

Year:  1996        PMID: 16749185     DOI: 10.1016/s0097-8485(96)80014-3

Source DB:  PubMed          Journal:  Comput Chem        ISSN: 0097-8485


  1 in total

1.  Self-identification of protein-coding regions in microbial genomes.

Authors:  S Audic; J M Claverie
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-18       Impact factor: 11.205

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.