Literature DB >> 8877516

Applications of GeneMark in multispecies environments.

J D McIninch1, W S Hayes, M Borodovsky.   

Abstract

This paper is supposed to bridge the gap between practical experience in using GeneMark for a rapidly widening repertoire of genomes, and the available publications that determine and compare the gene prediction accuracy of the GeneMark method for different genomes. Here we focus on the genome-specific variability of prediction error rates and their sources. DNA sequence inhomogeneity is present both in training and control sets of coding and non-coding regions. Coding region inhomogeneity, caused by differences in sequence composition between "native" and horizontally transferred genes or between genes expressed at different levels, contributes to the false negative error rate. Inhomogeneity of non-coding region may frequently be caused by the presence of unnoticed genes and contributes to the false positive error rate. We have documented such unnoticed genes in GenBank sequences for several species Some of protein products of these genes have been characterized by similarity search methods. For others, which we call "pioneer genes", no significant similarity has been found at a protein sequence level although the confidence of GeneMark prediction is high. For instance, to date a majority of those pioneer gene predictions made for E. coli now show strong similarity to more recently characterized proteins that have been added to protein sequence database. Another practical question is related to genomic sequence inhomogeneity at interspecies level: if GeneMark has not been trained for a particular species, is it possible to apply models derived for phylogenetically close genomes? The answer is, yes. The results of cross-species gene prediction experiments show that cross-species prediction can often be reasonably accurate.

Entities:  

Mesh:

Substances:

Year:  1996        PMID: 8877516

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  3 in total

1.  The secE gene of Helicobacter pylori.

Authors:  Claudine Médigue; Benjamin Chun-Yu Wong; Marie Chia-Mi Lin; Stéphanie Bocs; Antoine Danchin
Journal:  J Bacteriol       Date:  2002-05       Impact factor: 3.490

2.  Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures.

Authors:  Jiong Ma; Allan Campbell; Samuel Karlin
Journal:  J Bacteriol       Date:  2002-10       Impact factor: 3.490

3.  Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Authors:  Tadashi Imanishi; Takeshi Itoh; Yutaka Suzuki; Claire O'Donovan; Satoshi Fukuchi; Kanako O Koyanagi; Roberto A Barrero; Takuro Tamura; Yumi Yamaguchi-Kabata; Motohiko Tanino; Kei Yura; Satoru Miyazaki; Kazuho Ikeo; Keiichi Homma; Arek Kasprzyk; Tetsuo Nishikawa; Mika Hirakawa; Jean Thierry-Mieg; Danielle Thierry-Mieg; Jennifer Ashurst; Libin Jia; Mitsuteru Nakao; Michael A Thomas; Nicola Mulder; Youla Karavidopoulou; Lihua Jin; Sangsoo Kim; Tomohiro Yasuda; Boris Lenhard; Eric Eveno; Yoshiyuki Suzuki; Chisato Yamasaki; Jun-ichi Takeda; Craig Gough; Phillip Hilton; Yasuyuki Fujii; Hiroaki Sakai; Susumu Tanaka; Clara Amid; Matthew Bellgard; Maria de Fatima Bonaldo; Hidemasa Bono; Susan K Bromberg; Anthony J Brookes; Elspeth Bruford; Piero Carninci; Claude Chelala; Christine Couillault; Sandro J de Souza; Marie-Anne Debily; Marie-Dominique Devignes; Inna Dubchak; Toshinori Endo; Anne Estreicher; Eduardo Eyras; Kaoru Fukami-Kobayashi; Gopal R Gopinath; Esther Graudens; Yoonsoo Hahn; Michael Han; Ze-Guang Han; Kousuke Hanada; Hideki Hanaoka; Erimi Harada; Katsuyuki Hashimoto; Ursula Hinz; Momoki Hirai; Teruyoshi Hishiki; Ian Hopkinson; Sandrine Imbeaud; Hidetoshi Inoko; Alexander Kanapin; Yayoi Kaneko; Takeya Kasukawa; Janet Kelso; Paul Kersey; Reiko Kikuno; Kouichi Kimura; Bernhard Korn; Vladimir Kuryshev; Izabela Makalowska; Takashi Makino; Shuhei Mano; Regine Mariage-Samson; Jun Mashima; Hideo Matsuda; Hans-Werner Mewes; Shinsei Minoshima; Keiichi Nagai; Hideki Nagasaki; Naoki Nagata; Rajni Nigam; Osamu Ogasawara; Osamu Ohara; Masafumi Ohtsubo; Norihiro Okada; Toshihisa Okido; Satoshi Oota; Motonori Ota; Toshio Ota; Tetsuji Otsuki; Dominique Piatier-Tonneau; Annemarie Poustka; Shuang-Xi Ren; Naruya Saitou; Katsunaga Sakai; Shigetaka Sakamoto; Ryuichi Sakate; Ingo Schupp; Florence Servant; Stephen Sherry; Rie Shiba; Nobuyoshi Shimizu; Mary Shimoyama; Andrew J Simpson; Bento Soares; Charles Steward; Makiko Suwa; Mami Suzuki; Aiko Takahashi; Gen Tamiya; Hiroshi Tanaka; Todd Taylor; Joseph D Terwilliger; Per Unneberg; Vamsi Veeramachaneni; Shinya Watanabe; Laurens Wilming; Norikazu Yasuda; Hyang-Sook Yoo; Marvin Stodolsky; Wojciech Makalowski; Mitiko Go; Kenta Nakai; Toshihisa Takagi; Minoru Kanehisa; Yoshiyuki Sakaki; John Quackenbush; Yasushi Okazaki; Yoshihide Hayashizaki; Winston Hide; Ranajit Chakraborty; Ken Nishikawa; Hideaki Sugawara; Yoshio Tateno; Zhu Chen; Michio Oishi; Peter Tonellato; Rolf Apweiler; Kousaku Okubo; Lukas Wagner; Stefan Wiemann; Robert L Strausberg; Takao Isogai; Charles Auffray; Nobuo Nomura; Takashi Gojobori; Sumio Sugano
Journal:  PLoS Biol       Date:  2004-04-20       Impact factor: 8.029

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.