| Literature DB >> 23346041 |
Bulgan Galbadrakh1, Kyung-Eun Lee, Hyun-Seok Park.
Abstract
Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.Entities:
Keywords: context-free grammar; formal language theory; natural language processing; stochastic modeling
Year: 2012 PMID: 23346041 PMCID: PMC3543929 DOI: 10.5808/GI.2012.10.4.266
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1User interface of JSequitur program.
Fig. 2JSequitur class diagram.
One hundred four genes and their compression rates
Fig. 3Compression rates in relation to gene length.