Literature DB >> 7584359

Identification of human gene functional regions based on oligonucleotide composition.

V V Solovyev1, C B Lawrence.   

Abstract

Accurate recognition of coding and intron regions within large regions of uncharacterized genomic DNA is an unsolved problem. A data base of more than 4,240,791 bp coding and 7,790,682 bp noncoding human sequences was extracted from GenBank to develop a function for locating coding regions in anonymous sequences. Several coding measures based on oligonucleotide preferences were tested on a control set that including 1/3 of all extracted sequences. An accuracy of separation of coding/noncoding regions is 87% for 9 bp oligonucleotides on 54 bp windows and 91% on 108 bp windows, respectively. For separation of coding/intron regions the accuracy is 89-90% for 8 bp oligonucleotides on 54 bp windows and up to 95% on 108 bp windows. Using the information about preferences of octanucleotides in protein coding and intron regions and significant triplet frequencies as a function of position near splice junctions, a joint splice site prediction scheme was developed. The accuracy of the joint scheme for predicting splice site positions on the test set was about 96-97%, which exceeds the accuracy of the previously reported splice site selection method based on a more complex artificial neural network approach. A model of splicing using poly-G(C) rich exon flanking sequences is suggested. A remarkable difference of oligonucleotide composition 5'- and 3'- gene regions is displayed and applied in a gene structure predictive system.

Entities:  

Mesh:

Substances:

Year:  1993        PMID: 7584359

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  2 in total

1.  Ab initio gene finding in Drosophila genomic DNA.

Authors:  A A Salamov; V V Solovyev
Journal:  Genome Res       Date:  2000-04       Impact factor: 9.043

2.  Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences.

Authors:  Derek Gatherer
Journal:  Bioinform Biol Insights       Date:  2009-11-24
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.