Literature DB >> 12710898

Overlapping translation of nucleic acid sequences for bioinformatics applications.

Jan Charles Biro1.   

Abstract

SUMMARY: An alternative method to TblastX has been developed. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with BlastP. Thus, each nucleic acid sequences is represented by a single 'protein like' sequence instead of three 'proteins' in different reading frames. The 3x3 comparison of TblastX is represented by a single comparison, giving faster results. Additional advantages are: (1) it can be more sensitive to detect weak sequence similarities than either blastN or TblastX; (2) codon redundancy is eliminated; (3) the sensitivity to single nucleotide polymorphism, mutation and sequencing errors is reduced; (4) it is insensitive to frame shifts.
RESULTS: BlastP using OTS detected about two thirds of blastN and TblastX matches but discovered additional similarities. When blastN and TblastX against nucleic acids were compared to blastP against OTS, identical matches discovered by blastP were generally longer (602, respectively. 213 letters, p<0.01), had higher scores (748 respectively 460 bits, p<0.05) and lower E values (3.16E-20 vs. 1.17E+03, p<0.01) but the percentage identity was lower (25% respectively 61%, p<0.001). A qualitative evaluation with LALIGN showed an improvement of the visualization when OTS-s were used instead of nucleic acids. Many extensive sequence similarities became better visible, for example the repeating similarity between prion protein and human insulin gene micro-satellite, and the surprising similarity between the first part of prion protein coding region and the human pro-insulin (34.4% identity and additional 17.2% similarity through 238 residues, score >295 which is expected 4.6e-18 times by chance).

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12710898     DOI: 10.1016/s0306-9877(03)00008-2

Source DB:  PubMed          Journal:  Med Hypotheses        ISSN: 0306-9877            Impact factor:   1.538


  1 in total

1.  Frequent occurrence of recognition site-like sequences in the restriction endonucleases.

Authors:  Jan C Biro; Josephine M K Biro
Journal:  BMC Bioinformatics       Date:  2004-03-16       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.