Literature DB >> 9632830

Searching DNA databases for similarities to DNA sequences: when is a match significant?

I Anderson1, A Brass.   

Abstract

MOTIVATION: Searching DNA sequences against a DNA database is an essential element of sequence analysis. However, few systematic studies have been carried out to determine when a match between two DNA sequences has biological significance and this is limiting the use that can be made of DNA searching algorithms.
RESULTS: A test set of DNA sequences has been constructed consisting of artificially evolved and real sequences. This set has been used to test various database searching algorithms (BLAST, BLAST2, FASTA and Smith-Waterman) on a subset of the EMBL database. The results of this analysis have been used to determine the sensitivity and coverage of all of the algorithms. Guidelines have been produced which can be used to assess the significance of DNA database search results. The Smith-Waterman algorithm was shown to have the best coverage, but the worst sensitivity, whereas the default BLASTN algorithm (word length set to 11) was shown to have good sensitivity, but poor coverage. A sensible compromise between speed, sensitivity and coverage can be obtained using either the FASTA or BLAST (word length set to 6) algorithms. However, analysis of the results also showed that no algorithm works well when the length of the probe sequence is <200 bases. In general, matches can accurately be identified between coding regions of DNA sequences when there is >35% sequence identity between the corresponding proteins. Searching a DNA sequence against a DNA sequence database can, therefore, be a useful tool in sequence analysis. AVAILABILITY: The test sets used are available via anonymous ftp from mbisg2.sbc.man.ac.uk in the directory /pub/cabios/testdata/ CONTACT: I.Anderson@stud.man.ac.uk; abrass@man.ac.uk

Entities:  

Mesh:

Year:  1998        PMID: 9632830     DOI: 10.1093/bioinformatics/14.4.349

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

1.  Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database.

Authors:  Per Unneberg; Anders Wennborg; Magnus Larsson
Journal:  Nucleic Acids Res       Date:  2003-04-15       Impact factor: 16.971

2.  Mastering seeds for genomic size nucleotide BLAST searches.

Authors:  Valer Gotea; Vamsi Veeramachaneni; Wojciech Makałowski
Journal:  Nucleic Acids Res       Date:  2003-12-01       Impact factor: 16.971

3.  Identification and binding characterization of three odorant binding proteins and one chemosensory protein from Apolygus lucorum (Meyer-Dur).

Authors:  Jin-Feng Hua; Shuai Zhang; Jin-Jie Cui; Dao-Jie Wang; Chun-Yi Wang; Jun-Yu Luo; Li-Min Lv
Journal:  J Chem Ecol       Date:  2012-10-02       Impact factor: 2.626

4.  Molecular cloning and expression analysis of a heat shock protein (Hsp90) gene from black tiger shrimp (Penaeus monodon).

Authors:  Shigui Jiang; Lihua Qiu; Falin Zhou; Jianhua Huang; Yihui Guo; Keng Yang
Journal:  Mol Biol Rep       Date:  2007-10-13       Impact factor: 2.316

5.  Pitfalls of establishing DNA barcoding systems in protists: the cryptophyceae as a test case.

Authors:  Kerstin Hoef-Emden
Journal:  PLoS One       Date:  2012-08-24       Impact factor: 3.240

6.  A method to construct cDNA library of the entomopathogenic fungus, Metarhizium anisopliae, in the hemolymph of the infected locust.

Authors:  Cangsang Zhang; Yueqing Cao; Zhongkang Wang; Youping Yin; Guoxiong Peng; Yuxian Xia
Journal:  Mol Biotechnol       Date:  2007-05       Impact factor: 2.860

7.  A relational database for the discovery of genes encoding amino acid biosynthetic enzymes in pathogenic fungi.

Authors:  Peter F Giles; Darren M Soanes; Nicholas J Talbot
Journal:  Comp Funct Genomics       Date:  2003

8.  Genes involved in sex pheromone biosynthesis of Ephestia cautella, an important food storage pest, are determined by transcriptome sequencing.

Authors:  Binu Antony; Alan Soffan; Jernej Jakše; Sulieman Alfaifi; Koko D Sutanto; Saleh A Aldosari; Abdulrahman S Aldawood; Arnab Pain
Journal:  BMC Genomics       Date:  2015-07-18       Impact factor: 3.969

9.  ExprAlign--the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles.

Authors:  Weizhong Li; Andrew Y Gracey; Luciane Vieira Mello; Andrew Brass; Andrew R Cossins
Journal:  BMC Genomics       Date:  2009-11-26       Impact factor: 3.969

10.  Identification and comparative expression analysis of odorant binding protein genes in the tobacco cutworm Spodoptera litura.

Authors:  Shao-Hua Gu; Jing-Jiang Zhou; Shang Gao; Da-Hai Wang; Xian-Chun Li; Yu-Yuan Guo; Yong-Jun Zhang
Journal:  Sci Rep       Date:  2015-09-08       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.