Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Separating significant matches from spurious matches in DNA sequences.

Literature DB >> 22149632

Separating significant matches from spurious matches in DNA sequences.

Abstract

Word matches are widely used to compare genomic sequences. Complete genome alignment methods often rely on the use of matches as anchors for building their alignments, and various alignment-free approaches that characterize similarities between large sequences are based on word matches. Among matches that are retrieved from the comparison of two genomic sequences, a part of them may correspond to spurious matches (SMs), which are matches obtained by chance rather than by homologous relationships. The number of SMs depends on the minimal match length (ℓ) that has to be set in the algorithm used to retrieve them. Indeed, if ℓ is too small, a lot of matches are recovered but most of them are SMs. Conversely, if ℓ is too large, fewer matches are retrieved but many smaller significant matches are certainly ignored. To date, the choice of ℓ mostly depends on empirical threshold values rather than robust statistical methods. To overcome this problem, we propose a statistical approach based on the use of a mixture model of geometric distributions to characterize the distribution of the length of matches obtained from the comparison of two genomic sequences.

Entities: Disease

Mesh：

Year: 2011 PMID： 22149632 PMCID： PMC3244807 DOI： 10.1089/cmb.2011.0070

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

24 in total

Separating significant matches from spurious matches in DNA sequences.

Review 1. Comparison of genomic DNA sequences: solved and unsolved problems.

Review 2. Comparative genomics: genome-wide analysis in metazoan eukaryotes.

Review 3. Alignment-free sequence comparison-a review.

4. Efficient multiple genome alignment.

5. Good spaced seeds for homology search.

Review 6. The many faces of sequence alignment.

7. Robustness assessment of whole bacterial genome segmentations.

8. A general method applicable to the search for similarities in the amino acid sequence of two proteins.

9. Alignment of whole genomes.

10. Versatile and open software for comparing large genomes.

1. Fast and accurate phylogeny reconstruction using filtered spaced-word matches.

2. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes.