Yue Huang1, Ling Zhang. 1. Lynnon Corporation, 116 rue du Milicien, Vaudreuil-Dorion, Quebec, Canada, J7V 9M4. yhuang@lynnon.com
Abstract
MOTIVATION: Dot-matrix plots are widely used for similarity analysis of biological sequences. Many algorithms and computer software tools have been developed for this purpose. Though some of these tools have been reported to handle sequences of a few 100 kb, analysis of genome sequences with a length of >10 Mb on a microcomputer is still impractical due to long execution time and computer memory requirement. RESULTS: Two dot-matrix comparison methods have been developed for analysis of large sequences. The methods initially locate similarity regions between two sequences using a fast word search algorithm, followed with an explicit comparison on these regions. Since the initial screening removes most of random matches, the computing time is substantially reduced. The methods produce high quality dot-matrix plots with low background noise. Space requirements are linear, so the algorithms can be used for comparison of genome size sequences. Computing speed may be affected by highly repetitive sequence structures of eukaryote genomes. A dot-matrix plot of Yeast genome (12 Mb) with both strands was generated in 80 s with a 1 GHz personal computer.
MOTIVATION: Dot-matrix plots are widely used for similarity analysis of biological sequences. Many algorithms and computer software tools have been developed for this purpose. Though some of these tools have been reported to handle sequences of a few 100 kb, analysis of genome sequences with a length of >10 Mb on a microcomputer is still impractical due to long execution time and computer memory requirement. RESULTS: Two dot-matrix comparison methods have been developed for analysis of large sequences. The methods initially locate similarity regions between two sequences using a fast word search algorithm, followed with an explicit comparison on these regions. Since the initial screening removes most of random matches, the computing time is substantially reduced. The methods produce high quality dot-matrix plots with low background noise. Space requirements are linear, so the algorithms can be used for comparison of genome size sequences. Computing speed may be affected by highly repetitive sequence structures of eukaryote genomes. A dot-matrix plot of Yeast genome (12 Mb) with both strands was generated in 80 s with a 1 GHz personal computer.
Authors: Alan M Magee; Sue Aspinall; Danny W Rice; Brian P Cusack; Marie Sémon; Antoinette S Perry; Sasa Stefanović; Dan Milbourne; Susanne Barth; Jeffrey D Palmer; John C Gray; Tony A Kavanagh; Kenneth H Wolfe Journal: Genome Res Date: 2010-10-26 Impact factor: 9.043
Authors: Ryan Whitford; Ute Baumann; Tim Sutton; Luke Gumaelius; Petra Wolters; Scott Tingey; Jason A Able; Peter Langridge Journal: Funct Integr Genomics Date: 2006-03-14 Impact factor: 3.410
Authors: Saluana R Craveiro; Peter W Inglis; Roberto C Togawa; Priscila Grynberg; Fernando L Melo; Zilda Maria A Ribeiro; Bergmann M Ribeiro; Sônia N Báo; Maria Elita B Castro Journal: BMC Genomics Date: 2015-02-25 Impact factor: 3.969