| Literature DB >> 16086841 |
Chunlin Wang1, Elliot J Lefkowitz.
Abstract
BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics--the practice of comparing genomic sequences from different species--plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program.Entities:
Mesh:
Year: 2005 PMID: 16086841 PMCID: PMC1208854 DOI: 10.1186/1471-2105-6-200
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Mutation operators used in GenAlignRefine. a) Random_gap operator. Random gaps (shaded hypens) are inserted into the parent alignment to produce the offspring sequences. b) Local_gap_shuffle operator. Gaps in the parent are randomly moved to produce new offspring. c) Block_gap_shuffle operator. Contiguous blocks of gaps are randomly moved to new positions.
Performance of GenAlignRefine on simulated data.
| Program | Before Refinement | After Refinement |
| CHAOS/DIALIGN | 78.0%* | 85.1% |
| Multi-LAGAN | 86.3% | 93.0% |
* The numbers indicate the consistency between the alignment generated with the genome alignment tool and the "correct" alignment generated by Rose (see text).
Figure 2Improvement of COFFEE score for fuzzy regions. The 200 fuzzy regions derived from the starting Orthopoxvirus alignment that showed improvement following application of GenAlignRefine are displayed. For clarity, regions are sorted according to the overall improvement in COFFEE score. Vertical bars connect dots that show the improvement in COFFEE score for each region at each step in the refinement process. Red dots plot the original COFFEE score of the Multi-LAGAN-generated alignment for each region; green dots plot the COFFEE score of the same region after realignment by T-Coffee; blue dots indicate the COFFEE score of the same region after optimization by the genetic algorithm. The small magenta squares plot the overall improvement in COFFEE score for each region.