| Literature DB >> 33494278 |
Eugene V Korotkov1, Yulia M Suvorova1, Dmitrii O Kostenko2, Maria A Korotkova2.
Abstract
In this study, we developed a new mathematical method for performing multiple alignment of highly divergent sequences (MAHDS), i.e., sequences that have on average more than 2.5 substitutions per position (x). We generated sets of artificial DNA sequences with x ranging from 0 to 4.4 and applied MAHDS as well as currently used multiple sequence alignment algorithms, including ClustalW, MAFFT, T-Coffee, Kalign, and Muscle to these sets. The results indicated that most of the existing methods could produce statistically significant alignments only for the sets with x < 2.5, whereas MAHDS could operate on sequences with x = 4.4. We also used MAHDS to analyze a set of promoter sequences from the Arabidopsis thaliana genome and discovered many conserved regions upstream of the transcription initiation site (from -499 to +1 bp); a part of the downstream region (from +1 to +70 bp) also significantly contributed to the obtained alignments. The possibilities of applying the newly developed method for the identification of promoter sequences in any genome are discussed. A server for multiple alignment of nucleotide sequences has been created.Entities:
Keywords: dynamic programming; genetic algorithm; multiple sequence alignment; promoter
Mesh:
Year: 2021 PMID: 33494278 PMCID: PMC7909805 DOI: 10.3390/genes12020135
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096