Kazutaka Katoh1, Hiroyuki Toh. 1. Digital Medicine Initiative, Kyushu University, Fukuoka 812-8582, Japan. katoh@bioreg.kyushu-u.ac.jp
Abstract
MOTIVATION: To construct a multiple sequence alignment (MSA) of a large number (> approximately 10,000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. RESULTS: To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align approximately 60,000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. AVAILABILITY: The present algorithm has been implemented in the MAFFT sequence alignment package (http://align.bmr.kyushu-u.ac.jp/mafft/software/). SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.
MOTIVATION: To construct a multiple sequence alignment (MSA) of a large number (> approximately 10,000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. RESULTS: To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align approximately 60,000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. AVAILABILITY: The present algorithm has been implemented in the MAFFT sequence alignment package (http://align.bmr.kyushu-u.ac.jp/mafft/software/). SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.