| Literature DB >> 28968734 |
Kazutaka Katoh, John Rozewicki, Kazunori D Yamada.
Abstract
This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.Entities:
Keywords: multiple sequence alignment; phylogenetic tree; sequence analysis
Mesh:
Year: 2019 PMID: 28968734 PMCID: PMC6781576 DOI: 10.1093/bib/bbx108
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Screenshot of input page for large MSAs in MAFFT online service. (A–G) are explained in the main text.
Results of two different benchmarks, ContTest (136 entries, 1467–43 912 sequences) [3] and HomFam (89 entries; 93–93 681 sequences) [4], for some MAFFT options available on our online server
| Method | ContTest | HomFam | |||
|---|---|---|---|---|---|
| Accuracy score | CPU time (minutes) |
| CPU time (minutes) | ||
|
| PartTree (partsize = 50) | 0.4103 | 61 | 0.7862/0.5658 | 47 |
| PartTree (partsize = 1000) | 0.4364 | 140 | 0.8258/0.6377 | 94 | |
| DPPartTree (partsize = 50) | 0.4424 | 210 | 0.8413/0.6597 | 160 | |
| DPPartTree (partsize = 1000) | 0.4632 | 1000 | 0.8541/0.6934 | 820 | |
|
| FFT-NS-1 | 0.4856 | 170 | 0.8491/0.6669 | 160 |
|
| FFT-NS-1 (memsavetree) | 0.4835 | 280 | 0.8416/0.6667 | 260 |
|
| FFT-NS-2 | 0.4998 | 500 | 0.8759/0.7162 | 460 |
|
| FFT-NS-2 (memsavetree) | 0.5099 | 1100 | 0.8611/0.7023 | 990 |
|
| mafft-sparsecore ( | 0.5153 | 730 | 0.8821/0.7274 | 650 |
| mafft-sparsecore ( | 0.5361 | 1200 | 0.8970/0.7586 | 1300 | |
| mafft-sparsecore ( | 0.5440 | 3400 | 0.9075/0.7810 | 4400 | |
|
| mafft-sparsecore ( | 0.5298 | 1500 | 0.8845/0.7416 | 1300 |
| mafft-sparsecore ( | 0.5438 | 2000 | 0.8995/0.7638 | 2000 | |
| mafft-sparsecore ( | 0.5428 | 4200 | 0.9052/0.7826 | 5000 | |
|
| G-INS-1 | 0.5696 | 55 000 | 0.9306/0.8288 | 49000 |
|
| Randomchain | 0.5425 | 100 | 0.8349/0.6681 | 88 |
Note: The sum-of-pairs (SP) and total-column (TC) scores for HomFam were calculated by the FastSP program [5]. (A–G) correspond to the techniques explained in the main text. Command-line arguments are displayed after performing the calculation on the online service and also listed in the main text. Random numbers are used in (A), (E) and (G). In this test, only one set of random numbers was used for each method. For (E) and (G), seed of random numbers can be specified in the download version (see the last section in the main text) but cannot be specified in the online version. See https://mafft.sb.ecei.tohoku.ac.jp/ for detailed results.
Figure 2Variants of --add option.
Figure 3Interactive sequence selection. A group of sequences in guide tree (A) is selected at a time in sequence selection window (B). Several options for tree estimation can be selected (C). MSA can be visually checked using MSAViewer (D).