| Literature DB >> 27376004 |
HaiXia Long1, ManZhi Li2, HaiYan Fu1.
Abstract
BACKGROUND: Multiple sequence alignment (MSA) is one of the most important research contents in bioinformatics. A number of MSA programs have emerged. The accuracy of MSA programs highly depends on the parameters setting, mainly including gap open penalties (GOP), gap extension penalties (GEP) and substitution matrix (SM). This research tries to obtain the optimal GOP, GEP and SM rather than MAFFT default parameters.Entities:
Keywords: Default parameters; Gap extension penalties; Gap open penalties; MAFFT program; Multiple sequence alignment; Substitution matrix
Year: 2016 PMID: 27376004 PMCID: PMC4909661 DOI: 10.1186/s40064-016-2526-5
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
MAFFT algorithms and parameters (substitution matrix is denoted by bl)
| Method types | Algorithms | Parameters | Explain |
|---|---|---|---|
| Progressive methods | FFT-NS-1 | gop 1.53 | Approximately two times faster than the default |
| FFT-NS-2 | gop 1.53 | The accuracy of the FFT-NS-2 is slightly higher than that of the FFT-NS-1 | |
| Iterative refinement method | FFT-NS-i | gop 1.53 | Fastest in this category. Uses WSP score only |
| NW-NS-i | gop 1.53 | Distance is by the 6mer method | |
| Iterative refinement methods using WSP and consistency scores | L-INS-i | gop 1.53 | Uses WSP score and consistency score from local alignments |
| E-INS-i | gop 1.53 | Uses WSP score and consistency score from local alignments with a generalized affine gap cost | |
| G-INS-i | gop 1.53 | Uses WSP score and consistency score from global alignments |
BaliBASE 3.0 Statistics
| RV11 | RV12 | RV20 | RV30 | RV40 | RV50 | TOTAL | |
|---|---|---|---|---|---|---|---|
| Number of alignment | 38 | 45 | 41 | 30 | 48 | 16 | 218 |
| Number of sequence | 265 | 411 | 1896 | 1882 | 1317 | 483 | 6255 |
Fig. 1The value of SPS
The MAX_MEAN_SPS value of different substitution matrix
| Data set | RV11 | RV12 | RV20 | RV30 | RV40 | RV50 |
|---|---|---|---|---|---|---|
| BLOSUM30 | 0.5201 | 0.8369 | 0.8532 | 0.7683 | 0.6688 | 0.7427 |
| BLOSUM45 |
|
| 0.8577 | 0.7727 |
|
|
| BLOSUM62 | 0.5791 | 0.8380 |
|
| 0.6745 | 0.7466 |
| BLOSUM80 | 0.5770 | 0.8396 | 0.8573 | 0.7737 | 0.6752 | 0.7468 |
| PAM100 | 0.5453 | 0.8315 | 0.8535 | 0.7694 | 0.6686 | 0.7423 |
| PAM200 | 0.5415 | 0.8309 | 0.8518 | 0.7728 | 0.6682 | 0.7348 |
Italic figures represent the best results
The optimal parameters for each data set
| Data set | The optimal substitution matrix | The best MAX_MEAN_SPS value | The optimal GOP | The optimal of GEP |
|---|---|---|---|---|
| RV11 | Blosum45 | 0.5912 | 2 | 0.12 |
| RV12 | Blosum45 | 0.8465 | 2.9 | 1.44 |
| RV20 | Blosum62 | 0.8594 | 2.3 | 0.63 |
| RV30 | Blosum62 | 0.7819 | 2.1 | 0.72 |
| RV40 | Blosum45 | 0.6818 | 2.8 | 0.39 |
| RV50 | Blosum45 | 0.7505 | 2.8 | 0.03 |
The mean of SPS value from different algorithms
| RV11 | RV12 | RV20 | RV30 | RV40 | RV50 | |
|---|---|---|---|---|---|---|
| mafft-default | 0.4582 | 0.8142 | 0.8301 | 0.737 | 0.6168 | 0.6971 |
| clustalw-default | 0.4758 | 0.7966 | 0.8077 | 0.6802 | 0.5917 | 0.6377 |
| mafft_measure |
|
|
|
|
|
|
Italic figures represent the best SPS value from MAFFT measure parameters
Fig. 2The SPS values of sequence sets in different algorithm