| Literature DB >> 29682519 |
ManZhi Li1, HaiXia Long2, HongTao Wang1, HaiYan Fu2, Dong Xu2, YouJian Shen1, YuHua Yao1, Bo Liao1.
Abstract
High-accuracy alignment of sequences with disease information contributes to disease treatment and prevention. The results of multiple sequence alignment depend on the parameters of the objective function, including gap open penalties (GOP), gap extension penalties (GEP), and substitution matrix (SM). Firstly, the theory parameter formulas relating to GOP, GAP, and SM are inferred, combining unaligned sequence length, number, and identity. Secondly, we tested the rationality of the theory parameter formulas, with experiment on the ClustalW and MAFFT program. In addition, we obtained a group of MAFFT program parameters according to the formulas proposed. The results of all experiments show that the SPS (sum-of-pair score) obtained from theory parameters is better than the SPS obtained from the default parameters of ClustalW and MAFFT. In both theory and practice, our method to determine the parameters is feasible and efficient. These can provide high-accuracy alignment results for precision medicine.Entities:
Mesh:
Year: 2018 PMID: 29682519 PMCID: PMC5842723 DOI: 10.1155/2018/1718046
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Ratio of the longest sequence and the number of gaps inserted into the sequence.
| BALIBASE 2.0 | BALIBASE 3.0 | ||||||
|---|---|---|---|---|---|---|---|
| Data set | Test 1 | Test 2 | Test 3 | Ref 2 | Ref 3 | RV11 | RV12 |
| Mean (ratio) | 0.0769 | 0.0764 | 0.0744 | 0.1439 | 0.1612 | 0.1938 | 0.0784 |
Figure 1The relationship between the sequence length and the number of gaps.
Figure 2Unalignment and alignment.
Figure 3The results of the verification of substitution matrix (11).
The number of sequences meeting the substitution matrix requirements (see (11)).
| Sequence number | Reference alignment number | BLOSUM30 qualified | BLOSUM45 qualified | BLOSUM62 qualified | |
|---|---|---|---|---|---|
| Reference 1 | 4-5 | 78 | 78 (100%) | 78 (100%) | 78 (100%) |
| Reference 2 | 14–19 | 22 | 22 (100%) | 22 (100%) | 22 (100%) |
| Reference 3 | > 20 | 12 | 12 (100%) | 12 (100%) | 12 (100%) |
Figure 4The results of verification of GOP/GEP in (14) and (15).
Determination of the value of n, λ, ω, α, β.
|
| BLOSUM30 | BLOSUM45 | BLOSUM62 | |||
|---|---|---|---|---|---|---|
|
| num | SPS | num | SPS | num | SPS |
| 0.01 | 7 | 0.7586 | 11 | 0.7768 | 9 | 0.7697 |
| 0.02 | 9 | 0.761 | 10 | 0.7769 | 8 | 0.7692 |
| 0.03 | 10 | 0.7643 | 11 | 0.7795 | 10 | 0.7703 |
| 0.04 | 12 | 0.782 | 15 | 0.7886 | 12 | 0.7745 |
| 0.05 | 14 | 0.7843 |
|
| 15 | 0.7846 |
| 0.06 | 14 | 0.7805 | 17 | 0.7924 | 17 | 0.7864 |
| 0.07 | 14 | 0.7767 | 16 | 0.7896 | 16 | 0.786 |
| 0.08 | 14 | 0.7728 | 16 | 0.784 | 14 | 0.7821 |
| 0.09 | 14 | 0.7668 | 15 | 0.7804 | 14 | 0.7777 |
| 0.1 | 14 | 0.764 | 15 | 0.78 | 15 | 0.7826 |
| 0.01 | 7 | 0.7586 | 11 | 0.7768 | 9 | 0.7697 |
| 0.02 | 9 | 0.7614 | 10 | 0.7769 | 9 | 0.77 |
| 0.03 | 10 | 0.7681 | 12 | 0.7818 | 12 | 0.7744 |
| 0.04 | 15 | 0.7874 | 15 | 0.7877 | 13 | 0.7858 |
| 0.05 | 13 | 0.7783 | 15 | 0.79 | 16 | 0.7918 |
| 0.06 | 13 | 0.7736 | 14 | 0.7845 | 14 | 0.7859 |
| 0.07 | 13 | 0.7732 | 14 | 0.7781 | 12 | 0.7819 |
| 0.08 | 13 | 0.7709 | 16 | 0.7846 | 12 | 0.7713 |
| 0.09 | 14 | 0.7779 | 15 | 0.7781 | 13 | 0.7751 |
| 0.1 | 14 | 0.7731 | 15 | 0.7795 | 14 | 0.7779 |
Optimal GOP/GEP/matrix.
| Sequence set | Ref 1-test 1 | Ref 1-test 2 | Ref 1-test 3 | Ref 2 | Ref 3 |
|---|---|---|---|---|---|
| Sequence | 4-5 | 4-5 | 4-5 | 14–19 | >20 |
| Sequence | <100 | 100–300 | >300 | 50–600 | 60–600 |
|
| 0.03 | 0.05 | 0.08 | 0.02 | 0.02 |
|
| 5 | 5 | 10 | 10 | 10 |
| Matrix | BLOSUM45 | BLOSUM45 | BLOSUM62 | BLOSUM45 | BLOSUM45 |
|
| 3 | ||||
|
| 0.2 | ||||
|
| 0.9 | ||||
Figure 5The SPS values are from MAFFT theory parameters, MAFFT default parameters, and CLUSTALW default parameter.
SPS mean value.
| BaliBASE 2.0 | BaliBASE 3.0 | ||||||
|---|---|---|---|---|---|---|---|
| Data set | Ref 1 | Ref 1 | Ref 1 | Ref 2 | Ref 3 | RV11 | RV12 |
| MAFFT default parameters | 0.7749 | 0.7743 | 0.7460 | 0.8584 | 0.6938 | 0.4582 | 0.8142 |
| CW default parameters | 0.7614 | 0.7732 | 0.7340 | 0.8311 | 0.6189 | 0.4758 | 0.7966 |
| MAFFT theory parameters | 0.7918 | 0.8003 | 0.7652 | 0.8655 | 0.7073 | 0.5183 | 0.8449 |