| Literature DB >> 32429841 |
Huihui Chang1, Yimeng Nie1, Nan Zhang1, Xue Zhang1, Huimin Sun1, Ying Mao1, Zhongying Qiu2, Yuan Huang3.
Abstract
BACKGROUND: Amino acid substitution models play an important role in inferring phylogenies from proteins. Although different amino acid substitution models have been proposed, only a few were estimated from mitochondrial protein sequences for specific taxa such as the mtArt model for Arthropoda. The increasing of mitochondrial genome data from broad Orthoptera taxa provides an opportunity to estimate the Orthoptera-specific mitochondrial amino acid empirical model.Entities:
Keywords: Amino acid substitution model; Mitochondrial genome; Orthoptera; Phylogeny
Year: 2020 PMID: 32429841 PMCID: PMC7236349 DOI: 10.1186/s12862-020-01623-6
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
The mtOrt model
| Ala | Arg | Asn | Asp | Cys | Gln | Glu | Gly | His | Ile | Leu | Lys | Met | Phe | Pro | Ser | Thr | Trp | Tyr | Val | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Exchangeability rates | Ala | ||||||||||||||||||||
| Arg | 0.04 | ||||||||||||||||||||
| Asn | 0.03 | 0.22 | |||||||||||||||||||
| Asp | 0.14 | 0.09 | 6.01 | ||||||||||||||||||
| Cys | 0.64 | 1.63 | 0.57 | 0.24 | |||||||||||||||||
| Gln | 0.09 | 3.01 | 1.44 | 0.30 | 0.25 | ||||||||||||||||
| Glu | 0.20 | 0.03 | 1.64 | 10.55 | 0.25 | 2.50 | |||||||||||||||
| Gly | 1.05 | 0.21 | 0.69 | 1.13 | 1.25 | 0.08 | 1.24 | ||||||||||||||
| His | 0.07 | 1.43 | 2.55 | 0.46 | 0.24 | 5.73 | 0.12 | 0.08 | |||||||||||||
| Ile | 0.08 | 0.04 | 0.44 | 0.05 | 0.28 | 0.05 | 0.06 | 0.04 | 0.10 | ||||||||||||
| Leu | 0.06 | 0.07 | 0.07 | 0.02 | 0.28 | 0.23 | 0.05 | 0.03 | 0.14 | 1.59 | |||||||||||
| Lys | 0.00 | 1.43 | 3.95 | 0.13 | 0.00 | 3.98 | 3.33 | 0.13 | 0.25 | 0.08 | 0.07 | ||||||||||
| Met | 0.41 | 0.02 | 0.43 | 0.05 | 0.27 | 0.25 | 0.29 | 0.17 | 0.14 | 2.95 | 3.98 | 0.80 | |||||||||
| Phe | 0.06 | 0.00 | 0.12 | 0.02 | 1.50 | 0.06 | 0.07 | 0.09 | 0.17 | 1.00 | 2.16 | 0.03 | 0.63 | ||||||||
| Pro | 0.52 | 0.42 | 0.27 | 0.06 | 0.00 | 0.95 | 0.11 | 0.01 | 0.73 | 0.07 | 0.29 | 0.43 | 0.07 | 0.08 | |||||||
| Ser | 3.26 | 0.25 | 2.51 | 0.49 | 4.04 | 0.45 | 0.57 | 2.47 | 0.20 | 0.18 | 0.45 | 3.04 | 0.75 | 0.62 | 1.51 | ||||||
| Thr | 4.04 | 0.09 | 1.77 | 0.07 | 0.34 | 0.15 | 0.24 | 0.03 | 0.24 | 2.14 | 0.27 | 0.71 | 3.35 | 0.09 | 1.11 | 3.91 | |||||
| Trp | 0.03 | 1.21 | 0.10 | 0.19 | 2.09 | 0.01 | 0.18 | 0.25 | 0.02 | 0.07 | 0.42 | 0.32 | 0.21 | 0.52 | 0.06 | 0.34 | 0.02 | ||||
| Tyr | 0.02 | 0.22 | 1.44 | 0.41 | 3.73 | 0.75 | 0.27 | 0.09 | 4.15 | 0.17 | 0.21 | 0.39 | 0.30 | 3.90 | 0.22 | 0.52 | 0.13 | 0.78 | |||
| Va | 2.31 | 0.06 | 0.09 | 0.13 | 1.69 | 0.01 | 0.32 | 0.54 | 0.00 | 9.41 | 0.84 | 0.05 | 2.49 | 0.66 | 0.08 | 0.38 | 1.51 | 0.14 | 0.16 | ||
| Amino acid frequencies | 0.04 | 0.01 | 0.06 | 0.02 | 0.01 | 0.02 | 0.02 | 0.04 | 0.01 | 0.11 | 0.16 | 0.03 | 0.09 | 0.09 | 0.03 | 0.10 | 0.06 | 0.02 | 0.05 | 0.05 |
Log-likelihood of the target function on training dataset
| mtInv (initial model) | − 438,493.411 |
| First iteration | − 436,598.238 |
| Second iteration | −436,551.677 |
| Third iteration (final model) | −436,550.299 |
| AIC improvement | 3470.2240000001 |
| AIC/site | 0.804 |
Note: AIC/site are the AIC improvement per site of the final model in comparison to the initial model mtLnv, respectively
The correlations between mtOrt, mtOrt_O, mtOrt_C and mtOrt_E. MtOrt_O models
| mtOrt | mtOrt_O | mtOrt_C | mtOrt_E | |
|---|---|---|---|---|
| mtOrt | 1.000** | 0.998** | 0.998** | |
| mtOrt_O | 0.999** | 0.998** | 0.998** | |
| mtOrt_C | 0.990** | 0.990** | 0.992** | |
| mtOrt_E | 0.986** | 0.985** | 0.954** |
Note: The values in the top triangle represent the correlations between frequency vectors, while values in the low triangle are the correlations between exchangeability matrices. The greater the absolute value of the Pearson correlation coefficient, the higher the correlation. **: p < 0.01, extremely significant correlation
The Pearson’s correlations between 12 models: mtOrt and 11 widely used models
| Dayhoff | JTT | LG | mtArt | mtDeu | mtInv | mtMet | mtPan2013 | mtPro | mtZoa | WAG | mtOrt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dayhoff | 0.903** | 0.812** | 0.397 | 0.903** | 0.365 | 0.34 | 0.903** | 0.495* | 0.93** | 0.252 | ||
| JTT | 0.896** | 0.961** | 0.559* | 1.000** | 0.444 | 0.523* | 0.517* | 1.000** | 0.643** | 0.975** | 0.436 | |
| LG | 0.854** | 0.914** | 0.541* | 0.961** | 0.457* | 0.527* | 0.538* | 0.961** | 0.619** | 0.912** | 0.457* | |
| mtArt | 0.767** | 0.789** | 0.866** | 0.559* | 0.941** | 0.965** | 0.964** | 0.559* | 0.981** | 0.491* | 0.948** | |
| mtDeu | 0.895** | 1.000** | 0.914** | 0.789** | 0.444 | 0.523* | 0.517* | 1.000** | 0.643** | 0.975** | 0.436 | |
| mtInv | 0.024 | −0.015 | 0.028 | 0.091 | −0.014 | 0.976** | 0.968** | 0.444 | 0.875** | 0.363 | ||
| mtMet | −0.025 | −0.018 | 0.056 | 0.109 | −0.017 | 0.959** | 0.982** | 0.523* | 0.929** | 0.439 | ||
| mtPan2013 | −0.035 | −0.025 | 0.054 | −0.024 | 0.956** | 0.897** | 0.517* | 0.931** | 0.434 | |||
| mtPro | 0.896** | 1.000** | 0.914** | 0.789** | 1.000** | −0.015 | −0.018 | −0.025 | 0.643** | 0.975** | 0.436 | |
| mtZoa | 0.821** | 0.825** | 0.894** | 0.982** | 0.852** | 0.072 | 0.086 | 0.037 | 0.852** | 0.578** | 0.892** | |
| WAG | 0.919** | 0.934** | 0.961** | 0.809** | 0.934** | 0.003 | 0.017 | −0.016 | 0.934** | 0.850** | 0.347 | |
| mtOrt | −0.046 | −0.046 | − 0.011 | 0.047 | − 0.440 | −0.046 | 0.025 | −0.024 |
Note: The values in the top triangle represent the correlations between frequency vectors, while values in the low triangle are the correlations between exchangeability matrices. The greater the absolute value of the Pearson correlation coefficient, the higher the correlation. *: p < 0.05, significant correlation; **: p < 0.01, extremely significant correlation
Fig. 1Amino acid exchangeability rates of mtOrt, mtInv, mtPan2013 and mtMet models
Fig. 2The ratio of exchangeability rates between mtOrt and mtMet/mtPan2013/mtInv models. The size of one circle represents the exchangeability rate between mtOrt and other models. The solid (unfilled) circles represent exchangeability rates where mtOrt is bigger (smaller) than the three models. For visualization, the large ratios are trimmed at 10 and marked with dotted circles
Fig. 3Amino acid frequencies of mtOrt, mtInv, mtPan2013 and mtMet models
Fig. 4The mean difference of log-likelihood and AIC scores of per site between mtOrt and the existing models on testing datasets
Fig. 5The number and confidence levels of different models that build the optimal topology for each sub-dataset
Fig. 6The topological distances between trees inferred using mtOrt and five existing models. The horizontal axis indicates the topological distance between 2 tree topologies, whereas the vertical axis indicates the number of datasets
Fig. 7The phylogenic relationships among the higher taxa of Orthoptera
Fig. 8The number and confidence levels of different models optimized by mtOrt (+R5) that build the optimal topology for each sub-dataset
Fig. 9The maximum likelihood-based process to estimate an amino acid substitution model for protein sequences