| Literature DB >> 29966263 |
Xiaoli Qiang1, Zheng Kou2, Gang Fang3, Yanfeng Wang4.
Abstract
Avian influenza virus (AIV) can directly cross species barriers and infect humans with high fatality. Using machine learning methods, the present paper scores the amino acid mutations and predicts interspecies transmission. Initially, 183 signature positions in 11 viral proteins were screened by the scores of five amino acid factors and their random forest rankings. The most important amino acid factor (Factor 3) and the minimal range of signature positions (50 amino acid residues) were explored by a supporting vector machine (the highest-performing classifier among four tested classifiers). Based on these results, the avian-to-human transmission of AIVs was analyzed and a prediction model was constructed for virology applications. The distributions of human-origin AIVs suggested that three molecular patterns of interspecies transmission emerge in nature. The novel findings of this paper provide important clues for future epidemic surveillance.Entities:
Keywords: amino acid mutation; avian influenza virus; interspecies transmission; machine learning
Mesh:
Year: 2018 PMID: 29966263 PMCID: PMC6100476 DOI: 10.3390/molecules23071584
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Importance score curve and the performances of k-nearest neighbor (KNN), support vector machine (SVM), naïve Bayes (NB), and random forest (RF) classifiers. (a) The ranked scores were calculated from five AA factors using the random forest method. The x and y coordinates denote the total length of the 11 protein alignments and the importance scores, respectively. The cutoff value (9) is indicated by the thin horizontal line. (b) Performances of the four classifiers were evaluated from 100 repeats of 10-fold cross-validation. The area under the curve (AUC) ranges from 0 to 1.
Scores for the 183 signature amino acids of avian influenza viruses (AIVs).
| Num | Pro 1 | Pos 2 | Score | Num | Pro | Pos | Score | Num | Pro | Pos | Score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PB2 | 389 | 11.95 | 62 | HA | 176 | 13.61 | 123 | NA | 65 | 10.98 |
| 2 | PB2 | 478 | 9.81 | 63 | HA | 179 | 10.08 | 124 | NA | 66 | 9.93 |
| 3 | PB2 | 598 | 17.36 | 64 | HA | 185 | 14.73 | 125 | NA | 72 | 10.96 |
| 4 | PB2 | 627 | 9.83 | 65 | HA | 189 | 14.55 | 126 | NA | 79 | 11.38 |
| 5 | PB2 | 648 | 15.55 | 66 | HA | 207 | 9.49 | 127 | NA | 85 | 9.57 |
| 6 | PB2 | 676 | 9.94 | 67 | HA | 211 | 11.15 | 128 | NA | 88 | 10.13 |
| 7 | PB1 | 14 | 19.16 | 68 | HA | 213 | 11.40 | 129 | NA | 100 | 11.34 |
| 8 | PB1 | 48 | 18.13 | 69 | HA | 216 | 12.17 | 130 | NA | 187 | 10.48 |
| 9 | PB1 | 113 | 18.58 | 70 | HA | 221 | 10.57 | 131 | NA | 205 | 9.62 |
| 10 | PB1 | 149 | 11.09 | 71 | HA | 222 | 9.02 | 132 | NA | 233 | 10.13 |
| 11 | PB1 | 257 | 13.74 | 72 | HA | 240 | 17.36 | 133 | NA | 249 | 9.05 |
| 12 | PB1 | 383 | 12.14 | 73 | HA | 251 | 16.26 | 134 | NA | 257 | 17.24 |
| 13 | PB1 | 384 | 9.34 | 74 | HA | 266 | 10.96 | 135 | NA | 265 | 9.29 |
| 14 | PB1 | 387 | 11.50 | 75 | HA | 273 | 12.53 | 136 | NA | 285 | 10.46 |
| 15 | PB1 | 525 | 9.95 | 76 | HA | 274 | 9.23 | 137 | NA | 287 | 10.65 |
| 16 | PB1 | 573 | 13.38 | 77 | HA | 275 | 9.38 | 138 | NA | 288 | 10.28 |
| 17 | PB1 | 628 | 9.59 | 78 | HA | 289 | 10.36 | 139 | NA | 333 | 10.07 |
| 18 | PB1-F2 | 4 | 9.38 | 79 | HA | 290 | 11.74 | 140 | NA | 338 | 9.02 |
| 19 | PB1-F2 | 26 | 9.24 | 80 | HA | 297 | 10.48 | 141 | NA | 347 | 9.82 |
| 20 | PB1-F2 | 48 | 13.50 | 81 | HA | 315 | 11.98 | 142 | NA | 359 | 10.08 |
| 21 | PB1-F2 | 50 | 11.81 | 82 | HA | 323 | 13.04 | 143 | NA | 368 | 11.05 |
| 22 | PB1-F2 | 57 | 16.85 | 83 | HA | 327 | 12.84 | 144 | NA | 369 | 10.82 |
| 23 | PB1-F2 | 77 | 11.29 | 84 | HA | 327 | 16.23 | 145 | NA | 399 | 11.71 |
| 24 | PA | 37 | 18.74 | 85 | HA | 327 | 19.25 | 146 | NA | 415 | 9.43 |
| 25 | PA | 61 | 12.34 | 86 | HA | 327 | 10.41 | 147 | NA | 416 | 13.74 |
| 26 | PA | 63 | 9.70 | 87 | HA | 328 | 16.24 | 148 | NA | 418 | 9.09 |
| 27 | PA | 129 | 9.34 | 88 | HA | 377 | 13.91 | 149 | NA | 445 | 12.13 |
| 28 | PA | 337 | 11.25 | 89 | HA | 397 | 16.18 | 150 | NA | 468 | 9.66 |
| 29 | PA | 356 | 12.77 | 90 | HA | 407 | 9.49 | 151 | M1 | 15 | 9.79 |
| 30 | PA | 367 | 14.56 | 91 | HA | 431 | 13.52 | 152 | M1 | 27 | 12.16 |
| 31 | PA | 405 | 10.01 | 92 | HA | 492 | 9.49 | 153 | M1 | 37 | 14.66 |
| 32 | PA | 554 | 14.67 | 93 | HA | 495 | 11.15 | 154 | M1 | 46 | 14.96 |
| 33 | PA | 607 | 11.97 | 94 | HA | 496 | 10.62 | 155 | M1 | 101 | 13.28 |
| 34 | PA | 684 | 12.20 | 95 | HA | 500 | 11.88 | 156 | M1 | 140 | 12.40 |
| 35 | PA | 712 | 9.25 | 96 | HA | 503 | 12.76 | 157 | M1 | 142 | 11.31 |
| 36 | HA | 40 | 9.42 | 97 | HA | 526 | 11.91 | 158 | M1 | 166 | 17.35 |
| 37 | HA | 42 | 9.21 | 98 | HA | 530 | 11.26 | 159 | M1 | 205 | 11.09 |
| 38 | HA | 45 | 11.92 | 99 | HA | 531 | 11.67 | 160 | M1 | 219 | 13.18 |
| 39 | HA | 46 | 16.27 | 100 | HA | 534 | 12.77 | 161 | M1 | 224 | 23.52 |
| 40 | HA | 53 | 9.87 | 101 | NP | 34 | 17.45 | 162 | M1 | 232 | 14.80 |
| 41 | HA | 57 | 9.42 | 102 | NP | 77 | 12.39 | 163 | M1 | 242 | 19.59 |
| 42 | HA | 65 | 10.99 | 103 | NP | 105 | 10.61 | 164 | M1 | 248 | 11.25 |
| 43 | HA | 66 | 11.13 | 104 | NP | 373 | 14.73 | 165 | M2 | 13 | 13.66 |
| 44 | HA | 79 | 12.71 | 105 | NP | 377 | 21.88 | 166 | M2 | 21 | 10.53 |
| 45 | HA | 81 | 12.03 | 106 | NP | 482 | 19.71 | 167 | M2 | 97 | 15.79 |
| 46 | HA | 84 | 10.27 | 107 | NA | 19 | 9.20 | 168 | NS1 | 77 | 10.59 |
| 47 | HA | 91 | 17.33 | 108 | NA | 23 | 11.02 | 169 | NS1 | 80 | 12.48 |
| 48 | HA | 96 | 14.98 | 109 | NA | 37 | 9.57 | 170 | NS1 | 81 | 12.55 |
| 49 | HA | 102 | 9.04 | 110 | NA | 41 | 11.30 | 171 | NS1 | 82 | 12.01 |
| 50 | HA | 112 | 12.67 | 111 | NA | 42 | 9.33 | 172 | NS1 | 83 | 14.52 |
| 51 | HA | 114 | 19.46 | 112 | NA | 47 | 10.12 | 173 | NS1 | 84 | 10.21 |
| 52 | HA | 115 | 9.66 | 113 | NA | 48 | 11.23 | 174 | NS1 | 172 | 14.21 |
| 53 | HA | 121 | 10.42 | 114 | NA | 49 | 10.85 | 175 | NS1 | 179 | 11.18 |
| 54 | HA | 124 | 10.28 | 115 | NA | 50 | 9.14 | 176 | NS1 | 197 | 9.32 |
| 55 | HA | 131 | 12.31 | 116 | NA | 52 | 12.38 | 177 | NS1 | 212 | 14.19 |
| 56 | HA | 142 | 12.01 | 117 | NA | 52 | 10.34 | 178 | NEP | 14 | 13.01 |
| 57 | HA | 163 | 10.07 | 118 | NA | 52 | 9.75 | 179 | NEP | 22 | 15.38 |
| 58 | HA | 164 | 9.03 | 119 | NA | 53 | 9.03 | 180 | NEP | 40 | 10.28 |
| 59 | HA | 167 | 14.22 | 120 | NA | 58 | 11.05 | 181 | NEP | 60 | 9.17 |
| 60 | HA | 173 | 12.81 | 121 | NA | 60 | 9.34 | 182 | NEP | 100 | 10.58 |
| 61 | HA | 174 | 10.16 | 122 | NA | 63 | 9.44 | 183 | NEP | 115 | 11.10 |
1 Viral protein; 2 Position of amino acid residue as H3 subtype numbering.
Figure 2Contributions of AA factors and different mutation sets. (a) Performance of SVM classifier for different combinations of the five AA factors. The x and y coordinates denote the 31 combination patterns and the AUC values (from 0 to 1), respectively. Along the x axis, ‘13’ denotes that the set of 183 amino acid residues was transformed using AA Factor 1 and AA Factor 3 together, for example. (b) Contributions of mutation positions for different cutoff values (range 9–20). The y coordinate shows the AUC values.
Figure 3Minimal amino acid set for predicting AIVs. (a) Contributions of reduced mutation position sets. The x and y coordinates denote the cutoff (range 9–20) and the AUC values (range 0–1), respectively. (b) Profiles of 50 signature positions from human-origin (top) and avian-origin (bottom) AIVs. (c) Three patterns of human-origin AIVs clustered by the multidimensional scaling (MDS) method.
Figure 4Flowchart of methods used in this paper. (a) High-quality dataset construction; (b) Machine learning algorism.