| Literature DB >> 25521718 |
Christine L P Eng, Joo Chuan Tong, Tin Wee Tan.
Abstract
BACKGROUND: Majority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25521718 PMCID: PMC4290784 DOI: 10.1186/1755-8794-7-S3-S1
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Total number of positive and negative samples for protein datasets and combined dataset.
| Dataset | Training dataset | Testing dataset | ||||
|---|---|---|---|---|---|---|
| Positive samples | Negative samples | Total samples | Positive samples | Negative samples | Total samples | |
| HA | 5449 | 5261 | 10710 | 1344 | 1357 | 2701 |
| M1 | 547 | 908 | 1455 | 135 | 219 | 354 |
| M2 | 644 | 1038 | 1682 | 178 | 268 | 446 |
| NA | 3945 | 4315 | 8260 | 963 | 1051 | 2014 |
| NP | 1148 | 2140 | 3288 | 282 | 537 | 819 |
| NS1 | 1706 | 2940 | 4646 | 418 | 748 | 1166 |
| NS2 | 475 | 1157 | 1632 | 133 | 246 | 379 |
| PA | 2135 | 4067 | 6202 | 573 | 997 | 1570 |
| PB1 | 1995 | 3189 | 5184 | 504 | 797 | 1301 |
| PB1-F2 | 722 | 2206 | 2928 | 167 | 588 | 755 |
| PB2 | 2157 | 3327 | 5484 | 565 | 860 | 1425 |
| Combined | 3272 | 3923 | 7195 | 799 | 989 | 1788 |
Random forest optimized parameters.
| Model | Number of trees | Number of features |
|---|---|---|
| HA | 150 | 21 |
| M1 | 110 | 13 |
| M2 | 140 | 17 |
| NA | 150 | 16 |
| NP | 40 | 15 |
| NS1 | 50 | 20 |
| NS2 | 100 | 14 |
| PA | 60 | 18 |
| PB1 | 40 | 10 |
| PB1-F2 | 150 | 13 |
| PB2 | 40 | 16 |
| Combined | 40 | 22 |
Comparison of machine learning classifiers.
| Classifier | Accuracy | Sensitivity | Specificity | AUC | MCC |
|---|---|---|---|---|---|
| Naïve Bayes | 96.42 | 0.942 | 0.988 | 0.970 | 0.930 |
| kNN | 98.24 | 0.982 | 0.983 | 0.983 | 0.965 |
| SVM | 97.38 | 0.953 | 0.996 | 0.974 | 0.948 |
| ANN | 98.40 | 0.977 | 0.991 | 0.993 | 0.968 |
10-fold cross-validation performance on optimized parameters for prediction models.
| Model | Accuracy | Sensitivity | Specificity | AUC | MCC |
|---|---|---|---|---|---|
| HA | 98.62 | 0.986 | 0.993 | 0.998 | 0.972 |
| M1 | 97.66 | 0.977 | 0.987 | 0.985 | 0.950 |
| M2 | 96.73 | 0.967 | 0.973 | 0.989 | 0.931 |
| NA | 98.35 | 0.984 | 0.991 | 0.996 | 0.967 |
| NP | 97.51 | 0.975 | 0.979 | 0.992 | 0.945 |
| NS1 | 97.48 | 0.975 | 0.981 | 0.992 | 0.946 |
| NS2 | 96.57 | 0.966 | 0.971 | 0.980 | 0.916 |
| PA | 98.21 | 0.982 | 0.992 | 0.995 | 0.960 |
| PB1 | 97.26 | 0.973 | 0.990 | 0.992 | 0.942 |
| PB1-F2 | 97.99 | 0.980 | 0.987 | 0.992 | 0.945 |
| PB2 | 98.29 | 0.983 | 0.992 | 0.995 | 0.964 |
| Combined | 99.72 | 0.997 | 0.999 | 0.999 | 0.994 |
Performance evaluation with separate testing dataset.
| Model | Accuracy | Sensitivity | Specificity | AUC | MCC |
|---|---|---|---|---|---|
| HA | 98.78 | 0.988 | 0.992 | 0.997 | 0.976 |
| M1 | 97.18 | 0.972 | 0.984 | 0.984 | 0.940 |
| M2 | 97.09 | 0.971 | 0.971 | 0.993 | 0.939 |
| NA | 98.56 | 0.986 | 0.987 | 0.998 | 0.971 |
| NP | 97.56 | 0.976 | 0.965 | 0.991 | 0.946 |
| NS1 | 97.86 | 0.979 | 0.976 | 0.994 | 0.953 |
| NS2 | 97.63 | 0.976 | 1.000 | 0.976 | 0.948 |
| PA | 97.52 | 0.975 | 0.991 | 0.995 | 0.947 |
| PB1 | 97.23 | 0.972 | 0.988 | 0.994 | 0.942 |
| PB1-F2 | 98.54 | 0.985 | 0.988 | 0.994 | 0.957 |
| PB2 | 97.89 | 0.979 | 0.991 | 0.996 | 0.956 |
| Combined | 99.83 | 0.998 | 1.000 | 0.998 | 0.997 |
Top amino acid physicochemical properties identified using variable importance feature in random forest.
| Model | Amino acid property | |
|---|---|---|
| HA | Charge | KLEP940101 [ |
| Normalized van der Waals volume | FAUJ880103 [ | |
| Polarizability | CHAM820101 [ | |
| NA | Solvent accessibility | JANJ780102/JAN780103 [ |
| Polarity | GRAR740102 [ | |
| NS1 | Charge | KLEP940101 [ |
| PA | Hydrophobicity | ENGD860101 [ |
| Polarity | GRAR740102 [ | |
| PB1 | Solvent accessibility | JANJ780102/JAN780103 [ |
| PB2 | Charge | KLEP940101 [ |
| Solvent accessibility | JANJ780102/JAN780103 [ | |
Further information on sample strains used in the demonstration of host tropism prediction system.
| Strain | Subtype | Country | Collection Year | Host | |
|---|---|---|---|---|---|
| 1. | A/turkey/England/50-92/1991 | H5N1 | United Kingdom | 1991 | Turkey |
| 2. | A/wild duck/Korea/SH19-50/2010 | H7N9 | South Korea | 2010 | Duck |
| 3. | A/Chicken/Hong Kong/220/97 | H5N1 | Hong Kong | 1997 | Chicken |
| 4. | A/chicken/Shanghai/S1078/2013 | H7N9 | China | 2013 | Chicken |
| 5. | A/Hong Kong/542/97 | H5N1 | Hong Kong | 1997 | Human |
| 6. | A/Shanghai/01/2014 | H7N9 | China | 2014 | Human |
| 7. | A/New York/231/2003 | H1N2 | USA | 2003 | Human |
| 8. | A/Guangdong/ST798/2008 | H3N2 | China | 2008 | Human |
Figure 1Host tropism prediction results for sample strains. The results for four avian strains are shown at the top while the bottom half shows results for four human strains. The prediction results were strung together illustrating an entire influenza A genome with eight segments encoding 11 proteins. The proteins coded by the segment are listed at the bottom of the figure. Each protein prediction is independent and is not influenced by prediction of other proteins. Blue bars represent a prediction of avian by the corresponding protein prediction model while red bars represent a prediction result of human. Grey bars indicate that prediction was not made as the corresponding protein sequence was not available or incomplete. Accurate predictions were made for all 11 proteins for the first two avian strains as well as the final two human strains. However, prediction results for the remaining four strains from the 1997 H5N1 outbreak in Hong Kong and the 2013 H7N9 outbreak in China show mixed predictions of avian and human proteins. The human strains isolated during the two outbreaks showing some of its proteins predicted as avian indicate the source of infection as most likely avian. On the other hand, the avian strains from chickens during the two outbreaks have several proteins that were predicted human and suggest that these proteins could have adapted to human host.