| Literature DB >> 28484283 |
Yuhua Yao1, Xianhong Li2, Bo Liao3, Li Huang4, Pingan He4, Fayou Wang5, Jiasheng Yang6, Hailiang Sun7, Yulong Zhao8, Jialiang Yang9,10.
Abstract
Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top substitution matrices. We applied JRFR to human H3N2 seasonal influenza data from 1968 to 2003. A 10-fold cross-validation shows that JRFR outperforms other popular methods in predicting antigenic variants. In addition, our results suggest that structure features are most relevant to influenza antigenicity. By restricting the analysis to data involving two adjacent antigenic clusters, we inferred a few key amino acids mutation driving the 11 historical antigenic drift events, pointing to experimentally validated mutations. Finally, we constructed an antigenic cartography of all H3N2 viruses with hemagglutinin (the glycoprotein on the surface of the influenza virus responsible for its binding to host cells) sequence available from NCBI flu database, and showed an overall correspondence and local inconsistency between genetic and antigenic evolution of H3N2 influenza viruses.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28484283 PMCID: PMC5431489 DOI: 10.1038/s41598-017-01699-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A flowchart to illustrate the computational framework in this study.
The top 12 amino acids substitution matrices in predicting influenza antigenicity.
| Accession No | Description | RMSE |
|---|---|---|
| NIEK910102 | Structure-derived correlation matrix 2 | 0.965 |
| NIEK910101 | Structure-derived correlation matrix 1 | 0.967 |
| RIER950101 | Hydrophobicity scoring matrix | 0.968 |
| MIYS930101 | Base-substitution-protein-stability matrix | 0.968 |
| DOSZ010102 | Normalised version of SM_SAUSAGE | 0.970 |
| LUTR910102 | Structure-based comparison table for inside other class | 0.970 |
| AZAE970102 | The substitution matrix derived from spatially conserved motifs | 0.971 |
| BENS940101 | Log-odds scoring matrix collected in 6.4–8.7 PAM | 0.972 |
| TUDE900101 | isomorphicity of replacements | 0.973 |
| AZAE970101 | The single residue substitution matrix from interchanges of spatially neighbouring residues | 0.973 |
| HENS920102 | BLOSUM62 substitution matrix | 0.986 |
| DAYM780301 | Log odds matrix for 250 PAMs | 1.000 |
| Binary | 1 for substitution and 0 for match | 1.098 |
Performances of seven prediction models.
| Main data | Merge data 1 | Merge data 2 | RMSE | Accuracy | Specificity | Sensitivity | MCC |
|---|---|---|---|---|---|---|---|
| NIEK910101 | RUSR970101 | KOSJ950107 | 0.941 | 0.963 | 0.775 | 0.980 | 0.756 |
| RIER950101 | FEND850101 | FEND850101 | 0.943 | 0.964 | 0.777 | 0.981 | 0.758 |
| NIEK910101 | RUSR970101 | None | 0.950 | 0.963 | 0.767 | 0.981 | 0.753 |
| RIER950101 | FEND850101 | None | 0.952 | 0.963 | 0.771 | 0.980 | 0.753 |
| NIEK910102 | RUSR970101 | None | 0.952 | 0.963 | 0.767 | 0.981 | 0.754 |
Figure 2The antigenic map of 253 H3N2 influenza viruses predicted by JRFR. The 11 antigenic clusters HK68, EN72, VI75, TX77, BK79, SI87, BE89, BE92, WU95, SY97, and FU02 are marked by different colors.
Figure 3The genetic (a) and antigenic map (b) for all 1968 non-redundant H3N2 HA1 protein sequences between 1968 and 2014. The viruses are colored by their year of discovery.
Figure 4The importance score of all 329 sites predicted by JRFR. The 5 epitopes A, B, C, D, and E are marked in different colors. The remaining sites are classified as “Others”.
The top 34 antigenic importance sites according to importance scoring in JRFR for H3N2 influenza data.
| Site | Importance Score | Antigenic Domain | Site | Importance Score | Antigenic Domain |
|---|---|---|---|---|---|
| 189 | 5.273 | B | 262 | 3.678 | E |
| 2 | 4.651 | other | 157 | 3.664 | B |
| 159 | 4.645 | B | 193 | 3.646 | B |
| 163 | 4.541 | B | 190 | 3.545 | B |
| 278 | 4.421 | C | 131 | 3.460 | A |
| 156 | 4.417 | B | 62 | 3.453 | E |
| 133 | 4.196 | A | 226 | 3.408 | D |
| 158 | 4.167 | B | 299 | 3.406 | C |
| 145 | 4.133 | A | 137 | 3.383 | A |
| 276 | 4.116 | C | 248 | 3.326 | D |
| 135 | 3.915 | A | 144 | 3.270 | A |
| 196 | 3.899 | B | 197 | 3.189 | B |
| 307 | 3.890 | C | 172 | 3.154 | D |
| 173 | 3.862 | D | 142 | 3.091 | A |
| 94 | 3.859 | E | 217 | 2.998 | D |
| 83 | 3.775 | E | 143 | 2.946 | A |
| 155 | 3.712 | B | 121 | 2.936 | D |
Multiple mutations driving 10 antigenic drifts for H3N2 influenza viruses inferred by JRFR.
| Antigenic drift events | Combination of mutations |
|---|---|
| HK68-EN72 | S193N-G144D |
| EN72-VI75 | N53D-S193D-S145N |
| VI75-TX77 | S137G |
| TX77-BK79 | D144V-N2K |
| BK79-SI87 | Y155H-K189R |
| SI87-BE89 | N145K, N145K-G135E, N145K-N193S |
| BE89-BE92 | K145N-E156K-R189S, K145N-E156K-T262N-R189S, K145N-E156K-S133D-R189S |
| BE92-WU95 | K135T-N145K-L226V, K135T-N145K-N262S |
| WU95-SY97 | V196A-N276K-E158K-K156Q, V196A-N276K-E158K-K156Q-K62E |
| SY97-FU02 | A131T-H155T-V202I |
Figure 5Antigenic cartographies to illustrate the key mutations driving the 4 antigenic drift events including (a) SI87-BE89, (b) BE89-BE92, (c) BE92-WU95, and (d) WU95-SY97.