| Literature DB >> 32932748 |
Majid Forghani1, Michael Khachay1.
Abstract
Evaluation of the antigenic similarity degree between the strains of the influenza virus is highly important for vaccine production. The conventional method used to measure such a degree is related to performing the immunological assays of hemagglutinin inhibition. Namely, the antigenic distance between two strains is calculated on the basis of HI assays. Usually, such distances are visualized by using some kind of antigenic cartography method. The known drawback of the HI assay is that it is rather time-consuming and expensive. In this paper, we propose a novel approach for antigenic distance approximation based on deep learning in the feature spaces induced by hemagglutinin protein sequences and Convolutional Neural Networks (CNNs). To apply a CNN to compare the protein sequences, we utilize the encoding based on the physical and chemical characteristics of amino acids. By varying (hyper)parameters of the CNN architecture design, we find the most robust network. Further, we provide insight into the relationship between approximated antigenic distance and antigenicity by evaluating the network on the HI assay database for the H1N1 subtype. The results indicate that the best-trained network gives a high-precision approximation for the ground-truth antigenic distances, and can be used as a good exploratory tool in practical tasks.Entities:
Keywords: antigenic distance; convolutional neural network; evolution; influenza; vaccine
Year: 2020 PMID: 32932748 PMCID: PMC7551508 DOI: 10.3390/v12091019
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1General diagram of the proposed research.
Figure 2Each HI assay entry includes identifiers of the test and reference viruses, date of experiment, and the measured titer.
Antigenic and primary sialic acid receptor-binding sub-domains in HA1, taken from [53].
| Epitope Name | Sub-Domain |
|---|---|
| antigenic site Ca | 137, 138, 139, 140, 141, 142, 166, 167, 168, 169, |
| antigenic site Cb | 69, 70, 71, 72, 73, 74 |
| antigenic site Sa | 124, 125, 153, 154, 155, 156, 157, 159, 160, 161, 162, 163, 164 |
| antigenic site Sb | 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195 |
| receptor-binding site | 94, 131, 133, 150, 152, 180, 187, 191, 223, 225 |
Figure 3An example of the AAindex1 entry representing the hydrophobicity index. The values assigned to amino acids are highlighted in pink.
Figure 4Variance ratios explained by the first 11 factors obtained with application the PCA to AAindex1 database. Total explained variance is about 91%.
Figure 5The network input tensor represents the HA1 amino acid sequence of test and reference viruses encoded by 11 synthetic indices from AAindex1.
Figure 6The layers used in the examined networks.
Figure 7Common architecture of the tested networks, M1-M32.
Hyperparameters of the examined networks M1–M32.
| Model Name | Number of | Number of | Kernel Size | Total Number |
|---|---|---|---|---|
| M1 | 1 | 32 |
| 109 K |
| M2 | 1 | 32 |
| 72 K |
| M3 | 1 | 64 |
| 428 K |
| M4 | 1 | 64 |
| 274 K |
| M5 | 1 | 128 |
| 1.7 M |
| M6 | 1 | 128 |
| 1.1 M |
| M7 | 1 | 256 |
| 6.7 M |
| M8 | 1 | 256 |
| 4.2 M |
| M9 | 2 | 32 | 29 K | |
| M10 | 2 | 32 | 24 K | |
| M11 | 2 | 64 | 104 K | |
| M12 | 2 | 64 | 81 K | |
| M13 | 2 | 128 | 397 K | |
| M14 | 2 | 128 | 293 K | |
| M15 | 2 | 256 | 1.5 M | |
| M16 | 2 | 256 | 1.1 M | |
| M17 | 3 | 32 | 19 K | |
| M18 | 3 | 32 | 23 K | |
| M19 | 3 | 64 | 67 K | |
| M20 | 3 | 64 | 77 K | |
| M21 | 3 | 128 | 250 K | |
| M22 | 3 | 128 | 277 K | |
| M23 | 3 | 256 | 957 K | |
| M24 | 3 | 256 | 1 M | |
| M25 | 4 | 32 | 19 K | |
| M26 | 4 | 32 | 26 K | |
| M27 | 4 | 64 | 67 K | |
| M28 | 4 | 64 | 89 K | |
| M29 | 4 | 128 | 249 K | |
| M30 | 4 | 128 | 326 K | |
| M31 | 4 | 256 | 957 K | |
| M32 | 4 | 256 | 1.2 M |
Figure 8Architecture of SqueezeNet used as a baseline network.
Figure 9Experiment , where all models were trained on the unrestricted prehistory.
Figure 10Experiment : the models were trained on a five-year prehistory.
Figure 11Experiment : the models were trained on a four-year prehistory.
Figure 12Experiment : the models were trained on a three-year prehistory.
Top Five models for the temporal experiments. Average MAE are highlighted.
| Model Name | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | Mean | STD |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Experiment | |||||||||||
| M23 | 0.965 | 0.719 | 0.939 | 0.900 | 0.673 | 0.943 | 1.224 | 1.020 | 1.280 |
| 0.165 |
| M11 | 1.030 | 0.713 | 0.961 | 0.910 | 0.633 | 0.900 | 1.299 | 1.013 | 1.097 |
| 0.198 |
| M15 | 0.947 | 0.818 | 0.999 | 0.950 | 0.655 | 0.967 | 1.122 | 1.068 | 1.070 |
| 0.143 |
| M13 | 1.027 | 0.825 | 0.971 | 0.948 | 0.615 | 0.935 | 1.241 | 1.034 | 1.065 |
| 0.172 |
| M21 | 0.945 | 0.870 | 1.046 | 0.876 | 0.645 | 1.084 | 1.187 | 1.079 | 0.994 |
| 0.159 |
| SqN | 1.007 | 0.985 | 0.955 | 0.978 | 0.967 | 1.135 | 1.413 | 1.140 | 1.141 | 1.080 | 0.139 |
| Experiment | |||||||||||
| M23 | 0.890 | 0.836 | 0.964 | 0.924 | 0.657 | 1.039 | 1.186 | 1.092 | 1.134 |
| 0.165 |
| M13 | 1.032 | 0.887 | 1.063 | 0.919 | 0.656 | 0.941 | 1.135 | 1.074 | 1.080 |
| 0.146 |
| M10 | 1.004 | 1.034 | 1.008 | 0.962 | 0.528 | 0.943 | 1.197 | 1.119 | 1.005 |
| 0.186 |
| M15 | 0.962 | 0.975 | 1.034 | 0.911 | 0.633 | 0.990 | 1.170 | 1.082 | 1.127 |
| 0.157 |
| SqN | 0.870 | 0.830 | 0.994 | 0.985 | 0.590 | 1.048 | 1.447 | 1.061 | 1.128 |
| 0.220 |
| Experiment | |||||||||||
| SqN | 0.877 | 1.110 | 0.949 | 0.887 | 0.618 | 0.882 | 1.330 | 1.133 | 1.038 |
| 0.191 |
| M24 | 0.992 | 0.798 | 0.975 | 0.921 | 0.893 | 1.054 | 1.519 | 1.077 | 1.129 | 1.040 | 0.206 |
| M10 | 0.992 | 0.986 | 1.102 | 0.926 | 0.644 | 1.090 | 1.556 | 1.066 | 1.025 | 1.043 | 0.237 |
| M16 | 1.029 | 0.780 | 1.067 | 0.933 | 0.871 | 1.054 | 1.465 | 1.105 | 1.147 | 1.050 | 0.195 |
| M15 | 1.003 | 0.834 | 0.957 | 0.944 | 0.771 | 0.954 | 1.796 | 1.090 | 1.143 | 1.055 | 0.301 |
| Experiment | |||||||||||
| SqN | 1.008 | 0.961 | 0.982 | 0.978 | 0.631 | 1.166 | 1.549 | 1.251 | 1.084 | 1.068 | 0.235 |
| M16 | 0.945 | 0.748 | 0.985 | 0.934 | 0.850 | 1.281 | 1.618 | 1.149 | 1.125 | 1.070 | 0.262 |
| M24 | 0.950 | 0.830 | 0.959 | 0.942 | 1.044 | 1.301 | 1.476 | 1.067 | 1.072 | 1.071 | 0.200 |
| M23 | 0.924 | 0.807 | 0.928 | 0.945 | 0.869 | 1.290 | 1.671 | 1.114 | 1.166 | 1.079 | 0.271 |
| M9 | 1.015 | 0.813 | 0.102 | 0.959 | 0.643 | 0.319 | 1.788 | 1.080 | 1.025 | 1.085 | 0.323 |
Figure 13The results of 10-fold cross-validation for models M23 and SqueezeNet.
Cross-validation results in terms of averaged Mean Absolute Error.
| Model Name | Average MAE | STD |
|---|---|---|
| Titer | ||
| M23 | 0.58 | 0.020 |
| SqN | 0.627 | 0.024 |
| Virus | ||
| M23 | 0.871 | 0.154 |
| SqN | 0.895 | 0.154 |
Figure 14Titer: linear regression.
Figure 15Virus: linear regression.