| Literature DB >> 34205932 |
Matthieu Van Tilbeurgh1, Katia Lemdani1, Anne-Sophie Beignon1, Catherine Chapon1, Nicolas Tchitchek2, Lina Cheraitia1, Ernesto Marcos Lopez1, Quentin Pascal1, Roger Le Grand1, Pauline Maisonnasse1, Caroline Manet1.
Abstract
Vaccines represent one of the major advances of modern medicine. Despite the many successes of vaccination, continuous efforts to design new vaccines are needed to fight "old" pandemics, such as tuberculosis and malaria, as well as emerging pathogens, such as Zika virus and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Vaccination aims at reaching sterilizing immunity, however assessing vaccine efficacy is still challenging and underscores the need for a better understanding of immune protective responses. Identifying reliable predictive markers of immunogenicity can help to select and develop promising vaccine candidates during early preclinical studies and can lead to improved, personalized, vaccination strategies. A systems biology approach is increasingly being adopted to address these major challenges using multiple high-dimensional technologies combined with in silico models. Although the goal is to develop predictive models of vaccine efficacy in humans, applying this approach to animal models empowers basic and translational vaccine research. In this review, we provide an overview of vaccine immune signatures in preclinical models, as well as in target human populations. We also discuss high-throughput technologies used to probe vaccine-induced responses, along with data analysis and computational methodologies applied to the predictive modeling of vaccine efficacy.Entities:
Keywords: high-throughput technologies; in vivo imaging; machine learning; preclinical models; predictive biomarkers; systems immunology; unsupervised analyses; vaccine signatures; vaccines
Year: 2021 PMID: 34205932 PMCID: PMC8226531 DOI: 10.3390/vaccines9060579
Source DB: PubMed Journal: Vaccines (Basel) ISSN: 2076-393X
Figure 1Vaccine efficacy and safety are determined by interactions between innate and adaptive immunity. These interactions are shaped by host factors and can be orientated by vaccine properties.
Figure 2From individuals to single cells: integrating multi-level data into comprehensive vaccine signatures. Host factors and vaccine properties are important determinants of immune responses. Variations of these determinants, such as genetic polymorphisms, age, host microbiome or immunization procedure, thus condition the definition of vaccine signatures. Systems immunology enables the identification of biomarkers of vaccine responses at multiple scales, from whole-body to cellular factors. Diverse high-throughput technologies, including in vivo imaging, allow the characterization of vaccine immune signatures through various applications, such as immune-cell tracking, cell immunophenotyping, and multiplex profiling. Combining and integrating data at different scales will be of great value in identifying extensive vaccine immune signatures. (a) Positron emission tomography-computed tomography (PET-CT) imaging of the YF preM mRNA vaccine in NHPs [41]. (b) Near-infrared fluorescence (NIR) imaging to follow an anti-Langerin-HIVGag fusion vaccine from the injection site to the draining lymph node [42]. (c) Magnetic resonance imaging (MRI) of a DC-based vaccine in the lymph node [43]. (d) In vivo tracking of Langerhans cells within the skin by fibered confocal fluorescence microscopy (FCFM) [44]. (e) Tracking of fluorescently labeled HIV-1 envelope glycoprotein trimers in lymph nodes by immunohistofluorescence (IHF) [45].
Principle, advantages and drawbacks of common machine learning algorithms.
| Machine Learning Algorithm | Principle | Advantages | Drawbacks |
|---|---|---|---|
| Linear regression | It assumes a linear relationship between input variables and output and thus, attempts to model this relationship by fitting a linear equation to the observed data |
Simplicity Ease of implementation |
It assumes that the input variables are independent It risks generating biased models due to oversimplification |
| Linear discriminant analysis (LDA) | It is used to identify to which class samples belong to, certain statistical properties of the data are first calculated and then substituted into the LDA equation. The statistical properties consist of the mean and variance for the case of a single input and the means and covariance matrix for multiple inputs. |
Simplicity Robust and interpretable classification results |
Does not perform well when the discriminant information is not present in the mean It cannot be applied to non-linear problems |
| Random Forest | It builds a number of decision trees on bootstrapped training sets and considers a random sample of m predictors to be split candidates from the full set of p predictors to overcome the problem of high variance. Therefore, on average, the strong predictor is not considered and other predictors have a better chance. This process can be thought of as decorrelating of the trees, thereby making the average of the resulting trees less variable and hence more accurate and reliable. |
Reduced variation. Accurate and reliable It works well for both classification and regression problems |
It requires considerable computational power and time for training It suffers from interpretability |
| Support vector machine | It converts a non-linear separable problem by transforming it onto another higher dimensional space and thus, the problem becomes linearly separable. This is accomplished using various types of so-called kernel functions. Then, classification is performed by finding the hyperplane that well separates the classes of samples. |
It can solve any complex problem with the appropriate kernel function Less risk of overfitting |
Choosing the appropriate kernel function is not easy It does not work well with large or noisy datasets |
| Discriminant analysis via mixed integer programming (DAMIP) | It is a classification model based on a very powerful supervised-learning approach used primarily in the biomedical field. It is a discrete support vector machine coupled with a powerful embedded feature-selection module [ |
It reduces noise and errors. It applies constraints that result in superior classification accuracy Universally consistent. Handles well imbalanced data |
This algorithm is mainly used in the biomedical field, little is known about its drawbacks in literature |
Machine learning methods to predict vaccine immunogenicity and efficacy. Different machine learning algorithms can be used. The quality of the model needs to be evaluated, and there are different metrics to assess a model performance, such as accuracy (defined as the number of correct predictions divided by the total number of input data), Area Under the Receiver Operator Characteristic curve (AUROC) or Root Mean Squared Error for regressions. It depends on the machine learning method itself. (Ab, antibody; ClaNC, classification to nearest centroid; DAMIP, discriminant analysis via mixed integer programming; HAI, hemagglutination-inhibition; CHMI, Controlled Human Malaria Infection; * accuracy except otherwise mentioned).
| Vaccine | Vaccinees | Predicted Responses | Predictors | Machine Learning Method | Performance * | Reference |
|---|---|---|---|---|---|---|
| Yellow fever vaccine (YF-17D) | Healthy adults | The magnitude of the activated CD8+ T cell and neutralizing Ab responses | Early blood transcriptional signatures | ClaNC and DAMIP | Up to 90% and 100% respectively | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) | Patients 50–89 years old suffering from multiple chronic medical conditions | The magnitude of plasma HAI Ab response | Baseline signatures among 26 input continuous or categorical variables inc. previous vaccination, low grade chronic inflammation, chronic infections, blood cell counts | Neural network (multilayer perceptron (MLP), radial-basis function network (RBFN) and probabilistic network (PNN)) and Logistic regression | 72.5% of average hit rate across 10 samples | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) | Healthy adults | The magnitude of plasma HAI Ab response | Early blood transcriptional signatures | DAMIP | Up to 90% | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) | Healthy adults, inc. young (20–30 years) and older subjects (60 to 89 years) | The magnitude of plasma HAI Ab response | Baseline blood transcriptional, cytokines and cell populations signatures | Logistic regression | 84% | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) and pandemic H1N1 (pH1N1) vaccine | Healthy adults | The magnitude of the Ab response | Baseline HAI titer, blood cell populations, transcripts and pathways signatures | Diagonal linear discriminant analysis (for cell frequency data and when cell frequency and pathway status were combined); or partial least square (for data dimension reduction due to the large number of genes) followed by linear discriminant analysis (PLS-LDA) for transcript data alone | 0.86 of AUROC | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) over 5 seasons | Human adults, inc. elderlies (>65 years) | The magnitude of plasma HAI Ab response | Early blood transcriptional signatures | DAMIP and artificial neural network classifier | >80% | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) | Healthy adults (50 to 74 years) | The magnitude of the B-cell ELISPOT and plasma HAI Ab responses | Early blood cell composition, mRNA-Seq, and DNA methylation signatures | The ensemble learner (inc. Generalized linear models, Recursive Partitioning, and Regression Trees), and random forest models | 0.64–0.79 of AUROC | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) | Healthy adults | The magnitude of plasma HAI Ab response | Baseline HAI titer and blood transcriptional signatures | Gaussian Mixture Model (GMM) | R2 = 0.64 for the correlation between observed and | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) | Healthy adults | The magnitude of the Ab response | Early blood transcriptional signatures | Logistic Multiple Network-constrained Regression | 69% | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) over 8 seasons | Healthy adults | The magnitude of the specific Ab response | Baseline blood cell populations signatures | 128 machine learning algorithms suitable for classification using Sequential Iterative Modeling “OverNight” (SIMON), inc. Diagonal Discriminant Analysis, Partial Least Squares, Linear Discriminant Analysis, Logic Regression, Neural Network, Random Forest | Up to 0.92 of AUROC | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) given transcutaneously, intradermally or intramuscularly | Healthy adults | The magnitude of the specific T CD8+ and Ab responses | Early blood transcriptional and serum cytokines signatures | Logistic regression | 0.93 to 0.96 of AUROC | [ |
| Seasonal Trivalent Inactivated influenza Vaccine (TIV) and 23-valent pneumococcal polysaccharide vaccine | Old patients (>65 years) with chronic kidney disease with or without non-dialysis | The magnitude of the HAI Ab and anti-PnPS IgG responses | Baseline signatures among 30 input continuous or categorical variables inc. previous vaccinations, low grade chronic inflammation, chronic infections, blood cell counts | Multivariable linear regression model | [ | |
| RTS,S malaria vaccine | Healthy adults | The protection against CHMI | Early blood transcriptional signatures | DAMIP | >80% | [ |
| Candidate malaria vaccine composed of a Self-Assembling Protein Nanoparticles presenting the malarial circumsporozoite protein (CSP) adjuvanted with three different liposomal formulations: liposome plus Alum, liposome plus QS21, or both | Rhesus macaques | Adjuvant condition | Vaccine-induced immune response signatures among many variables inc. serology, fluorospot, ICS from blood, liver, LN and spleen | Random forest followed by Linear regression analysis | 92% | [ |
| Live-attenuated varicella zoster virus (VZV) vaccine | Healthy adults, inc. younger (25–40 years) and older (60–79 years) | The magnitude of the specific T and IgG responses | Early blood transcriptional, metabolite clusters, cytokines, and cell populations signatures | Multivariate regression model (Partial least square) | [ | |
| Monovalent oral polio vaccine type 3 (mOPV3) | Infants aged 6–11 months | Seroconversion or shedding of vaccine virus as a marker of vaccine “take” | Baseline enteric pathogens blood cell populations, and plasma cytokines signatures | Random forest | 58% | [ |
| Two distinct live attenuated Tularemia vaccine administered by scarification | Healthy humans | The magnitude of the specific Ab and activated CD4 and CD8 T cell responses | Early blood transcriptional signatures | Logistic regression | 26% of mean misclassification error | [ |
| rVSV-ZEBOV | Healthy adults | The magnitude of the Ab response | Early blood transcriptional, plasma cytokine and cell populations signatures | Sparse partial least-squares followed by multivariable linear regression | 0.77 of root square residuals leave-one-out explaining 55% of the variability | [ |
| DNA/rAd5 HIV-1 preventive candidate vaccine | Healthy adults | HIV infection | Magnitude and quality of CD4 and CD8 T cells | PCA followed by Cox proportional hazards regression model, and Logistic regression with lasso | Up to 0.75 of AUROC | [ |
| Seven preventive HIV-1 vaccine regimens (inc. DNA, NYVAC, ALVAC, MVA, AIDSVAX) | Healthy adults | The magnitude of long-term immune responses | Baseline demographic variables and peak immune responses | Regularized random forest and linear regression models | R = 0.91 for the correlation between observed andpredicted data | [ |
| 41 different vaccine vectors all expressing the same antigen | Mice | The quality of late T-cell responses | Early transcriptome of dendritic cells | Random forest | Up to 98% | [ |