| Literature DB >> 31543876 |
Annelies Emmaneel1,2, Delfien J Bogaert3,4, Sofie Van Gassen1,2, Simon J Tavernier3,5,6, Melissa Dullaers7, Filomeen Haerynck3,4, Yvan Saeys1,2.
Abstract
Common variable immunodeficiency (CVID) is one of the most frequently diagnosed primary antibody deficiencies (PADs), a group of disorders characterized by a decrease in one or more immunoglobulin (sub)classes and/or impaired antibody responses caused by inborn defects in B cells in the absence of other major immune defects. CVID patients suffer from recurrent infections and disease-related, non-infectious, complications such as autoimmune manifestations, lymphoproliferation, and malignancies. A timely diagnosis is essential for optimal follow-up and treatment. However, CVID is by definition a diagnosis of exclusion, thereby covering a heterogeneous patient population and making it difficult to establish a definite diagnosis. To aid the diagnosis of CVID patients, and distinguish them from other PADs, we developed an automated machine learning pipeline which performs automated diagnosis based on flow cytometric immunophenotyping. Using this pipeline, we analyzed the immunophenotypic profile in a pediatric and adult cohort of 28 patients with CVID, 23 patients with idiopathic primary hypogammaglobulinemia, 21 patients with IgG subclass deficiency, six patients with isolated IgA deficiency, one patient with isolated IgM deficiency, and 100 unrelated healthy controls. Flow cytometry analysis is traditionally done by manual identification of the cell populations of interest. Yet, this approach has severe limitations including subjectivity of the manual gating and bias toward known populations. To overcome these limitations, we here propose an automated computational flow cytometry pipeline that successfully distinguishes CVID phenotypes from other PADs and healthy controls. Compared to the traditional, manual analysis, our pipeline is fully automated, performing automated quality control and data pre-processing, automated population identification (gating) and deriving features from these populations to build a machine learning classifier to distinguish CVID from other PADs and healthy controls. This results in a more reproducible flow cytometry analysis, and improves the diagnosis compared to manual analysis: our pipelines achieve on average a balanced accuracy score of 0.93 (±0.07), whereas using the manually extracted populations, an averaged balanced accuracy score of 0.72 (±0.23) is achieved.Entities:
Keywords: CVID; FlowSOM; PAD; computational pipeline; flow cytometry
Year: 2019 PMID: 31543876 PMCID: PMC6730493 DOI: 10.3389/fimmu.2019.02009
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Left: FlowSOM tree for the PBMCs panel 1. The background coloring indicates the metaclustering. Right: FlowSOM tree were the cells from Healthy control PIDHC011 were mapped onto the original FlowSOM tree for panel 1. The colors of the nodes correspond to the manually gated labels.
Figure 2Top left: FlowSOM tree for the B cell subset panel 2 with expressed CD markers displayed. The background coloring indicates the metaclustering. Top right: FlowSOM tree with expression of immunoglobulins displayed. Bottom: FlowSOM tree where the cells from Healthy control PIDHC011 were mapped onto original FlowSOM tree for panel 2. The colors of the nodes correspond to the manually gated labels.
Figure 3Left: FlowSOM tree for the T cell subset panel 3. The background coloring indicates the metaclustering. Right: FlowSOM tree were the cells from Healthy control PIDHC011 were mapped onto the original FlowSOM tree for panel 3. The colors of the nodes correspond to the manually gated labels.
Figure 4Overview of the balanced accuracy scores of the different classification models (performed with 21-fold cross-validation). Color indicates whether feature selection was applied, shape indicates which classification model was used. Overall, FlowSOM features can clearly improve on features extracted from the manual gating.
Overview of the most frequently misclassified patients.
| PID030 (CVID) | 52 | 45 | No explanation found yet. |
| PID040 (CVID) | 70 | 73 | Syndromic primary immunodeficiency initially presenting with CVID phenotype. |
| PID041 (CVID) | 59 | 63 | Syndromic primary immunodeficiency initially presenting with CVID phenotype. First degree family member of PID040. |
| PID043 (Other PAD) | 0 | 64 | Early loss to follow-up, no information on progression of disease phenotype. |
| PID053 (CVID) | 111 | 110 | No explanation found yet. |
| PID054 (CVID) | 104 | 100 | No explanation found yet. First degree family member of PID053. |
| PID055 (CVID) | 78 | 70 | Presumably secondary CVID after autoimmune—induced subacute liver failure with need of liver transplantation. |
| PID060 (CVID) | 42 | No explanation found yet. | |
| PID257 (CVID) | 38 | No explanation found yet. | |
| PID285 (CVID) | 39 | No explanation found yet. |
All the patients listed in the first column were misclassified by the models in more than 1/3 of the 112 possible model combinations (built with or without a feature selection, with either an SVM or random forest classifier, with one of six different feature sets of FlowSOM or the manually selected populations). The second column depicts the results for all two-class models while the third column shows the results for the three-class models. In the last column, remarks are listed as possible explanations for the frequent misclassification. The red color of a number indicates that that patient was not misclassified in more than 1/3th of the 112 models.
Figure 5(A1-3) t-SNE result of the patient population with manually gated cell populations of the most important features determined by the SVM model for the three panels individually. (B1-3) t-SNE result of the patient population with the total features of FlowSOM of the most important features determined by the SVM model for the three panels individually. The perplexity for both t-SNEs was set to 15. A z-score was applied on the used features first to eliminate age-linked immune changes.
Figure 6Boxplots calculated for the most important features in the support vector machines built for the three-way classification of all FlowSOM features for either panel 1, 2, or 3. The colored points indicate the values on which the boxplots were built. A z-score was applied on the used features first to eliminate age-linked immune changes.
Figure 7Boxplots calculated for the features selected by the feature selection step in the three-class classification model calculated on all FlowSOM features for either panel 1, 2, or 3. This feature selection step is performed before the classification step in the automated models. The first three features selected with the lowest p-value are displayed for both panels. The colored points indicate the values on which the boxplots were built. A z-score was applied on the used features first to eliminate age-linked immune changes.