| Literature DB >> 33568388 |
Christina Adamichou1, Irini Genitsaridi1, Dionysis Nikolopoulos2, Myrto Nikoloudaki1, Argyro Repa1, Alessandra Bortoluzzi3, Antonis Fanouriakis2,4, Prodromos Sidiropoulos1,5, Dimitrios T Boumpas2,6, George K Bertsias7,5.
Abstract
OBJECTIVES: Diagnostic reasoning in systemic lupus erythematosus (SLE) is a complex process reflecting the probability of disease at a given timepoint against competing diagnoses. We applied machine learning in well-characterised patient data sets to develop an algorithm that can aid SLE diagnosis.Entities:
Keywords: autoantibodies; autoimmune diseases; lupus erythematosus; systemic
Mesh:
Substances:
Year: 2021 PMID: 33568388 PMCID: PMC8142436 DOI: 10.1136/annrheumdis-2020-219069
Source DB: PubMed Journal: Ann Rheum Dis ISSN: 0003-4967 Impact factor: 19.103
Figure 1Schematic overview of the methodology for developing a machine learning-based diagnostic model for SLE. We used a discovery cohort of randomly selected 802 adults with SLE or control rheumatologic diseases (1:1 ratio) to prepare 20 clinically selected panels of classification criteria items (both in their original version and deconvoluted into subitems in the case of composite items) and non-criteria features. Two machine learning methods were applied for feature selection and model construction for each panel: (A) Random Forests (RF) and (B) Least Absolute Shrinkage and Selection Operator (LASSO) followed by logistic regression (LASSO-LR). The best model (highest accuracy the in 10-fold cross-validation process) was further tested in an independent dataset of 512 patients with systemic lupus erythematosus (SLE) and 143 disease controls (validation cohort). AUC, area under the curve; CV, cross-validation; ROC, receiver operating curve.
Figure 2A Least Absolute Shrinkage and Selection Operator-logistic regression (LASSO-LR) model shows high discriminating capacity for SLE against competing rheumatological diseases. (A) A LASSO-LR model comprising of 14 clinical and serological features showed the highest accuracy for SLE in the 10-fold cross-validation runs from the discovery cohort. The plot illustrates the features associated with increased likelihood for SLE as compared with control rheumatological diseases along with the corresponding effect sizes (OR; 95% CI, p value). All model parameters are treated as dichotomous (ie, present=1, absent=0) in the LR equation as follows: F(x)=Intercept + (1.80×mucosal ulcers 1) + (2.96×synovitis 1) + (1.83×serositis 1) + (3.66×immunologic disorder 2) + (4.42×antinuclear antibodies (ANA)3) + (2.13×alopecia 4) + (2.17×neurologic disorder 4) + (4.25×malar and/or maculopapular rash 3) + (2.58×subacute cutaneous lupus erythematosus (SCLE) and/or discoid lupus erythematosus (DLE)3) + (1.82×leucopenia 3) + (6.46×thrombocytopenia and/or autoimmune haemolytic anaemia (AIHA)3) + (6.63×low C3 and C4 3) – (1.45×interstitial lung disease (ILD) 5); 1defined according to the ACR 1997 classification criteria, 2defined according to the ACR 1997 criteria modified to include also positive anti-β2 glycoprotein IgG or IgM antibodies, 3defined according to the EULAR/ACR 2019 classification criteria, 4defined according to the SLICC 2012 classification criteria, 5see online supplemental table S2) for definition. (B) The LASSO-LR model presented in (A) was further evaluated in an external (validation) cohort of patients with 512 patients with SLE and 143 disease controls. The graph represents the receiver operating curve with a calculated area under the curve of 0.981 indicating an excellent capacity of the model to discriminate SLE versus disease controls.
Figure 3The Least Absolute Shrinkage and Selection Operator-logistic regression (LASSO-LR) model can generate SLE risk probabilities, which correspond to distinct diagnostic certainty levels and correlate with disease outcomes. (A) Bar plot representation of the fraction of patients with SLE patients and disease controls (validation cohort) according to increasing bins of predicted SLE risk probabilities (0%–14%, 15%–43%, 44%–86%, 87%–100%) calculated by the LASSO-LR model shown in figure 2. Superimposed are the diagnostic accuracies (blue-coloured) corresponding to the rates of correct classification of disease controls against patients with SLE in the lower two probability bins (0%–14%, 15%–43%), and of patients with SLE against disease controls in the higher two probability bins (44%–86%, 87%–100%). Results are averages (±SD) for patient fractions or 95% CI for the accuracy metric) calculated from randomly generated, non-overlapping subsets of patients with SLE (seven subsets each containing 73 or 74 patients) and disease controls (two subsets containing 71 and 72 patients) from the validation cohort. The majority of control (average 80%) and SLE (average 82%) patients belong to the lowest (0%–14%) and the highest (87%–100%) risk probability groups, respectively. In accordance, accuracy was highest in these two extreme risk groups but dropped in the intermediate ones (15%–43%, 44%–86%). (B) Bar plot representation of the relative proportion of SLE and disease controls (validation cohort) within each SLE risk probability bin (0%–14%, 15%–43%, 44%–86%, 87%–100%). Calculations were made from the non-overlapping subsets of patients with SLE and disease controls as outlined in (A). (C) Positive- and negative-likelihood ratios (LRs) (mean, 95% CI) for the diagnosis of SLE against control diagnoses, according to different SLE risk probability thresholds (>14%,>43%,>86%) applied to the discovery cohort. Calculations were made from the non-overlapping subsets of patients with SLE and disease controls as outlined in (A). The >14% threshold had an average LR+5.0 and LR–0.017, which correspond to a moderate increase when tested positive and a large decrease when tested negative in the likelihood for SLE, respectively. (D) Matrices of SLE risk probabilities based on different combinations of features included in the LASSO-LR diagnostic model. In each scenario, the calculated probability fits to one of the four SLE risk groups corresponding to varying diagnostic certainty levels (unlikely SLE: 0%–14%, possible/cannot rule out SLE: 15%–43%, likely SLE: 44%–86%, definite SLE: 87%–100%). (E) Dot plot analysis of the model-generated SLE risk probabilities according to the severity of disease manifestations (defined based on the BILAG system) and organ damage (SLICC/ACR Damage Index (SDI)). Data were generated from the validation cohort patients with SLE (n=512) and are presented as mean (95% CI). The Kruskal-Wallis (non-parametric) analysis of variance test was performed and two-tailed p values are shown. ANA, antinuclear antibodies; RMDs, rheumatic diseases; SCLE, subacute cutaneous lupus erythematosus; SDI, SLICC/ACRdamage index; SLE, systemic lupus erythematosus.
Figure 4The new diagnostic model has high accuracy for systemic lupus erythematosus (SLE) including early and severe disease requiring immunosuppressive or biologic treatment. (A) Confusion matrix of the actual versus predicted cases of patients with SLE (n=512) and disease controls (n=143) in the validation cohort. The LASSO-LR diagnostic model was operated as binary (SLE or not-SLE) by setting the SLE risk probability threshold at ≥50%. Based on the number of true-positive, true-negative, false-positive and false-negative cases, sensitivity, specificity, accuracy, positive- and negative-likelihood ratios are estimated as metrics of the model diagnostic performance. (B) Sensitivity of the LASSO-LR model (operated as binary) for the detection of clinically relevant subsets of SLE including early disease, lupus nephritis, neuropsychiatric lupus, haematological lupus and severe lupus requiring potent immunosuppressive and/or biologic treatment.
A simple scoring system version of the SLE Risk Probability Index*
| Feature | Score |
| Malar rash or maculopapular rash† | 3 |
| Subacute cutaneous lupus erythematosus or discoid lupus erythematosus† | 2 |
| Alopecia‡ | 1.5 |
| Mucosal ulcers§ | 1 |
| Arthritis§ | 2 |
| Serositis§ | 1.5 |
| Leucopenia<4000/μL (at least once)† | 1.5 |
| Thrombocytopenia or autoimmune haemolytic anaemia† | 4.5 |
| Neurological disorder‡ | 1.5 |
| Proteinuria>500 mg/24 hours† | 4.5 |
| ANA† | 3 |
| Low C3 and C4† | 2 |
| Immunological disorder (any of: anti-DNA, anti-Sm, anti-phospholipid antibodies)¶ | 2.5 |
| Interstitial lung disease** | –1 |
| SLE if total score >7†† |
*Apply the model in individuals with clinical suspicion for SLE. Each feature is counted if present (ever) and if not explained by other cause (eg, drug effects, infections, malignant disorders, alternative more likely disease).
†Defined as in Aringer et al.9 10
‡Defined as in Petri et al.8
§Defined as in Hochberg.27
¶Defined as in Hochberg27 modified to include also positive anti-β2 glycoprotein IgG or IgM or IgA antibodies.
**Radiologic features of lung disease suggesting inflammation and fibrosis of the alveoli, distal airways and septal interstitium of the lung, as observed with a high-resolution CT scan of the chest.
††When operated at a threshold (sum of individual scores) of >7 (out of a maximum value 30.5), the sensitivity, specificity and accuracy rates are 94.2%, 94.4% and 94.2%, respectively.
ANA, antinuclear antibodies; SLE, systemic lupus erythematosus.