| Literature DB >> 34592765 |
Nathan Radakovich1,2, Manja Meggendorfer3, Luca Malcovati4, C Beau Hilton1,2, Mikkael A Sekeres5, Jacob Shreve6, Yazan Rouphail7, Wencke Walter4, Stephan Hutter4, Anna Galli4, Sara Pozzi4, Chiara Elena4, Eric Padron8, Michael R Savona9,10, Aaron T Gerds1, Sudipto Mukherjee1, Yasunobu Nagata11, Rami S Komrokji8, Babal K Jha11, Claudia Haferlach4, Jaroslaw P Maciejewski11, Torsten Haferlach3, Aziz Nazha1.
Abstract
The differential diagnosis of myeloid malignancies is challenging and subject to interobserver variability. We used clinical and next-generation sequencing (NGS) data to develop a machine learning model for the diagnosis of myeloid malignancies independent of bone marrow biopsy data based on a 3-institution, international cohort of patients. The model achieves high performance, with model interpretations indicating that it relies on factors similar to those used by clinicians. In addition, we describe associations between NGS findings and clinically important phenotypes and introduce the use of machine learning algorithms to elucidate clinicogenomic relationships.Entities:
Mesh:
Year: 2021 PMID: 34592765 PMCID: PMC8579270 DOI: 10.1182/bloodadvances.2021004755
Source DB: PubMed Journal: Blood Adv ISSN: 2473-9529
Cohort demographics, laboratory parameters, and cytogenetic variables
| MDS | MDS-MPN/CMML | MPN | ICUS | CCUS | |
|---|---|---|---|---|---|
| Mean (2.5th-97.5th percentile) | |||||
| Age, y | 69.0 (41.00-85.70) | 70.0 (43.00-85.86) | 62.2 (28.35-83.90) | 56.0 (22.44-84.55) | 68.4 (40.55-85.04) |
| WBC, 109/L | 5.8 (1.29-20.23) | 19.8 (2.27-84.83) | 15.5 (2.99-61.99) | 4.4 (1.80-10.71) | 4.7 (1.80-10.84) |
| Hemoglobin, g/dL | 10.1 (6.80-14.10) | 11.1 (7.00-15.39) | 12.6 (7.19-20.70) | 12.2 (7.10-16.09) | 11.9 (7.57-15.01) |
| Platelets, 1012/L | 183.1 (15.00-650.15) | 182.6 (15.00-735.75) | 362.5 (10.13-1069.30) | 159.8 (19.52-387.32) | 142.5 (18.10-350.50) |
| ANC, 109/L | 3.1 (0.27-12.07) | 9.4 (0.59-37.21) | 10.3 (1.34-42.70) | 2.5 (0.19-7.34) | 2.5 (0.52-7.45) |
| ALC, 109/L | 1.0 (0.04-3.63) | 2.6 (0.44-7.85) | 2.3 (0.31-6.54) | 1.5 (0.46-3.25) | 1.6 (0.45-3.92) |
| AMC, 109/L | 0.3 (0.00-1.50) | 4.8 (0.08-26.28) | 0.9 (0.00-4.40) | 0.4 (0.02-1.08) | 0.5 (0.07-1.38) |
| BM blast, % | 5.0 (0.00-17.00) | 5.5 (0.00-19.00) | 1.5 (0.00-6.70) | 1.6 (0.00-4.50) | 1.8 (0.00-4.60) |
| Peripheral blasts, 109/L | 0.3 (0.00-3.00) | 1.6 (0.00-12.00) | 1.4 (0.00-8.32) | 0.0 (0.00-0.00) | 0.0 (0.00-0.00) |
| Number (%) | |||||
| Female | 1005 (37.26) | 392 (14.53) | 89 (3.30) | 67 (2.48) | 57 (2.11) |
| Normal karyotype | 810 (30.03) | 103 (3.82) | 104 (3.86) | 44 (1.63) | 30 (1.11) |
| Chr 5 abnormality | 135 (5.01) | 6 (0.22) | 4 (0.15) | 0 (0.00) | 0 (0.00) |
| Chr 7 abnormality | 73 (2.71) | 18 (0.67) | 5 (0.19) | 0 (0.00) | 0 (0.00) |
| Complex karyotype | 105 (3.89) | 12 (0.44) | 5 (0.19) | 0 (0.00) | 1 (0.04) |
|
| |||||
| Age, y | reference | 0.043 | 5.04E-12 | 1.27E-17 | 0.948 |
| WBC, 109/L | reference | 5.47E-122 | 3.94E-49 | 0.13 | 0.579 |
| Hemoglobin, g/dL | reference | 7.32E-21 | 7.99E-19 | 3.13E-25 | 2.86E-17 |
| Platelets, 1012/L | reference | 0.245 | 4.60E-19 | 0.727 | 5.13E-01 |
| ANC, 109/L | reference | 2.28E-79 | 2.23E-54 | 0.552 | 5.06E-01 |
| ALC, 109/L | reference | 6.23E-115 | 2.78E-34 | 4.27E-19 | 7.75E-15 |
| AMC, 109/L | reference | 1.77E-194 | 3.03E-34 | 8.28E-15 | 1.58E-13 |
| BM blast % | reference | 0.659 | 2.17E-24 | 1.43E-07 | 2.13E-05 |
| Peripheral blasts, 109/L | reference | 6.35E-54 | 2.58E-43 | 0.08 | 1.09E-01 |
| Female | reference | 0.468 | 2.94E-08 | 0.000987 | 9.69E-01 |
| Normal karyotype | reference | 1.10E-45 | 0.819 | 2.77E-05 | 2.00E-03 |
| Chr 5 abnormality | reference | 3.26E-10 | 0.001 | 0.000666 | 7.00E-03 |
| Chr 7 abnormality | reference | 0.118 | 0.199 | 0.019 | 6.90E-02 |
| Complex karyotype | reference | 2.89E-05 | 0.026 | 0.003 | 6.10E-02 |
ALC, absolute lymphocyte count; AMC, absolute monocyte count; ANC, absolute neutrophil count; BM, bone marrow; Chr, chromosome; WBC , white blood cell.
Figure 1.Cohort genomic characteristics. (A) Top mutated genes in cohort. (B) Mutation frequency by disease subtype. (C) Cohort-wide oncoprint.
Figure 2.Cohort geno-geno and geno-clinical correlations. (A) Coexpression vs exclusivity between mutations by disease subtype. (B) Disease-phenotype correlations. (C) Mutation-phenotype correlations.
Figure 3.Model feature importance. (A) Global model feature importance. (B) For individuals, features in blue decrease the probability of a given diagnosis, while features in red increase it. The size of the red or blue bar for a given feature reflects its relative contribution. The probability for each diagnosis is given in the “output value,” as a number between zero and one.
Figure 4.Feature importance for prediction of clinical variables based on NGS data. Importance plots are shown for severe pancytopenia, age <65 years, normal karyotype, abnormal karyotype, and complex karyotype. Bars for each feature represent relative importance of given feature for predicting a clinical characteristic.