| Literature DB >> 35284557 |
Yuki Kataoka1,2,3,4, Yuya Kimura5, Tatsuyoshi Ikenoue6,7, Yoshinori Matsuoka3,8, Junichi Matsumoto9, Junji Kumasawa6,10, Kentaro Tochitatni11, Hiraku Funakoshi12, Tomohiro Hosoda13, Aiko Kugimiya14, Michinori Shirano15, Fumiko Hamabe16, Sachiyo Iwata17, Shingo Fukuma6.
Abstract
Background: We developed and validated a machine learning diagnostic model for the novel coronavirus (COVID-19) disease, integrating artificial-intelligence-based computed tomography (CT) imaging and clinical features.Entities:
Keywords: COVID-19; Light Gradient Boosting Machine (LightGBM); decision support tool; diagnosis; machine learning
Year: 2022 PMID: 35284557 PMCID: PMC8904977 DOI: 10.21037/atm-21-5571
Source DB: PubMed Journal: Ann Transl Med ISSN: 2305-5839
Patient characteristics
| Variables | COVID-19 PCR: positive (N=326) | COVID-19 PCR: negative (N=377) |
|---|---|---|
| Age, years | 55 [43–68] | 68 [45–79] |
| Sex, male, N (%) | 197 (60.4) | 220 (58.4) |
| Smoking, current/ex-smoker, N (%) | 46 (14.1) | 64 (17.0) |
| Contact history, N (%) | ||
| With family patients | 39 (12.0) | 6 (1.6) |
| With non-family patients | 78 (23.9) | 34 (9.0) |
| None | 209 (64.1) | 337 (89.4) |
| Travel overseas, N (%) | 39 (12.0) | 14 (3.7) |
| Duration of symptom, days | 6 [4–9] | 4 [2–9] |
| Missing data, N (%) | 15 (4.6) | 15 (4.0) |
| Symptoms, N (%) | ||
| Cough | 123 (37.7) | 109 (28.9) |
| Chill | 48 (14.7) | 43 (11.4) |
| Sore throat | 81 (24.8) | 78 (20.7) |
| Diarrhea | 41 (12.6) | 25 (6.6) |
| Muscle pain | 29 (8.9) | 17 (4.5) |
| Conjunctivitis | 18 (5.5) | 12 (3.2) |
| Taste disorder | 33 (10.1) | 22 (5.8) |
| Complications, N (%) | ||
| Coronary arterial diseases | 12 (3.7) | 38 (10.1) |
| Cerebrovascular diseases | 25 (7.7) | 46 (12.2) |
| Chronic heart failures | 21 (6.4) | 59 (15.6) |
| Chronic kidney diseases | 17 (5.2) | 53 (14.1) |
| COPD | 16 (4.9) | 62 (16.4) |
| Malignancy | 29 (8.9) | 87 (23.1) |
| Immune disorders | 5 (1.5) | 28 (7.4) |
| Hypertension | 52 (16.0) | 85 (22.5) |
| Diabetes mellitus | 56 (17.1) | 84 (22.3) |
| Others | 59 (18.1) | 158 (41.9) |
| Vital signs | ||
| Body temperature, °C | 37.2 [36.6–38.1] | 37.2 [36.7–38.0] |
| Missing data, N (%) | 14 (4.3) | 25 (6.6) |
| Systolic blood pressure, mmHg | 126 [113–138] | 130 [114–148] |
| Missing data, N (%) | 20 (6.1) | 38 (10.1) |
| Diastolic blood pressure, mmHg | 79 [70–89] | 77 [67–87] |
| Missing data, N (%) | 20 (6.1) | 38 (10.1) |
| Heart rate, beats per minute | 86 [78–98] | 93 [80–108] |
| Missing data, N (%) | 11 (3.4) | 35 (9.3) |
| Respiratory rate, breaths per minute | 18 [16–21] | 20 [16–24] |
| Missing data, N (%) | 65 (19.9) | 138 (36.6) |
| Laboratory data | ||
| White blood cell, ×103/μL | 4.1 [1.8–5.2] | 9.2 [6.4–12.5] |
| Missing data, N (%) | 14 (4.3) | 44 (11.7) |
| Hemoglobin, g/dL | 14.0 [12.9–15.2] | 12.2 [10.3–13.5] |
| Missing data, N (%) | 23 (7.1) | 60 (15.9) |
| Platelet, ×104/μL | 18.9 [15.1–25.0] | 23.5 [16.4–30.0] |
| Missing data, N (%) | 16 (4.9) | 45 (11.9) |
| Aspartate aminotransferase, U/L | 32 [24–54] | 27 [19–40] |
| Missing data, N (%) | 13 (4.0) | 43 (11.4) |
| Alanine aminotransferase, U/L | 30 [17–46] | 20 [13–34] |
| Missing data, N (%) | 13 (4.0) | 43 (11.4) |
| Lactate dehydrogenase, U/L | 282 [216–403] | 244 [186–324] |
| Missing data, N (%) | 15 (4.6) | 54 (14.3) |
| C-reactive protein, mg/dL | 3.7 [0.5–9.5] | 5.5 [1.4–11.9] |
| Missing data, N (%) | 14 (4.3) | 57 (15.1) |
| Computed tomography data | ||
| Ali-M3 confidence, % | 0.93 [0.52–1.00] | 0.25 [0.01–0.71] |
All continuous variables are not normally distributed and are presented as median [interquartile range]; categorical variables are presented as N (%). COPD, chronic obstructive pulmonary disease.
Patient characteristics and scanning protocol in each hospital
| Characteristic | H01 (N=94)1 | H02 (N=158)1 | H03 (N=19)1 | H04 (N=70)1 | H05 (N=71)1 | H06 (N=32)1 | H07 (N=21)1 | H08 (N=68)1 | H09 (N=110)1 | H10 (N=34)1 | H11 (N=43)1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Age, years | 57 [44–76] | 60 [44–73] | 63 [53–76] | 60 [39–72] | 59 [44–76] | 60 [46–68] | 65 [41–76] | 63 [42–73] | 59 [42–73] | 78 [58–86] | 68 [46–80] | |
| COVID-19 PCR | ||||||||||||
| Positive | 70 (74%) | 52 (33%) | 18 (95%) | 35 (50%) | 26 (37%) | 21 (66%) | 12 (57%) | 18 (26%) | 37 (34%) | 11 (32%) | 21 (49%) | |
| System | Aquilion PRIME | Optima CT660 | Aquilion PRIME | Optima CT660 | Optima CT660 | Aquilion PRIME | Aquilion CX Edition | Aquilion ONE | Aquilion CXL | Aquilion PRIME | Aquilion CXL | Aquilion CX Edition |
| Vendor | Canon Medical Systems | GE | Canon Medical Systems | GE | GE | Canon Medical Systems | Canon Medical Systems | Canon Medical Systems | Canon Medical Systems | Canon Medical Systems | Canon Medical Systems | Canon Medical Systems |
| Tube voltage (kVp) | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
| Automatic tube current modulation (mAs) | Auto | 100–510 | 150–250 | 80–500 | 80–500 | 150–250 | 403–500 | 100–400 | 100–400 | 50–250 | 100–400 | 100–400 |
| Pitch | ||||||||||||
| Standard | 111 | 55 | 65 | 55 | 55 | 65 | – | 65 | 53 | 65 | 53 | – |
| Factor | 0.813 | 0.984 | 0.813 | 0.984 | 0.984 | 0.813 | 1.172 | 0.813 | 0.828 | 0.813 | 0.828 | 1.000 |
| Matrix | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 | 512×512 |
| Slice thickness (cm) | 0.500 | 0.625 | 0.500 | 0.625 | 0.625 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 1.000 | 5.000 |
| Field of view (mm) | 320 | 340 | 320 | – | – | 330 | 350 | 320 | 320 | 320–400 | 320 | 320 |
| Reconstruction interval (mm) | 5 | 0.625 | 2 | 1.25 | 1.25 | 3 | 5 | 5 | 5 | 5 | 5 | 5 |
1, n (%) or median [interquartile range].
Model discrimination and calibration in the test data
| Model | Score | Difference between Ali-M3 confidence | |||
|---|---|---|---|---|---|
| AUROC (95% CI) | Brier score (95% CI) | AUROC (95% CI) | Brier score (95% CI) | ||
| Full model* | 0.91 (0.86 to 0.95) | 0.10 (0.07 to 0.13) | 0.13 (0.07 to 0.19) | −0.13 (−0.17 to −0.09) | |
| A-blood model | 0.90 (0.86 to 0.94) | 0.12 (0.08 to 0.16) | 0.12 (0.07 to 0.18) | −0.11 (−0.15 to −0.07) | |
| Ali-M3 confidence | 0.78 (0.71 to 0.83) | 0.23 (0.19 to 0.27) | – | − | |
*, machine learning model using all variables; , machine learning model using 8 variables including Ali-M3 confidence, white blood cell, hemoglobin, platelet, aspartate aminotransferase, alanine aminotransferase, lactate dehydrogenase, and C-reactive protein. AUROC, area under receiver operator curve; CI, confidence interval.
Figure 1Receiver operator curves (ROCs) of COVID-19 PCR positive prediction models in the test data. Models include machine learning model using all variables, machine learning model using 8 variables (Ali-M3 confidence; white blood cell; hemoglobin; platelet; aspartate aminotransferase; alanine aminotransferase; lactate dehydrogenase; C-reactive protein), and Ali-M3 confidence. AUROC, area under the receiver operator curves.
Figure 2Variables that demonstrated the greatest association with COVID-19 real-time reverse transcription polymerase chain reaction (RT-PCR) positive in machine learning model using all variables in the test data. WBC, white blood cells; Hb, hemoglobin; LDH, lactate dehydrogenase; PLT, platelet; RR, respiratory rate; ALT, alanine aminotransferase; HR, heart rate; CRP, C-reactive protein; AST, aspartate aminotransferase; dBP, diastolic blood pressure; sBP, systolic blood pressure; BT, body temperature; SHAP, Shapley Additive exPlanations.
Figure 3Variables that demonstrated the greatest association with COVID-19 real-time reverse transcription polymerase chain reaction (RT-PCR) positive in machine learning model using 8 variables (Ali-M3 confidence; white blood cell; hemoglobin; platelet; aspartate aminotransferase; alanine aminotransferase; lactate dehydrogenase; C-reactive protein) in the test data. WBC, white blood cells; Hb, hemoglobin; LDH, lactate dehydrogenase; ALT, alanine aminotransferase; CRP, C-reactive protein; PLT, platelet; AST, aspartate aminotransferase; SHAP, Shapley Additive exPlanations.
Figure 4Predicted versus observed probability of COVID-19 real-time reverse transcription polymerase chain reaction (RT-PCR) positive (calibration; pink line) for Ali-M3 confidence.
Figure 5Predicted versus observed probability of COVID-19 real-time reverse transcription polymerase chain reaction (RT-PCR) positive (calibration; pink line) for machine learning model using all variables.
Figure 6Predicted versus observed probability of COVID-19 real-time reverse transcription polymerase chain reaction (RT-PCR) positive (calibration; pink line) for machine learning model using 8 variables (Ali-M3 confidence; white blood cell; hemoglobin; platelet; aspartate aminotransferase; alanine aminotransferase; lactate dehydrogenase; C-reactive protein).