| Literature DB >> 35603277 |
Min Hu1, Chikashi Asami1, Hiroshi Iwakura2, Yasuyo Nakajima3, Ryousuke Sema1, Tsuyoshi Kikuchi1, Tsuyoshi Miyata1, Koji Sakamaki4, Takumi Kudo5, Masanobu Yamada3, Takashi Akamizu2,5, Yasubumi Sakakibara1,6.
Abstract
Background: Approximately 2.4 million patients in Japan would benefit from treatment for thyroid disease, including Graves' disease and Hashimoto's disease. However, only 450,000 of them are receiving treatment, and many patients with thyroid dysfunction remain largely overlooked. In this retrospective study, we aimed to develop and conduct preliminary testing on a machine learning method for screening patients with hyperthyroidism and hypothyroidism who would benefit from prompt medical treatment.Entities:
Keywords: Diagnostic markers; Thyroid diseases
Year: 2022 PMID: 35603277 PMCID: PMC9053267 DOI: 10.1038/s43856-022-00071-1
Source DB: PubMed Journal: Commun Med (Lond) ISSN: 2730-664X
Summary of the data from each institution.
| Institution | Wakayama Medical University | Gunma University | Hidaka Hospital | Kuma Hospital |
|---|---|---|---|---|
| Number of prescriptions | 8,249,286 | 34,561,268 | 23,450 | 61,590 |
| Number of patients | 14,249 | 27,133 | 10,482 | 124,863 |
| Average age | 60.9 | 51.7 | 47.7 | 50.3 |
| Male/female ratio | 1.03 (5,888/5,723) | 0.53 (8,143/15,296) | 1.82 (15,125/8,325) | 0.21 |
| Data period | 2010–2018 | 2004–2019 | 2004–2007 | 2007–2020 |
The demographic summary is shown for each institution. “Number of prescriptions” represents the number of prescription records in each dataset, and “Number of patients” represents the number of patients in each dataset. “Average age”, and “male/female ratio” and “data period” represent the demographic summary of the patients in each institution.
List of verification items.
| No. | Verification item | Option | |||
|---|---|---|---|---|---|
| 1 | Training data labeling | Thyroid function test criterion | Prescription criterion | ||
| 2 | Institution combination (for patient data and control group data) | Institution combination 1 (Inst. comb. 1) | Institution combination 2 (Inst. comb. 2) | Institution combination 3 (Inst. comb. 3) | External |
| 3 | Machine-learning algorithm | GBDT | SVM | Logistic regression | ANN |
| 4 | Input features | Feature set 1 | Feature set 2 | ||
Verification items in this study are categorized into four groups: “Training data labeling”, “Institution combination”, “Machine-learning algorithm”, and “Input features”. Each category contains several specific verification options and was verified in our experiments.
Results of the validation of models with different labeling criteria, machine-learning algorithms, institutions, and input features.
| No. | I | II | III | IV | V | VI | VII | VIII | IX | |
|---|---|---|---|---|---|---|---|---|---|---|
| Training | Data labeling | Thyroid function test criterion | Prescription criterion | Thyroid function test criterion | ||||||
| Institution combination | Inst. comb. 1 | Inst. comb. 1 | Inst. comb. 2 | Inst. comb. 3 | Inst. comb. 1 | |||||
| Machine-learning algorithm | GBDT | SVM | Logistics regression | ANN | GBDT | |||||
| Input features | Feature set 1 | Feature set 2 | Feature set 1 | |||||||
| Validation | Labeling criteria | Thyroid function test criterion | ||||||||
| Institution combination | Inst. comb. 1 | External | ||||||||
| Hyperthyroidism | AUROC | 93.8 ± 2.7% | 93.0 ± 2.3% | 93.1 ± 2.4% | 92.8 ± 3.0% | 91.8 ± 3.4% | 90.9 ± 3.3% | 91.9 ± 2.7% | 85.5 ± 3.9% | 97.2 ± 0.5% |
| PPV | 80.3 ± 6.2% | 81.4 ± 4.7% | 76.6 ± 8.1% | 78.2 ± 7.1% | 71.6 ± 4.5% | 79.4 ± 6.8% | 73.9 ± 7.9% | 72.4 ± 7.0% | 98.5 ± 0.5% | |
| NPV | 94.4 ± 2.7% | 93.1 ± 3.4% | 93.9 ± 2.7% | 92.4 ± 3.5% | 94.1 ± 3.7% | 91.7 ± 3.7% | 93.6 ± 3.4% | 88.5 ± 4.2% | 67.4 ± 6.0% | |
| Sensitivity | 89.1 ± 5.8% | 86.4 ± 7.7% | 88.6 ± 5.5% | 85.4 ± 7.0% | 89.4 ± 6.7% | 83.6 ± 8.4% | 88.3 ± 7.0% | 77.3 ± 9.7% | 90.0 ± 2.9% | |
| Specificity | 88.6 ± 4.7% | 89.9 ± 3.5% | 85.7 ± 6.3% | 87.7 ± 5.1% | 82.0 ± 4.1% | 88.7 ± 4.6% | 83.6 ± 6.7% | 84.6 ± 5.9% | 93.7 ± 2.1% | |
| Hypothyroidism | AUROC | 90.9 ± 3.3% | 92.1 ± 3.2% | 89.3 ± 2.2% | 88.5 ± 4.5% | 88.6 ± 4.0% | 86.7 ± 3.1% | 89.0 ± 3.6% | 82.5 ± 3.7% | 94.0 ± 1.5% |
| PPV | 79.9 ± 8.4% | 73.9 ± 6.2% | 73.2 ± 7.4% | 72.9 ± 8.1% | 74.1 ± 7.0% | 67.7 ± 6.6% | 71.6 ± 9.2% | 70.0 ± 10.3% | 59.8 ± 5.2% | |
| NPV | 91.3 ± 5.3% | 94.8 ± 3.8% | 92.3 ± 4.7% | 91.7 ± 3.6% | 90.1 ± 2.3% | 90.4 ± 4.3% | 92.5 ± 2.9% | 85.2 ± 2.5% | 98.3 ± 0.8% | |
| Sensitivity | 82.4 ± 12.5% | 90.5 ± 7.2% | 85.1 ± 10.2% | 84.9 ± 6.7% | 81.2 ± 4.7% | 82.2 ± 9.2% | 86.4 ± 6.3% | 70.6 ± 8.2% | 91.6 ± 3.9% | |
| Specificity | 86.5 ± 6.8% | 83.7 ± 5.2% | 83.4 ± 7.9% | 83.8 ± 6.2% | 85.3 ± 5.3% | 79.5 ± 7.7% | 81.8 ± 8.0% | 83.4 ± 8.6% | 88.5 ± 2.6% | |
The mean and standard deviation for the tenfold cross-validation are shown for each score.
The evaluation metrics AUROC, PPV, NPV, sensitivity, and specificity for each model are shown. Two criteria for labeling of data, a thyroid test criterion and a prescription criterion, were devised. Inst. comb. 1 represents thyroid dysfunction group data from both Wakayama Medical University and Gunma University, and a control group data from Hidaka Hospital, Inst. comb. 2 represents thyroid dysfunction group data from Wakayama Medical University and a control group data from Hidaka Hospital, and Inst. comb. 3 represents thyroid dysfunction group data from Gunma University and a control group data from Hidaka Hospital. Feature set 1 is the full set of features available in the four hospitals, and Feature set 2 is limited to five routine tests that are mandatory for Japanese national special health check-ups. Four typical machine-learning algorithms for structured data, gradient boosting decision trees, support vector machines and neural networks used in related studies, as well as logistic regression, were examined.
Evaluation results obtained without considering crosstalk.
| No. | A-1 | A-2 | |
|---|---|---|---|
| Training | Positive label criterion | Thyroid function test criterion | |
| Negative label setting | Crosstalk nonaccount | ||
| Validation | Positive label criterion | Thyroid function test criterion | |
| Negative label setting | Crosstalk nonaccount | Crosstalk account | |
| Hyperthyroidism | AUROC | 98.0 ± 2.2% | 91.3 ± 2.3% |
| Hypothyroidism | AUROC | 95.7 ± 3.1% | 81.4 ± 4.4% |
The mean and standard deviation for the tenfold cross-validation are shown for the AUROC scores. “Crosstalk account” represents the negative label setting, where both the control group and the hypothyroidism group were labeled negative in the hyperthyroidism group and both the control group and the hyperthyroidism group were labeled negative in the hypothyroidism group. “Crosstalk nonaccount” represents the negative label setting where only the control group was labeled negative.