| Literature DB >> 31901869 |
Duoru Lin1, Jingjing Chen1, Zhuoling Lin1, Xiaoyan Li1, Kai Zhang2, Xiaohang Wu1, Zhenzhen Liu1, Jialing Huang3, Jing Li1, Yi Zhu4, Chuan Chen4, Lanqin Zhao1, Yifan Xiang1, Chong Guo1, Liming Wang5, Yizhi Liu1, Weirong Chen6, Haotian Lin7.
Abstract
BACKGROUND: Approximately 1 in 33 newborns is affected by congenital anomalies worldwide. We aimed to develop a practical model for identifying infants with a high risk of congenital cataracts (CCs), which is the leading cause of avoidable childhood blindness.Entities:
Keywords: Congenital anomaly; Congenital cataract; Identification model; Machine learning
Mesh:
Year: 2020 PMID: 31901869 PMCID: PMC6948173 DOI: 10.1016/j.ebiom.2019.102621
Source DB: PubMed Journal: EBioMedicine ISSN: 2352-3964 Impact factor: 8.143
Fig. 1Flowchart of the research performed in this study. CC: congenital/infantile cataracts.
Fig. 2Birth information of children with CCs. Notes: Only the patients with relevant data were included in the distribution analysis. GHG: gestational hyperglycaemia; GHT: gestational hypertension; CHD: congenital heart disease.
Fig. 3Histories of family heredity and family conditions of children with CC. Notes: Only the patients with relevant data were included in the distribution analysis. *: Because no significant difference was found in education level between the fathers and mothers, the mothers were used to represent the parental education level in this study. ¥: Chinese Yuan.
Comparisons of the pregnancy-labor history, living environment and family variables between the children with CCs and the healthy controls.
| Children with CCs | Healthy controls | |||
|---|---|---|---|---|
| Number | 1129 (Bil=807; Unil=322) | 609 | – | – |
| Age (months) | 31.28±33.23 | 39.09±12.80 | 6.99 | <0.001 |
| Male | 59.9% | 53.7% | 6.194 | 0.015 |
| Family history | 23.83% (269/1129) | 0% (0/609) | 171.67 | <0.001 |
| ≥2nd foetus | 48.89% (419/857) | 23.06% (140/607) | 142.74 | <0.001 |
| Pregnant virus infection | 27.83% (310/1114) | 20.39% (124/608) | 11.53 | 0.001 |
| Preterm delivery | 9.97% (112/1123) | 3.78% (23/608) | 21.02 | <0.001 |
| Eutocia | 66.19% (742/1121) | 56.58% (344/608) | 15.59 | <0.001 |
| Oxygen inspiration/ | 22.17% (237/1069) | 6.74% (41/608) | 66.70 | <0.001 |
| Comorbidity | 11.78% (133/1129) | 1.65% (10/606) | 53.51 | <0.001 |
| Radiation/pollution | 11.86% (114/961) | 9.20% (55/598) | 2.71 | 0.111 |
| Parental smoking | 51.05% (537/1052) | 34.44% (208/604) | 42.77 | <0.001 |
| Low/medium parental education level | 77.69% (846/1089) | 33.06% (200/605) | 327.95 | <0.001 |
| Low household income | 60% (391/644) | 22.89% (111/485) | 253.01 | <0.001 |
Notes: †: junior, primary and below; ¶: an average family income less than 71.5 K (Chinese yuan) was defined as a low household income; results are marked if statistically significant according to Pearson's chi-squared test (*) or an independent-sample t-test (#) (P<0.05); Bil: bilateral patients; Unil: unilateral patients.
Performance of four-fold cross validation and external validation of the CC identification models.
| Accuracy | Sensitivity | Specificity | False negative rate | False positive rate | |||
|---|---|---|---|---|---|---|---|
| 4-fold cross validation | Bilateral | RF | 0.81±0.01 | 0.79±0.02 | 0.82±0.04 | 0.21±0.02 | 0.18±0.04 |
| Ada | 0.79±0.02 | 0.78±0.03 | 0.81±0.03 | 0.22±0.03 | 0.19±0.03 | ||
| Unilateral | RF | 0.79±0.01 | 0.56±0.05 | 0.92±0.03 | 0.44±0.05 | 0.08±0.03 | |
| Ada | 0.75±0.01 | 0.70±0.08 | 0.78±0.05 | 0.30±0.08 | 0.22±0.05 | ||
| External validation | Bilateral | RF | 0.86 | 0.80 | 0.91 | 0.20 | 0.09 |
| Ada | 0.85 | 0.77 | 0.90 | 0.23 | 0.10 | ||
| Unilateral | RF | 0.86 | 0.58 | 0.98 | 0.42 | 0.02 | |
| Ada | 0.85 | 0.58 | 0.97 | 0.42 | 0.03 |
Notes: RF: random forest; Ada: adaptive boosting.
Fig. 4ROC curves and AUC values of models with different algorithms and type of cataracts (bilateral or unilateral) in internal 4-fold cross validation and external validation. CC prediction models performed better in bilateral patients than in unilateral cases, and RF yielded better performance than Ada. ROC: receiver operating characteristic; AUC: area under the curve; RF: random forest; Ada: adaptive boosting; CI: confidence interval.
Fig. 5Relevance ranks of the 11 relevant factors of CCs in bilateral and unilateral patients. Family history of CC, low parental education level, and comorbidity were identified as the top three most relevant factors to both bilateral and unilateral CC diagnosis.
Clinical test of the stability of CC identification models.
| Nos. of patients vs. controls | Algorithm | Accuracy | Sensitivity | Specificity | False negative rate | False positive rate |
|---|---|---|---|---|---|---|
| 94 vs. 100 | RF | 0.86 | 0.80 | 0.93 | 0.20 | 0.07 |
| Ada | 0.88 | 0.82 | 0.93 | 0.18 | 0.07 | |
| 50 vs. 100 | RF | 0.86±0.01 | 0.72±0.01 | 0.93±0.01 | 0.28±0.01 | 0.07±0.01 |
| Ada | 0.85±0.02 | 0.72±0.02 | 0.92±0.02 | 0.28±0.02 | 0.09±0.02 | |
| 30 vs. 100 | RF | 0.88±0.01 | 0.69±0 | 0.93±0.01 | 0.31±0 | 0.07±0.01 |
| Ada | 0.87±0.02 | 0.72±0.02 | 0.91±0.03 | 0.28±024 | 0.09±0.03 | |
| 10 vs. 100 | RF | 0.92±0.01 | 0.78±0 | 0.93±0.01 | 0.22±0 | 0.07±0.01 |
| Ada | 0.89±0.02 | 0.75±0.05 | 0.91±0.03 | 0.25±0.05 | 0.09±0.03 |
Fig. 6ROC curves and AUC values of two AI algorithms for bilateral patients in the clinical test. ROC: receiver operating characteristic; AUC: area under the curve; AI: artificial intelligence; CI: confidence interval.