| Literature DB >> 30747712 |
Liyan Pan1, Guangjian Liu1, Xiaojian Mao2, Huiying Liang1, Xiuzhen Li2, Huixian Li1, Jiexin Zhang1.
Abstract
BACKGROUND: Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis-gonadotropin-releasing hormone (GnRH)-stimulation test or GnRH analogue (GnRHa)-stimulation test-is expensive and makes patients uncomfortable due to the need for repeated blood sampling.Entities:
Keywords: GnRHa-stimulation test; central precocious puberty; machine learning; prediction model
Year: 2019 PMID: 30747712 PMCID: PMC6390190 DOI: 10.2196/11728
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Training and validation process of prediction models. AUC: area under receiver operating characteristic; XGBoost: extreme gradient boosting.
Basic characteristics of girls who underwent the GnRHa-stimulation test.
| Variables | Non-CPPa (n=966), mean (SD) | CPP (n=791), mean (SD) | |
| Age (years) | 7.07 (1.11) | 7.52 (0.99) | <.001 |
| LHc (IU/L) | 0.12 (0.23) | 0.93 (1.28) | <.001 |
| FSHd (IU/L) | 1.82 (1.30) | 3.01 (1.62) | <.001 |
| GHe (ng/mL) | 3.27 (3.26) | 4.75 (4.69) | <.001 |
| IGF-If (ng/mL) | 231.35 (65.93) | 317.87 (89.84) | <.001 |
| IGFBP-3g (μg/mL) | 4.55 (0.52) | 4.81 (0.55) | <.001 |
| Estradiol (pmol/L) | 102.56 (50.96) | 125.81 (60.97) | <.001 |
| Prolactin (ng/mL) | 8.73 (5.39) | 8.59 (5.61) | .52 |
| Testosterone (nmol/L) | 0.80 (0.39) | 0.94 (0.49) | <.001 |
| Historyh (months) | 7.67 (10.39) | 9.27 (9.63) | <.001 |
| Menstruation/menarche (yes, no) | N/Ai | N/A | .03 |
| Heightj (cm) | 127.16 (8.61) | 131.61 (8.42) | <.001 |
| Weightj (kg) | 27.32 (5.32) | 29.60 (4.95) | <.001 |
| BMIk (kg/m2) | 16.73 (2.30) | 16.91 (1.96) | .34 |
| Breast core (yes, no) | N/A | N/A | .02 |
| Pubesl (1-5) | 1.06 (0.27) | 1.14 (0.44) | <.001 |
| Pigmentation (yes, no) | N/A | N/A | .87 |
| Left breastl (1-5) | 2.33 (0.84) | 2.76 (0.92) | <.001 |
| Right breastl (1-5) | 2.32 (0.84) | 2.78 (0.92) | <.001 |
aCPP: central precocious puberty.
bThe equality of each indicator was evaluated by Chi-square or Student t test. P<.05 was considered significant.
cLH: luteinizing hormone.
dFSH: follicle-stimulation hormone.
eGH: growth hormone.
fIGF-I: insulin-like growth factor-I.
gIGFBP-3: insulin-like growth factor binding protein-3.
hAbnormal duration in records.
iN/A: not applicable.
jAt stimulation test.
kBMI: body mass index.
lTanner stage.
Predictive performance of classifiers and the corresponding parameters. A paired t test was performed on specificity and sensitivity for comparison against XGBoost.
| Algorithms/Variables | Specificitya (%), | Sensitivityb (%), | AUCc, | Parameters | |
| XGBoostd | 85.39 (1.38) | 77.94 (3.50) | 0.89 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| Random forest | 84.32 (1.88)e | 77.91 (3.59)f | 0.88 (0.02) | Max depth=3, criterion=gini, number of trees=20 | |
| SVMg | 88.94 (1.76)e | 62.36 (4.12)e | 0.86 (0.04) | Kernel=linear, penalty coefficient=5 | |
| Decision tree | 75.90 (2.47)e | 71.71 (3.99)e | 0.74 (0.02) | Criterion=entropy | |
| XGBoost | 83.17 (5.29) | 75.28 (6.43) | 0.86 (0.04) | Learning rate=0.01, max depth=3, number of trees=500 | |
| Random forest | 83.46 (6.28)f | 74.72 (6.43)f | 0.85 (0.04) | Max depth=3, criterion=gini, number of trees=20 | |
| SVM | 88.94 (4.90)e | 62.36 (7.73)e | 0.86 (0.02) | Kernel=linear, penalty coefficient=5 | |
| Decision tree | 76.25 (7.07)e | 68.06 (7.12)e | 0.72 (0.04) | Criterion=entropy | |
| XGBoostd | 87.66 (5.52) | 76.64 (6.51) | 0.90 (0.04) | Learning rate=0.01, max depth=4, number of trees=500 | |
| Random forest | 87.41 (4.22)f | 75.03 (7.91)f | 0.90 (0.05) | Max depth=3, criterion=entropy, number of trees=20 | |
| SVM | 89.81 (4.28)f | 66.53 (7.01)e | 0.86 (0.02) | Kernel=linear, penalty coefficient=5 | |
| Decision tree | 76.35 (5.51)e | 68.61 (7.16)e | 0.72 (0.05) | Criterion=entropy | |
| LHh, IGF-Ii, FSHj | 83.17 (1.62) | 76.39 (3.57) | 0.86 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| LHh, IGF-Ii | 83.27 (1.62) | 75.69 (3.61) | 0.86 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| LHh, FSHj | 83.56 (1.94) | 75.83 (3.13) | 0.84 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| LHh | 83.37 (2.00) | 75.97 (3.74) | 0.84 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| IGF-Ii, FSHj | 80.77 (2.47) | 57.08 (3.29) | 0.77 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| IGF-Ii | 80.19 (3.14) | 53.19 (4.55) | 0.73 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
| FSHj | 84.13 (3.87) | 45.00 (5.34) | 0.68 (0.02) | Learning rate=0.01, max depth=3, number of trees=500 | |
aSpecificity=number of true negatives/(number of true negatives+number of false positives).
bSensitivity=number of true positives/(number of true positives+number of false negatives).
cAUC, area under the receiver operating curve.
dXGBoost: extreme gradient boosting.
eP<.01
fNot significant.
gSVM: supported vector machines.
hLH: luteinizing hormone.
iIGF-I: insulin-like growth factor-I.
jFSH: follicle-stimulation hormone.
Figure 2ROC curves for classifiers with 19 variables for 1757 patients and 25 variables for 436 patients. ROC: receiver operating curve; AUC: area under ROC.
Figure 3Feature importance ranking for 19 variables in two classifiers calculated by the models. LH: luteinizing hormone; IGF-I: insulin-like growth factor-I; FSH: follicle-stimulation hormone; PRL: prolactin; GH: growth hormone; E2: estradiol; BMI: body mass index; TTE: testosterone; Rbreast: right breast; Lbreast: left breast; IGFBP-3: insulin-like growth factor binding protein-3; PMT: pigmentation; MST: menstruation.
Figure 4Results of LIME with XGBoost and Random Forest classifiers applied to one positive (A, B) and one negative (C, D) instance. The left sides are for XGBoost, and the right for Random Forest. Blue color is for the negative instance and orange is for the positive instance. The first column represents the prediction probabilities of negative and positive results achieved from classifiers. The second column shows the features’ contributions to the probability. Only the top nine features are displayed for clarity. The third column displays the original data values. LIME: local interpretable model-agnostic explanations; XGBoost: extreme gradient boosting; LH: luteinizing hormone; IGF-I: insulin-like growth factor-I; FSH: follicle-stimulation hormone; PRL: prolactin; GH: growth hormone; E2: estradiol; BMI: body mass index; TTE: testosterone; Rbreast: right breast; Lbreast: left breast; IGFBP-3: insulin-like growth factor binding protein-3; PMT: pigmentation; MST: menstruation; PMT: pigmentation.